MySQL ROW_NUMBER() Explained (MySQL 8.0): Ranking, Top-N Queries, and Deduplication

目次

1. Introduction

MySQL version 8.0 introduced many new features, and one of the most notable is support for window functions. In this article, we’ll focus on one of the most frequently used functions: ROW_NUMBER().

The ROW_NUMBER() function provides powerful capabilities for data analysis and reporting, making it easy to sort and rank data based on specific conditions. This article explains everything from basic usage and practical examples to alternative approaches for older MySQL versions.

Target readers

  • Beginner to intermediate users with basic SQL knowledge
  • Engineers and data analysts who process and analyze data using MySQL
  • Anyone considering migrating to the latest MySQL version

Benefits of ROW_NUMBER()

This function lets you assign a unique number to each row based on specific conditions. For example, you can easily write queries like “create a ranking in descending order of sales” or “extract and organize duplicate data” in a concise way.

In older versions, you often had to write complex queries using user-defined variables. With ROW_NUMBER(), your SQL becomes simpler and more readable.

In this article, we’ll use concrete query examples and explain them in a beginner-friendly way. In the next section, we’ll take a closer look at the basic syntax and behavior of this function.

2. What Is the ROW_NUMBER() Function?

The ROW_NUMBER() function, newly added in MySQL 8.0, is a type of window function that assigns sequential numbers to rows. It can number rows by a specific order and/or within each group, which is extremely useful for data analysis and reporting. Here we’ll explain the basic syntax in detail with practical examples.

Basic syntax of ROW_NUMBER()

First, the basic format of ROW_NUMBER() is as follows.

SELECT
    column_name,
    ROW_NUMBER() OVER (PARTITION BY group_column ORDER BY sort_column) AS row_num
FROM
    table_name;

Meaning of each element

  • ROW_NUMBER(): Assigns a sequential number to each row.
  • OVER: Keyword used to define the window for a window function.
  • PARTITION BY: Groups data by the specified column. Optional. If omitted, numbering is applied across all rows.
  • ORDER BY: Defines the ordering used to assign numbers, i.e., the sorting criteria.

Basic example

For example, assume you have a table named “sales” with the following data.

employeedepartmentsale
ASales Department500
BSales Department800
CDevelopment Department600
DDevelopment Department700

To assign sequential numbers within each department in descending order of sales, use the following query.

SELECT
    employee,
    department,
    sale,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS row_num
FROM
    sales;

Result

employeedepartmentsalerow_num
BSales Department8001
ASales Department5002
DDevelopment Department7001
CDevelopment Department6002

From this result, you can see that rankings by sales within each department are displayed.

How to use PARTITION BY

In the example above, the data is grouped by the “department” column. This assigns a separate sequence for each department.

If you omit PARTITION BY, numbering is assigned across all rows as one sequence.

SELECT
    employee,
    sale,
    ROW_NUMBER() OVER (ORDER BY sale DESC) AS row_num
FROM
    sales;

Result

employeesalerow_num
B8001
D7002
C6003
A5004

Characteristics and caveats of ROW_NUMBER()

  • Unique numbering: Even if values are the same, the assigned numbers are unique.
  • Handling NULLs: If ORDER BY includes NULLs, they appear first in ascending order and last in descending order.
  • Performance impact: For large datasets, ORDER BY can be expensive, so proper indexing is important.

3. Practical Use Cases

Here are practical scenarios using MySQL’s ROW_NUMBER() function. This function is useful in many real-world cases, such as ranking data and handling duplicates.

3-1. Ranking within each group

For example, consider the case where you want to “rank employees by sales within each department” using sales data. Use the following dataset as an example.

employeedepartmentsale
ASales Department500
BSales Department800
CDevelopment Department600
DDevelopment Department700

Query example: Sales ranking by department

SELECT
    employee,
    department,
    sale,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
    sales;

Result:

employeedepartmentsalerank
BSales Department8001
ASales Department5002
DDevelopment Department7001
CDevelopment Department6002

In this way, each department gets its own sequence in descending order of sales, making it easy to generate rankings.

3-2. Extracting the Top N rows

Next, let’s look at a case where you want to “extract the top 3 employees by sales within each department.”

Query example: Extract Top N rows

WITH RankedSales AS (
    SELECT
        employee,
        department,
        sale,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
    FROM
        sales
)
SELECT
    employee,
    department,
    sale
FROM
    RankedSales
WHERE
    rank <= 3;

Result:

employeedepartmentsale
BSales Department800
ASales Department500
DDevelopment Department700
CDevelopment Department600

This example retrieves only the top 3 rows by sales within each department. As you can see, ROW_NUMBER() is suitable not only for ranking but also for filtering top results.

3-3. Finding and removing duplicate data

Databases sometimes contain duplicate records. In such cases, you can also handle them easily using ROW_NUMBER().

Query example: Detect duplicates

SELECT *
FROM (
    SELECT
        employee,
        sale,
        ROW_NUMBER() OVER (PARTITION BY employee ORDER BY sale DESC) AS rank
    FROM
        sales
) tmp
WHERE rank > 1;

This query detects duplicates when multiple records exist for the same employee name.

Query example: Delete duplicates

DELETE FROM sales
WHERE id IN (
    SELECT id
    FROM (
        SELECT
            id,
            ROW_NUMBER() OVER (PARTITION BY employee ORDER BY sale DESC) AS rank
        FROM
            sales
    ) tmp
    WHERE rank > 1
);

Summary

ROW_NUMBER() is useful in a variety of scenarios, such as:

  1. Ranking within each group
  2. Extracting Top N rows
  3. Detecting and deleting duplicates

This makes complex data processing and analysis simpler and more efficient.

4. Comparison with Other Window Functions

In MySQL 8.0, in addition to ROW_NUMBER(), there are window functions like RANK() and DENSE_RANK() that can be used for ranking and position calculations. While they have similar roles, their behavior and results differ. Here we’ll compare each function and explain when to use them.

4-1. RANK() function

The RANK() function assigns ranks, giving the same rank to equal values and skipping the next rank number.

Basic syntax

SELECT
    column_name,
    RANK() OVER (PARTITION BY group_column ORDER BY sort_column) AS rank
FROM
    table_name;

Example

Using the following data, calculate sales ranks.

employeedepartmentsale
ASales Department800
BSales Department800
CSales Department600
DSales Department500

Query example: Using RANK()

SELECT
    employee,
    sale,
    RANK() OVER (ORDER BY sale DESC) AS rank
FROM
    sales;

Result:

employeesalerank
A8001
B8001
C6003
D5004

Key points:

  • A and B with the same sales amount (800) are both treated as rank “1”.
  • The next rank “2” is skipped, so C becomes rank “3”.

4-2. DENSE_RANK() function

The DENSE_RANK() function also assigns the same rank to equal values, but it does not skip the next rank number.

Basic syntax

SELECT
    column_name,
    DENSE_RANK() OVER (PARTITION BY group_column ORDER BY sort_column) AS dense_rank
FROM
    table_name;

Example

Using the same data as above, try the DENSE_RANK() function.

Query example: Using DENSE_RANK()

SELECT
    employee,
    sale,
    DENSE_RANK() OVER (ORDER BY sale DESC) AS dense_rank
FROM
    sales;

Result:

employeesaledense_rank
A8001
B8001
C6002
D5003

Key points:

  • A and B with the same sales amount (800) are both treated as rank “1”.
  • Unlike RANK(), the next rank starts at “2”, so ranking continuity is preserved.

4-3. How ROW_NUMBER() differs

The ROW_NUMBER() function differs from the other two in that it assigns a unique number even when values are the same.

Example

SELECT
    employee,
    sale,
    ROW_NUMBER() OVER (ORDER BY sale DESC) AS row_num
FROM
    sales;

Result:

employeesalerow_num
A8001
B8002
C6003
D5004

Key points:

  • Even if values are the same, each row gets a unique number, so there are no duplicate ranks.
  • This is useful when you need strict ordering control or per-row uniqueness.

4-4. Quick use-case summary

FunctionRanking behaviorTypical use case
ROW_NUMBER()Assigns a unique numberWhen you need sequential numbering or unique identification per row
RANK()Same rank for ties; skips the next rank numberWhen you want rankings with gaps reflecting ties
DENSE_RANK()Same rank for ties; does not skip rank numbersWhen you want continuous ranks without gaps

Summary

ROW_NUMBER(), RANK(), and DENSE_RANK() should be used appropriately depending on the scenario.

  1. ROW_NUMBER() is best when you need unique numbers per row.
  2. RANK() is useful when you want ties to share a rank and you want to emphasize rank gaps.
  3. DENSE_RANK() is suitable when you want continuous ranks without gaps.

5. Alternatives for MySQL Versions Below 8.0

In versions earlier than MySQL 8.0, ROW_NUMBER() and other window functions are not supported. However, you can achieve similar behavior using user-defined variables. This section explains practical alternatives for MySQL versions below 8.0.

5-1. Sequential numbering using user-defined variables

In MySQL 5.7 and earlier, you can use user-defined variables to assign sequential numbers per row. Let’s look at the following example.

Example: Sales ranking by department

Sample data:

employeedepartmentsale
ASales Department500
BSales Department800
CDevelopment Department600
DDevelopment Department700

Query:

SET @row_num = 0;
SET @dept = '';

SELECT
    employee,
    department,
    sale,
    @row_num := IF(@dept = department, @row_num + 1, 1) AS rank,
    @dept := department
FROM
    (SELECT * FROM sales ORDER BY department, sale DESC) AS sorted_sales;

Result:

employeedepartmentsalerank
BSales Department8001
ASales Department5002
DDevelopment Department7001
CDevelopment Department6002

5-2. Extracting the Top N rows

To retrieve the Top N rows, you can use user-defined variables in a similar way.

Query:

SET @row_num = 0;
SET @dept = '';

SELECT *
FROM (
    SELECT
        employee,
        department,
        sale,
        @row_num := IF(@dept = department, @row_num + 1, 1) AS rank,
        @dept := department
    FROM
        (SELECT * FROM sales ORDER BY department, sale DESC) AS sorted_sales
) AS ranked_sales
WHERE rank <= 3;

Result:

employeedepartmentsalerank
BSales Department8001
ASales Department5002
DDevelopment Department7001
CDevelopment Department6002

This query assigns ranks by department and then extracts only rows within the top 3.

5-3. Detecting and deleting duplicates

You can also handle duplicate data using user-defined variables.

Query example: Detect duplicates

SET @row_num = 0;
SET @id_check = '';

SELECT *
FROM (
    SELECT
        id,
        name,
        @row_num := IF(@id_check = name, @row_num + 1, 1) AS rank,
        @id_check := name
    FROM
        (SELECT * FROM customers ORDER BY name, id) AS sorted_customers
) AS tmp
WHERE rank > 1;

Query example: Delete duplicates

DELETE FROM customers
WHERE id IN (
    SELECT id
    FROM (
        SELECT
            id,
            @row_num := IF(@id_check = name, @row_num + 1, 1) AS rank,
            @id_check := name
        FROM
            (SELECT * FROM customers ORDER BY name, id) AS sorted_customers
    ) AS tmp
    WHERE rank > 1
);

5-4. Caveats when using user-defined variables

  1. Session dependency
  • User-defined variables are valid only within the current session. They cannot be reused across different queries or sessions.
  1. Dependency on processing order
  • User-defined variables depend on execution order, so setting ORDER BY properly is critical.
  1. SQL readability and maintainability
  • Queries can become complex, so in MySQL 8.0 and later, using window functions is recommended.

Summary

In MySQL versions below 8.0, you can use user-defined variables to implement sequential numbering and ranking in place of window functions. However, because queries tend to become more complex, it’s best to consider migrating to a newer version whenever possible.

6. Caveats and Best Practices

MySQL’s ROW_NUMBER() function and variable-based alternatives are very convenient, but there are important points to keep in mind to run them accurately and efficiently. This section explains practical caveats and best practices for performance optimization.

6-1. Performance considerations

1. ORDER BY cost

ROW_NUMBER() is always used with ORDER BY. Since it requires sorting, processing time can increase significantly for large datasets.

Mitigation:

  • Use indexes: Add indexes to the columns used in ORDER BY to speed up sorting.
  • Use LIMIT: Retrieve only the number of rows you actually need to reduce the amount of data processed.

Example:

SELECT
    employee,
    sale,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
    sales
LIMIT 1000;

2. Increased memory usage and disk I/O

Window functions are processed using temporary tables and memory. As data volume grows, memory consumption and disk I/O can increase.

Mitigation:

  • Split queries: Break processing into smaller queries and extract data step-by-step to reduce load.
  • Use temporary tables: Store extracted data in a temporary table and run aggregation from there to distribute the workload.

6-2. Query tuning tips

1. Check the execution plan

In MySQL, you can use EXPLAIN to check the query execution plan. This helps you verify whether indexes are being used correctly.

Example:

EXPLAIN
SELECT
    employee,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
    sales;

Example output:

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1SIMPLEsalesindexNULLsale4NULL500Using index

If you see Using index, it indicates that the index is being used appropriately.

2. Optimize indexes

Be sure to add indexes to columns used in ORDER BY and WHERE. Pay special attention to the following.

  • Single-column indexes: Good for simple sort conditions
  • Composite indexes: Effective when multiple columns are involved in conditions

Example:

CREATE INDEX idx_department_sale ON sales(department, sale DESC);

3. Use batch processing

Instead of processing a huge dataset all at once, you can reduce load by processing data in batches.

Example:

SELECT * FROM sales WHERE department = 'Sales Department' LIMIT 1000 OFFSET 0;
SELECT * FROM sales WHERE department = 'Sales Department' LIMIT 1000 OFFSET 1000;

6-3. Maintaining data consistency

1. Updates and recalculation

When rows are inserted or deleted, numbering can change. Build a mechanism to recalculate numbers as needed.

Example:

CREATE VIEW ranked_sales AS
SELECT
    employee,
    sale,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
FROM
    sales;

Using a view helps you keep rankings current based on the latest data.

6-4. Best-practice query example

Below is an example of best practices that consider performance and maintainability.

Example: Extract Top N rows

WITH RankedSales AS (
    SELECT
        employee,
        department,
        sale,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY sale DESC) AS rank
    FROM
        sales
)
SELECT *
FROM RankedSales
WHERE rank <= 3;

This structure uses a common table expression (CTE) to improve readability and reusability.

Summary

When using ROW_NUMBER() or its alternatives, keep these points in mind:

  1. Improve speed through index optimization.
  2. Identify bottlenecks by checking the execution plan.
  3. Plan for data updates and maintain consistency.
  4. Use batch processing and CTEs to distribute load.

Applying these best practices will enable efficient processing for large-scale data analysis and reporting.

7. Conclusion

In this article, we focused on MySQL’s ROW_NUMBER() function, explaining everything from basic usage and practical examples to alternatives for older versions, plus caveats and best practices. In this section, we’ll recap the key points and summarize practical takeaways.

7-1. Why ROW_NUMBER() is useful

The ROW_NUMBER() function is especially convenient for data analysis and reporting in the following ways:

  1. Sequential numbering within groups: Easily create sales rankings by department or category-based rankings.
  2. Extracting Top N rows: Efficiently filter and extract data based on specific conditions.
  3. Detecting and deleting duplicates: Useful for data cleanup and organization.

Because it simplifies complex queries, it significantly improves SQL readability and maintainability.

7-2. Comparison with other window functions

Compared to window functions such as RANK() and DENSE_RANK(), ROW_NUMBER() differs in that it assigns a unique number even for identical values.

FunctionFeatureUse case
ROW_NUMBER()Assigns a unique sequential number to each rowBest when you need unique identification or ranking with no duplicates
RANK()Same rank for ties; skips the next rank numberWhen you need tie-aware rankings and rank gaps matter
DENSE_RANK()Same rank for ties; does not skip rank numbersWhen you want continuous ranking while handling ties

Choosing the right function:
Selecting the best function for your purpose enables efficient data processing.

7-3. Handling older MySQL versions

For environments below MySQL 8.0, we also introduced approaches using user-defined variables. However, you should consider these caveats:

  • Reduced readability due to more complex SQL
  • Query optimization can be more difficult in some cases
  • Additional handling may be required to maintain data consistency

If possible, strongly consider migrating to MySQL 8.0 or later and using window functions.

7-4. Key points for performance optimization

  1. Use indexes: Add indexes to columns used in ORDER BY to improve speed.
  2. Check execution plans: Validate performance in advance with EXPLAIN.
  3. Adopt batch processing: Process large datasets in smaller chunks to distribute load.
  4. Use views and CTEs: Improve reusability and simplify complex queries.

By applying these techniques, you can achieve efficient and stable data processing.

7-5. Final notes

ROW_NUMBER() is a powerful tool that can significantly improve the efficiency of data analysis.
In this article, we covered everything from basic syntax and practical examples to caveats and alternatives.

We encourage you to run the queries yourself while following along with this article. Improving your SQL skills will help you tackle more complex data analysis and reporting with confidence.

Appendix: Reference resources