MySQL WITH Clause (CTE) Explained: Syntax, Examples, Recursive Queries & Performance Tips

目次

1. Introduction

MySQL is a database management system widely used by developers and database administrators, providing powerful and flexible SQL functionality. Among its features, the WITH clause (Common Table Expression: CTE), introduced in MySQL 8.0, is a powerful tool that makes SQL queries more readable and improves maintainability.

This article explains the WITH clause from the basics to advanced usage, targeting beginners to intermediate users. In particular, we cover practical topics such as replacing subqueries and implementing recursive queries.

For those learning SQL or struggling to optimize queries in daily work, this article aims to provide concrete solutions. Follow the content below to understand the fundamentals of the WITH clause and apply it in real-world scenarios.

2. Basics of the WITH Clause (Common Table Expressions)

What Is the WITH Clause?

The WITH clause is a syntax used to define a temporary result set (Common Table Expression, CTE) within an SQL query and reference it in subsequent queries. Supported from MySQL 8.0 onward, it allows complex subqueries to be rewritten in a clearer and more concise form.

For example, when writing subqueries directly, readability may decrease and the overall query can become lengthy. By using the WITH clause, you can divide queries into logical blocks, making them easier to understand.

Basic Syntax of the WITH Clause

Below is the basic syntax of the WITH clause:

WITH table_name AS (
  SELECT column1, column2
  FROM original_table
  WHERE condition
)
SELECT column1, column2
FROM table_name;

In this syntax, a virtual table (Common Table Expression) is defined after WITH, and it is then used in the main query. This makes repeatedly used subqueries easier to express concisely.

Differences from Subqueries and Views

The WITH clause creates a temporarily available result set and differs from subqueries and views in several aspects.

FeatureWITH ClauseSubqueryView
ScopeValid only within the queryUsable only where definedReusable across the entire database
PersistenceTemporaryTemporaryPermanent
PurposeSimplifies complex queriesTemporary data extractionFrequently reused data extraction

The WITH clause is more readable than subqueries and is ideal when you do not need to create a permanent object like a view.

Benefits of Using the WITH Clause

  1. Improved Query Readability
    Even when multiple subqueries exist, organizing them with the WITH clause clarifies the structure.
  2. Improved Reusability
    By defining a temporary result set, it can be referenced multiple times within the query.
  3. Improved Maintainability
    Since queries can be logically divided, modifications and extensions become easier.

3. Basic Usage of the MySQL WITH Clause

Replacing Subqueries

The WITH clause is a powerful tool for simplifying complex subqueries. Embedding subqueries directly can make the entire query complicated and difficult to read, but using the WITH clause improves readability.

Below is a basic example of replacing a subquery using the WITH clause.

Using a Subquery:

SELECT AVG(sales.total) AS average_sales
FROM (
  SELECT SUM(amount) AS total
  FROM orders
  GROUP BY customer_id
) AS sales;

Using the WITH Clause:

WITH sales AS (
  SELECT SUM(amount) AS total
  FROM orders
  GROUP BY customer_id
)
SELECT AVG(sales.total) AS average_sales
FROM sales;

In this example, a temporary result set named sales is defined using the WITH clause and then referenced in the main query. This makes the entire query easier to understand and better organized.

Defining Multiple Common Table Expressions (CTEs)

The WITH clause allows defining multiple CTEs. This enables further modularization of complex queries.

Example:

WITH 
  sales_per_customer AS (
    SELECT customer_id, SUM(amount) AS total_sales
    FROM orders
    GROUP BY customer_id
  ),
  high_value_customers AS (
    SELECT customer_id
    FROM sales_per_customer
    WHERE total_sales > 10000
  )
SELECT customer_id
FROM high_value_customers;

In this example, sales_per_customer calculates the total sales per customer, and based on that result, high_value_customers extracts customers with high purchase amounts. By using multiple CTEs sequentially, queries can be constructed step by step.

Using Nested CTEs

By using nested CTEs, you can perform more complex data operations.

Example:

WITH 
  sales_data AS (
    SELECT product_id, SUM(amount) AS total_sales
    FROM orders
    GROUP BY product_id
  ),
  ranked_sales AS (
    SELECT product_id, total_sales,
           RANK() OVER (ORDER BY total_sales DESC) AS rank
    FROM sales_data
  )
SELECT product_id, total_sales
FROM ranked_sales
WHERE rank <= 5;

In this query, sales_data aggregates sales per product, and ranked_sales assigns rankings based on sales volume. Finally, the top five products are extracted.

Key Points for Practical Usage

  1. Think in Logical Steps
    Build CTEs step by step to improve readability and make debugging easier.
  2. Store Intermediate Calculation Results
    Group calculation results or filtering conditions used multiple times into CTEs to reduce code duplication.
  3. Be Careful with Large Datasets
    Since CTEs generate temporary result sets, consider performance impact when working with large amounts of data.

4. Practical Examples of Recursive WITH Clauses

What Is a Recursive WITH Clause?

A recursive WITH clause (recursive CTE) is a method that uses a Common Table Expression to repeatedly execute a self-referencing query, enabling you to process hierarchical data and iterative calculations. Recursive CTEs are supported in MySQL 8.0 and later and are especially useful when working with parent-child relationships and hierarchical structures.

Basic Syntax of a Recursive CTE

To define a recursive CTE, use the WITH RECURSIVE keyword. The basic syntax is as follows:

WITH RECURSIVE recursive_table_name AS (
  initial_query -- starting point of the recursion
  UNION ALL
  recursive_query -- query called recursively
)
SELECT * FROM recursive_table_name;
  • Initial query: Retrieves the first dataset for the recursive process.
  • Recursive query: Generates new rows based on the initial query or the previous iteration’s results.
  • UNION ALL: Combines the results of the initial query and the recursive query.

Example: Processing Hierarchical Data

Recursive CTEs are often used to expand data with a hierarchical structure (e.g., organization trees or category trees).

Example: Expanding an employee management hierarchy

Consider the following employees table:

employee_idnamemanager_id
1AliceNULL
2Bob1
3Charlie1
4David2

Using this data, we can create a query that retrieves the full hierarchy starting from a given employee.

WITH RECURSIVE employee_hierarchy AS (
  -- Initial query: get top-level employees
  SELECT employee_id, name, manager_id, 1 AS level
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive query: get direct reports
  SELECT e.employee_id, e.name, e.manager_id, eh.level + 1
  FROM employees e
  INNER JOIN employee_hierarchy eh
  ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy;

Result:

employee_idnamemanager_idlevel
1AliceNULL1
2Bob12
3Charlie12
4David23

In this query, subordinates are searched recursively based on manager_id, expanding the full hierarchy.

Limitations and Cautions for Recursive CTEs

  1. An End Condition Is Required
    If the recursive query does not meet an end condition, an infinite loop may occur. Always include appropriate conditions to prevent infinite recursion.
  2. Performance Impact
    Recursive CTEs can involve many calculations over large datasets, which may increase execution time. Use LIMIT clauses and filtering conditions to improve efficiency.
  3. Recursion Depth Limits
    MySQL has a limit on recursion depth, so take care when running very deep recursive processing. This limit can be configured with the max_recursive_iterations parameter.

Scenarios Where Recursive CTEs Are Useful

  • Traversing folder structures: Recursively search folders and subfolders.
  • Building org charts: Visualize hierarchies from managers to subordinates.
  • Displaying category trees: Retrieve hierarchical product categories or tag structures.

Recursive CTEs are a powerful way to write concise SQL queries for these scenarios while improving readability.

5. Notes and Considerations When Using the WITH Clause

Performance Impact and Optimization

  1. CTE Recalculation
    In general, a CTE defined with the WITH clause is recalculated each time it is referenced. Therefore, using the same CTE multiple times can increase query execution time. Example:
   WITH sales AS (
     SELECT product_id, SUM(amount) AS total_sales
     FROM orders
     GROUP BY product_id
   )
   SELECT * FROM sales WHERE total_sales > 1000;
   SELECT COUNT(*) FROM sales;

In the above case, sales is referenced twice, so it is computed twice. To avoid this, when you need to reference results multiple times, storing the result in a temporary table can be effective.

Solution:

   CREATE TEMPORARY TABLE temp_sales AS
   SELECT product_id, SUM(amount) AS total_sales
   FROM orders
   GROUP BY product_id;

   SELECT * FROM temp_sales WHERE total_sales > 1000;
   SELECT COUNT(*) FROM temp_sales;
  1. Splitting Complex CTEs
    If the WITH clause becomes too deeply nested, the overall query can become complex and harder to debug. It is important to split logic appropriately so that processing within a single CTE does not become excessively complicated.

Using WITH Clauses on Large Datasets

The WITH clause generates temporary datasets during execution. When handling large amounts of data, this can place a burden on memory or storage.

Countermeasures:

  • Filter Data with WHERE Clauses
    Reduce computation by filtering out unnecessary data inside the CTE.
  WITH filtered_orders AS (
    SELECT *
    FROM orders
    WHERE order_date > '2023-01-01'
  )
  SELECT customer_id, SUM(amount)
  FROM filtered_orders
  GROUP BY customer_id;
  • Use LIMIT Clauses
    If the dataset is large, use LIMIT to extract only the data you need.

MySQL Version Compatibility

The WITH clause in MySQL is supported in MySQL 8.0 and later. Since earlier versions do not support the WITH clause, you need to consider alternatives.

Alternatives:

  • Use Subqueries
    Use subqueries directly instead of the WITH clause.
  SELECT AVG(total_sales)
  FROM (
    SELECT customer_id, SUM(amount) AS total_sales
    FROM orders
    GROUP BY customer_id
  ) AS sales;
  • Create Views
    If you need reusable queries, using a view can be effective.
  CREATE VIEW sales_view AS
  SELECT customer_id, SUM(amount) AS total_sales
  FROM orders
  GROUP BY customer_id;

  SELECT AVG(total_sales) FROM sales_view;

How to Use the WITH Clause Properly

  1. Prioritize Readability
    The purpose of the WITH clause is to organize queries and improve readability. Overusing it can make queries more complex, so use it only when necessary.
  2. Verify Performance
    Check the execution plan (the EXPLAIN command) and consider how to optimize performance.
   EXPLAIN
   WITH sales AS (
     SELECT product_id, SUM(amount) AS total_sales
     FROM orders
     GROUP BY product_id
   )
   SELECT * FROM sales WHERE total_sales > 1000;

6. Practical Use Cases in Real-World Scenarios

Aggregating Sales Data

Here is an example of aggregating sales data by month and then using that result to calculate the monthly average sales.

Example: Aggregating Monthly Sales and Calculating the Average

WITH monthly_sales AS (
  SELECT 
    DATE_FORMAT(order_date, '%Y-%m') AS sales_month,
    SUM(amount) AS total_sales
  FROM orders
  GROUP BY sales_month
)
SELECT 
  sales_month, 
  total_sales,
  AVG(total_sales) OVER () AS average_sales
FROM monthly_sales;

In this query, monthly_sales calculates sales per month, and based on that result, the overall average sales are computed. This allows data to be organized clearly and simplifies analysis.

Filtering Data Based on Specific Conditions

By separating complex filtering logic into a WITH clause, readability can be improved.

Example: Creating a List of High-Spending Customers

WITH customer_totals AS (
  SELECT 
    customer_id, 
    SUM(amount) AS total_spent
  FROM orders
  GROUP BY customer_id
)
SELECT 
  customer_id, 
  total_spent
FROM customer_totals
WHERE total_spent > 100000;

In this query, customer_totals calculates the total purchase amount per customer, and customers meeting the specified condition are extracted.

Analyzing Hierarchical Data

When analyzing hierarchical data such as organizational structures or categories, recursive WITH clauses are extremely useful.

Example: Retrieving a List of Direct and Indirect Subordinates

WITH RECURSIVE employee_hierarchy AS (
  SELECT 
    employee_id, 
    name, 
    manager_id, 
    1 AS level
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  SELECT 
    e.employee_id, 
    e.name, 
    e.manager_id, 
    eh.level + 1
  FROM employees e
  INNER JOIN employee_hierarchy eh
  ON e.manager_id = eh.employee_id
)
SELECT 
  employee_id, 
  name, 
  manager_id, 
  level
FROM employee_hierarchy
ORDER BY level, manager_id;

This query builds hierarchical data in employee_hierarchy and retrieves employees grouped by level. It enables dynamic generation of organization chart–like information.

Advanced Analysis Using Multiple CTEs

By leveraging multiple WITH clauses, data can be processed step by step, making complex analysis simpler.

Example: Extracting Top-Selling Products per Category

WITH category_sales AS (
  SELECT 
    category_id, 
    product_id, 
    SUM(amount) AS total_sales
  FROM orders
  GROUP BY category_id, product_id
),
ranked_sales AS (
  SELECT 
    category_id, 
    product_id, 
    total_sales,
    RANK() OVER (PARTITION BY category_id ORDER BY total_sales DESC) AS rank
  FROM category_sales
)
SELECT 
  category_id, 
  product_id, 
  total_sales
FROM ranked_sales
WHERE rank <= 3;

In this query, sales per category are calculated, and the top three products within each category are extracted. This approach is effective when narrowing down data based on specific conditions.

Key Points for Practical Application

  1. Design Queries in Logical Stages
    Use the WITH clause to divide queries and process data step by step while maintaining readability.
  2. Extract Only Necessary Data
    Use WHERE and LIMIT clauses to avoid processing unnecessary data and design efficient queries.
  3. Flexible Business Applications
    The WITH clause can be flexibly applied to sales analysis, customer segmentation, inventory management, and more.

7. FAQ (Frequently Asked Questions)

Q1: When should I use the WITH clause?

A1:
The WITH clause is especially effective in the following scenarios:

  • When you want to simplify complex subqueries.
  • When you need to reuse the same dataset multiple times within a query.
  • When you want to logically divide a query to improve readability.

For example, in queries that use the same aggregation results multiple times, the WITH clause allows more efficient organization.

Q2: When is a recursive CTE useful?

A2:
Recursive CTEs are useful when handling hierarchical structures or iterative calculations. Specifically:

  • Processing hierarchical data (e.g., organization trees, category structures).
  • Displaying folder or file hierarchies.
  • Sequential calculations of numbers or time periods (e.g., calculating the Fibonacci sequence).

Using recursive CTEs makes it easier to expand and process self-referencing data.

Q3: Are queries using the WITH clause more efficient than views?

A3:
It depends on the use case.

  • WITH clause: Creates a temporary result set used only within the query. Suitable for data that does not need frequent reuse.
  • View: Permanently stored in the database and reusable by other queries. Suitable for queries that are used repeatedly.

Choosing the appropriate method depending on the scenario is important.

Q4: What causes performance degradation when using the WITH clause?

A4:
The main causes of performance degradation when using the WITH clause include:

  • CTE recalculation: Results are recalculated each time they are referenced, increasing processing time.
  • Handling large datasets: Generating large datasets within the CTE increases memory usage and reduces performance.
  • Lack of proper indexing: If queries inside the CTE do not use appropriate indexes, performance may slow down.

Countermeasures:

  • Consider temporary tables or views if reuse frequency is high.
  • Use WHERE and LIMIT clauses to properly narrow down data.

Q5: What alternatives exist for MySQL versions that do not support the WITH clause?

A5:
In versions prior to MySQL 8.0, the WITH clause is not supported, so use the following alternatives:

  • Use Subqueries
    Use subqueries directly instead of the WITH clause.
  SELECT AVG(total_sales)
  FROM (
    SELECT customer_id, SUM(amount) AS total_sales
    FROM orders
    GROUP BY customer_id
  ) AS sales;
  • Use Temporary Tables
    Store reusable datasets in a temporary table.
  CREATE TEMPORARY TABLE temp_sales AS
  SELECT customer_id, SUM(amount) AS total_sales
  FROM orders
  GROUP BY customer_id;

  SELECT AVG(total_sales) FROM temp_sales;

Q6: What are the best practices when using the WITH clause?

A6:
Keep the following best practices in mind:

  • Prioritize simplicity: Do not force complex logic into a single WITH clause. Split it appropriately.
  • Verify performance: Check the execution plan using the EXPLAIN command and optimize as necessary.
  • Consider reusability: If reuse frequency is high, leverage views or temporary tables.

8. Conclusion

This article covered the WITH clause (Common Table Expression, CTE) introduced in MySQL 8.0, from fundamentals to advanced applications. The WITH clause is an extremely useful feature for making complex queries concise and readable. Below are the key takeaways.

Main Benefits of the WITH Clause

  1. Improved Query Readability
    Organizes complex subqueries, enhancing readability and maintainability of SQL code.
  2. Query Reusability
    Efficiently processes data when referencing the same dataset multiple times.
  3. Enables Recursive Data Operations
    Recursive CTEs simplify processing hierarchical data and iterative calculations.

Practical Usage Points

  • Useful for sales and customer data analysis, enabling step-by-step aggregation.
  • Recursive CTEs are effective for hierarchical data processing (such as org charts or category structures).
  • Combining WITH clauses with views or temporary tables allows flexible and efficient database operations.

Important Considerations

  • The WITH clause is powerful, but improper use may reduce performance.
  • Evaluate reusability and performance case by case, and choose between views or temporary tables when appropriate.
  • Always verify query efficiency using the execution plan (EXPLAIN command).

Next Steps

By using the WITH clause, you can create more efficient and maintainable SQL queries. Try applying it in your real-world projects by following these steps:

  1. Start with simple queries and practice structuring them using the WITH clause.
  2. Challenge yourself with recursive CTEs for hierarchical data and complex scenarios.
  3. Focus on performance optimization to further improve your SQL skills.

This concludes the article. Use your knowledge of the MySQL WITH clause in your daily work and studies.