MySQL REGEXP Explained: Complete Guide with Examples (MySQL 8.0 Functions Included)

目次

1. Introduction

What Is MySQL Regular Expression (REGEXP)?

MySQL regular expressions are powerful tools for flexibly searching and manipulating strings within a database. They enable advanced pattern matching that is difficult to achieve with standard string searches, allowing you to extract data that matches specific formats or conditions.

For example, you can easily extract data such as “names that start with a specific character” or “codes containing only numeric values.” This functionality is particularly useful for data cleansing and handling complex search conditions.

Benefits of Using Regular Expressions in MySQL

  1. Support for Complex Search Conditions
  • You can specify complex string patterns that cannot be handled by the standard LIKE operator.
  1. Batch Extraction and Replacement of Data
  • For example, you can extract only data that matches a specific format or replace part of a string.
  1. Enhanced Features in MySQL 8.0 and Later
  • New functions such as REGEXP_LIKE and REGEXP_SUBSTR have been added, enabling more flexible operations.

Purpose of This Article

This article explains everything from the basic usage of MySQL regular expressions (REGEXP) to advanced examples and important considerations. Whether you are a beginner or a semi-professional, you will find practical insights useful for real-world applications.

In the next section, we will explain the fundamentals of regular expressions in MySQL.

2. Basics of Regular Expressions in MySQL

What Is the REGEXP Operator?

In MySQL, the REGEXP operator is used to work with regular expressions. This operator determines whether a value matches a specified pattern. Additionally, RLIKE functions as an alias for REGEXP.

The following example checks whether a string matches the pattern “abc”.

SELECT * FROM users WHERE name REGEXP 'abc';

Basic Syntax of the REGEXP Operator

The basic syntax for searching with regular expressions is as follows:

SELECT * FROM table_name WHERE column_name REGEXP 'pattern';

Common REGEXP Pattern List

SymbolDescriptionExample
^Matches the beginning of a line^abc → Strings that start with “abc”
$Matches the end of a lineabc$ → Strings that end with “abc”
.Matches any single charactera.c → Matches “abc”, “adc”, etc.
|OR (matches either pattern)abc|xyz → Matches “abc” or “xyz”
[]Matches any one of the specified characters[abc] → Matches “a”, “b”, or “c”
*Matches zero or more repetitionsab*c → Matches “ac”, “abc”, “abbc”, etc.

Difference Between REGEXP and LIKE

FeatureLIKEREGEXP
FlexibilitySupports only wildcards (% and _)Supports advanced pattern matching
PerformanceFastMay be slightly slower for complex patterns

Practical Examples: Searching with REGEXP

Example 1: Search for Valid Email Format

SELECT * FROM users WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';

Example 2: Search for Fields Containing Only Numbers

SELECT * FROM orders WHERE order_id REGEXP '^[0-9]+$';

Summary

In this section, we covered the basic usage and patterns of the REGEXP operator in MySQL. This enables everything from simple searches to advanced pattern matching for flexible data operations.

3. Regular Expression Functions Added in MySQL 8.0

REGEXP_LIKE() – Match Check Using a Regular Expression

REGEXP_LIKE(string, pattern [, flags])

Example:

SELECT REGEXP_LIKE('abcdef', 'abc');

Result: 1 (matches)

REGEXP_INSTR() – Find the Match Position

REGEXP_INSTR(string, pattern [, start_position, occurrence, flags, return_type])

Example:

SELECT REGEXP_INSTR('abcdef', 'cd');

Result: 3

REGEXP_SUBSTR() – Extract the Matching Substring

REGEXP_SUBSTR(string, pattern [, start_position, occurrence, flags])

Example:

SELECT REGEXP_SUBSTR('abc123def', '[0-9]+');

Result: 123

REGEXP_REPLACE() – Replace Using a Regular Expression

REGEXP_REPLACE(string, pattern, replacement [, start_position, occurrence, flags])

Example:

SELECT REGEXP_REPLACE('Item123Price456', '[0-9]', '#');

Result: Item###Price###

Summary

The regular expression functions added in MySQL 8.0 enable detailed and flexible string operations. By using them effectively, you can extract and transform data more efficiently.

4. Practical Use Cases for Regular Expressions

Search for Data That Matches Specific Patterns

Example 1: Detect Email Address Format

SELECT * FROM users WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';

Example 2: Detect Phone Number Format

SELECT * FROM contacts WHERE phone REGEXP '^[0-9]{3}-[0-9]{4}-[0-9]{4}$';

Extract Substrings

Example 1: Extract the Numeric Part

SELECT REGEXP_SUBSTR('abc123def456', '[0-9]+');

Result: 123

Replace Data

Example 1: Replace Numbers with “#”

SELECT REGEXP_REPLACE('Item123Price456', '[0-9]', '#');

Result: Item###Price###

Data Validation and Cleaning

Example 1: Detect Invalid Email Addresses

SELECT * FROM users WHERE email NOT REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';

Summary

With these examples, you can efficiently handle tasks such as searching, extracting, replacing, and validating data.

5. Important Considerations and Best Practices

Handling Multibyte Characters (Full-Width Characters)

Since MySQL regular expressions are evaluated on a byte basis by default, special care is required when handling multibyte characters such as Japanese characters.

Solution:

ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Impact on Performance

Issue: Because regular expressions involve complex processing, performance may degrade when searching large datasets.
Solution:

SELECT * FROM users WHERE email LIKE '%@example.com' AND email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';

Protection Against ReDoS (Regular Expression Denial of Service)

Issue: Malicious patterns may cause excessive system load.
Solution:

  1. Use simple patterns whenever possible.
  2. Strengthen input validation.
  3. Monitor query execution time.

Check Version Compatibility

The newer regular expression functions are not available in MySQL versions earlier than 8.0. Always verify your environment version before implementation.

Test in a Staging Environment

Test query behavior and performance in advance, including how edge cases and invalid values are handled.

Summary

Keep these best practices in mind to use regular expressions safely and efficiently while considering both performance and security.

6. Conclusion

Key Takeaways

  1. By learning basic operations and regular expression patterns, you can handle everything from simple searches to complex data extraction.
  2. The regular expression functions introduced in MySQL 8.0 allow for even more flexible operations.
  3. Using practical examples improves efficiency in real-world data manipulation.
  4. Applying best practices ensures safe and high-performance queries.

Benefits of Using MySQL Regular Expressions

  • Advanced Search Capabilities: Easily define conditions that are difficult with simple string searches.
  • Improved Data Processing Efficiency: Extraction, replacement, and validation can be completed directly within SQL.
  • Wide Range of Applications: Useful for everything from data cleansing to log analysis.

How to Continue Learning and Applying

  1. Practice queries using real datasets to deepen your understanding.
  2. Actively leverage the latest version features to optimize performance.
  3. Regularly review your queries to maintain security and speed.

Final Thoughts

By mastering MySQL regular expressions, you can improve operational efficiency and enhance your data analysis capabilities.

7. Frequently Asked Questions (FAQ)

Q1. What is the difference between REGEXP and LIKE in MySQL?

A. REGEXP supports advanced pattern matching, while LIKE is mainly used for partial string matching.

SELECT * FROM users WHERE email LIKE '%example.com';
SELECT * FROM users WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';

Q2. How can I improve performance?

A.

  1. Apply filter conditions in advance.
  2. Use indexes effectively.
  3. Keep queries simple and optimized.

Q3. How do I handle multibyte characters?

A. Configure UTF-8 support.

ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Q4. Can you show an example of replacement using regular expressions?

A. Replace numbers with “#”.

SELECT REGEXP_REPLACE('Item123Price456', '[0-9]', '#');

Q5. What is a query to convert a date format?

A. Change “YYYY/MM/DD” to “YYYY-MM-DD”.

SELECT REGEXP_REPLACE('2023/12/20', '/', '-');

Q6. How can I specify multiple conditions using REGEXP?

A. Use the pipe symbol (|).

SELECT * FROM products WHERE name REGEXP 'phone|tablet';

Summary

The FAQ section addressed common questions and provided practical query examples.