- 1 1. Introduction
- 2 2. Regular Expression Basics and Syntax in MySQL
- 3 Summary
- 4 3. Regular Expression Functions Added in MySQL 8.0 and Later
- 5 4. Practical Examples of MySQL Regular Expressions
- 6 Summary
- 7 5. Important Considerations When Using Regular Expressions (Performance and Security)
- 8 Summary
- 9 6. FAQ (Frequently Asked Questions)
- 9.1 6.1 What regex patterns can I use in MySQL?
- 9.2 6.2 What is the difference between LIKE and REGEXP?
- 9.3 6.3 What are the differences in regex features between MySQL 5.x and MySQL 8.x?
- 9.4 6.4 What should I do if a regex does not work as expected?
- 9.5 6.5 How can I improve the performance of REGEXP queries?
- 10 Summary
- 11 7. Summary
1. Introduction
Using Regular Expressions in MySQL
MySQL is a powerful database management system for searching and manipulating data, and it also supports regular expressions (REGEXP) to enable more flexible searching. With regular expressions, you can efficiently extract data that matches specific patterns, as well as format or validate data.
Benefits of Using Regular Expressions in MySQL
Using MySQL regular expressions provides benefits such as:
- Advanced searching: Easily extract data that contains specific strings or patterns.
- Check data integrity: Useful for input validation (e.g., validating email address formats).
- More powerful than the LIKE operator: Compared to LIKE, which uses wildcards (
%and_), REGEXP allows you to define complex search conditions more flexibly.
Purpose and Structure of This Article
This article explains everything from the basics of regular expressions in MySQL to advanced usage, performance optimization, and the new regular expression functions introduced in MySQL 8.0 and later. With practical examples, the content is designed to be easy to understand even for beginners.
Article structure:
- MySQL regular expression basics and syntax
- Regular expression functions added in MySQL 8.0 and later
- Practical examples (validation, data extraction, format conversion)
- Important considerations when using regular expressions (performance, security)
- FAQ (Frequently Asked Questions)
- Summary
2. Regular Expression Basics and Syntax in MySQL
How to Use Regular Expressions in MySQL
In MySQL, you use the REGEXP operator (or RLIKE) to work with regular expressions. Similar to the LIKE operator, it is used to determine whether a column value matches a specified regex pattern.
Basic syntax
SELECT * FROM table_name WHERE column_name REGEXP 'regex_pattern';Or
SELECT * FROM table_name WHERE column_name RLIKE 'regex_pattern';RLIKE is an alias for REGEXP, and both behave the same.
Example
For example, to search for records where the name contains the character “山”:
SELECT * FROM users WHERE name REGEXP '山';Basic Regular Expression Patterns
In MySQL regular expressions, you can use patterns like the following.
| Symbol | Meaning | Example | Result |
|---|---|---|---|
. | Any single character | a.c | Matches “abc”, “aac”, “adc” |
^ | Start of the string | ^abc | Matches “abcde” but not “dabc” |
$ | End of the string | xyz$ | Matches “axyz” but not “xyzb” |
[] | Any one of the specified characters | [abc] | Matches “a”, “b”, or “c” |
[^] | Any character not in the specified set | [^abc] | Matches any character except “a”, “b”, or “c” |
* | Repeat the preceding character 0 or more times | a* | Matches “”, “a”, “aa”, “aaa”, etc. |
+ | Repeat the preceding character 1 or more times | a+ | Matches “a”, “aa”, “aaa”, etc. (does not match the empty string) |
{n} | Repeat the preceding character exactly n times | a{3} | Matches “aaa” |
{n,} | Repeat the preceding character at least n times | a{2,} | Matches “aa”, “aaa”, “aaaa”, etc. |
{n,m} | Repeat the preceding character between n and m times | a{2,4} | Matches “aa”, “aaa”, “aaaa” |
Differences from the LIKE Operator
MySQL also provides the LIKE operator, which is commonly used for simple pattern matching. However, REGEXP is more powerful than LIKE and supports searching with more complex conditions.
Example 1: Using the LIKE operator
SELECT * FROM users WHERE name LIKE '%山%';- With
LIKE, you can search for data that contains山, but you can only use simple wildcards (%and_).
Example 2: Using the REGEXP operator
SELECT * FROM users WHERE name REGEXP '^山';- With
REGEXP, you can extract only the data where the name starts with 山.
How to Specify Multiple Patterns
You can use | to specify multiple patterns with an OR condition.
Example: Get users whose last name is “佐藤” or “田中”
SELECT * FROM users WHERE name REGEXP '佐藤|田中';Negative Matching
If you use ^ inside square brackets, you can retrieve data that “does not contain the specified characters.”
Example: Data that starts with a character other than “山”
SELECT * FROM users WHERE name REGEXP '^[^山]';In this case, it returns records where the name does not start with 山.
Summary
- In MySQL,
REGEXPenables more powerful pattern searching thanLIKE. - Understanding basic syntax and patterns (
.,^,$,[],*,+, etc.) enables flexible searching. - It is important to understand the differences from
LIKEand use each appropriately. - Using
|allows searching with multiple patterns. - Using negative character classes
[^ ]helps you search for data that does not start with specific characters.
3. Regular Expression Functions Added in MySQL 8.0 and Later
In MySQL 8.0, in addition to the traditional REGEXP operator, four new functions were added to enable more flexible regular expression searching. These make it possible to retrieve match positions, extract substrings, and perform replacements, greatly expanding what you can do with regex in MySQL.
In this section, we explain each new regular expression function in detail and show concrete examples.
3.1 REGEXP_LIKE()
Overview
REGEXP_LIKE(), like the REGEXP operator, checks whether a column value matches a specified regex pattern.
Syntax
REGEXP_LIKE(column_name, 'regex_pattern' [, flags])column_name: the column to search'regex_pattern': the regex pattern used as the conditionflags (optional): case sensitivity (useifor case-insensitive matching)
Examples
“Get users whose email ends with gmail.com or yahoo.co.jp”
SELECT * FROM users WHERE REGEXP_LIKE(email, '(@gmail\.com|@yahoo\.co\.jp)$');“Search for usernames containing ‘admin’ (case-insensitive)”
SELECT * FROM users WHERE REGEXP_LIKE(username, 'admin', 'i');✅ REGEXP_LIKE() serves the same purpose as the REGEXP operator, but it allows a more standard-SQL-friendly style.
3.2 REGEXP_INSTR()
Overview
REGEXP_INSTR() returns the position (start index) where a regex match occurs within a string.
Syntax
REGEXP_INSTR(column_name, 'regex_pattern' [, start_position, occurrence, return_option, flags])start_position(optional): where to start searching (default: 1)occurrence(optional): which match occurrence to return (default: 1)return_option(optional): 0 (return start position) or 1 (return end position of the match)flags(optional): useifor case-insensitive matching
Examples
“If the phone number starts with 090 or 080, get the start position”
SELECT phone, REGEXP_INSTR(phone, '^(090|080)') AS match_pos FROM users;“Get the start position of the domain part in an email address”
SELECT email, REGEXP_INSTR(email, '@') AS domain_start FROM users;- It returns the position where
@is found (e.g.,user@example.comreturns5).
✅ With REGEXP_INSTR(), you can retrieve positional information, expanding your options for string processing.
3.3 REGEXP_SUBSTR()
Overview
REGEXP_SUBSTR() extracts the substring that matches a regex pattern from within a string.
Syntax
REGEXP_SUBSTR(column_name, 'regex_pattern' [, start_position, occurrence, flags])occurrence(optional): get the nth match (default: 1)flags(optional): case sensitivity (ito ignore case)
Examples
“Extract only the domain part from an email address”
SELECT email, REGEXP_SUBSTR(email, '@[a-zA-Z0-9.-]+') AS domain FROM users;“Get the first number sequence in a message”
SELECT message, REGEXP_SUBSTR(message, '[0-9]+') AS first_number FROM logs;✅ With REGEXP_SUBSTR(), you can extract specific patterns and use them for organizing and transforming data.
3.4 REGEXP_REPLACE()
Overview
REGEXP_REPLACE() replaces substrings that match a regex pattern with another string.
Syntax
REGEXP_REPLACE(column_name, 'regex_pattern', 'replacement' [, occurrence, flags])replacement: the string to replace the matched part withoccurrence(optional): replace only the nth match (if omitted, replaces all matches)flags(optional): useifor case-insensitive matching
Examples
“Remove hyphens (-) from phone numbers”
SELECT phone, REGEXP_REPLACE(phone, '-', '') AS clean_phone FROM users;“Remove HTML tags”
SELECT comment, REGEXP_REPLACE(comment, '<[^>]+>', '') AS clean_text FROM reviews;✅ With REGEXP_REPLACE(), format conversion and data cleaning become easy.
3.5 Summary
| Function | Purpose | Example |
|---|---|---|
REGEXP_LIKE() | Check whether a value matches a regex pattern | SELECT * FROM users WHERE REGEXP_LIKE(email, '@gmail\.com$'); |
REGEXP_INSTR() | Get the start position of the matched substring | SELECT REGEXP_INSTR(email, '@') FROM users; |
REGEXP_SUBSTR() | Extract the matched substring | SELECT REGEXP_SUBSTR(email, '@[a-zA-Z0-9.-]+') FROM users; |
REGEXP_REPLACE() | Replace the matched substring | SELECT REGEXP_REPLACE(phone, '-', '') FROM users; |
4. Practical Examples of MySQL Regular Expressions
By leveraging MySQL regular expressions, you can streamline real-world data processing tasks such as data validation, extracting specific data, and format conversion. In this section, we introduce practical use cases with SQL code and explanations.
4.1 Email Address Validation
Overview
Use regular expressions to check whether email addresses stored in the database follow a valid format.
SQL to Use
SELECT email FROM users
WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';Explanation
^[a-zA-Z0-9._%+-]+→ one or more alphanumeric characters, dots, underscores, or plus signs before@@[a-zA-Z0-9.-]+→ a domain name after@\.[a-zA-Z]{2,}$→ ends with a TLD (Top-Level Domain) of at least 2 letters (e.g.,.com,.jp,.net)
With this query, you can exclude invalid email addresses (e.g., user@@example.com, user@.com).
4.2 Checking Japanese Phone Number Formats
Overview
Determine whether common Japanese phone numbers (e.g., 090-1234-5678, 03-1234-5678) follow the correct format.
SQL to Use
SELECT phone FROM users
WHERE phone REGEXP '^(0[789]0-[0-9]{4}-[0-9]{4}|0[1-9]-[0-9]{4}-[0-9]{4})$';Explanation
0[789]0-[0-9]{4}-[0-9]{4}→ mobile numbers (090-xxxx-xxxx,080-xxxx-xxxx, etc.)0[1-9]-[0-9]{4}-[0-9]{4}→ landlines (03-xxxx-xxxx,06-xxxx-xxxx, etc.)
In this way, regular expressions are useful when format consistency is required.
4.3 Credit Card Number Format Validation
Overview
Validate the formats of credit card numbers (Visa, Mastercard, American Express, etc.).
SQL to Use
SELECT card_number FROM payments
WHERE card_number REGEXP '^(4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})$';Explanation
^4[0-9]{12}(?:[0-9]{3})?$→ Visa (16 digits or 13 digits)^5[1-5][0-9]{14}$→ Mastercard (16 digits)^3[47][0-9]{13}$→ American Express (15 digits)
With this query, you can filter out invalid card numbers in the database (e.g., incorrect length or invalid starting digits).
4.4 Removing HTML Tags
Overview
If user-submitted data contains HTML tags, you can remove them and convert the content to plain text.
SQL to Use
SELECT REGEXP_REPLACE(comment, '<[^>]+>', '') AS clean_comment FROM reviews;Explanation
'<[^>]+>'→ removes all HTML tags between<and>
Example
| Input | Output |
|---|---|
<b>Hello</b> World! | Hello World! |
<p>これはサンプルです</p> | これはサンプルです |
This is useful for stripping HTML tags from user comments and blog posts.
4.5 Postal Code Format Validation (Japan)
Overview
Validate whether Japanese postal codes (e.g., 123-4567) follow the correct format.
SQL to Use
SELECT postal_code FROM addresses
WHERE postal_code REGEXP '^[0-9]{3}-[0-9]{4}$';Explanation
^[0-9]{3}-[0-9]{4}$→ a postal code in the “3 digits-4 digits” format
Using this regex helps maintain data consistency and prevent incorrect formats.
4.6 Username Filtering (Detecting Disallowed Words)
Overview
Check whether usernames contain disallowed words reserved for system use (e.g., admin, root, system) during registration.
SQL to Use
SELECT username FROM users
WHERE username REGEXP 'admin|root|system';Explanation
admin|root|system→ detects usernames containing these terms
This query helps prevent regular users from using names intended for system administrators.
Summary
- Regular expressions can be used for validating email addresses, phone numbers, and credit card numbers.
- They can also be applied to removing HTML tags and converting data formats.
- They are effective for improving database safety and consistency through tasks like filtering disallowed words and validating postal codes.

5. Important Considerations When Using Regular Expressions (Performance and Security)
Using regular expressions in MySQL enables powerful searching and data cleansing. However, improper usage can cause performance degradation and introduce security risks. This section explains key considerations for using MySQL regular expressions safely and efficiently.
5.1 Performance Optimization for Regular Expressions
MySQL regex searching is convenient, but it has a drawback: indexes generally do not apply, which can make queries slower.
Measures to Improve Performance
- Combine
LIKEorFULLTEXTindexes
SELECT * FROM users WHERE email LIKE '%gmail.com';- Use simpler regular expressions
SELECT * FROM users WHERE name REGEXP '^admin|admin$';- Filter the target data first
SELECT * FROM users WHERE email LIKE 'a%' AND email REGEXP '^[a-zA-Z0-9._%+-]+@gmail\.com$';5.2 Security Risks and Countermeasures
1. Regular Expression DoS (ReDoS)
- Avoid unbounded matches such as
.*. - Use
^and$to limit the search scope. - Do not pass user input directly into
REGEXP.
2. SQL Injection and Regular Expressions
❌ Unsafe query
SELECT * FROM users WHERE username REGEXP '$input';✅ Safe query (use placeholders)
$stmt = $pdo->prepare("SELECT * FROM users WHERE username REGEXP ?");
$stmt->execute([$sanitized_input]);5.3 Version Compatibility
| Version | Main Features |
|---|---|
| MySQL 5.x | Only the REGEXP operator is available |
| MySQL 8.x | Adds REGEXP_LIKE(), REGEXP_INSTR(), REGEXP_SUBSTR(), and REGEXP_REPLACE() |
Summary
- Because
REGEXPdoes not use indexes, performance measures are important. - To prevent ReDoS (Regular Expression DoS), avoid excessive use of wildcards.
- To prevent SQL injection, do not apply user input directly to
REGEXP.
6. FAQ (Frequently Asked Questions)
Here are common questions about using MySQL regular expressions (REGEXP). Resolving these issues will help you use regex more effectively.
6.1 What regex patterns can I use in MySQL?
| Symbol | Description | Example | Result |
|---|---|---|---|
. | Any single character | a.c | "abc", "aac", "adc" |
^ | Start of the string | ^abc | "abcde" (does not match "dabc") |
$ | End of the string | xyz$ | "axyz" (does not match "xyzb") |
[] | Any of the specified characters | [abc] | "a", "b", "c" |
[^] | Any character not in the specified set | [^abc] | Any character except "a", "b", "c" |
* | Repeat the preceding character 0 or more times | a* | "", "a", "aa", "aaa" |
+ | Repeat the preceding character 1 or more times | a+ | "a", "aa", "aaa" (does not match "") |
6.2 What is the difference between LIKE and REGEXP?
| Comparison | LIKE | REGEXP |
|---|---|---|
| Functionality | Simple pattern matching | Searching with complex regular expressions |
| Wildcards | % (any string), _ (any single character) | .* (any string), ^, $, [a-z], etc. |
| Search speed | Fast because indexes can apply | Often slower due to full table scans |
| Use cases | Simple searches (contains, prefix matching, etc.) | Complex searches based on specific patterns |
✅ Because LIKE is faster, prefer LIKE for simple searches and use REGEXP when complex matching is required.
6.3 What are the differences in regex features between MySQL 5.x and MySQL 8.x?
| Version | Main Features |
|---|---|
| MySQL 5.x | Only the REGEXP operator is available |
| MySQL 8.x | Adds REGEXP_LIKE(), REGEXP_INSTR(), REGEXP_SUBSTR(), and REGEXP_REPLACE() |
6.4 What should I do if a regex does not work as expected?
✅ Checklist
- Check whether escaping is correct
SELECT * FROM users WHERE email REGEXP '\.com$';- Try the
iflag withREGEXP_LIKE()
SELECT * FROM users WHERE REGEXP_LIKE(username, 'admin', 'i');- Check the data encoding
SHOW VARIABLES LIKE 'character_set_database';6.5 How can I improve the performance of REGEXP queries?
- Pre-filter with
LIKE
SELECT * FROM users
WHERE email LIKE '%gmail.com'
AND email REGEXP '^[a-zA-Z0-9._%+-]+@gmail\.com$';- Use indexes where appropriate
ALTER TABLE users ADD FULLTEXT(email);Summary
REGEXPis more powerful than LIKE, but you must be mindful of performance.- In MySQL 8.0, new regex functions (such as
REGEXP_LIKE()) were added, enabling more flexible processing. - To improve performance: combine with LIKE, use indexes, and design simple regex patterns.
7. Summary
MySQL regular expressions (REGEXP) are highly useful for searching data, validation, and format conversion. This article covered MySQL regex basics, new features added in MySQL 8.0 and later, practical examples, important considerations, and FAQs.
7.1 Key Takeaways
Here is a quick recap of the most important points.
1. MySQL Regular Expression Basics
- Using the
REGEXPoperator enables more flexible pattern matching thanLIKE. - Common regex patterns
^(start),$(end),.(any single character),[](character class),+(one or more repetitions), etc.
2. New Regular Expression Functions in MySQL 8.0+
In MySQL 8.0, these four functions were added, enabling more flexible processing:
REGEXP_LIKE(): an alternative to theREGEXPoperatorREGEXP_INSTR(): get the start position of the matching substringREGEXP_SUBSTR(): extract the matching substringREGEXP_REPLACE(): replace substrings using regex
3. Practical Examples
- Email address validation
- Phone number and credit card number checks
- Removing HTML tags
- Validating postal code formats
- Detecting disallowed words
- Data cleaning (trimming whitespace, removing commas, etc.)
4. Important Considerations When Using Regular Expressions
- Performance optimization
- Because
REGEXPdoes not use indexes, it can result in full table scans. - You can improve performance by combining
LIKEorFULLTEXTindexes. - Security measures
- To prevent ReDoS (Regular Expression DoS), avoid excessive use of
.*. - To prevent SQL injection, use
placeholders.
5. FAQ (Frequently Asked Questions)
- Regex patterns available in MySQL
- Differences between
LIKEandREGEXP - Feature differences between MySQL 5.x and 8.x
- Troubleshooting when regex does not behave as expected
- Ways to improve performance
7.2 Best Practices for Using Regular Expressions
- Keep regex patterns simple
- Example: Avoid overusing
.*and make use of^(start) and$(end).
- Combine with
LIKEorFULLTEXTwhere appropriate
- Pre-filter with
LIKE, then applyREGEXPto reduce query cost.
- Use
REGEXP_REPLACE()for data cleaning
- Example: Remove HTML tags and normalize unnecessary spaces.
- Take advantage of MySQL 8.0 and later
- Using new functions such as
REGEXP_LIKE()can make SQL more readable.
- Apply security measures thoroughly
- Do not apply user input directly to
REGEXP(SQL injection prevention). - Use
placeholdersfor dynamic queries.
7.3 Resources for Further Learning
Here are references to help you learn MySQL regular expressions more deeply.
Official Documentation
7.4 Final Notes
MySQL regular expressions can be used in many situations, including data searching, cleaning, and validation. However, it is important to use them appropriately while understanding the performance and security considerations.
We hope this article helps you when working with MySQL regular expressions. Try applying these techniques in your projects!


