MySQL Regular Expressions (REGEXP) Guide: Syntax, MySQL 8 Functions, Examples, and Best Practices

目次

1. Introduction

Using Regular Expressions in MySQL

MySQL is a powerful database management system for searching and manipulating data, and it also supports regular expressions (REGEXP) to enable more flexible searching. With regular expressions, you can efficiently extract data that matches specific patterns, as well as format or validate data.

Benefits of Using Regular Expressions in MySQL

Using MySQL regular expressions provides benefits such as:

  • Advanced searching: Easily extract data that contains specific strings or patterns.
  • Check data integrity: Useful for input validation (e.g., validating email address formats).
  • More powerful than the LIKE operator: Compared to LIKE, which uses wildcards (% and _), REGEXP allows you to define complex search conditions more flexibly.

Purpose and Structure of This Article

This article explains everything from the basics of regular expressions in MySQL to advanced usage, performance optimization, and the new regular expression functions introduced in MySQL 8.0 and later. With practical examples, the content is designed to be easy to understand even for beginners.

Article structure:

  1. MySQL regular expression basics and syntax
  2. Regular expression functions added in MySQL 8.0 and later
  3. Practical examples (validation, data extraction, format conversion)
  4. Important considerations when using regular expressions (performance, security)
  5. FAQ (Frequently Asked Questions)
  6. Summary

2. Regular Expression Basics and Syntax in MySQL

How to Use Regular Expressions in MySQL

In MySQL, you use the REGEXP operator (or RLIKE) to work with regular expressions. Similar to the LIKE operator, it is used to determine whether a column value matches a specified regex pattern.

Basic syntax

SELECT * FROM table_name WHERE column_name REGEXP 'regex_pattern';

Or

SELECT * FROM table_name WHERE column_name RLIKE 'regex_pattern';

RLIKE is an alias for REGEXP, and both behave the same.

Example
For example, to search for records where the name contains the character “山”:

SELECT * FROM users WHERE name REGEXP '山';

Basic Regular Expression Patterns

In MySQL regular expressions, you can use patterns like the following.

SymbolMeaningExampleResult
.Any single charactera.cMatches “abc”, “aac”, “adc”
^Start of the string^abcMatches “abcde” but not “dabc”
$End of the stringxyz$Matches “axyz” but not “xyzb”
[]Any one of the specified characters[abc]Matches “a”, “b”, or “c”
[^]Any character not in the specified set[^abc]Matches any character except “a”, “b”, or “c”
*Repeat the preceding character 0 or more timesa*Matches “”, “a”, “aa”, “aaa”, etc.
+Repeat the preceding character 1 or more timesa+Matches “a”, “aa”, “aaa”, etc. (does not match the empty string)
{n}Repeat the preceding character exactly n timesa{3}Matches “aaa”
{n,}Repeat the preceding character at least n timesa{2,}Matches “aa”, “aaa”, “aaaa”, etc.
{n,m}Repeat the preceding character between n and m timesa{2,4}Matches “aa”, “aaa”, “aaaa”

Differences from the LIKE Operator

MySQL also provides the LIKE operator, which is commonly used for simple pattern matching. However, REGEXP is more powerful than LIKE and supports searching with more complex conditions.

Example 1: Using the LIKE operator

SELECT * FROM users WHERE name LIKE '%山%';
  • With LIKE, you can search for data that contains , but you can only use simple wildcards (% and _).

Example 2: Using the REGEXP operator

SELECT * FROM users WHERE name REGEXP '^山';
  • With REGEXP, you can extract only the data where the name starts with 山.

How to Specify Multiple Patterns

You can use | to specify multiple patterns with an OR condition.

Example: Get users whose last name is “佐藤” or “田中”

SELECT * FROM users WHERE name REGEXP '佐藤|田中';

Negative Matching

If you use ^ inside square brackets, you can retrieve data that “does not contain the specified characters.”

Example: Data that starts with a character other than “山”

SELECT * FROM users WHERE name REGEXP '^[^山]';

In this case, it returns records where the name does not start with .

Summary

  • In MySQL, REGEXP enables more powerful pattern searching than LIKE.
  • Understanding basic syntax and patterns (., ^, $, [], *, +, etc.) enables flexible searching.
  • It is important to understand the differences from LIKE and use each appropriately.
  • Using | allows searching with multiple patterns.
  • Using negative character classes [^ ] helps you search for data that does not start with specific characters.

3. Regular Expression Functions Added in MySQL 8.0 and Later

In MySQL 8.0, in addition to the traditional REGEXP operator, four new functions were added to enable more flexible regular expression searching. These make it possible to retrieve match positions, extract substrings, and perform replacements, greatly expanding what you can do with regex in MySQL.

In this section, we explain each new regular expression function in detail and show concrete examples.

3.1 REGEXP_LIKE()

Overview

REGEXP_LIKE(), like the REGEXP operator, checks whether a column value matches a specified regex pattern.

Syntax

REGEXP_LIKE(column_name, 'regex_pattern' [, flags])
  • column_name: the column to search
  • 'regex_pattern': the regex pattern used as the condition
  • flags (optional): case sensitivity (use i for case-insensitive matching)

Examples

“Get users whose email ends with gmail.com or yahoo.co.jp

SELECT * FROM users WHERE REGEXP_LIKE(email, '(@gmail\.com|@yahoo\.co\.jp)$');

“Search for usernames containing ‘admin’ (case-insensitive)”

SELECT * FROM users WHERE REGEXP_LIKE(username, 'admin', 'i');

REGEXP_LIKE() serves the same purpose as the REGEXP operator, but it allows a more standard-SQL-friendly style.

3.2 REGEXP_INSTR()

Overview

REGEXP_INSTR() returns the position (start index) where a regex match occurs within a string.

Syntax

REGEXP_INSTR(column_name, 'regex_pattern' [, start_position, occurrence, return_option, flags])
  • start_position (optional): where to start searching (default: 1)
  • occurrence (optional): which match occurrence to return (default: 1)
  • return_option (optional): 0 (return start position) or 1 (return end position of the match)
  • flags (optional): use i for case-insensitive matching

Examples

“If the phone number starts with 090 or 080, get the start position”

SELECT phone, REGEXP_INSTR(phone, '^(090|080)') AS match_pos FROM users;

“Get the start position of the domain part in an email address”

SELECT email, REGEXP_INSTR(email, '@') AS domain_start FROM users;
  • It returns the position where @ is found (e.g., user@example.com returns 5).

✅ With REGEXP_INSTR(), you can retrieve positional information, expanding your options for string processing.

3.3 REGEXP_SUBSTR()

Overview

REGEXP_SUBSTR() extracts the substring that matches a regex pattern from within a string.

Syntax

REGEXP_SUBSTR(column_name, 'regex_pattern' [, start_position, occurrence, flags])
  • occurrence (optional): get the nth match (default: 1)
  • flags (optional): case sensitivity (i to ignore case)

Examples

“Extract only the domain part from an email address”

SELECT email, REGEXP_SUBSTR(email, '@[a-zA-Z0-9.-]+') AS domain FROM users;

“Get the first number sequence in a message”

SELECT message, REGEXP_SUBSTR(message, '[0-9]+') AS first_number FROM logs;

✅ With REGEXP_SUBSTR(), you can extract specific patterns and use them for organizing and transforming data.

3.4 REGEXP_REPLACE()

Overview

REGEXP_REPLACE() replaces substrings that match a regex pattern with another string.

Syntax

REGEXP_REPLACE(column_name, 'regex_pattern', 'replacement' [, occurrence, flags])
  • replacement: the string to replace the matched part with
  • occurrence (optional): replace only the nth match (if omitted, replaces all matches)
  • flags (optional): use i for case-insensitive matching

Examples

“Remove hyphens (-) from phone numbers”

SELECT phone, REGEXP_REPLACE(phone, '-', '') AS clean_phone FROM users;

“Remove HTML tags”

SELECT comment, REGEXP_REPLACE(comment, '<[^>]+>', '') AS clean_text FROM reviews;

✅ With REGEXP_REPLACE(), format conversion and data cleaning become easy.

3.5 Summary

FunctionPurposeExample
REGEXP_LIKE()Check whether a value matches a regex patternSELECT * FROM users WHERE REGEXP_LIKE(email, '@gmail\.com$');
REGEXP_INSTR()Get the start position of the matched substringSELECT REGEXP_INSTR(email, '@') FROM users;
REGEXP_SUBSTR()Extract the matched substringSELECT REGEXP_SUBSTR(email, '@[a-zA-Z0-9.-]+') FROM users;
REGEXP_REPLACE()Replace the matched substringSELECT REGEXP_REPLACE(phone, '-', '') FROM users;

4. Practical Examples of MySQL Regular Expressions

By leveraging MySQL regular expressions, you can streamline real-world data processing tasks such as data validation, extracting specific data, and format conversion. In this section, we introduce practical use cases with SQL code and explanations.

4.1 Email Address Validation

Overview

Use regular expressions to check whether email addresses stored in the database follow a valid format.

SQL to Use

SELECT email FROM users 
WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';

Explanation

  • ^[a-zA-Z0-9._%+-]+ → one or more alphanumeric characters, dots, underscores, or plus signs before @
  • @[a-zA-Z0-9.-]+ → a domain name after @
  • \.[a-zA-Z]{2,}$ → ends with a TLD (Top-Level Domain) of at least 2 letters (e.g., .com, .jp, .net)

With this query, you can exclude invalid email addresses (e.g., user@@example.com, user@.com).

4.2 Checking Japanese Phone Number Formats

Overview

Determine whether common Japanese phone numbers (e.g., 090-1234-5678, 03-1234-5678) follow the correct format.

SQL to Use

SELECT phone FROM users 
WHERE phone REGEXP '^(0[789]0-[0-9]{4}-[0-9]{4}|0[1-9]-[0-9]{4}-[0-9]{4})$';

Explanation

  • 0[789]0-[0-9]{4}-[0-9]{4} → mobile numbers (090-xxxx-xxxx, 080-xxxx-xxxx, etc.)
  • 0[1-9]-[0-9]{4}-[0-9]{4} → landlines (03-xxxx-xxxx, 06-xxxx-xxxx, etc.)

In this way, regular expressions are useful when format consistency is required.

4.3 Credit Card Number Format Validation

Overview

Validate the formats of credit card numbers (Visa, Mastercard, American Express, etc.).

SQL to Use

SELECT card_number FROM payments 
WHERE card_number REGEXP '^(4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})$';

Explanation

  • ^4[0-9]{12}(?:[0-9]{3})?$ → Visa (16 digits or 13 digits)
  • ^5[1-5][0-9]{14}$ → Mastercard (16 digits)
  • ^3[47][0-9]{13}$ → American Express (15 digits)

With this query, you can filter out invalid card numbers in the database (e.g., incorrect length or invalid starting digits).

4.4 Removing HTML Tags

Overview

If user-submitted data contains HTML tags, you can remove them and convert the content to plain text.

SQL to Use

SELECT REGEXP_REPLACE(comment, '<[^>]+>', '') AS clean_comment FROM reviews;

Explanation

  • '<[^>]+>' → removes all HTML tags between < and >

Example

InputOutput
<b>Hello</b> World!Hello World!
<p>これはサンプルです</p>これはサンプルです

This is useful for stripping HTML tags from user comments and blog posts.

4.5 Postal Code Format Validation (Japan)

Overview

Validate whether Japanese postal codes (e.g., 123-4567) follow the correct format.

SQL to Use

SELECT postal_code FROM addresses 
WHERE postal_code REGEXP '^[0-9]{3}-[0-9]{4}$';

Explanation

  • ^[0-9]{3}-[0-9]{4}$ → a postal code in the “3 digits-4 digits” format

Using this regex helps maintain data consistency and prevent incorrect formats.

4.6 Username Filtering (Detecting Disallowed Words)

Overview

Check whether usernames contain disallowed words reserved for system use (e.g., admin, root, system) during registration.

SQL to Use

SELECT username FROM users 
WHERE username REGEXP 'admin|root|system';

Explanation

  • admin|root|system → detects usernames containing these terms

This query helps prevent regular users from using names intended for system administrators.

Summary

  • Regular expressions can be used for validating email addresses, phone numbers, and credit card numbers.
  • They can also be applied to removing HTML tags and converting data formats.
  • They are effective for improving database safety and consistency through tasks like filtering disallowed words and validating postal codes.

5. Important Considerations When Using Regular Expressions (Performance and Security)

Using regular expressions in MySQL enables powerful searching and data cleansing. However, improper usage can cause performance degradation and introduce security risks. This section explains key considerations for using MySQL regular expressions safely and efficiently.

5.1 Performance Optimization for Regular Expressions

MySQL regex searching is convenient, but it has a drawback: indexes generally do not apply, which can make queries slower.

Measures to Improve Performance

  1. Combine LIKE or FULLTEXT indexes
   SELECT * FROM users WHERE email LIKE '%gmail.com';
  1. Use simpler regular expressions
   SELECT * FROM users WHERE name REGEXP '^admin|admin$';
  1. Filter the target data first
   SELECT * FROM users WHERE email LIKE 'a%' AND email REGEXP '^[a-zA-Z0-9._%+-]+@gmail\.com$';

5.2 Security Risks and Countermeasures

1. Regular Expression DoS (ReDoS)

  • Avoid unbounded matches such as .*.
  • Use ^ and $ to limit the search scope.
  • Do not pass user input directly into REGEXP.

2. SQL Injection and Regular Expressions

Unsafe query

SELECT * FROM users WHERE username REGEXP '$input';

Safe query (use placeholders)

$stmt = $pdo->prepare("SELECT * FROM users WHERE username REGEXP ?");
$stmt->execute([$sanitized_input]);

5.3 Version Compatibility

VersionMain Features
MySQL 5.xOnly the REGEXP operator is available
MySQL 8.xAdds REGEXP_LIKE(), REGEXP_INSTR(), REGEXP_SUBSTR(), and REGEXP_REPLACE()

Summary

  • Because REGEXP does not use indexes, performance measures are important.
  • To prevent ReDoS (Regular Expression DoS), avoid excessive use of wildcards.
  • To prevent SQL injection, do not apply user input directly to REGEXP.

6. FAQ (Frequently Asked Questions)

Here are common questions about using MySQL regular expressions (REGEXP). Resolving these issues will help you use regex more effectively.


6.1 What regex patterns can I use in MySQL?

SymbolDescriptionExampleResult
.Any single charactera.c"abc", "aac", "adc"
^Start of the string^abc"abcde" (does not match "dabc")
$End of the stringxyz$"axyz" (does not match "xyzb")
[]Any of the specified characters[abc]"a", "b", "c"
[^]Any character not in the specified set[^abc]Any character except "a", "b", "c"
*Repeat the preceding character 0 or more timesa*"", "a", "aa", "aaa"
+Repeat the preceding character 1 or more timesa+"a", "aa", "aaa" (does not match "")

6.2 What is the difference between LIKE and REGEXP?

ComparisonLIKEREGEXP
FunctionalitySimple pattern matchingSearching with complex regular expressions
Wildcards% (any string), _ (any single character).* (any string), ^, $, [a-z], etc.
Search speedFast because indexes can applyOften slower due to full table scans
Use casesSimple searches (contains, prefix matching, etc.)Complex searches based on specific patterns

Because LIKE is faster, prefer LIKE for simple searches and use REGEXP when complex matching is required.

6.3 What are the differences in regex features between MySQL 5.x and MySQL 8.x?

VersionMain Features
MySQL 5.xOnly the REGEXP operator is available
MySQL 8.xAdds REGEXP_LIKE(), REGEXP_INSTR(), REGEXP_SUBSTR(), and REGEXP_REPLACE()

6.4 What should I do if a regex does not work as expected?

Checklist

  1. Check whether escaping is correct
   SELECT * FROM users WHERE email REGEXP '\.com$';
  1. Try the i flag with REGEXP_LIKE()
   SELECT * FROM users WHERE REGEXP_LIKE(username, 'admin', 'i');
  1. Check the data encoding
   SHOW VARIABLES LIKE 'character_set_database';

6.5 How can I improve the performance of REGEXP queries?

  1. Pre-filter with LIKE
   SELECT * FROM users 
   WHERE email LIKE '%gmail.com' 
   AND email REGEXP '^[a-zA-Z0-9._%+-]+@gmail\.com$';
  1. Use indexes where appropriate
   ALTER TABLE users ADD FULLTEXT(email);

Summary

  • REGEXP is more powerful than LIKE, but you must be mindful of performance.
  • In MySQL 8.0, new regex functions (such as REGEXP_LIKE()) were added, enabling more flexible processing.
  • To improve performance: combine with LIKE, use indexes, and design simple regex patterns.

7. Summary

MySQL regular expressions (REGEXP) are highly useful for searching data, validation, and format conversion. This article covered MySQL regex basics, new features added in MySQL 8.0 and later, practical examples, important considerations, and FAQs.

7.1 Key Takeaways

Here is a quick recap of the most important points.

1. MySQL Regular Expression Basics

  • Using the REGEXP operator enables more flexible pattern matching than LIKE.
  • Common regex patterns
  • ^ (start), $ (end), . (any single character), [] (character class), + (one or more repetitions), etc.

2. New Regular Expression Functions in MySQL 8.0+

In MySQL 8.0, these four functions were added, enabling more flexible processing:

  • REGEXP_LIKE(): an alternative to the REGEXP operator
  • REGEXP_INSTR(): get the start position of the matching substring
  • REGEXP_SUBSTR(): extract the matching substring
  • REGEXP_REPLACE(): replace substrings using regex

3. Practical Examples

  • Email address validation
  • Phone number and credit card number checks
  • Removing HTML tags
  • Validating postal code formats
  • Detecting disallowed words
  • Data cleaning (trimming whitespace, removing commas, etc.)

4. Important Considerations When Using Regular Expressions

  • Performance optimization
  • Because REGEXP does not use indexes, it can result in full table scans.
  • You can improve performance by combining LIKE or FULLTEXT indexes.
  • Security measures
  • To prevent ReDoS (Regular Expression DoS), avoid excessive use of .*.
  • To prevent SQL injection, use placeholders.

5. FAQ (Frequently Asked Questions)

  • Regex patterns available in MySQL
  • Differences between LIKE and REGEXP
  • Feature differences between MySQL 5.x and 8.x
  • Troubleshooting when regex does not behave as expected
  • Ways to improve performance

7.2 Best Practices for Using Regular Expressions

  1. Keep regex patterns simple
  • Example: Avoid overusing .* and make use of ^ (start) and $ (end).
  1. Combine with LIKE or FULLTEXT where appropriate
  • Pre-filter with LIKE, then apply REGEXP to reduce query cost.
  1. Use REGEXP_REPLACE() for data cleaning
  • Example: Remove HTML tags and normalize unnecessary spaces.
  1. Take advantage of MySQL 8.0 and later
  • Using new functions such as REGEXP_LIKE() can make SQL more readable.
  1. Apply security measures thoroughly
  • Do not apply user input directly to REGEXP (SQL injection prevention).
  • Use placeholders for dynamic queries.

7.3 Resources for Further Learning

Here are references to help you learn MySQL regular expressions more deeply.

Official Documentation

7.4 Final Notes

MySQL regular expressions can be used in many situations, including data searching, cleaning, and validation. However, it is important to use them appropriately while understanding the performance and security considerations.

We hope this article helps you when working with MySQL regular expressions. Try applying these techniques in your projects!