MySQL Collation Explained: How to Set, Change, and Optimize Character Sorting

1. Introduction

MySQL is one of the most widely used database management systems, and within it, “Collation” is an important setting that affects how string data is compared and sorted.

The Importance of Collation

If an appropriate collation is not configured, search results may become inconsistent with expectations, and database performance may degrade. In multilingual systems especially, the accuracy of character comparison and search behavior is significantly affected.

This article provides a detailed explanation of collations in MySQL, including configuration methods, differences between types, and important considerations. By understanding these concepts, you can achieve more effective database management.

2. What Is Collation?

Collation refers to the rules that MySQL applies when comparing and sorting string values.

The Role of Collation

  • String Sorting: Determines the order of string data.
  • String Comparison: Defines comparison criteria such as WHERE name = 'Sagawa'.
  • Search Accuracy: Impacts multilingual support and matching precision.

Relationship with Character Sets

Collation is closely related to the character set. For example, the utf8 character set includes the following collations:

  • utf8_general_ci: Case-insensitive comparison.
  • utf8_bin: Binary comparison.

Collation Naming Convention

character_set_comparison_type

Example:

  • utf8_general_ci: Case-insensitive comparison (ci: case insensitive).
  • utf8_bin: Binary comparison.

3. Collation Configuration Levels in MySQL

In MySQL, collations can be configured at the following five levels:

Server Level

SHOW VARIABLES LIKE 'collation_server';

To change the setting, add the following to my.cnf and restart the server.

[mysqld]
collation_server=utf8mb4_unicode_ci

Database Level

ALTER DATABASE database_name DEFAULT COLLATE utf8mb4_unicode_ci;

Table Level

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Column Level

ALTER TABLE table_name MODIFY COLUMN column_name VARCHAR(255) COLLATE utf8mb4_unicode_ci;

String Literal Level

SELECT * FROM table_name WHERE column_name = 'value' COLLATE utf8mb4_bin;

4. Main Collation Types and Their Characteristics

utf8_general_ci

  • Features: Case-insensitive and fast comparison.
  • Note: Lower accuracy and not fully compliant with the Unicode standard.

utf8_unicode_ci

  • Features: High-accuracy comparison based on the Unicode standard.
  • Note: Slightly slower than utf8_general_ci.

utf8_bin

  • Features: Case-sensitive and requires exact matches.
  • Use Case: Passwords and identifier comparisons.

utf8mb4_unicode_ci

  • Features: Compliant with modern Unicode standards and suitable for multilingual systems.
  • Use Case: Applications handling emojis and special symbols.

5. How to Check and Change Collation

In MySQL, you can check and modify collation settings at the database, table, and column levels.

How to Check Collation

Check Database Collation

SELECT SCHEMA_NAME, DEFAULT_COLLATION_NAME 
FROM INFORMATION_SCHEMA.SCHEMATA
WHERE SCHEMA_NAME = 'database_name';

Check Table Collation

SHOW TABLE STATUS WHERE Name = 'table_name';

Check Column Collation

SHOW FULL COLUMNS FROM table_name;

How to Change Collation

Change Database Collation

ALTER DATABASE database_name
DEFAULT CHARACTER SET utf8mb4
DEFAULT COLLATE utf8mb4_unicode_ci;

Change Table Collation

ALTER TABLE table_name
CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Change Column Collation

ALTER TABLE table_name
MODIFY COLUMN column_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Temporarily Changing Collation

When comparing columns with different collations, you can use the COLLATE clause to prevent errors.

SELECT * FROM table_name
WHERE column1 COLLATE utf8mb4_unicode_ci = column2;

Always back up your data before making changes and verify compatibility with your application.

6. Important Considerations and Best Practices for Collation

When configuring collation in MySQL, it is important to understand key considerations and follow optimal operational practices. Below are important notes and best practices for effectively managing collation.

Important Considerations

Operations Between Different Collations

Comparing or joining columns with different collations may result in errors.

  • Example Error:
ERROR 1267 (HY000): Illegal mix of collations for operation '='
  • Solution: Use the COLLATE clause to unify the collation.
SELECT * FROM table_name
WHERE column1 COLLATE utf8mb4_unicode_ci = column2;

Changing Collation and Existing Data

Changing collation may affect existing data. Since comparison and search behavior can change, careful validation is required.

Performance Impact

  • High-precision collations such as utf8mb4_unicode_ci may be slower than utf8_general_ci.
  • When handling large datasets, collation selection can significantly affect performance and must be chosen carefully.

Migration Issues

When changing collation, ensure compatibility with applications and external systems.

Best Practices

1. Choose Collation Based on Requirements

Consider the following when selecting a collation:

  • Accuracy Priority: Use utf8_unicode_ci or utf8mb4_unicode_ci when precise comparisons are required.
  • Performance Priority: Choose utf8_general_ci if slightly lower accuracy is acceptable.
  • Multilingual Support: Use utf8mb4-based collations when handling emojis and special characters.

2. Standardize Collation

Using a unified collation across the database prevents errors and inconsistencies.

  • Apply the same collation at the database, table, and column levels.
  • Use the COLLATE clause for temporary adjustments when needed.

3. Test and Back Up Before Changes

Before making changes:

  • Verify behavior in a test environment.
  • Create a full backup of production data.

4. Optimize Performance

To minimize the impact of collation on query performance:

  • Use high-performance collations for frequently queried columns (e.g., indexed columns).
  • Temporarily adjust collation in specific queries when necessary.

5. Use the Latest Unicode Support

For multilingual systems, using utf8mb4_0900_ai_ci based on the latest Unicode standard improves accuracy and compatibility.

7. Conclusion

MySQL collation is a critical factor that determines how string data is compared and sorted within a database. This article covered everything from the basic concept of collation to configuration methods, considerations, and best practices.

Key Takeaways

  1. What Is Collation?
    Collation defines how strings are compared and ordered. Selecting the appropriate collation improves both database accuracy and performance.
  2. Flexible Configuration Levels
    MySQL allows collation to be configured at multiple levels: server, database, table, column, and literal. This flexibility enables optimal configuration at each layer.
  3. Main Collation Characteristics
  • utf8_general_ci: Fast but less precise.
  • utf8_unicode_ci: More precise but slightly slower.
  • utf8mb4_unicode_ci: Unicode-compliant and supports emojis and special characters.
  1. Practical Verification and Modification
    This article introduced SQL commands to check and change collation, along with practical implementation guidance.
  2. Considerations and Best Practices
    Understand key precautions when changing collation and always perform testing and backups beforehand.

Using Collation Effectively

Since collation significantly impacts string comparison and sorting, configuring it properly is essential. Use the methods and best practices introduced in this article to select the optimal configuration for your system.