- 1 1. Introduction
- 2 2. Basic Knowledge of MySQL Character Sets and Collations
- 3 3. How to Check the Current Character Set in MySQL
- 4 4. How to Configure and Change Character Sets
- 5 5. Differences Between utf8 and utf8mb4
- 6 6. Causes and Solutions for Encoding Issues (Mojibake)
- 7 7. FAQ Section
- 8 8. Conclusion
1. Introduction
MySQL is a widely used database management system utilized in various applications. Among its many configuration elements, character set settings are critically important because they directly affect data integrity and performance. However, many developers encounter problems because they are not fully aware of how to properly configure and verify character sets.
This article focuses on how to check MySQL character set settings, explains how to modify them, clarifies the differences between utf8 and utf8mb4, and covers practical strategies to prevent encoding issues. By reading this guide, you will gain both foundational knowledge and practical skills related to MySQL character set management.
2. Basic Knowledge of MySQL Character Sets and Collations
What Is a Character Set?
A character set is an encoding system that allows computers to represent text as digital data. For example, UTF-8 is widely used because it supports a broad range of languages worldwide. In MySQL, utf8 and latin1 have often been used as default character sets, but in recent years, utf8mb4 has become the recommended standard.
What Is a Collation?
A collation defines the rules for comparing and sorting strings. For example, utf8_general_ci and utf8_unicode_ci are both collations for UTF-8, but utf8_unicode_ci provides more accurate comparisons based on the Unicode standard.
The Relationship Between Character Sets and Collations
A character set defines how characters are encoded, while a collation defines how those encoded characters are compared and sorted. Selecting appropriate combinations helps prevent encoding issues and performance degradation.
3. How to Check the Current Character Set in MySQL
In MySQL, character sets are configured at multiple levels: server level, database level, table level, and column level. Below are methods to check the character set settings at each level.
Check the Server-Wide Character Set Settings
To check the server-level character set configuration, run the following command:
SHOW VARIABLES LIKE 'character_set_%';The output will look similar to the following:
+--------------------------+------------------+
| Variable_name | Value |
+--------------------------+------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
+--------------------------+------------------+Meaning of each item:
character_set_server: Default server character set.character_set_database: Default database character set.
Check the Character Set for a Specific Database
To verify the character set configuration of a specific database, use the following command:
SHOW CREATE DATABASE database_name;Example output:
CREATE DATABASE `database_name` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */Check the Character Set for Tables and Columns
To check the character set for a table or its columns, use the following commands.
Table Level:
SHOW CREATE TABLE table_name;Column Level:
SHOW FULL COLUMNS FROM table_name;Example output:
+----------------+--------------+----------------------+-------+
| Field | Type | Collation | Null |
+----------------+--------------+----------------------+-------+
| column_name | varchar(255) | utf8mb4_unicode_ci | YES |
+----------------+--------------+----------------------+-------+By using these commands, you can verify whether character sets are properly configured at each level.
4. How to Configure and Change Character Sets
The method for changing character sets in MySQL differs depending on whether you are modifying the server level, database level, table level, or column level. Below is a detailed explanation of each configuration method.
Changing Server-Wide Settings
To modify the default server-level character set, edit the MySQL configuration file (typically my.cnf or my.ini).
Configuration Steps:
- Open the configuration file.
sudo nano /etc/my.cnf- Add or modify the following settings:
[mysqld]
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci- Restart the MySQL server.
sudo systemctl restart mysqldChanging Database-Level Settings
To change the character set of a specific database, use the following command:
Modification Command:
ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Example:
ALTER DATABASE my_database CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;This command changes the default character set for the database but does not affect existing tables or stored data. If you need to modify tables as well, refer to the next section.
Changing Table-Level Settings
To modify the character set of an existing table, use the following command:
Modification Command:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Example:
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;This command also updates the character set of all columns within the table.
Changing Column-Level Settings
If you need to change the character set of a specific column only, use the following command:
Modification Command:
ALTER TABLE table_name MODIFY column_name column_type CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Example:
ALTER TABLE users MODIFY username VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
5. Differences Between utf8 and utf8mb4
Technical Differences
- utf8: In MySQL, utf8 is actually a subset of UTF-8 and supports up to 3 bytes per character. Because of this limitation, it cannot store emojis or certain special characters (e.g., 𠮷).
- utf8mb4: A full implementation of UTF-8 that supports up to 4 bytes per character.
Why utf8mb4 Is Recommended
- Compatibility: Modern web and mobile applications frequently handle emojis and special characters.
- Standardization: Many CMS platforms (e.g., WordPress) recommend utf8mb4 as the default character set.
Important Considerations When Migrating
When migrating from utf8 to utf8mb4, pay attention to the following points:
- Database Size: Since utf8mb4 may use up to 4 bytes per character, the database size can increase.
- Existing Data: It is strongly recommended to back up your data before making changes.
- Application Configuration: The character set used by the application (e.g., client connection character set) must also be set to utf8mb4.
6. Causes and Solutions for Encoding Issues (Mojibake)
Main Causes of Encoding Issues
- Character Set Mismatch Between Client and Server
- Example: The client uses
latin1while the server usesutf8mb4.
- Improper Data Migration
- The character set is not correctly specified when importing data.
- Application Misconfiguration
- The character set specified during the database connection is incorrect.
Practical Measures to Prevent Encoding Issues
- Verify and Standardize Server Settings
- Check the server character set settings and maintain consistency across all levels.
SHOW VARIABLES LIKE 'character_set_%';- Adjust Client Settings
- Explicitly specify the character set when establishing a client connection.
SET NAMES utf8mb4;- Be Careful During Data Migration
- Specify the correct character set when importing data.
mysql --default-character-set=utf8mb4 -u username -p database_name < dump.sql7. FAQ Section
Frequently Asked Questions
- Will changing to utf8mb4 affect performance?
- Since utf8mb4 may increase data size, there can be a slight performance impact in very large-scale databases. However, in typical production environments, this rarely becomes a significant issue.
- Is there any risk when migrating from utf8 to utf8mb4?
- The migration process itself is not inherently risky. However, to prevent potential data loss or application issues during character conversion, it is essential to take a full backup beforehand.
- What changes when modifying the collation?
- It enables more accurate string comparison and sorting. For multilingual applications,
utf8mb4_unicode_ciis recommended.
8. Conclusion
In this article, we explained how to check MySQL character sets, how to configure and modify them, the differences between utf8 and utf8mb4, and how to prevent encoding issues. Character set configuration is a foundational aspect of database management, and proper settings directly contribute to preventing errors and improving performance. Use this guide as a reference to select and configure the appropriate character set for your project.


