- 1 1. 引言
- 2 2. 什么是 MySQL 字符集?理解基础知识
- 3 3. 如何检查当前字符集
- 4 4. 如何更改 MySQL 字符集
- 5 5. Troubleshooting After Changing the Character Set
- 6 6. How Character Set Changes Affect Performance
- 7 7. Recommended Settings (Best Practices)
- 8 8. 常见问题
1. 引言
为什么您可能需要更改 MySQL 字符集
数据库字符集是一个重要的设置,它决定了文本数据如何被编码和处理以进行存储。在 MySQL 中,默认字符集通常是 latin1,这在使用日文或其他特殊字符时可能会导致问题。特别是在数据迁移或系统标准化期间,切换到合适的字符集变得至关重要。
常见问题及其原因
与 MySQL 字符集相关的典型问题包括以下内容。
- 乱码(mojibake)
- 同一环境中混用了
utf8和latin1 - 客户端和服务器的字符集设置不匹配
- 搜索时的问题
- 由于排序规则差异,无法返回预期的搜索结果
- 排序顺序与预期不同
- 数据迁移期间的问题
- 因为未使用
utf8mb4,无法保存表情符号和特殊符号 - 导出/导入期间未正确处理字符集转换
文章目标和结构
本文全面解释了 MySQL 字符集更改,从基本概念到如何更改设置以及故障排除。
提纲
- MySQL 字符集的基本知识
- 如何检查当前字符集
- 如何更改 MySQL 字符集
- 更改后的故障排除
- 字符集更改对性能的影响
- 推荐设置(最佳实践)
- 常见问题解答(FAQ)
通过阅读本指南,您将加深对 MySQL 字符集的理解,并能够选择正确的设置并避免常见问题。
2. 什么是 MySQL 字符集?理解基础知识
什么是字符集?
字符集(Character Set)是一组用于将字符存储和处理为数字数据的规则。例如,在存储日文字符“あ”时,UTF-8 将其表示为字节序列 E3 81 82,而 Shift_JIS 使用 82 A0。
在 MySQL 中,您可以在数据库或表级别指定不同的字符集。通过选择合适的字符集,您可以防止乱码并使国际化更顺畅。
常见字符集
| Character Set | Characteristics | Use Case |
|---|---|---|
utf8 | UTF-8 up to 3 bytes | Does not support some special characters (such as emoji) |
utf8mb4 | UTF-8 up to 4 bytes | Supports emoji and special characters (recommended) |
latin1 | ASCII-compatible | Used in older systems |
什么是排序规则?
排序规则(Collation)是用于在字符集中比较和排序数据的规则集。例如,它定义了“A”和“a”是否被视为相同字符,以及如何确定排序顺序。
常用排序规则
| Collation | Description |
|---|---|
utf8_general_ci | Case-insensitive, suitable for general use |
utf8_unicode_ci | Unicode-based collation (recommended) |
utf8mb4_bin | Binary comparison (use when exact matches are required) |
utf8 和 utf8mb4 的区别
MySQL 的 utf8 实际上每个字符最多只能存储3 字节,因此无法处理某些特殊字符(例如表情符号或某些扩展的 CJK 字符)。相比之下,utf8mb4 支持每个字符最多4 字节,这就是为什么现代应用程序推荐使用 utf8mb4。
| Character Set | Max Bytes | Emoji Support | Recommendation |
|---|---|---|---|
utf8 | 3 bytes | ❌ Not supported | ❌ Not recommended |
utf8mb4 | 4 bytes | ✅ Supported | ✅ Recommended |
为什么应该从 utf8 切换到 utf8mb4
- 未来兼容性 : 现代系统越来越多地标准化使用
utf8mb4。 - 存储特殊字符和表情符号 : 使用
utf8mb4,您可以安全处理 SNS 帖子和消息应用程序中的数据。 - 国际化 : 对于多语言系统,它降低了乱码的风险。
总结
- 字符集决定了数据如何被存储和处理。
- 排序规则决定了字符如何被比较。
- MySQL 的
utf8限制为 3 字节,因此推荐使用utf8mb4。 utf8mb4_unicode_ci是通用用途的常用推荐排序规则。
3. 如何检查当前字符集
在更改 MySQL 字符集之前,检查当前设置非常重要。
因为字符集可以在多个级别设置(数据库、表、列),您应该准确了解需要更改的位置。
如何检查当前字符集
检查 MySQL 服务器范围的字符集
First, check the default character set and collation settings for the entire MySQL server.
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
示例输出:
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
+--------------------------+----------------------------+
检查每个数据库的字符集
要检查特定数据库的字符集,请使用以下查询。
SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME = 'database_name';
示例输出
+----------------+----------------------+----------------------+
| SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+----------------+----------------------+----------------------+
| my_database | utf8mb4 | utf8mb4_unicode_ci |
+----------------+----------------------+----------------------+
检查表的字符集
以下是检查特定表字符集的方法。
SHOW CREATE TABLE table_name;
示例输出
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci;
检查点
DEFAULT CHARSET=latin1→ 不是utf8mb4,因此需要更改COLLATE=latin1_swedish_ci→ 通常更适合切换为utf8mb4_unicode_ci
检查列的字符集
要检查列级别的字符集,请运行以下 SQL。
SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'database_name'
AND TABLE_NAME = 'table_name';
示例输出
+-------------+--------------------+----------------------+
| COLUMN_NAME | CHARACTER_SET_NAME | COLLATION_NAME |
+-------------+--------------------+----------------------+
| name | latin1 | latin1_swedish_ci |
| email | utf8mb4 | utf8mb4_unicode_ci |
+-------------+--------------------+----------------------+
在此示例中,name 列使用 latin1,建议将其更改为 utf8mb4。
摘要
- MySQL 的字符集在多个层级(服务器、数据库、表、列)进行配置。
- 通过检查每个层级的字符集,您可以进行相应的更改。
- 使用
SHOW VARIABLES和SHOW CREATE TABLE等命令来全面了解当前配置。

4. 如何更改 MySQL 字符集
通过适当地更改 MySQL 字符集,您可以防止乱码并更顺畅地支持多语言数据。
本节将说明如何在各层级更新设置:全局服务器、数据库、表和列。
更改全局服务器默认字符集
要更改全局服务器默认字符集,需要编辑 MySQL 配置文件(my.cnf 或 my.ini)。
步骤
- 打开配置文件
- 在 Linux 上:
bash sudo nano /etc/mysql/my.cnf 在 Windows 上:wp:list /wp:list
- 打开
C:\ProgramData\MySQL\MySQL Server X.X\my.ini
- 打开
- 添加或更改字符集设置 在
mysqld部分下添加或更新以下行。[mysqld] character-set-server=utf8mb4 collation-server=utf8mb4_unicode_ci
- Restart MySQL
sudo systemctl restart mysql
On Windows:
net stop MySQL && net start MySQL
- Verify the change
SHOW VARIABLES LIKE 'character_set_server';
Change the Character Set at the Database Level
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Verify the change
SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME = 'mydatabase';
Change the Character Set at the Table Level
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Verify the change
SHOW CREATE TABLE users;
Change the Character Set at the Column Level
ALTER TABLE users MODIFY COLUMN name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Verify the change
SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'mydatabase'
AND TABLE_NAME = 'users';
Post-Change Verification and the Importance of Backups
To preserve data integrity after changing the character set, follow these steps.
Back up your data
mysqldump -u root -p --default-character-set=utf8mb4 mydatabase > backup.sql
Re-check the settings
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
SHOW CREATE TABLE users;
Insert and display test data
INSERT INTO users (name, email) VALUES ('Test User', 'test@example.com');
SELECT * FROM users;
Summary
- Server-wide character set change : Edit
my.cnfand setcharacter-set-server=utf8mb4 - Database character set change :
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 - Table character set change :
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 - Column character set change :
ALTER TABLE users MODIFY COLUMN name VARCHAR(255) CHARACTER SET utf8mb4 - After changes, always verify settings and test your data
5. Troubleshooting After Changing the Character Set
After changing the MySQL character set, you may encounter cases where the system does not behave correctly or stored data becomes garbled.
In this section, we explain common issues and how to fix them in detail.
Causes of Mojibake and How to Fix It
If mojibake occurs after changing the character set, the following causes are common.
| Cause | How to Check | Solution |
|---|---|---|
| The client character set setting differs | SHOW VARIABLES LIKE 'character_set_client'; | Run SET NAMES utf8mb4; |
| Existing data was stored using a different encoding | SELECT HEX(column_name) FROM table_name; | Use CONVERT() or re-export the data |
| The connection encoding is not correct | Connect with mysql --default-character-set=utf8mb4 | Adjust the client-side character set configuration |
| Application settings (PHP/Python, etc.) are incorrect | mysqli_set_charset($conn, 'utf8mb4'); | Standardize the application’s character set settings |
Fix #1: Set the client character set correctly
SET NAMES utf8mb4;
Fix #2: Convert existing data properly
UPDATE users SET name = CONVERT(CAST(CONVERT(name USING latin1) AS BINARY) USING utf8mb4);
Notes When Converting from latin1 to utf8mb4
Safe procedure
- Back up current data
mysqldump -u root -p --default-character-set=latin1 mydatabase > backup.sql
- Change the database character set
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
- Change the table character set
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
- Re-import the data
mysql -u root -p --default-character-set=utf8mb4 mydatabase < backup.sql
Data Is Not Searchable After the Change
Case #1: LIKE search does not work
SELECT * FROM users WHERE name COLLATE utf8mb4_unicode_ci LIKE '%Tanaka%';
Case #2: Sort order changed
SELECT * FROM users ORDER BY BINARY name;
Application-Side Measures
For PHP
mysqli_set_charset($conn, 'utf8mb4');
For Python (MySQL Connector)
import mysql.connector
conn = mysql.connector.connect(
host="localhost",
user="root",
password="password",
database="mydatabase",
charset="utf8mb4"
)
For Node.js (MySQL2)
const mysql = require('mysql2');
const connection = mysql.createConnection({
host: 'localhost',
user: 'root',
password: 'password',
database: 'mydatabase',
charset: 'utf8mb4'
});
Summary
- Post-change issues generally fall into three categories: client settings, data conversion, and application settings.
- To prevent mojibake, standardize the client character set using
SET NAMES utf8mb4. - Watch for LIKE search and sort order changes, and specify
COLLATEwhen needed. - Set
utf8mb4in your application as well to avoid encoding mismatches.
6. How Character Set Changes Affect Performance
When changing the MySQL character set to utf8mb4, there are several performance considerations, such as increased storage usage and index limitations.
In this section, we explain the impact and the best countermeasures.
Increased Storage Usage
Compared to MySQL’s utf8, utf8mb4 can use up to 4 bytes per character,
so the overall table size may increase.
Max bytes per character by character set
| Character Set | Max Bytes per Character |
|---|---|
latin1 | 1 byte |
utf8 | 3 bytes |
utf8mb4 | 4 bytes |
For example, with utf8, VARCHAR(255) is up to 765 bytes (255×3),
but with utf8mb4, it becomes up to 1020 bytes (255×4).
Countermeasure
ALTER TABLE posts MODIFY COLUMN title VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Increased Index Size
MySQL enforces a maximum index key length.
After switching to utf8mb4, index entries become larger, and you may hit the limit—making indexes unusable.
Check index impact
SHOW INDEX FROM users;
Example error
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
Countermeasure
ALTER TABLE users MODIFY COLUMN email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Impact on Query Performance
Changing the character set to utf8mb4 may affect query execution speed.
Operations that may be affected
LIKEsearches over large datasetsORDER BYprocessing- JOIN query performance
Countermeasure
CREATE INDEX idx_name ON users(name(100));
Memory Usage and Buffer Tuning
With utf8mb4, memory usage may increase.
Recommended settings
[mysqld]
innodb_buffer_pool_size = 1G
query_cache_size = 128M
Summary
- Switching to
utf8mb4increases storage usage. - Index sizes increase and may exceed limits.
- Query performance can be affected.
- Because memory usage may increase, buffer sizes may need tuning.
7. Recommended Settings (Best Practices)
By setting MySQL character sets appropriately, you can maintain data integrity while optimizing performance.
In this section, we present recommended MySQL character set configurations and explain key points for an optimal setup.
Recommended MySQL Character Set Configuration
| Item | Recommended Setting | Reason |
|---|---|---|
| Character Set | utf8mb4 | Supports all Unicode characters including emoji and special characters |
| Collation | utf8mb4_unicode_ci | Case-insensitive and suitable for multilingual systems |
| Storage Engine | InnoDB | Good balance of performance and consistency |
| Indexed string length | VARCHAR(191) | Avoids exceeding MySQL index limits |
Recommended my.cnf Settings
1. MySQL Server Character Set Settings
[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init-connect='SET NAMES utf8mb4'
skip-character-set-client-handshake
innodb_large_prefix = ON
innodb_file_format = Barracuda
innodb_file_per_table = 1
innodb_buffer_pool_size = 1G
query_cache_size = 128M
2. Client-Side Character Set Settings
[client]
default-character-set = utf8mb4
Recommended Database Settings
CREATE DATABASE mydatabase DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
To change an existing database character set:
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Recommended Table Settings
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
更改现有表的字符集
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
utf8mb4_general_ci 与 utf8mb4_unicode_ci 的差异
| Collation | Characteristics | Use Case |
|---|---|---|
utf8mb4_general_ci | Faster comparisons, but less accurate | Performance-focused systems |
utf8mb4_unicode_ci | Unicode-standard, more accurate comparisons | General-purpose use (recommended) |
✅ 如果需要多语言支持或精确排序,请选择 utf8mb4_unicode_ci。
索引优化
CREATE FULLTEXT INDEX idx_fulltext ON articles(content);
摘要
- 推荐使用
utf8mb4+utf8mb4_unicode_ci的组合。 - 统一服务器设置(
my.cnf)和连接字符集。 - 在数据库、表和列级别显式指定
utf8mb4。 - 使用
VARCHAR(191)以避免索引键长度限制。 - 使用
utf8mb4_unicode_ci进行精确比较。
8. 常见问题
以下是关于更改 MySQL 字符集的常见实际问题。我们还会介绍 如何处理错误 和 如何选择最佳设置。
utf8 与 utf8mb4 有何区别?
SHOW VARIABLES LIKE 'character_set_server';
更改 MySQL 字符集会导致数据丢失吗?
mysqldump -u root -p --default-character-set=utf8mb4 mydatabase > backup.sql
如果出现乱码(mojibake),该如何修复?
UPDATE users SET name = CONVERT(CAST(CONVERT(name USING latin1) AS BINARY) USING utf8mb4);
将 latin1 转换为 utf8mb4 时有哪些风险?
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
切换到 utf8mb4 会影响性能吗?
ALTER TABLE users MODIFY COLUMN email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
应该使用 utf8mb4_general_ci 还是 utf8mb4_unicode_ci?
| Collation | Characteristics | Use Case |
|---|---|---|
utf8mb4_general_ci | Faster comparisons, but less accurate | Performance-focused systems |
utf8mb4_unicode_ci | Unicode-standard, accurate comparisons | General-purpose use (recommended) |
切换到 utf8mb4 后查询会变慢吗?
CREATE FULLTEXT INDEX idx_fulltext ON articles(content);
摘要
✅ 推荐使用 utf8mb4。由于 utf8 的局限性,不推荐使用。
✅ 在进行更改之前,始终使用 SHOW VARIABLES 检查设置。
✅ 使用导出/导入工作流以防止乱码。
✅ 考虑索引限制,并在适当情况下使用 VARCHAR(191)。
✅ 为提升性能,添加适当的索引。
最终说明
更改 MySQL 字符集不仅仅是一次简单的配置调整——它是一项关键任务,可能影响 数据完整性和性能。通过遵循正确的设置和流程,您可以安全、有效地迁移到 utf8mb4。
🔹 请按照本文步骤正确配置字符集! 🔹


