- 1 1. Introduction
- 2 2. What Is a MySQL Character Set? Understanding the Basics
- 3 3. How to Check the Current Character Set
- 4 4. How to Change the MySQL Character Set
- 5 5. Troubleshooting After Changing the Character Set
- 6 6. How Character Set Changes Affect Performance
- 7 7. Recommended Settings (Best Practices)
- 8 8. FAQ
- 8.1 What’s the difference between utf8 and utf8mb4?
- 8.2 Will changing the MySQL character set cause data loss?
- 8.3 How do I fix mojibake if it happens?
- 8.4 What are the risks when converting from latin1 to utf8mb4?
- 8.5 Does switching to utf8mb4 affect performance?
- 8.6 Which should I use: utf8mb4_general_ci or utf8mb4_unicode_ci?
- 8.7 Will queries become slower after switching to utf8mb4?
- 8.8 Summary
- 8.9 Final Notes
1. Introduction
Why You May Need to Change the MySQL Character Set
A database character set is an important setting that determines how text data is encoded and processed for storage. In MySQL, the default character set is often latin1, which can cause problems when working with Japanese or other special characters. Especially during data migration or system standardization, switching to an appropriate character set becomes essential.
Common Issues and Their Causes
Typical problems related to MySQL character sets include the following.
- Mojibake (garbled characters)
utf8andlatin1are mixed in the same environment- The client and server character set settings do not match
- Problems when searching
- Due to collation differences, the expected search results are not returned
- Sort order differs from what you expect
- Problems during data migration
- Emoji and special symbols cannot be saved because
utf8mb4is not used - Character set conversion is not handled properly during export/import
Article Goals and Structure
This article provides a comprehensive explanation of MySQL character set changes, from fundamental concepts to how to change settings and troubleshooting.
Outline
- Basic knowledge of MySQL character sets
- How to check the current character set
- How to change the MySQL character set
- Troubleshooting after the change
- How character set changes affect performance
- Recommended settings (best practices)
- FAQ (frequently asked questions)
By reading this guide, you will deepen your understanding of MySQL character sets and be able to choose the right settings and avoid common issues.
2. What Is a MySQL Character Set? Understanding the Basics
What Is a Character Set?
A character set (Character Set) is a set of rules used to store and process characters as digital data. For example, when storing the Japanese character “あ”, UTF-8 represents it as the byte sequence E3 81 82, while Shift_JIS uses 82 A0.
In MySQL, you can specify different character sets at the database or table level. By choosing an appropriate character set, you can prevent mojibake and make internationalization smoother.
Common Character Sets
| Character Set | Characteristics | Use Case |
|---|---|---|
utf8 | UTF-8 up to 3 bytes | Does not support some special characters (such as emoji) |
utf8mb4 | UTF-8 up to 4 bytes | Supports emoji and special characters (recommended) |
latin1 | ASCII-compatible | Used in older systems |
What Is Collation?
Collation is the rule set used to compare and sort data within a character set. For example, it defines whether “A” and “a” are treated as the same character and how ordering is determined.
Commonly Used Collations
| Collation | Description |
|---|---|
utf8_general_ci | Case-insensitive, suitable for general use |
utf8_unicode_ci | Unicode-based collation (recommended) |
utf8mb4_bin | Binary comparison (use when exact matches are required) |
Differences Between utf8 and utf8mb4
MySQL’s utf8 can actually store only up to 3 bytes per character, so it cannot handle certain special characters (such as emoji or some extended CJK characters). In contrast, utf8mb4 supports up to 4 bytes per character, which is why modern applications are recommended to use utf8mb4.
| Character Set | Max Bytes | Emoji Support | Recommendation |
|---|---|---|---|
utf8 | 3 bytes | ❌ Not supported | ❌ Not recommended |
utf8mb4 | 4 bytes | ✅ Supported | ✅ Recommended |
Why You Should Switch from utf8 to utf8mb4
- Future compatibility: Modern systems are increasingly standardizing on
utf8mb4. - Storing special characters and emoji: With
utf8mb4, you can safely handle data in SNS posts and messaging apps. - Internationalization: For multilingual systems, it reduces the risk of mojibake.
Summary
- Character set determines how data is stored and processed.
- Collation determines how characters are compared.
- MySQL’s
utf8is limited to 3 bytes, soutf8mb4is recommended. utf8mb4_unicode_ciis a commonly recommended collation for general use.
3. How to Check the Current Character Set
Before changing the MySQL character set, it is important to check the current settings.
Because character sets can be set at multiple levels (database, table, column), you should understand exactly where changes are needed.
How to Check the Current Character Set
Check the MySQL Server-Wide Character Set
First, check the default character set and collation settings for the entire MySQL server.
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';Example output:
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
+--------------------------+----------------------------+Check the Character Set per Database
To check the character set for a specific database, use the following query.
SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME = 'database_name';Example output
+----------------+----------------------+----------------------+
| SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+----------------+----------------------+----------------------+
| my_database | utf8mb4 | utf8mb4_unicode_ci |
+----------------+----------------------+----------------------+Check a Table’s Character Set
This is how to check the character set for a specific table.
SHOW CREATE TABLE table_name;Example output
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci;Checkpoints
DEFAULT CHARSET=latin1→ Notutf8mb4, so a change is neededCOLLATE=latin1_swedish_ci→ Switching toutf8mb4_unicode_ciis usually more appropriate
Check a Column’s Character Set
To inspect the character set at the column level, run the following SQL.
SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'database_name'
AND TABLE_NAME = 'table_name';Example output
+-------------+--------------------+----------------------+
| COLUMN_NAME | CHARACTER_SET_NAME | COLLATION_NAME |
+-------------+--------------------+----------------------+
| name | latin1 | latin1_swedish_ci |
| email | utf8mb4 | utf8mb4_unicode_ci |
+-------------+--------------------+----------------------+In this example, the name column uses latin1, so it is recommended to change it to utf8mb4.
Summary
- MySQL character sets are configured at multiple levels (server, database, table, column).
- By checking the character set at each level, you can apply the right changes.
- Use commands like
SHOW VARIABLESandSHOW CREATE TABLEto fully understand the current configuration.

4. How to Change the MySQL Character Set
By changing the MySQL character set appropriately, you can prevent mojibake and support multilingual data more smoothly.
In this section, we explain how to update settings at each level: server-wide, database, table, and column.
Change the Server-Wide Default Character Set
To change the server-wide default character set, you need to edit the MySQL configuration file (my.cnf or my.ini).
Steps
- Open the configuration file
- On Linux:
bash sudo nano /etc/mysql/my.cnf - On Windows:
- Open
C:\ProgramData\MySQL\MySQL Server X.X\my.ini
- Open
- Add or change the character set settings
Add or update the following lines under themysqldsection.
[mysqld]
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci- Restart MySQL
sudo systemctl restart mysqlOn Windows:
net stop MySQL && net start MySQL- Verify the change
SHOW VARIABLES LIKE 'character_set_server';Change the Character Set at the Database Level
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Verify the change
SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME = 'mydatabase';Change the Character Set at the Table Level
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Verify the change
SHOW CREATE TABLE users;Change the Character Set at the Column Level
ALTER TABLE users MODIFY COLUMN name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Verify the change
SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'mydatabase'
AND TABLE_NAME = 'users';Post-Change Verification and the Importance of Backups
To preserve data integrity after changing the character set, follow these steps.
Back up your data
mysqldump -u root -p --default-character-set=utf8mb4 mydatabase > backup.sqlRe-check the settings
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
SHOW CREATE TABLE users;Insert and display test data
INSERT INTO users (name, email) VALUES ('Test User', 'test@example.com');
SELECT * FROM users;Summary
- Server-wide character set change: Edit
my.cnfand setcharacter-set-server=utf8mb4 - Database character set change:
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 - Table character set change:
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 - Column character set change:
ALTER TABLE users MODIFY COLUMN name VARCHAR(255) CHARACTER SET utf8mb4 - After changes, always verify settings and test your data
5. Troubleshooting After Changing the Character Set
After changing the MySQL character set, you may encounter cases where the system does not behave correctly or stored data becomes garbled.
In this section, we explain common issues and how to fix them in detail.
Causes of Mojibake and How to Fix It
If mojibake occurs after changing the character set, the following causes are common.
| Cause | How to Check | Solution |
|---|---|---|
| The client character set setting differs | SHOW VARIABLES LIKE 'character_set_client'; | Run SET NAMES utf8mb4; |
| Existing data was stored using a different encoding | SELECT HEX(column_name) FROM table_name; | Use CONVERT() or re-export the data |
| The connection encoding is not correct | Connect with mysql --default-character-set=utf8mb4 | Adjust the client-side character set configuration |
| Application settings (PHP/Python, etc.) are incorrect | mysqli_set_charset($conn, 'utf8mb4'); | Standardize the application’s character set settings |
Fix #1: Set the client character set correctly
SET NAMES utf8mb4;Fix #2: Convert existing data properly
UPDATE users SET name = CONVERT(CAST(CONVERT(name USING latin1) AS BINARY) USING utf8mb4);Notes When Converting from latin1 to utf8mb4
Safe procedure
- Back up current data
mysqldump -u root -p --default-character-set=latin1 mydatabase > backup.sql- Change the database character set
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;- Change the table character set
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;- Re-import the data
mysql -u root -p --default-character-set=utf8mb4 mydatabase < backup.sqlData Is Not Searchable After the Change
Case #1: LIKE search does not work
SELECT * FROM users WHERE name COLLATE utf8mb4_unicode_ci LIKE '%Tanaka%';Case #2: Sort order changed
SELECT * FROM users ORDER BY BINARY name;Application-Side Measures
For PHP
mysqli_set_charset($conn, 'utf8mb4');For Python (MySQL Connector)
import mysql.connector
conn = mysql.connector.connect(
host="localhost",
user="root",
password="password",
database="mydatabase",
charset="utf8mb4"
)For Node.js (MySQL2)
const mysql = require('mysql2');
const connection = mysql.createConnection({
host: 'localhost',
user: 'root',
password: 'password',
database: 'mydatabase',
charset: 'utf8mb4'
});Summary
- Post-change issues generally fall into three categories: client settings, data conversion, and application settings.
- To prevent mojibake, standardize the client character set using
SET NAMES utf8mb4. - Watch for LIKE search and sort order changes, and specify
COLLATEwhen needed. - Set
utf8mb4in your application as well to avoid encoding mismatches.
6. How Character Set Changes Affect Performance
When changing the MySQL character set to utf8mb4, there are several performance considerations, such as increased storage usage and index limitations.
In this section, we explain the impact and the best countermeasures.
Increased Storage Usage
Compared to MySQL’s utf8, utf8mb4 can use up to 4 bytes per character,
so the overall table size may increase.
Max bytes per character by character set
| Character Set | Max Bytes per Character |
|---|---|
latin1 | 1 byte |
utf8 | 3 bytes |
utf8mb4 | 4 bytes |
For example, with utf8, VARCHAR(255) is up to 765 bytes (255×3),
but with utf8mb4, it becomes up to 1020 bytes (255×4).
Countermeasure
ALTER TABLE posts MODIFY COLUMN title VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Increased Index Size
MySQL enforces a maximum index key length.
After switching to utf8mb4, index entries become larger, and you may hit the limit—making indexes unusable.
Check index impact
SHOW INDEX FROM users;Example error
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytesCountermeasure
ALTER TABLE users MODIFY COLUMN email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Impact on Query Performance
Changing the character set to utf8mb4 may affect query execution speed.
Operations that may be affected
LIKEsearches over large datasetsORDER BYprocessing- JOIN query performance
Countermeasure
CREATE INDEX idx_name ON users(name(100));Memory Usage and Buffer Tuning
With utf8mb4, memory usage may increase.
Recommended settings
[mysqld]
innodb_buffer_pool_size = 1G
query_cache_size = 128MSummary
- Switching to
utf8mb4increases storage usage. - Index sizes increase and may exceed limits.
- Query performance can be affected.
- Because memory usage may increase, buffer sizes may need tuning.
7. Recommended Settings (Best Practices)
By setting MySQL character sets appropriately, you can maintain data integrity while optimizing performance.
In this section, we present recommended MySQL character set configurations and explain key points for an optimal setup.
Recommended MySQL Character Set Configuration
| Item | Recommended Setting | Reason |
|---|---|---|
| Character Set | utf8mb4 | Supports all Unicode characters including emoji and special characters |
| Collation | utf8mb4_unicode_ci | Case-insensitive and suitable for multilingual systems |
| Storage Engine | InnoDB | Good balance of performance and consistency |
| Indexed string length | VARCHAR(191) | Avoids exceeding MySQL index limits |
Recommended my.cnf Settings
1. MySQL Server Character Set Settings
[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init-connect='SET NAMES utf8mb4'
skip-character-set-client-handshake
innodb_large_prefix = ON
innodb_file_format = Barracuda
innodb_file_per_table = 1
innodb_buffer_pool_size = 1G
query_cache_size = 128M2. Client-Side Character Set Settings
[client]
default-character-set = utf8mb4Recommended Database Settings
CREATE DATABASE mydatabase DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;To change an existing database character set:
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Recommended Table Settings
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;Change Character Set for Existing Tables
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Differences Between utf8mb4_general_ci and utf8mb4_unicode_ci
| Collation | Characteristics | Use Case |
|---|---|---|
utf8mb4_general_ci | Faster comparisons, but less accurate | Performance-focused systems |
utf8mb4_unicode_ci | Unicode-standard, more accurate comparisons | General-purpose use (recommended) |
✅ If you need multilingual support or accurate sorting, choose utf8mb4_unicode_ci.
Index Optimization
CREATE FULLTEXT INDEX idx_fulltext ON articles(content);Summary
- The combination of
utf8mb4+utf8mb4_unicode_ciis recommended. - Standardize server settings (
my.cnf) and connection character sets. - Explicitly specify
utf8mb4at the database, table, and column levels. - Use
VARCHAR(191)to avoid index key length limitations. - Use
utf8mb4_unicode_cifor accurate comparisons.
8. FAQ
Here are common real-world questions about changing MySQL character sets.
We also cover how to handle errors and how to choose the best settings.
What’s the difference between utf8 and utf8mb4?
SHOW VARIABLES LIKE 'character_set_server';Will changing the MySQL character set cause data loss?
mysqldump -u root -p --default-character-set=utf8mb4 mydatabase > backup.sqlHow do I fix mojibake if it happens?
UPDATE users SET name = CONVERT(CAST(CONVERT(name USING latin1) AS BINARY) USING utf8mb4);What are the risks when converting from latin1 to utf8mb4?
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Does switching to utf8mb4 affect performance?
ALTER TABLE users MODIFY COLUMN email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;Which should I use: utf8mb4_general_ci or utf8mb4_unicode_ci?
| Collation | Characteristics | Use Case |
|---|---|---|
utf8mb4_general_ci | Faster comparisons, but less accurate | Performance-focused systems |
utf8mb4_unicode_ci | Unicode-standard, accurate comparisons | General-purpose use (recommended) |
Will queries become slower after switching to utf8mb4?
CREATE FULLTEXT INDEX idx_fulltext ON articles(content);Summary
✅ utf8mb4 is recommended. utf8 is not recommended due to its limitations.
✅ Before making changes, always check settings with SHOW VARIABLES.
✅ Use export/import workflows to prevent mojibake.
✅ Consider index limitations and use VARCHAR(191) where appropriate.
✅ For performance, add appropriate indexes.
Final Notes
Changing the MySQL character set is not just a simple configuration tweak—it is a critical task that can affect data integrity and performance.
By following the right settings and procedures, you can migrate to utf8mb4 safely and effectively.
🔹 Follow the steps in this article and configure your character set correctly! 🔹


