MySQL Japanese Character Encoding Fix: Prevent Mojibake with utf8mb4 (Complete Guide)

目次

1. Introduction

Having Trouble Handling Japanese in MySQL? Causes and Complete Solutions Explained

MySQL is widely used as a database for web applications and WordPress. However, have you ever encountered issues such as garbled Japanese text or characters displaying as “???”?

This problem frequently occurs for beginners and in local development environments such as XAMPP, MAMP, or virtualized setups like Docker. The primary cause is improper character encoding configuration in MySQL.

In this article, we clearly explain how to correctly configure MySQL to handle Japanese text, along with common issues and their solutions.

We also include practical guidance for real-world environments, such as Docker configuration, my.cnf settings, and modifying existing databases. This guide is suitable for both beginners and professional engineers.

In the next section, we will examine the fundamental reason why Japanese characters become garbled.

2. Main Causes of Japanese Text Garbling

Why Doesn’t MySQL Display Japanese Correctly?

If Japanese text appears as “???” or unreadable symbols in MySQL, the cause is almost certainly incorrect character encoding settings. MySQL is highly flexible, but if the character set and collation settings do not match, data cannot be stored and retrieved correctly.

Below are the three most common causes.

Cause 1: Default Character Set Remains latin1

Older MySQL versions or default installations sometimes use latin1 (Western European language encoding). Since latin1 cannot properly handle Japanese, characters become corrupted at insertion time. This means the data is already corrupted when stored in the database.

Cause 2: Character Set Mismatch Between Client and Server

MySQL involves character encoding at three stages:

  • During transmission from the client (character_set_client)
  • During server-side processing (character_set_server)
  • During result output (character_set_results)

For example, even if the client uses utf8mb4, if the server processes data as latin1, corruption occurs during processing. This mismatch is one of the most common pitfalls.

Cause 3: Inconsistent Database, Table, and Column Settings

When creating new tables without explicitly specifying a character set, MySQL applies its default configuration. This can result in inconsistent settings such as:

  • Database: utf8mb4
  • Table: utf8
  • Column: latin1

Such inconsistency causes garbled text during storage and display.

Summary: Most Issues Stem from Character Set Mismatches

In most cases, Japanese garbling in MySQL occurs because configured character sets do not match. In the next section, we will explain how to check the current character encoding settings in MySQL. Proper verification allows you to quickly identify and fix the issue.

3. How to Check MySQL Character Set Settings

The First Step to Finding the Cause Is Checking the Current Settings

When MySQL cannot handle Japanese correctly, the first thing you should check is the current settings for the character set and collation.
In MySQL, multiple character sets are exchanged between the client and the server, and they must match.

Here, we explain how to check these settings using the command line and SQL queries.

Check Character Sets with the SHOW VARIABLES Command

While connected to MySQL, run the following SQL to check the current character set configuration:

SHOW VARIABLES LIKE 'character_set%';

After running this command, you will get output like the following:

+--------------------------+---------+
| Variable_name            | Value   |
+--------------------------+---------+
| character_set_client     | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database   | utf8mb4 |
| character_set_results    | utf8mb4 |
| character_set_server     | utf8mb4 |
| character_set_system     | utf8    |
+--------------------------+---------+

What Each Setting Means

SettingMeaning and Role
character_set_clientThe encoding of strings sent from the client
character_set_connectionThe character set used during client-to-server communication
character_set_resultsThe character set used when query results are returned to the client
character_set_databaseThe default character set of the currently selected database
character_set_serverThe default character set used when creating new databases and tables
character_set_systemThe character set used internally by the server (usually no need to change)

In particular, it is crucial that character_set_client, character_set_connection, and character_set_results all match. If they differ, strings can become corrupted when sent or returned.

Checkpoints to Prevent Garbled Text

  • Confirm that all items are set to utf8mb4
  • If multiple character sets are mixed, apply the configuration changes introduced later
  • Be careful: tables and columns may have their own character set settings

Note: Also Check Collation Settings

Collation affects string ordering and comparison behavior. You can check it with:

SHOW VARIABLES LIKE 'collation%';

Collation is less likely to directly cause mojibake, but it affects sorting and search accuracy for Japanese text. It’s reassuring to confirm that settings like utf8mb4_general_ci or utf8mb4_unicode_ci are used.

In the next section, we will explain concrete configuration methods to properly handle Japanese in MySQL, including how to modify these settings.

4. How to Configure MySQL to Handle Japanese Correctly

Say Goodbye to Mojibake with the Right Settings

To handle Japanese correctly in MySQL, it’s essential to standardize all character set settings. In particular, utf8mb4 is the recommended choice because it supports not only Japanese, but also emojis and special characters.

In this section, we explain concrete configuration methods for the client side, server side, and database/table/column levels.

4.1 Client-Side Configuration: Explicitly Set It on Connection

Right after connecting to MySQL, run the following command to lock the connection character set to utf8mb4:

SET NAMES 'utf8mb4';

This command applies to the following three variables at once:

  • character_set_client
  • character_set_connection
  • character_set_results

✅ Note:

  • If you connect from PHP, write something like mysqli_set_charset($conn, 'utf8mb4');.
  • When using the mysql CLI command, specifying --default-character-set=utf8mb4 is also effective.

4.2 Server-Side Configuration: Persistent Settings via my.cnf

By adding settings like the following to my.cnf (or my.ini), you can change the default character set for the entire MySQL server to utf8mb4:

[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-server = utf8mb4 collation-server = utf8mb4_general_ci

✅ Important Notes:

  • You must restart MySQL after changing the configuration.
  • Example: sudo systemctl restart mysql (Linux)
  • The file location varies by environment. Common Linux paths include /etc/mysql/my.cnf and /etc/my.cnf.

4.3 Specify Character Sets for Databases and Tables

When creating new databases or tables, explicitly specify the character set:

Example: Creating a Database
CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Example: Creating a Table
CREATE TABLE users (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(100)
) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
If You Need to Convert an Existing Table
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

4.4 Recommended Character Set: Why utf8mb4?

MySQL also has a character set called utf8, but it only supports up to 3 bytes per UTF-8 character. As a result, emojis and some kanji variants cannot be stored properly.

In contrast, utf8mb4 supports up to 4 bytes and is therefore fully UTF-8 compatible. This is why it has become the standard recommendation today.

In the next chapter, we will explain Japanese-related settings and precautions specific to Docker environments. Let’s cover the key points to prevent mojibake even in containerized development setups.

5. Handling Japanese in a Docker Environment

Ensuring Proper Japanese Support in Containerized Environments

In recent years, Docker has become a common development environment. However, many developers report that “Japanese text becomes garbled in MySQL running on Docker.” This usually happens because the container locale settings or the initial MySQL configuration are not properly configured.

In this section, we introduce practical solutions for correctly handling Japanese when using MySQL in Docker.

5.1 Configure Locale Support in the Dockerfile

If your application server (not just the MySQL container) needs to handle Japanese, locale configuration is required. Below is an example for a Debian-based Dockerfile:

RUN apt-get update && apt-get install -y locales \
  && locale-gen ja_JP.UTF-8 \
  && update-locale LANG=ja_JP.UTF-8

ENV LANG=ja_JP.UTF-8
ENV LC_ALL=ja_JP.UTF-8

✅ Key Points:

  • Prevents encoding errors when reading or writing Japanese files on the application side.
  • Affects not only MySQL but also runtime environments such as PHP and Python.

5.2 Specify Character Sets in docker-compose

When launching a MySQL container with docker-compose.yml, you can specify character sets as follows:

services:
  db:
    image: mysql:8.0
    container_name: mysql-ja
    environment:
      MYSQL_ROOT_PASSWORD: rootpass
      MYSQL_DATABASE: mydb
      MYSQL_USER: user
      MYSQL_PASSWORD: password
      TZ: Asia/Tokyo
      LANG: ja_JP.UTF-8
      LC_ALL: ja_JP.UTF-8
    command:
      --character-set-server=utf8mb4
      --collation-server=utf8mb4_general_ci
    ports:
      - "3306:3306"
    volumes:
      - ./mysql-data:/var/lib/mysql

✅ Additional Notes:

  • The command: section allows you to pass startup parameters to MySQL.
  • TZ and LANG help ensure a proper Japanese-compatible environment.

5.3 Verify Japanese Support Inside the MySQL Container

To confirm that MySQL is properly configured with utf8mb4, enter the container and check:

docker exec -it mysql-ja mysql -u root -p

After logging in, run:

SHOW VARIABLES LIKE 'character_set%';

If all relevant settings are utf8mb4, Japanese text storage and display should work reliably.

Summary: In Docker, Startup Settings and Locale Are Critical

To safely handle Japanese in MySQL within Docker:

  • Explicitly specify utf8mb4 when starting the MySQL container
  • Set the application container locale to ja_JP.UTF-8

These pre-configurations are extremely important.

In the next section, we will cover frequently reported issues and their practical solutions.

6. Common Problems and How to Fix Them

Still Seeing Garbled Text After Configuration? The Cause May Remain

Even after changing MySQL settings to utf8mb4, Japanese text may still not display or save correctly. In this section, we introduce frequently reported issues and their practical solutions.

Problem 1: Configuration Changes Do Not Take Effect

Cause:

After modifying configuration files such as my.cnf or docker-compose.yml, MySQL was not restarted.

Solution:

  • Server environment: sudo systemctl restart mysql
  • Docker environment: docker-compose downdocker-compose up -d

Problem 2: Japanese Appears Garbled in the Terminal

Cause:

The issue may not be MySQL itself but the terminal’s display encoding. For example, Windows Command Prompt may not properly display UTF-8.

Solution:

  • Windows: Switch to UTF-8 using chcp 65001
  • macOS/Linux: Ensure terminal encoding is set to UTF-8 (usually default)

Problem 3: Existing Databases or Tables Were Created with latin1

Cause:

If existing databases or tables were originally created with latin1, Japanese data may already be corrupted.

Solution:

  1. Check the table structure:
SHOW CREATE TABLE your_table_name;
  1. Convert the table character set:
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

Important:

Already corrupted data cannot be repaired by conversion alone. Consider restoring from backup or manually correcting the data.

Problem 4: Character Encoding Mismatch in PHP or Python Applications

Cause:

Even if MySQL uses utf8mb4, garbling occurs if the application sends data in a different encoding.

Solution:

  • PHP: mysqli_set_charset($conn, "utf8mb4");
  • Python (MySQL Connector): Specify charset='utf8mb4' when connecting

Problem 5: Garbled Text When Importing/Exporting CSV or Excel Files

Cause:

CSV or Excel files may use Shift_JIS or UTF-8 with BOM, which may not align with MySQL’s utf8mb4 configuration.

Solution:

  • Convert CSV files to UTF-8 before importing
  • Explicitly execute SET NAMES 'utf8mb4'; before exporting
  • When saving from Excel, choose “UTF-8 (with BOM)” format

Comprehensive Troubleshooting Checklist

CheckpointStatus
All character_set_* variables are utf8mb4
collation_server is utf8mb4_general_ci
Database, table, and column character sets are explicitly defined
Application sends data using utf8mb4
Environment (terminal/editor) encoding is UTF-8

In the next section, we will summarize the key points and provide final recommendations for safely handling Japanese in MySQL environments.

7. Conclusion

Reviewing the Essential Concepts and Settings for Handling Japanese in MySQL

To properly handle Japanese in MySQL, it is not enough to assume that “setting it to utf8 is sufficient.” What truly matters is configuration consistency and understanding the entire data flow.

Key Points Covered in This Article:

  • The main cause of Japanese mojibake is the use of inappropriate character sets such as latin1 or mismatched settings between client and server.
  • MySQL character set settings can be checked using the SHOW VARIABLES command.
  • The recommended character set is utf8mb4. It is fully UTF-8 compatible and supports emojis and extended kanji characters.
  • Configuration should be applied at three levels: client, server, and database/table level.
  • In Docker environments, specifying command: and LANG is essential. Both locale and character set must be properly configured.
  • If issues occur, isolate and troubleshoot step by step. Check not only MySQL itself but also the terminal, application layer, and external data interactions.

Best Practices for Future Operations

  • When setting up a new MySQL environment, design it with utf8mb4 as the default from the beginning.
  • In team or multi-environment development, document and share configuration files and connection parameters.
  • In Docker or CI/CD environments, automating configuration via environment variables and managed config files is key.
  • During data import/export, consider using character encoding conversion tools such as iconv or nkf.

Final Thoughts

Once your MySQL environment is properly configured for Japanese, ongoing development and operations become significantly smoother.
Understanding “why mojibake occurs” and “which settings must be configured” allows you to prevent problems before they happen and ensure stable data processing.

We hope this guide helps you build a more reliable and comfortable development environment.

8. Frequently Asked Questions (FAQ)

Common Questions About MySQL and Japanese Support

Q1. Japanese text appears as “???”. What is the cause?

A. The most common cause is a character encoding mismatch. For example, if the client sends Japanese text using utf8mb4 but the server receives it as latin1, mojibake occurs.
Executing SET NAMES 'utf8mb4'; when connecting resolves many cases.

Q2. I set utf8mb4 in my.cnf, but it does not apply.

A. Simply editing my.cnf is not enough. You must restart the MySQL server.
On Linux, run sudo systemctl restart mysql. In Docker, execute docker-compose down followed by docker-compose up -d.

Q3. Existing tables contain garbled Japanese. Can they be fixed?

A. Full recovery can be difficult, but you can try the following steps:

  1. Check the table structure (SHOW CREATE TABLE)
  2. Convert the character set
ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

If data has already been corrupted, restoring from backup or manual correction may be required.

Q4. I use MySQL in Docker and experience Japanese garbling.

A. In addition to MySQL settings, you must configure the locale in your Dockerfile or docker-compose.yml (e.g., LANG=ja_JP.UTF-8).
Also explicitly specify --character-set-server=utf8mb4 when starting the MySQL container.

Q5. What is the difference between utf8 and utf8mb4? Which should I use?

A. MySQL’s utf8 only supports 3-byte UTF-8 characters. In contrast, utf8mb4 supports 4-byte characters, including emojis and extended kanji.
From both compatibility and future-proof perspectives, utf8mb4 is strongly recommended.

Q6. CSV files exported from Excel become garbled. What should I do?

A. Excel may use Shift_JIS or UTF-8 with BOM by default, which can conflict with MySQL settings.
Save the CSV file explicitly in UTF-8 format, or execute SET NAMES 'utf8mb4'; before importing to align encodings.


If these FAQs do not resolve your issue, review your configuration from the beginning or consider rebuilding the environment per setup.
Handling technical challenges patiently is the key to properly managing Japanese data in MySQL.