Mwongozo wa Kubadilisha Seti ya Herufi ya MySQL: Badilisha hadi utf8mb4 (Rekebisha Mojibake)

目次

1. Utangulizi

Kwa Nini Unahitaji Kubadilisha Seti ya Herufi ya MySQL

Seti ya herufi ya hifadhidata ni mpangilio muhimu unaoamua jinsi data ya maandishi inavyosandishwa na kuchakatwa kwa ajili ya uhifadhi. Katika MySQL, seti ya herufi chaguo-msingi mara nyingi ni latin1, ambayo inaweza kusababisha matatizo wakati wa kushughulikia Kijapani au herufi nyingine maalum. Hasa wakati wa uhamisho wa data au usawa wa mfumo, kubadili hadi seti ya herufi inayofaa inakuwa muhimu.

Masuala ya Kawaida na Sababu Zao

Matatizo ya kawaida yanayohusiana na seti za herufi za MySQL yanajumuisha yafuatayo.

  1. Mojibake (herufi zilizochafuka)
  • utf8 na latin1 vimechanganywa katika mazingira yale yale
  • Mipangilio ya seti ya herufi ya mteja na seva hailingani
  1. Matatizo wakati wa utafutaji
  • Kutokana na tofauti za ukaribu, matokeo yanayotarajiwa ya utafutaji hayarejeshiwi
  • Mpangilio wa kupanga unatofautiana na kile unachotarajia
  1. Matatizo wakati wa uhamisho wa data
  • Emoji na alama maalum haziwezi kuhifadhiwa kwa sababu utf8mb4 haijatumika
  • Ubadilishaji wa seti ya herufi haujatibiwa ipasavyo wakati wa usafirishaji/kusafirisha

Malengo ya Makala na Muundo

Makala hii inatoa maelezo kamili ya mabadiliko ya seti ya herufi ya MySQL, kutoka dhima za msingi hadi jinsi ya kubadilisha mipangilio na utatuzi wa matatizo.

Muhtasari

  1. Ujuzi wa msingi wa seti za herufi za MySQL
  2. Jinsi ya kukagua seti ya herufi ya sasa
  3. Jinsi ya kubadilisha seti ya herufi ya MySQL
  4. Utatuzi wa matatizo baada ya mabadiliko
  5. Jinsi mabadiliko ya seti ya herufi yanavyoathiri utendaji
  6. Mipangilio inayopendekezwa (mazoezi bora)
  7. Maswali Yanayoulizwa Mara kwa Mara (FAQ)

Kwa kusoma mwongozo huu, utaimarisha uelewa wako wa seti za herufi za MySQL na utaweza kuchagua mipangilio sahihi na kuepuka matatizo ya kawaida.

2. Seti ya Herufi ya MySQL ni Nini? Kuelewa Misingi

Seti ya Herufi ni Nini?

A seti ya herufi (Character Set) ni mkusanyiko wa sheria zinazotumika kuhifadhi na kuchakata herufi kama data ya kidijitali. Kwa mfano, wakati wa kuhifadhi herufi ya Kijapani “あ”, UTF-8 inaiwakilisha kama mlolongo wa bajeti E3 81 82, wakati Shift_JIS inatumia 82 A0.

Katika MySQL, unaweza kubainisha seti tofauti za herufi katika ngazi ya hifadhidata au jedwali. Kwa kuchagua seti ya herufi inayofaa, unaweza kuzuia mojibake na kufanya utafsiri wa kimataifa kuwa laini.

Seti za Herufi za Kawaida

Character SetCharacteristicsUse Case
utf8UTF-8 up to 3 bytesDoes not support some special characters (such as emoji)
utf8mb4UTF-8 up to 4 bytesSupports emoji and special characters (recommended)
latin1ASCII-compatibleUsed in older systems

Collation ni Nini?

Collation ni mkusanyiko wa sheria unaotumika kulinganisha na kupanga data ndani ya seti ya herufi. Kwa mfano, inaelezea kama “A” na “a” zinachukuliwa kama herufi moja na jinsi mpangilio unavyobainishwa.

Collations Zinazotumika Mara kwa Mara

CollationDescription
utf8_general_ciCase-insensitive, suitable for general use
utf8_unicode_ciUnicode-based collation (recommended)
utf8mb4_binBinary comparison (use when exact matches are required)

Tofauti Kati ya utf8 na utf8mb4

utf8 ya MySQL inaweza kwa kweli kuhifadhi hadi baiti 3 kwa kila herufi, hivyo haiwezi kushughulikia baadhi ya herufi maalum (kama emoji au baadhi ya herufi za CJK zilizoendelewa). Kinyume chake, utf8mb4 inaunga mkono hadi baiti 4 kwa kila herufi, ndiyo sababu programu za kisasa zinapendekezwa kutumia utf8mb4.

Character SetMax BytesEmoji SupportRecommendation
utf83 bytes❌ Not supported❌ Not recommended
utf8mb44 bytes✅ Supported✅ Recommended

Kwa Nini Unapaswa Kubadili kutoka utf8 hadi utf8mb4

  1. Ulinganifu wa baadaye : Mifumo ya kisasa inaendelea kuanzisha viwango vya utf8mb4 .
  2. Kuhifadhi herufi maalum na emoji : Kwa utf8mb4 , unaweza kushughulikia data katika machapisho ya SNS na programu za ujumbe kwa usalama.
  3. Utafsiri wa kimataifa : Kwa mifumo ya lugha nyingi, hupunguza hatari ya mojibake.

Muhtasari

  • Seti ya herufi inaamua jinsi data inavyohifadhiwa na kuchakatwa.
  • Collation inaamua jinsi herufi zinavyolinganishwa.
  • utf8 ya MySQL imepunguzwa hadi baiti 3, hivyo utf8mb4 inapendekezwa.
  • utf8mb4_unicode_ci ni collation inayopendekezwa sana kwa matumizi ya jumla.

3. Jinsi ya Kukagua Seti ya Herufi ya Sasa

Kabla ya kubadilisha seti ya herufi ya MySQL, ni muhimu kukagua mipangilio ya sasa. Kwa sababu seti za herufi zinaweza kuwekwa katika viwango vingi (hifadhidata, jedwali, safu), unapaswa kuelewa hasa wapi mabadiliko yanahitajika.

Jinsi ya Kukagua Seti ya Herufi ya Sasa

Kukagua Seti ya Herufi ya MySQL kwa Ngazi ya Seva

Kwanza, angalia mipangilio ya seti ya herufi na collation kwa server nzima ya MySQL.

SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';

Mfano wa pato:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb4                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8                        |
+--------------------------+----------------------------+

Angalia Seti ya Herufi kwa Kila Hifadhidata

Ili kuangalia seti ya herufi kwa hifadhidata maalum, tumia swali hili linalofuata.

SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME = 'database_name';

Mfano wa pato

+----------------+----------------------+----------------------+
| SCHEMA_NAME    | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+----------------+----------------------+----------------------+
| my_database   | utf8mb4               | utf8mb4_unicode_ci   |
+----------------+----------------------+----------------------+

Angalia Seti ya Herufi ya Jedwali

Hii ndiyo jinsi ya kuangalia seti ya herufi kwa jedwali maalum.

SHOW CREATE TABLE table_name;

Mfano wa pato

CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `email` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci;

Pointi za Kuangalia

  • DEFAULT CHARSET=latin1 → Sio utf8mb4 , kwa hivyo mabadiliko yanahitajika
  • COLLATE=latin1_swedish_ci → Kubadili hadi utf8mb4_unicode_ci kwa kawaida ni sahihi zaidi

Angalia Seti ya Herufi ya Safu

Ili kukagua seti ya herufi katika kiwango cha safu, endesha SQL hii inayofuata.

SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME 
FROM information_schema.COLUMNS 
WHERE TABLE_SCHEMA = 'database_name' 
AND TABLE_NAME = 'table_name';

Mfano wa pato

+-------------+--------------------+----------------------+
| COLUMN_NAME | CHARACTER_SET_NAME | COLLATION_NAME       |
+-------------+--------------------+----------------------+
| name        | latin1             | latin1_swedish_ci    |
| email       | utf8mb4            | utf8mb4_unicode_ci   |
+-------------+--------------------+----------------------+

Katika mfano huu, safu ya name inatumia latin1, kwa hivyo inashauriwa kubadilisha hadi utf8mb4.

Muhtasari

  • Seti za herufi za MySQL zimepangwa katika viwango vingi (server, hifadhidata, jedwali, safu).
  • Kwa kuangalia seti ya herufi katika kila kiwango, unaweza kutumia mabadiliko sahihi.
  • Tumia amri kama SHOW VARIABLES na SHOW CREATE TABLE ili kuelewa kikamilifu usanidi uliopo.

4. Jinsi ya Kubadilisha Seti ya Herufi ya MySQL

Kwa kubadilisha seti ya herufi ya MySQL kwa usahihi, unaweza kuzuia mojibake na kuunga mkono data ya lugha nyingi kwa urahisi zaidi.
Katika sehemu hii, tunaeleza jinsi ya kusasisha mipangilio katika kila kiwango: server nzima, hifadhidata, jedwali, na safu.

Badilisha Seti ya Herufi ya Chaguo-msingi ya Server Nzima

Ili kubadilisha seti ya herufi ya chaguo-msingi ya server nzima, unahitaji kuhariri faili ya usanidi ya MySQL (my.cnf au my.ini).

Hatua

  1. Fungua faili ya usanidi
  • Kwenye Linux: bash sudo nano /etc/mysql/my.cnf
  • Kwenye Windows: wp:list /wp:list

    • Fungua C:\ProgramData\MySQL\MySQL Server X.X\my.ini
  1. Ongeza au badilisha mipangilio ya seti ya herufi Ongeza au sasisha mistari ifuatayo chini ya sehemu ya mysqld.
    [mysqld]
    character-set-server=utf8mb4
    collation-server=utf8mb4_unicode_ci
    
  1. Washa upya MySQL
    sudo systemctl restart mysql
    

Kwenye Windows:

net stop MySQL && net start MySQL
  1. Thibitisha mabadiliko
    SHOW VARIABLES LIKE 'character_set_server';
    

Badilisha Seti ya Herufi katika Kiwango cha Hifadhidata

ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Thibitisha mabadiliko

SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME 
FROM information_schema.SCHEMATA 
WHERE SCHEMA_NAME = 'mydatabase';

Badilisha Seti ya Herufi katika Kiwango cha Jedwali

ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Thibitisha mabadiliko

SHOW CREATE TABLE users;

Badilisha Seti ya Herufi katika Kiwango cha Safu

ALTER TABLE users MODIFY COLUMN name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Thibitisha mabadiliko

SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME 
FROM information_schema.COLUMNS 
WHERE TABLE_SCHEMA = 'mydatabase' 
AND TABLE_NAME = 'users';

Uthibitishaji Baada ya Mabadiliko na Umuhimu wa Nakili za Akiba

Ili kudumisha uadilifu wa data baada ya kubadilisha seti ya herufi, fuata hatua hizi.

Fanya nakili ya data yako

mysqldump -u root -p --default-character-set=utf8mb4 mydatabase > backup.sql

Kagua upya mipangilio

SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
SHOW CREATE TABLE users;

Ingiza na onyesha data ya majaribio

INSERT INTO users (name, email) VALUES ('Test User', 'test@example.com');
SELECT * FROM users;

Muhtasari

  • Mabadiliko ya seti ya herufi kwa ajili ya seva nzima : Hariri my.cnf na weka character-set-server=utf8mb4
  • Mabadiliko ya seti ya herufi ya hifadhidata : ALTER DATABASE mydatabase CHARACTER SET utf8mb4
  • Mabadiliko ya seti ya herufi ya jedwali : ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4
  • Mabadiliko ya seti ya herufi ya safu : ALTER TABLE users MODIFY COLUMN name VARCHAR(255) CHARACTER SET utf8mb4
  • Baada ya mabadiliko, daima thibitisha mipangilio na jaribu data yako

5. Utatuzi wa Tatizo Baada ya Kubadilisha Seti ya Herufi

Baada ya kubadilisha seti ya herufi ya MySQL, unaweza kukutana na hali ambapo mfumo haufanyi kazi ipasavyo au data iliyohifadhiwa inakuwa na herufi zilizochafuka. Katika sehemu hii, tunaelezea masuala ya kawaida na jinsi ya kuyatatua kwa undani.

Sababu za Mojibake na Jinsi ya Kuirekebisha

Kama mojibake itatokea baada ya kubadilisha seti ya herufi, sababu zifuatazo ni za kawaida.

CauseHow to CheckSolution
The client character set setting differsSHOW VARIABLES LIKE 'character_set_client';Run SET NAMES utf8mb4;
Existing data was stored using a different encodingSELECT HEX(column_name) FROM table_name;Use CONVERT() or re-export the data
The connection encoding is not correctConnect with mysql --default-character-set=utf8mb4Adjust the client-side character set configuration
Application settings (PHP/Python, etc.) are incorrectmysqli_set_charset($conn, 'utf8mb4');Standardize the application’s character set settings

Suluhisho #1: Weka seti ya herufi ya mteja kwa usahihi

SET NAMES utf8mb4;

Suluhisho #2: Badilisha data iliyopo ipasavyo

UPDATE users SET name = CONVERT(CAST(CONVERT(name USING latin1) AS BINARY) USING utf8mb4);

Vidokezo Wakati wa Kubadilisha kutoka latin1 hadi utf8mb4

Utaratibu salama

  1. Fanya nakili ya data ya sasa
    mysqldump -u root -p --default-character-set=latin1 mydatabase > backup.sql
    
  1. Badilisha seti ya herufi ya hifadhidata
    ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    
  1. Badilisha seti ya herufi ya jedwali
    ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    
  1. Reingiza data
    mysql -u root -p --default-character-set=utf8mb4 mydatabase < backup.sql
    

Data Haiwezi Kutafutwa Baada ya Mabadiliko

Kesi #1: Utafutaji wa LIKE haufanyi kazi

SELECT * FROM users WHERE name COLLATE utf8mb4_unicode_ci LIKE '%Tanaka%';

Kesi #2: Utaratibu wa kupanga umekibadilika

SELECT * FROM users ORDER BY BINARY name;

Hatua za Upande wa Programu

Kwa PHP

mysqli_set_charset($conn, 'utf8mb4');

Kwa Python (MySQL Connector)

import mysql.connector

conn = mysql.connector.connect(
    host="localhost",
    user="root",
    password="password",
    database="mydatabase",
    charset="utf8mb4"
)

For Node.js (MySQL2)

const mysql = require('mysql2');

const connection = mysql.createConnection({
  host: 'localhost',
  user: 'root',
  password: 'password',
  database: 'mydatabase',
  charset: 'utf8mb4'
});

Summary

  • Post-change issues generally fall into three categories: client settings, data conversion, and application settings.
  • To prevent mojibake, standardize the client character set using SET NAMES utf8mb4.
  • Watch for LIKE search and sort order changes, and specify COLLATE when needed.
  • Set utf8mb4 in your application as well to avoid encoding mismatches.

6. How Character Set Changes Affect Performance

When changing the MySQL character set to utf8mb4, there are several performance considerations, such as increased storage usage and index limitations.
In this section, we explain the impact and the best countermeasures.

Increased Storage Usage

Compared to MySQL’s utf8, utf8mb4 can use up to 4 bytes per character,
so the overall table size may increase.

Max bytes per character by character set

Character SetMax Bytes per Character
latin11 byte
utf83 bytes
utf8mb44 bytes

For example, with utf8, VARCHAR(255) is up to 765 bytes (255×3),
but with utf8mb4, it becomes up to 1020 bytes (255×4).

Countermeasure

ALTER TABLE posts MODIFY COLUMN title VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Increased Index Size

MySQL enforces a maximum index key length.
After switching to utf8mb4, index entries become larger, and you may hit the limit—making indexes unusable.

Check index impact

SHOW INDEX FROM users;

Example error

ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

Countermeasure

ALTER TABLE users MODIFY COLUMN email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Impact on Query Performance

Changing the character set to utf8mb4 may affect query execution speed.

Operations that may be affected

  • LIKE searches over large datasets
  • ORDER BY processing
  • JOIN query performance

Countermeasure

CREATE INDEX idx_name ON users(name(100));

Memory Usage and Buffer Tuning

With utf8mb4, memory usage may increase.

Recommended settings

[mysqld]
innodb_buffer_pool_size = 1G
query_cache_size = 128M

Summary

  • Switching to utf8mb4 increases storage usage.
  • Index sizes increase and may exceed limits.
  • Query performance can be affected.
  • Because memory usage may increase, buffer sizes may need tuning.

7. Recommended Settings (Best Practices)

By setting MySQL character sets appropriately, you can maintain data integrity while optimizing performance.
In this section, we present recommended MySQL character set configurations and explain key points for an optimal setup.

Recommended MySQL Character Set Configuration

ItemRecommended SettingReason
Character Setutf8mb4Supports all Unicode characters including emoji and special characters
Collationutf8mb4_unicode_ciCase-insensitive and suitable for multilingual systems
Storage EngineInnoDBGood balance of performance and consistency
Indexed string lengthVARCHAR(191)Avoids exceeding MySQL index limits

Recommended my.cnf Settings

1. MySQL Server Character Set Settings

[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init-connect='SET NAMES utf8mb4'
skip-character-set-client-handshake
innodb_large_prefix = ON
innodb_file_format = Barracuda
innodb_file_per_table = 1
innodb_buffer_pool_size = 1G
query_cache_size = 128M

2. Client-Side Character Set Settings

[client]
default-character-set = utf8mb4

Recommended Database Settings

CREATE DATABASE mydatabase DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

To change an existing database character set:

ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Recommended Table Settings

CREATE TABLE users (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Badilisha Seti ya Herufi kwa Jedwali Lililopo

ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Matofauti Kati ya utf8mb4_general_ci na utf8mb4_unicode_ci

CollationCharacteristicsUse Case
utf8mb4_general_ciFaster comparisons, but less accuratePerformance-focused systems
utf8mb4_unicode_ciUnicode-standard, more accurate comparisonsGeneral-purpose use (recommended)

Kama unahitaji msaada wa lugha nyingi au upangaji sahihi, chagua utf8mb4_unicode_ci.

Ubora wa Fahirisi

CREATE FULLTEXT INDEX idx_fulltext ON articles(content);

Muhtasari

  • Mchanganyiko wa utf8mb4 + utf8mb4_unicode_ci unapendekezwa.
  • Sanifisha mipangilio ya seva (my.cnf) na seti za herufi za muunganisho.
  • Taja wazi utf8mb4 katika viwango vya hifadhidata, jedwali, na nguzo.
  • Tumia VARCHAR(191) ili kuepuka vikwazo vya urefu wa ufunguo wa fahirisi.
  • Tumia utf8mb4_unicode_ci kwa ulinganisho sahihi.

8. MAULIZO YA Kawaida

Hapa kuna masuala ya kawaida ya ulimwengu halisi kuhusu kubadilisha seti za herufi za MySQL.
Pia tunashughulikia jinsi ya kushughulikia makosa na jinsi ya kuchagua mipangilio bora.

Tofauti gani kati ya utf8 na utf8mb4?

SHOW VARIABLES LIKE 'character_set_server';

Je, kubadilisha seti ya herufi ya MySQL kusababisha upotevu wa data?

mysqldump -u root -p --default-character-set=utf8mb4 mydatabase > backup.sql

Ninawezaje kurekebisha mojibake ikiwa itatokea?

UPDATE users SET name = CONVERT(CAST(CONVERT(name USING latin1) AS BINARY) USING utf8mb4);

Ni hatari gani wakati wa kubadilisha kutoka latin1 hadi utf8mb4?

ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Je, kubadilisha hadi utf8mb4 kunaathiri utendaji?

ALTER TABLE users MODIFY COLUMN email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Nitumie ipi: utf8mb4_general_ci au utf8mb4_unicode_ci?

CollationCharacteristicsUse Case
utf8mb4_general_ciFaster comparisons, but less accuratePerformance-focused systems
utf8mb4_unicode_ciUnicode-standard, accurate comparisonsGeneral-purpose use (recommended)

Je, masuala yatakuwa polepole baada ya kubadilisha hadi utf8mb4?

CREATE FULLTEXT INDEX idx_fulltext ON articles(content);

Muhtasari

utf8mb4 inapendekezwa. utf8 haipendekezwi kutokana na vikwazo vyake.
Kabla ya kufanya mabadiliko, daima angalia mipangilio kwa SHOW VARIABLES.
Tumia utiririsho wa kuhamisha/kuingiza ili kuzuia mojibake.
Fikiria vikwazo vya fahirisi na tumia VARCHAR(191) mahali inafaa.
Kwa utendaji, ongeza fahirisi zinazofaa.

Matangazo ya Mwisho

Kubadilisha seti ya herufi ya MySQL sio tu marekebisho rahisi ya mipangilio—ni kazi muhimu ambayo inaweza kuathiri uadilifu wa data na utendaji.
Kwa kufuata mipangilio na taratibu sahihi, unaweza kuhamia utf8mb4 kwa usalama na ufanisi.

🔹 Fuata hatua katika makala hii na upange seti yako ya herufi kwa usahihi! 🔹