Every time they ask me the same question, they ask me about the same thing: “How to transcode cracks from a database storing latin1 strings into normal Cyrillic (windows-1251) or utf-8”.
Below I will try to answer this question most fully, and also give a piece of PHP code that uniquely solves the problem.
Firstly, I do not recommend to anyone to continue working in the windows-1251 encoding. This single-byte encoding no longer meets the requirements of modernity. Quickly translate all projects to utf-8. The faster this is done, the sooner you will have problems with krakozabrami.
Now about latin1. This encoding (also known as windows-1252) was commonly used previously in MySQL up to version 4. The symbolic table of Cyrillic letters is in it in place of Arabic characters. But since it is also single-byte, there are no problems when reading the data in this encoding from this table and outputting them as windows-1251, because the codes are the same (0xA0-0xFF). But all this will work only as long as you do not install MySQL 5+, working by default in utf-8.
')
What does MySQL 5+ do by passing you such data? Before transferring to the client's side, he honestly recodes all the data in utf-8, placing Arabic characters (and in latin1 your Cyrillic alphabet is actually Arabic characters) in the range of utf-8 codes where they should be. As a result, if you even try to recode the resulting utf-8-string back to Cyrillic with the iconv function ('utf-8', 'windows-1251', $ str), then you will fail. iconv will give an error or return an empty string.
The first thing a programmer does is he is trying to change the latin1 table encoding to windows-1251 in phpMyAdmin. But MySQL cannot do this (as he writes), because the corresponding Arabic characters are not in the windows-1251 encoding. The second thing that comes to mind is to convert this table to utf-8. And it turns out. Only here the texts are still displayed krakozabrami.
How to be? How to solve this problem ?The solution here is quite simple, but in order to come to it yourself, you need to clearly understand what encodings are and how they work. In understanding my hand-made chart will help.

And here is the algorithm that I use to get the encodings in order.
- I translate all database tables in utf-8 encoding. At the same time, supposedly Cyrillic characters stored in the latin1 encoding, and therefore actually being Arabic, are translated to utf-8 and occupy their legitimate places in the range of utf-8 codes intended for Arabic characters.
- I am writing a micro-utility for PHP, which does the following with each character string:
- a) Translates the string in windows-1252 encoding. There should be no problems. Thus, Arabic letters occupy the range of codes A0-FF.
- b) Translates the received single-byte string to utf-8, but not as windows-1252, but as windows-1251, i.e. giving characters from the range A0-FF to Cyrillic. As a result, the characters fall into utf-8 in the range of codes that is intended for Cyrillic characters.
- Everything, now our line officially is the Cyrillic line in utf-8. It can be written back to the same DB cell, or immediately output to the output stream. However, I still recommend performing a one-time full database conversion, and forgetting latin1 as a nightmare.
Below is the sample code for PHP, which translates the user's full name into a normal Cyrillic encoding.
$q = 'select id, fio from `users`';
$res = mysql_query($q);
while (($row = mysql_fetch_assoc($res)) !== false) {
// fio utf-8/latin1 windows-1252
$s = iconv('utf-8', 'windows-1252', $row['fio']);
// utf-8, windows-1251
$s = iconv('windows-1251', 'utf-8', $s);
//
$q = 'update `users` set fio = "'.addslashes($s).'" where id = '.$row['id'];
mysql_query($q);
}