⬆️ ⬇️

Correction and modification of MySQL encodings

We fight with krakozabrami Due to the fact that quite a lot of people are asking to help fix the problem with MySQL encodings, I decided to write an article describing how to “cure” the most frequent cases.



The article does not describe how to properly configure MySQL encodings (quite a lot has already been written about this), but about cases when there are rather large tables with incorrect encodings and you need to fix everything.



The worst thing about incorrectly configured encodings is that the problem is often difficult to detect, and at first glance it may seem that the site is working properly and there are no problems.



A small digression. Sypex viewer



At some point I got tired of sending people to the cumbersome phpMyAdmin, and the tiny utility Sypex Viewer was written. It is a single PHP file, uses modern Web 2.0 technologies AJAX, JSON and others. The main tasks that were set during creation were the minimum weight, and the maximum convenience and speed of work. Later in the examples there will be screenshots from it, but all the same actions can be done in phpMyAdmin.

')

Data in cp1251 table in latin1



Probably the most popular problem. When the data is encoded in cp1251 (Windows-1251), and the tables have the default encoding latin1. Such situations arise in the following cases:



As a result, the site seems to be all right, but if you look in the Sypex Viewer, the Russian characters will look like “krakozyabry” (as users usually call them).







In the article, I will consider one of the options for converting encodings using the free Sypx Dumper php script as a ready-made solution.

  1. On the "Export" tab, select the desired table.
  2. The encoding should be auto (the remaining parameters are unimportant, you can add a comment, for example, “Dump before correcting the encoding”).
  3. Click Run. Now we have a backup (in any case, it is desirable to do it with any database transformations).
  4. Go to the tab "Import"
  5. Select the backup file you just made.
  6. Select cp1251 encoding and mark the "Encoding correction" option.
  7. Click Run.
  8. That's all we go in Sypex Viewer, to make sure that the Russian characters are displayed correctly.






Data and tables in utf8, but the connection encoding is latin1



Now consider the more advanced case. Gaining popularity in recent times, in connection with the craze UTF-8. The creators of the software began to translate their offspring to UTF-8, but even here everything is not as smooth as we would like.



The problem arises mainly when the tables have the UTF-8 encoding, the data are in UTF-8, but the connection encoding is set to latin1 by default (a typical example, vBulletin 4, although there is a connection encoding setting in configs, but it is commented out default).



As a result, MySQL sends data to UTF-8, but since the encoding is latin1, MySQL tries to convert the data from latin1 to UTF-8. As a result, Russian characters look like this:







The situation is more advanced, but the problem is corrected almost as in the first case, only in clause 2 you need to select the latin1 encoding, and in clause 6 you need to select the utf8 encoding.



Change encoding



Also a common problem is the conversion of encoding from cp1251 to UTF-8. Before performing this step, make sure that your Russian characters are correctly displayed in Sypex Viewer or phpMyAdmin, if this is not the case, then first correct the encoding.

So, again we go into Sypex Dumper.

  1. In the "Export" tab, select the desired table.
  2. Set the encoding to which you want to convert the tables, in this case utf8.
  3. Click Run.
  4. Then go to the "Import" and select the desired file.
  5. We set the utf8 encoding and the "Encoding correction" option.
  6. Click Run.
  7. That's all the tables in UTF-8. Do not forget that you need to set the connection encoding, convert your scripts and templates to UTF-8, set the correct encoding in the headers.




Connection encoding



Do not forget that after coding corrections, you need to make sure that your scripts use the correct encoding of the connection (in principle, this will be immediately visible, they will incorrectly show Russian characters without the necessary encoding of the connection). For some, it is set in the settings, in some you have to add it yourself.



For what it is enough to go through the file search, and find where the function mysql_connect (or mysqli_connect) is called. After this line you need to add a line that indicates the encoding of the connection.



mysql_query("SET NAMES 'cp1251'"); 




Where instead of cp1251, specify the desired encoding of the connection.



Do not forget to make a backup before converting encodings, here as with condoms, it is better to have it and not be needed than when it is needed - it will not be.



PS Thank you Shorts for fun content for examples.

Source: https://habr.com/ru/post/131869/



All Articles