📜 ⬆️ ⬇️

Cheat Sheet on the transition to UTF-8

UTF-8 node: there is a site in X encoding, you need to convert to UTF-8
I am saying a short list of what should be translated into UTF-8 so that the site works correctly.
  1. MySQL database in particular
  2. Installation mbstring
  3. Mbstring configuration
  4. Dealing with unsafe multibyte functions in PHP
  5. htmlentities () for multibyte strings
  6. Content-type headers checking
  7. Check binary files and strings




1. MySQL database in the above privacy


If the site should work with UTF-8, then everything in the database should remain in UTF-8. It is quite logical. We do this to create a new database:
CREATE DATABASE db_name
CHARACTER SET utf8
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
DEFAULT COLLATE utf8_general_ci
;

To change the existing so:
ALTER DATABASE db_name
CHARACTER SET utf8
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
DEFAULT COLLATE utf8_general_ci
;

For tables like this:
ALTER TABLE tbl_name
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
;


2. Installing mbstring


Windows
Linux (debian in particular): # aptitude install php-mbstring
')

3. mbstring configuration


We do the following in php.ini , httpd.conf or .htaccess (Do not forget to deliver php_value for httpd.conf or .htaccess ).
mbstring.language = Neutral ; Set default language to Neutral(UTF-8) (default)
mbstring.internal_encoding = UTF-8 ; Set default internal encoding to UTF-8
mbstring.encoding_translation = On ; HTTP input encoding translation is enabled
mbstring.http_input = auto ; Set HTTP input character set dectection to auto
mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8
mbstring.detect_order = auto ; Set default character encoding detection order to auto
mbstring.substitute_character = none ; Do not print invalid characters
default_charset = UTF-8 ; Default character set for auto content type header



4. Dealing with unsafe multibyte functions in PHP


I’ll give you a list of the functions that you need to use for the correct operation of your script:
mail ()mb_send_mail ()
strlen ()mb_strlen ()
strpos ()mb_strpos ()
strrpos ()mb_strrpos ()
substr ()mb_substr ()
strtolower ()mb_strtolower ()
strtoupper ()mb_strtoupper ()
substr_count ()mb_substr_count ()
ereg ()mb_ereg ()
eregi ()mb_eregi ()
ereg_replace ()mb_ereg_replace ()
eregi_replace ()mb_eregi_replace ()
split ()mb_split ()




5. htmlentities () for multibyte strings



Simple replacement function with php.net. For simple texts should suffice.
/**
* Multibyte equivalent for htmlentities() [lite version :)]
*
* @param string $str
* @param string $encoding
* @return string
**/
function mb_htmlentities($str, $encoding = 'utf-8') {
mb_regex_encoding($encoding);
$pattern = array('<', '>', '"', '\'');
$replacement = array('<', '>', '"', ''');
for ($i=0; $i<sizeof($pattern); $i++) {
$str = mb_ereg_replace($pattern[$i], $replacement[$i], $str);
}
return $str;
}


6. Checking content-type headers



It's simple. Change any to:
header('Content-Type: text/html; charset=UTF-8');


7. Check binary files and strings


You will have to get all the whims and all and forget nothing =)

Cheat Sheet, not an article.

Source: https://habr.com/ru/post/13969/


All Articles