📜 ⬆️ ⬇️

UTF-8 in PHP. Part 1

Hello, with this post I would like to try to bring a bright future closer, in which everyone uses the “kosher” encoding of UTF-8. In particular, this concerns the environment closest to me - the web and programming language - PHP, and at the end of the series we will come to the practical part and develop another bicycle library.

1. Introduction


To understand the following text, beginners need to know some details on the coding in general. I will try to simplify the presentation of the material. For those who don’t know anything about bitwise operations, you must first familiarize yourself with the materials on Wikipedia .

You need to start with the understanding that the computer works with numbers and store the string (and the symbol, as part of it), also in a numerical form. For these purposes, there are encodings. In fact, these are tables in which the correspondence between numbers and symbols is indicated. Historically, the basic ASCII encoding contains only control codes and Latin characters, a total of 128 (127 is the maximum number that can be stored in 7 bits).

In order to store other ASCII-based texts, many other encodings were created, in which the 8th bit was added. They can already store up to 256 characters, the first 128 of which traditionally corresponded to ASCII, but for the rest, everyone shoved everything he wanted. So it turned out that each operating system vendor has its own set of encodings, and each satisfies the needs of only a relatively narrow circle of people. The situation was even more complicated by the lack of common standards, it became algorithmically impossible to distinguish between them and now it looks more like a guessing (more on this in the following sections).
')
As a result, a universal exit was needed, a coding that can store all possible characters and will take into account differences in the letter of different nations (for example, the direction of the letter). The task was solved by the creation of Unicode, which is capable of encoding virtually all writing systems in the world in a single encoding.

UTF-8 has become the most popular web encoding, which has several significant advantages:

I would like to elaborate on the last paragraph. This means that if earlier it was possible to perform a simple conversion on a table and record the result, then now the method of saving this result is also determined, depending on the bit depth that is required for its storage. On the example of the principle of storage you can see in the table (x - stored data bits):
BitMaximum Stored Value1 octet2 octets3 octets4 octets
Initial octetContinuing octets
7U + 007F0xxxxxxx
elevenU + 07FF110xxxxx10xxxxxx
sixteenU + FFFF1110xxxx10xxxxxx10xxxxxx
21U + 10FFFF (according to the standard, but really U + 1FFFFF)11110xxx10xxxxxx10xxxxxx10xxxxxx


It is easy to see that in the high bits of the initial octet there is always a counter indicating the number of bytes in the sequence — this is the number of leading ones, after which there is a zero. Note: if there is only one octet, then the leading unit is not indicated, so that the initial octets are easily distinguished from the continuing ones.

For example, let's see how the string “Hi Hi” will look in UTF-8 encoding.

Step one. Translate each character into its numeric representation (I will use hexadecimal numbering system) according to the table .

Hi Hi = 0x041F 0x0440 0x0438 0x0432 0x044D 0x0442 0x0020 0x0048 0x0069
Do not forget that the space is also a symbol.

Step two. Convert numbers from hexadecimal to binary. We use the calculator Windows 7 (in programmer mode).

0x041F = 0000 0100 0001 1111
0x0440 = 0000 0100 0100 0000
0x0438 = 0000 0100 0011 1000
0x0432 = 0000 0100 0011 0010
0x0435 = 0000 0100 0011 0101
0x0442 = 0000 0100 0100 0010
0x0020 = 0010 0000
0x0048 = 0100 1000
0x0069 = 0110 1001
For clarity, I added zeros to the high bits. Please note: characters can occupy a different number of bytes.

Step three. Translate numeric representations into UTF-8 octet sequences.

0x041F = 100 0001 1111 = 110 xxxxx 10xxxxxx = 110 10000 10 011111
0x0440 = 100 0100 0000 = 110 xxxxx 10xxxxxx = 110 10001 10 000000
0x0438 = 100 0011 1000 = 110 xxxxx 10xxxxxx = 110 10000 10 111000
0x0432 = 100 0011 0010 = 110 xxxxx 10xxxxxx = 110 10000 10 110010
0x0435 = 100 0011 0101 = 110 xxxxx 10xxxxxx = 110 10000 10 110101
0x0442 = 100 0100 0010 = 110 xxxxx 10xxxxxx = 110 10001 10 000010
0x0020 = 010 0000 = 0 xxxxxx = 0 0100000
0x0048 = 100 1000 = 0 xxxxxx = 0 1001000
0x0069 = 110 1001 = 0 xxxxxx = 0 1101001
Counters in bold. Please note: characters with codes up to 0x0080 remain unchanged, this is ASCII compatibility. It should also be understood that UTF-8 will take 2 times more space (2 bytes) for Russian-language text than Windows-1251, which uses only 1 byte.

As a solution, you can record the entire sequence in a row (hopefully without errors): “11010000 10011111 11010001 10000000 11010000 10111000 11010000 10110010 11010000 10110101 11010001 10000010 00100000 01001000 01101001”.

You can check the solution with the code:
$ tmp = " ;
foreach ( explode ( '' , '11010000 10011111 11010001 10000000 11010000 10111000 11010000 10110010 11010000 10110101 11010001 10000010 00100000 01001000 01101001' ) as $ octet ) {
$ tmp . = chr ( bindec ( $ octet ) ) ;
}
echo $ tmp ;


To perform the reverse operation in the code we need (simplified):
  1. Determine the number of octets in the 1st character and save this value;
  2. From the first byte, discard the octet counter, save the remainder;
  3. If, in a sequence of more than 1 octet, shift the remainder after operation 2 by 6 bits to the left and record information in them from the lower 6 bits of the subsequent octet;
  4. Repeat from 1 point to meet :).


Optimized PHP code that allows you to get a numeric representation of characters and the inverse operation (publish the full version at the end of the cycle):
Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  1. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  2. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  3. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  4. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  5. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  6. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  7. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  8. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  9. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  10. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  11. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  12. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  13. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  14. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  15. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  16. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  17. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  18. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  19. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  20. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  21. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  22. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  23. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  24. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  25. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  26. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  27. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  28. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  29. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  30. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  31. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  32. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  33. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  34. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  35. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  36. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  37. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  38. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  39. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  40. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  41. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  42. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  43. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  44. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  45. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  46. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  47. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  48. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  49. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  50. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  51. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  52. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  53. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  54. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  55. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  56. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  57. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  58. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  59. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  60. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  61. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  62. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  63. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  64. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  65. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  66. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  67. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  68. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  69. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  70. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  71. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  72. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  73. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  74. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  75. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  76. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  77. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  78. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  79. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  80. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  81. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  82. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  83. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  84. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  85. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  86. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  87. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  88. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }
  89. Copy Source | Copy HTML class String_Multibyte { /** <br/> * UTF-8 , $index $char. <br/> * , , BOM 0x10FFFE-0x10FFFF FALSE. <br/> * <br/> * [...] , . <br/> * <br/> * @author Andrew Dryga <anddriga at gmail>, {@link http://andryx.habrahabr.ru}. <br/> * @param string $char (). <br/> * @param int &$index , . , . <br/> * @return int|false FALSE , . <br/> */ public function getCodePoint( $char , & $index = 0 ) { // $octet1 = ord( $char [ $index ]); // ASCII ( 0bbb bbbb), . if ( $octet1 >> 7 == 0x00 ) { return $octet1 ; } elseif ( $octet1 >> 6 != 0x02 ) { // if (! isset ( $char [++ $index ])) { return false ; } // $octet2 = ord( $char [ $index ]); // ( 10bb bbbb) if ( $octet2 >> 6 != 0x02 ) { -- $index ; return false ; } // 6 $octet2 &= 0x3F ; // , if ( $octet1 >> 5 == 0x06 ) { $result = ( $octet1 & 0x1F ) << 6 | $octet2 ; // if ( 0x80 < $result ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet3 = ord( $char [ $index ]); if ( $octet3 >> 6 != 0x02 ) { -- $index ; return false ; } $octet3 &= 0x3F ; if ( $octet1 >> 4 == 0x0E ) { $result = ( $octet1 & 0x0F ) << 12 | $octet2 << 6 | $octet3 ; // ; , BOM if ( 0x800 < $result && !( 0xD7FF < $result && $result < 0xF900 ) && $result != 0xFEFF ) { return $result ; } } else { if (! isset ( $char [++ $index ])) { return false ; } $octet4 = ord( $char [ $index ]); if ( $octet4 >> 6 != 0x02 ) { -- $index ; return false ; } $octet4 &= 0x3F ; if ( $octet1 >> 3 == 0x1E ) { $result = ( $octet1 & 0x07 ) << 18 | $octet2 << 12 | $octet3 << 6 | $octet4 ; // ; ; // , Unicode 10FFFF if ( 0x10000 < $result && $result < 0xF0000 ) { return $result ; } } } } return false ; } } /** <br/> * UTF-8 . <br/> * [...] <br/> * @author ur001 <ur001ur001@gmail.com>, {@link http://ur001.habrahabr.ru}. <br/> * @param string $codePoint Unicode character ordinal. <br/> * @return string|FALSE UTF-8 FALSE . <br/> */ public function getChar( $codePoint ) { if ( $codePoint < 0x80 ) { return chr( $codePoint ); } elseif ( $codePoint < 0x800 ) { return chr( 0xC0 | $codePoint >> 6 ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x10000 ) { return chr( 0xE0 | $codePoint >> 12 ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } elseif ( $codePoint < 0x110000 ) { return chr( 0xF0 | $codePoint >> 18 ) . chr( 0x80 | $codePoint >> 12 & 0x3F ) . chr( 0x80 | $codePoint >> 6 & 0x3F ) . chr( 0x80 | $codePoint & 0x3F ); } else { return false ; } } }

The getChar () method was taken from the Jevix library, I have already seen this code anyway, I remembered it well, and even when implementing it from memory it would be dishonest not to mention the author.

You can test the resulting class with the code:
Copy Source | Copy HTML
  1. // Create an instance of the object
  2. $ obj = new String_Multibyte ();
  3. // Create a string in the most convenient way to test
  4. $ tmp = " ;
  5. foreach (explode ( " , " 11010000 10011111 11010001 10000000 11010000 10111000 11010000 10110010 11010000 10110101 11010001 10000010 00100000 01001000 01101001 " ) as $ octet ) {
  6. $ tmp . = chr (bindec ( $ octet ));
  7. }
  8. // Build the character code map
  9. $ map = array ();
  10. $ len = strlen ( $ tmp );
  11. for ( $ i = 0 ; $ i < $ len ; $ i ++) {
  12. if ( true == ( $ result = $ obj -> getCodePoint ( $ tmp , $ i ))) {
  13. $ map [] = $ result ;
  14. }
  15. }
  16. // Clear the string and restore it from the map
  17. $ tmp = " ;
  18. $ count = count ( $ map );
  19. for ( $ i = 0 ; $ i < $ count ; $ i ++) {
  20. $ tmp . = $ obj -> getChar ( $ map [ $ i ]);
  21. }
  22. // Display the restored string
  23. echo $ tmp , '<br />' .EOL;
  24. // Check it for validity (this is the easiest way)
  25. echo preg_match ( '#. {1} #u' , $ tmp )? 'Valid Unicode' : 'Unknown' , '<br />' .EOL;
I did not try to write the most beautiful or correct code for the tests, but with the help of it you can easily change the values ​​of characters and immediately see the result. All invalid sequences will be ignored, the output string is always valid, but that’s not all.

To be sure that the text does not contain anything superfluous, it is necessary to remove unnecessary (unprintable, markup, unspecified, surrogate, etc.) symbols from it and carry out normalization, in the next section.

PS:
Next will be about normalization, security, definition of encodings and working with UTF-8 in PHP.

References:

Source: https://habr.com/ru/post/113715/


All Articles