Index access to Multibyte PHP strings or learning OOP in practice

Background: here, it seems, quite recently the firm in which I worked Delphi as a programmer, issued a final breath and collapsed. Legally possible and not, but a large number of employees began to look for a new job, including me. I will not discuss the demand for the current desktop approach, and the relevance of Delphi, but I decided to use this situation to change the type of activity. Namely, having studied the job offers in my region (small regional center), I decided to become a Web developer in PHP. And in the end, my NG holidays went more at the book than at the festive table.

Almost immediately I ran into a situation that confused me a little: as the language in which most Internet sites revolve, the absolute advantage of UTF-8 was already determined in it, it does not have any more reasonable support for it. After reading about the standard solution method - the mb_strings extension, I calmed down, but some inconvenience in use left my sediment. Namely: the lack of a method of accessing a symbol as an element of an array and analogs of a number of standard functions in their multibyte analogs. But it was impossible to linger, I further studied the literature and on various issues turning to Google, but constantly ran into topics for beginners about the inconvenience of working with multibyte strings and waiting for PHP6. Honestly, they even bored me. If in a direct form there are no analogues in mb_strings, but everything necessary for its own implementation was.

And so, having come to the topic of standard interfaces, I saw what I was lacking as a beginner:

1. OOP approach for encapsulating coding information and string processing methods

class MBString { private $string; private $encoding; public function __construct($string, $encoding = 'UTF-8') { $this->string = $string; $this->encoding = $encoding; } public function __toString() { return (string)$this->string; } /** *     * @return int */ public function Length() { return mb_strlen($this->string, $this->encoding); } /** *     * @return int */ public function Size() { return strlen($this->string); } /** *    * @return string */ public function getEncoding() { return $this->encoding; } /** *    * @param $encoding */ public function setEncoding($encoding) { $this->string = mb_convert_encoding($this->string, $encoding, $this->encoding); $this->encoding = $encoding; } /** *    * @return array */ public static function SupportEncodings() { return mb_list_encodings(); } /** *     * @param int $i * @return string */ public function GetChar($i) { return mb_substr($this->string, $i, 1, $this->encoding); } /** *     * @param int $i * @param string $char */ public function SetChar($i, $char) { $this->string = mb_substr($this->string, 0, $i, $this->encoding) .mb_substr($char, 0, 1, $this->encoding) //      .mb_substr($this->string, $i+1, $this->Length()-($i+1), $this->encoding); } /** *     * @param int $i */ public function UnSetChar($i){ $this->string = mb_substr($this->string, 0, $i, $this->encoding).mb_substr($this->string, $i+1, $this->Length()-($i+1), $this->encoding); } public function UCFirst() { $this->SetChar(0, mb_strtoupper($this->GetChar(0), $this->encoding)); return $this->string; } public function UCWords() { return $this->string = mb_convert_case($this->string, MB_CASE_TITLE, $this->encoding); } }

There is nothing unusual in the class, it uses the standard functions mb_strings for index access to the string character, and further, based on the methods created, an example of the implementation of a couple of functions missing in mb_strings is given: UCFirst and UCWords . Subsequently, the class can be supplemented with everything that you lacked. The class implements the magic method __toString () which turned out to be very useful for this class.

A couple of comments:

since the implementation of this class clearly controls methods where the dimension of the string (in characters) may change, it is recommended to implement the Length method to be cached, i.e. store the size in a private variable and recalculate it (mb_strlen ($ this-> string, $ this-> encoding)) only if the equality is null, but about the same variable in all methods that change the dimension of the string. For the reason that Calls Length is a frequent operation, and the mb_strlen function is not the fastest.
You can add the Add function (MBString $ string) to the class to be able to concatenate strings based on their encodings (convert the string to be attached to the class encoding)

2. standard interface Iterator (IteratorAggregate) for foreach

  protected $iterator_index = 0; public function rewind() { $this->iterator_index = 0; } public function current() { return $this->GetChar($this->iterator_index); } public function key() { return $this->iterator_index; } public function next() { ++$this->iterator_index; } public function valid() { return ($this->iterator_index < $this->Length()); }

The Iterator interface requires the implementation of a number of simple methods, but will allow the foreach operator to walk through the array with all the favorite methods.

Remarks:

Update class header to class MBString implements Iterator
There is also an alternative interface IteratorAggregate with similar functionality. The advantage is that you can “throw out” a part of duplicate methods outside the class, and in the class itself only implement the getIterator function and the mediator returning the class. Its implementation is trivial and practically nothing (except for the reference to the parent class) is no different from the methods listed above.

3. ArrayAccess interface for index access

  public function offsetExists($offset) { return ($offset < $this->Length()); } public function offsetGet($offset) { return $this->GetChar($offset); } public function offsetSet($offset, $value) { $this->SetChar($offset, $value); } public function offsetUnset($offset) { $this->UnSetChar($offset); }

Just a couple of lines of code, but OOP and PHP now allow you to access the character of a string as an element of an array, regardless of the encoding!

Remarks:

Update class header to class MBString implements Iterator, ArrayAccess
When changing a single character, it is still necessary to consider the encoding (the assigned character must have the encoding of the string in the class)

4. well, Interface Countable for more complete emulation of arrays

  public function count() { return $this->Length(); }

Now the count function can be applied to a class, and iterating over a string with a for loop will be akin to traversing an array.

Turn around: Update class header to class type MBString implements Iterator, ArrayAccess, Countable

Usage example

 require_once('MBString.php'); $mbStr = new MBString(' '); echo '  __toString(): ',"$mbStr<br/>"; echo 'Length: ',$mbStr->Length(), ' Size: ', $mbStr->Size(), '<br/>'; echo '$mbStr[0]: ', $mbStr[0], '<br/>'; $mbStr->SetChar(0, 'z'); echo $mbStr, '<br/>'; echo 'UCFirst: ', $mbStr->UCFirst(), '<br/>'; echo 'UCWords: ', $mbStr->UCWords(), '<br/>'; foreach($mbStr as $k=>$v){ echo $v, '-'; } echo '<br>'; for ($i=0; $i< $mbStr->Length(); $i++){ echo $mbStr[$i], '+'; }

   __toString():   Length: 14 Size: 27 $mbStr[0]:  z  UCFirst: Z  UCWords: Z  Z---------- ---- Z++++++++++ ++++

PS Tomorrow ... Or rather, today is my first PHP interview, so I’ll take all the criticism and comments later.

Source: https://habr.com/ru/post/165107/

All Articles