📜 ⬆️ ⬇️

Methods in primitive types of PHP

Some time ago, Anthony Ferrara expressed his thoughts about the future of PHP . I agree with most of his views, but not with all. In the article, I will focus on one specific aspect: converting primitive data types, such as strings or arrays, into “pseudo-objects”, allowing you to execute method calls to them.

Let's start with a few examples:

$str = "test foo bar"; $str->length(); // == strlen($str) == 12 $str->indexOf("foo") // == strpos($str, "foo") == 5 $str->split(" ") // == explode(" ", $str) == ["test", "foo", "bar"] $str->slice(4, 3) // == substr($str, 4, 3) == "foo" $array = ["test", "foo", "bar"]; $array->length() // == count($array) == 3 $array->join(" ") // == implode(" ", $array) == "test foo bar" $array->slice(1, 2) // == array_slice($array, 1, 2) == ["foo", "bar"] $array->flip() // == array_flip($array) == ["test" => 0, "foo" => 1, "bar" => 2] 

Here $str is a regular string and $array is a simple array — they are not objects. We simply give them some object behavior, allowing them to call methods.
')
Please note that this behavior is not far off. This is no longer a dream, something already exists right now. The scalar objects php extension allows you to define methods for primitive types.



The introduction of method call support in primitive types has a number of advantages, which I will consider further:

Ability to clear API


Probably the most common complaint of those who have ever heard of PHP is the inconsistent and incomprehensible naming of functions in the standard library, as well as the equally inconsistent and incomprehensible order of parameters. Typical examples:

 //    strpos str_replace //    strcspn // STRing Complement SPaN strpbrk // STRing Pointer BReaK //    strpos($haystack, $needle) array_search($needle, $haystack) 

Although this problem is often overvalued (we also have an IDE) , it is difficult to deny that the current situation is not sufficiently optimal. It should also be noted that many functions have problems that go beyond the bizarre name. Often, all cases of behavior are taken into account properly taken into account and, accordingly, are not processed, thus there is a need to specifically process them in the calling code. For string functions, as a rule, these are checks for empty lines or offsets at the very end of the line.

The logical solution is to simply add to PHP6 a huge number of aliases for functions that will unify the names and parameters of the call. We will have string\\pos() , string\\replace() , string\\complement_span() or something like that. For me personally (and, it seems, many php-src developers have a similar opinion) this does not make much sense. The current names of the functions are deeply rooted in the muscle memory of any PHP programmer, and there seems to be no need to make some trivial cosmetic changes.

On the other hand, the introduction of the OO API for primitive types makes it possible to redesign the API as a side effect of the transition to a new paradigm. It also allows you to start from a truly clean slate, without having to satisfy any expectations of the old procedural API. Two examples:



My main goal in the OO API for primitive types is to start from scratch, which will allow us to implement a set of properly designed solutions. But, of course, this is not the only advantage of such a step. The TOE syntax offers a number of additional benefits, which will be discussed below.

Improved readability


Procedural calls usually do not stack into a chain. Consider the following example:

 $output = array_map(function($value) { return $value * 42; }, array_filter($input, function($value) { return $value > 10; }); 

At first glance, it is not clear what caused array_map and what array_filter turned array_filter ? In what order did they volunteer? The $input variable is hidden somewhere in the middle between two closures, function calls are written in reverse order, depending on how they are actually used. Now the same example using OO syntax:

 $output = $input->filter(function($value) { return $value > 10; })->map(function($value) { return $value * 42; }); 

I suppose that in this case the order of actions (first the filter, then the mapping) and the original $input array are shown more clearly.

An example, of course, is a bit contrived, since you can always take out closures to variables or use auto-substitution and syntax highlighting in the IDE. Another example (this time from the real code) shows approximately the same situation:

 substr(strtr(rtrim($className, '_'), '\\', '_'), 15); 

In this case, the number of additional parameters '_'), '\\\\', '_'), 15 completely confusing, it is difficult to associate the substituted values ​​with the corresponding function calls. Compare with this version:

 $className->trimRight('_')->replace('\\', '_')->slice(15); 

Here, operations and their arguments are tightly grouped and the order of calling the methods corresponds to the order in which they are executed.

Another bonus that results from this syntax is the absence of the “needle / haystack” problem. While aliases allow us to eliminate this by introducing a naming convention, there is simply no such problem in the OO API:

 $string->contains($otherString) $array->contains($someValue) $string->indexOf($otherString) $array->indexOf($someValue) 

There can be no confusion as to which part performs which role.

Polymorphism


PHP currently provides the Countable interface , which can be implemented in classes, in order to customize the count($obj) output count($obj) . Why is all this necessary? Because we do not have polymorphism of functions. However, we have a polymorphism of methods.

If arrays implement $array->count() as a (pseudo-) method, at the code level you will not have to worry that $array is an array. This can be implemented in any other object using the count() method. In principle, we get the same behavior as when using the Countable , only without the need for any manipulations.

In fact, a much more general solution is hidden here. For example, you could implement a UnicodeString class that implements all methods of type string, and then use regular strings and UnicodeString interchangeably. Well, at least in theory. Obviously, this will work only as long as the use is limited only to string methods, and will fail, after using the concatenation operator, since complete operator overloading is currently supported only for inner classes.

However, I hope it is clear that this is a rather powerful concept. The same applies to arrays, for example. You could use the SplFixedArray class , which behaves in the same way as an array, implementing the same interface.

Now that we have considered some of the advantages of this approach, let's also consider some of the problems that will have to be faced:

Unstable typing


Quote from Anthony's blog:
[C] Kalyars are not objects, but, more importantly, they cannot be any types. PHP depends on a typing system that sincerely believes that strings are integers. Much of the flexibility of the system is based on the fact that any scalar type can be converted to any other with ease. [...]

More importantly, however, because of this weak typing system, you cannot know 100% what type a variable will be. You can tell how you want to relate to it, but you cannot explicitly state what will be under the hood. Even with the help of casting, you will not achieve the ideal situation, as there are times when the type can still change.


To illustrate this problem, consider the following example:

 $num = 123456789; $sumOfDigits = array_sum(str_split($num)); 

Here, $num processed as a string of digits separated by str_split , and then summed with array_sum . Now try doing the same thing using methods:

 $num = 123456789; $sumOfDigits = $num->chunk()->sum(); 

The chunk() method, which is in string , is called from number . What's happening? Anthony offers one solution:

This means that for all scalar operations, all scalar types must be respected. Which leads to an object model where scalars have all the mathematical methods, as well as all string methods. What a nightmare.

The quote already says that such a decision is unacceptable. However, I think that we can completely get rid of such cases simply by throwing an error (exception!). To explain why an idea has a right to life, let's take a look at what types in PHP may matter.

Primitive types in php


In addition to objects, PHP has the following types of variables:

 null bool int float string array resource 

Now let's think that the list can actually have meaningful methods: You can immediately remove the resource (legacy type) and look at the rest. Null and bool , obviously, do not need methods if you do not want to invent abominations like $bool->invert() .

The vast majority of mathematical functions do not look too good as methods. Consider:

 log($n) $n->log() sqrt($n) $n->sqrt() acosh($n) $n->acosh() 

I hope you agree that the mathematical functions for reading are much nicer in the current notation. There are, of course, several methods that it would be reasonable to assign to the number type. For example, $num->format(10) reads quite nicely. More on this. There is no real need for the OO number API, as there are few features that can be enabled. In addition, the current mathematical API is not so problematic in terms of naming according to mathematical operations, the names are pretty standardized.

Only strings and arrays remain. We have already seen that there are many good APIs for these two types. But what does all this have to do with the problem with weak typing? The important point is the following:

Although the presentation of strings as integers is very often used, for example, those arriving via HTTP or from a database, the reverse is not true: it is very rare that you need to use an integer as a string. The following code will confuse me:

 strpos(54321, 32, 1); 

Handling numbers as strings is a pretty weird job. I think it is perfectly normal to demand a reduction in this case. Using the original example with the sum of numbers:

 $num = 123456789; $sumOfDigits = ((string) $num)->chunk()->sum(); 

Here we found out that, yes, in fact, it is not necessary to cast a number to a string. It is acceptable for me in such cases to use such a hack.

With arrays, the situation is even simpler: it does not make sense to use operations to work with arrays with a non-array array.

Another factor that improves this issue is the control of scalar types (which is present in any version of PHP). If you use the string type control, you will always have to submit a string at the input (even if the value passed to the function is missing - depending on the details of the type control implementation).

But this does not mean that there is no problem at all. Due to improperly designed functions, it can sometimes happen that an unexpected type sneaks into the code. For example, substr($str, strlen($str)) , someone very “wise” decided to return bool(false) instead of string(0) "" . However, such a question concerns only substr . It is not related to API methods, so you will not encounter this.

Object transfer semantics


In addition to the problem with implicit typing, there is another semantic question about pseudo-methods in primitive types: objects and types in PHP have different semantic ways of interacting with themselves. If we start allowing methods to be called in strings and arrays, they will start to look like objects and some people might start to expect that they have object semantics. This issue affects both strings and arrays:

 function change($arg) { echo $arg->length(); // $arg    $arg[0] = 'x'; //    :3 } $str = 'foo'; change($str); // $str   $array = ['f', 'o', 'o']; change($array); // $array   

One could change the effect of semantics. In my eyes, the transmission of large structures, such as arrays, by value, is a rather bad idea; first of all, it would be preferable that they are transferred by object. However, there would be a rather large hole in backward compatibility with the change of approach, at least I think so, I did not perform tests to determine the actual impact of such a change. For strings, on the other hand, passing as an object will be disastrous if we force the strings to be completely unchanged. Personally, I think the current approach, which allows changing a particular character in a string at any time, is very convenient (try doing the same in Python).

I do not know if there is a good way to solve this problem, except for the explicit mention in our documentation that strings and arrays are only pseudo-objects with methods, and not real objects.

The problem can also be extended to other object-related functions. For example, you could write something like $string instanceof string to explicitly define a string this or a real object. I do not know how far all this should go. It is better to strictly adhere to all methods and explicitly mention that these are not real objects. In this case, you get good support for the features of the OO system. We will have to think about it.

Current state


In conclusion, this approach has a number of problems, but they should not be considered as particularly important. At the same time, it provides great opportunities to implement environmentally friendly APIs for our basic types and to improve readability (and writing) of the code for performing operations with them.

What is the state of the idea? The people are not particularly against such an approach, and they want such alias to exist everywhere. The main thing that is not enough to move forward on this issue is the lack of a developed specification for the API.

I created a project scalar objects , which is implemented as a PHP extension. It allows you to register a class that will handle method calls for the corresponding primitive type. Example:

 class StringHandler { public function length() { return strlen($this); } public function contains($str) { return false !== strpos($this, $str); } } register_primitive_type_handler('string', 'StringHandler'); $str = "foo bar baz"; var_dump($str->length()); // int(11) var_dump($str->contains("bar")); // bool(true) var_dump($str->contains("hello")); // bool(false) 

Work has now begun on the string handler , which includes the API specification , but I have not finished the project. I hope I find the motivation to someday continue to develop this idea. There are already a number of projects working on similar APIs.

Here is one of those things that I would like to see in the new PHP.

Source: https://habr.com/ru/post/240561/


All Articles