What's wrong with popular articles telling that foo is faster than bar?

Translator's note: I also thought that the time of the articles "What's faster is double or single quotes?" It was 10 years ago. But a similar article ("What performance tricks actually work") recently collected a relatively large rating on Reddit and even got into PHP digest on Habré. Accordingly, I decided to translate an article with a critical analysis of these and similar "tests."

There are many articles (and even entire sites) devoted to the launch of various tests that compare the performance of various syntactic constructions and declare on this basis that one is faster than the other.

the main problem

Such tests are incorrect for many reasons, starting with the question and ending with implementation errors. But most importantly, such tests are meaningless and harmful at the same time .

Mindless because no practical value does not carry. No real project has ever been accelerated using the methods given in such articles. Just because no syntax difference matters for performance, but data processing.
Harmful because they lead to the wildest superstitions and - even worse - cause unsuspecting readers to write bad code, thinking at the same time that they “optimize” it.

This should be enough to close the question. But even if we accept the rules of the game and pretend that these “tests” have at least some sense, it turns out that their results are reduced only to a demonstration of the tester’s lack of education and lack of any experience.

Single vs double

Take the notorious quotes, "single vs. double." Of course, no quotes are faster. First, there is such a thing as opcode cache , which saves the result of parsing the PHP script in the cache. At the same time, the PHP code is saved in the opcode format, where the same string literals are stored as absolutely identical entities, regardless of which quotes were used in the PHP script. That means the absence of even a theoretical difference in performance.

But even if we do not use opcode cache (although it should, if our task is a real increase in performance), we find that the difference in the parsing code is so small (several conditional jumps comparing single-byte characters, literally several processor instructions) that it will absolutely undetectable. This means that any results obtained will demonstrate only the problems in the test environment. There is a very detailed article, Disproving the Single Quotes Performance Myth from core developer PHP Nikita Popov, which examines this issue in detail. Nevertheless, an energetic tester appears almost every month in order to reveal to the public an imaginary “difference” in performance.

Logical inconsistencies

Some of the tests are generally meaningless, simply from the point of view of asking the question: For example, the test entitled "Is a throw really a super-expensive operation?" This is essentially the question "Is it really that it is more costly to handle an error than to not handle it?" Are you serious? Of course, adding any fundamental functionality to the code will make it "slower". But this does not mean that new functionality should not be added at all, under such a ridiculous pretext. If you reason like that, then the fastest program is one that does nothing at all! The program should be useful and work without errors in the first place. And only after this has been achieved, and only if it is slow, should it be optimized. But if the question itself does not make sense, then why even test performance? It is funny that the tester failed to correctly implement even this meaningless test, which will be shown in the next section.

Or another example, a test entitled "Is $row[id] really slower than $row['id'] ?" This is essentially the question "Which code is faster - the one that works with errors, or without?" (since writing id without quotes in this case is an error of the E_NOTICE level, and such writing will be declared obsolete in future versions of PHP). WTF? What is the point of generally measuring the performance of a code with errors? The error should be corrected simply because it is an error, and not because it makes the code run slower. It is funny that the tester failed to correctly implement even this meaningless test, which will be shown in the next section.

Quality tests

And again - even an obviously useless test must be consistent, consistent - that is, measure comparable values. But, as a rule, such tests are made by the left heel, and as a result, the results obtained are meaningless and not relevant to the task.

For example, our stupid tester undertook to measure the "excessive use of the try..catch operator". But in the actual test, he measured not only try catch , but also throw , throwing an exception at each iteration of the loop. But such a test is simply incorrect, since in real life errors do not occur with every script execution.

Of course, tests should not be made on beta versions of PHP and should not compare mainstream solutions with experimental ones. And if the tester undertakes to compare the "parsing speed of json and xml", then he should not use an experimental function in tests.

Some tests simply demonstrate a complete misunderstanding of the task set by the tester himself. A similar example from a recently published article has already been mentioned above: the author of the test tried to find out whether the code causing the error ("Use of undefined constant") will be slower than the code without errors (which uses a syntactically correct string literal), but failed even with this obviously meaningless test, comparing the performance of a quoted number with the performance of a number written without quotes. Of course, you can write numbers without quotes in PHP (as opposed to strings), and as a result, the author tested completely different functionality, having received incorrect results.

There are other issues that need to be taken into account, such as the test environment. There are PHP extensions, such as XDebug, that can have a very large impact on test results. Or the already mentioned opcode cache, which must be included in the performance tests so that the test results can have at least some meaning.

The way testing is done also matters. Since the PHP process dies entirely after each request, it makes sense to test the performance of the entire life cycle, starting with creating a connection to the web server and ending with closing this connection. There are utilities, such as the Apache benchmark or Siege, that allow you to do this.

Real performance improvement

All this is good, but what conclusion should the reader draw from this article? What performance tests are useless by definition? Of course no. But what really matters is the reason why they should run. Testing from scratch is a waste of time. There should always be a specific reason for running performance tests. And this reason is called "profiling" . When your application starts to work slowly, you have to do profiling, which means measuring the speed of various sections of code to find the slowest. After such a site is found, we must determine the cause. Most often this is either much larger than required, the volume of data processed, or a request to an external data source. For the first case, the optimization will be to reduce the amount of data processed, and for the second, to cache the results of the query.

For example, in terms of performance, there is no difference whether we use an explicitly prescribed loop, or the built-in PHP function for processing arrays (which is essentially just syntactic sugar). What really matters is the amount of data we transfer to processing. In case it is unreasonably large, we must cut it down, or move the processing somewhere else (to the database). This will give us a huge performance boost that will be real . While the difference between the ways to call a loop for data processing is unlikely to be noticeable at all.

Only after performing such mandatory performance improvements, or if we cannot cut down the amount of data processed, can we proceed to performance tests. But again, such tests should not be done from scratch. In order to start comparing the performance of an explicit loop and the built-in function, we must be sure that the loop is the cause of the problem, not its contents (spoiler: of course, this is the content).

A recent example from my practice: there was a query in the code using the Doctrine Query Builder, which was supposed to take several thousand parameters. The request itself is quite fast, but Doctrine takes quite a long time to digest several thousand parameters. As a result, the request was rewritten to pure SQL, and the parameters were transferred to the execute () method of the PDO library, which copes with so many parameters almost instantly.

Does this mean that I will not use Doctrine Query Builder anymore? Of course no. It is ideal for 99% of tasks, and I will continue to use it for all requests. And only in exceptional cases it is worth using a less convenient, but more productive method.

The query and parameters for this sample were constructed in a loop. If I had a foolish idea of how the cycle is invoked, I would simply lose time without any positive result. And this is the essence of all performance optimizations: optimize only the code that is slow in your particular case. And not the code that was considered slow a long time ago, in a galaxy far far away, or a code that someone had the idea to call slow based on meaningless tests.

Source: https://habr.com/ru/post/419743/

All Articles