Atomic reactor in every site

Everyone has heard that PHP was created to die . So, this is not entirely true. If you want to - PHP may not die, work asynchronously, and even supports honest multithreading. But not all at once, this time we will talk about how to make it live for a long time, and the atomic reactor will help us in this!

A nuclear reactor is a ReactPHP project, the description states “Nuclear Reactor Written in PHP”. This article pushed me to get to know him (the picture above is from there). I re-read it several times throughout the year, but I could not get to the implementation in practice, although the productivity growth by more than an order of magnitude over the long term was very pleasing.

The initial state

CleverStyle CMS, APCu caching engine, version in development, that is, all possible components are installed as the experimental system, the Static pages module page opens in the tests.
A working laptop with a Core i7 4900MQ (4 cores, 8 threads), Ubuntu 15.04 x64 OS acts as a test piece, the disk subsystem consists of two SATA3 SSD in RAID0 (soft, btrfs, while not the best option for the database, turned out to be quite a bottleneck in tests, but there is something there), before each test, sudo sync is run, with each request, 2-4 queries are made to the database (creating a visitor session, not cached at the database level), Nginx has 16 workers.
The conditions are not laboratory, but you need to work with something)
Test performance will be a simple Apache Benchmark.
')
First, PHP-FPM (PHP 5.5, 16 workers, statically):

Hidden text

nazar-pc @ nazar-pc ~> ab -n5000 -c128 cscms.org : 8080 / uk
This is ApacheBench, Version 2.3 <$ Revision: $ 1604373>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, www.zeustech.net
Licensed to The Apache Software Foundation, www.apache.org

Benchmarking cscms.org (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software: nginx / 1.6.2
Server Hostname: cscms.org
Server Port: 8080

Document Path: / uk
Document Length: 99320 bytes

Concurrency Level: 128
Time taken for tests: 22.280 seconds
Complete requests: 5000
Failed requests: 4239
(Connect: 0, Receive: 0, Length: 4239, Exceptions: 0)
Total transferred: 498328949 bytes
HTML transferred: 496603949 bytes
Requests per second: 224.41 [# / sec] (mean)
Time per request: 570.373 [ms] (mean)
Time per request: 4.456 [ms] (mean, across all concurrent requests)
Transfer rate: 21842.25 [Kbytes / sec] received

Connection Times (ms)
min mean [± sd] median max
Connect: 0 0 0.5 0 3
Processing: 26,563 101.6 541 880
Waiting: 24,559 101.3 537 872
Total: 30,564 101.4 541 881

Percentage of the requests served within a certain time (ms)
50% 541
66% 559
75% 572
80% 584
90% 759
95% 795
98% 817
99% 829
100% 881 (longest request)

Competitiveness 128, because with 256 PHP-FPM just falls.

Now HHVM, for the beginning we will warm up HHVM with the help of 50,000 queries ( why ), then we will run the test:

Hidden text

nazar-pc @ nazar-pc ~> ab -n5000 -c256 cscms.org : 8000 / uk
This is ApacheBench, Version 2.3 <$ Revision: $ 1604373>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, www.zeustech.net
Licensed to The Apache Software Foundation, www.apache.org

Benchmarking cscms.org (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software: nginx / 1.6.2
Server Hostname: cscms.org
Server Port: 8000

Document Path: / uk
Document Length: 99309 bytes

Concurrency Level: 256
Time taken for tests: 20.418 seconds
Complete requests: 5000
Failed requests: 962
(Connect: 0, Receive: 0, Length: 962, Exceptions: 0)
Total transferred: 498398875 bytes
HTML transferred: 496543875 bytes
Requests per second: 244.88 [# / sec] (mean)
Time per request: 1045.408 [ms] (mean)
Time per request: 4.084 [ms] (mean, across all concurrent requests)
Transfer rate: 23837.54 [Kbytes / sec] received

Connection Times (ms)
min mean [± sd] median max
Connect: 0 0 1.5 0 8
Processing: 505 1019 102.6 1040 1582
Waiting: 505 1017 102.9 1039 1579
Total: 513 1019 102.5 1040 1586

Percentage of the requests served within a certain time (ms)
50% 1040
66% 1068
75% 1080
80% 1087
90% 1108
95% 1126
98% 1179
99% 1397
100% 1586 (longest request)

We received 245 requests per second, and we will work with this.

The first steps

I want the code not to depend on whether it is launched from under the HTTP server written in PHP, or in a more familiar mode.
For this, headers_list () / header_remove () and http_response_code (), superglobal $ _GET, $ _POST, $ _REQUEST, $ _COOKIE, $ _SERVER were manually reclaimed.
System classes were destroyed after each request and created with the new.
In general, it worked, but there were nuances:

In the case of asynchronous operations where more than one request will be executed simultaneously, everything will be covered with a copper basin
Creating all system objects still created significant overhead, although this worked faster than restarting the script.
It did not run from PHP-CLI, to send headers you need PHP-CGI, which has memory flowing (for some unknown reason) during a long-running process
If someone decides to call exit () / die (), everything dies.

Optimization, asynchronous support

Firstly, the system objects were divided into two groups - the first, requests that depend on the user and the specific request, the second - completely independent.
Independent objects ceased to collapse after each request, which gave a significant increase in speed.
The object that receives the request from ReactPHP and forms the response received an additional __request_id field. Upon receipt of a system object that depends on a specific request using debug_backtrace (), this __request_id is obtained, which makes it possible to separate these objects for each individual request, even when asynchronous.
Also, the system functions that work with the global state were singled out separately; for HTTP servers, modified versions of them were connected, which take into account __request_id. The _header () functions were added instead of header () (for PHP-CLI headers to work), _http_response_code () instead of http_response_code (), the existing _getcookie () and _setcookie () were modified, the latter under the hood manually forms headers to modify the cookie and sends them to _header ().
Superglobal variables are replaced by array-like objects, and when accessing the elements of such a strange array, we get data corresponding to a specific query - compatibility with regular code is high, the main thing is not to overwrite superglobal variables, and keep in mind that there may not be an array at all (for example, if used with array_merge ()).
As another compromise solution, \ ExitException was added to the system, which replaces exit () / die () calls (including modifying third-party libraries when needed, except for situations when you really need to complete the entire script), this allows you to intercept the output at the top , and avoid executing the script.

We test the result on a pool of 16 running Http servers (HHVM interpreter), Nginx balances requests (warming up 50,000 requests to the pool):

Hidden text

nazar-pc @ nazar-pc ~> ab -n5000 -c256 cscms.org : 9990 / uk
This is ApacheBench, Version 2.3 <$ Revision: $ 1604373>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, www.zeustech.net
Licensed to The Apache Software Foundation, www.apache.org

Benchmarking cscms.org (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software: nginx / 1.6.2
Server Hostname: cscms.org
Server Port: 9990

Document Path: / uk
Document Length: 99323 bytes

Concurrency Level: 256
Time taken for tests: 16.092 seconds
Complete requests: 5000
Failed requests: 1646
(Connect: 0, Receive: 0, Length: 1646, Exceptions: 0)
Total transferred: 498418546 bytes
HTML transferred: 496643546 bytes
Requests per second: 310.71 [# / sec] (mean)
Time per request: 823.928 [ms] (mean)
Time per request: 3.218 [ms] (mean, across all concurrent requests)
Transfer rate: 30246.49 [Kbytes / sec] received

Connection Times (ms)
min mean [± sd] median max
Connect: 0 0 0.9 0 6
Processing: 100 804 308.3 750 2287
Waiting: 79 804 308.2 750 2285
Total: 106 804 308.1 750 2287

Percentage of the requests served within a certain time (ms)
50% 750
66% 841
75% 942
80% 990
90% 1180
95% 1381
98% 1720
99% 1935
100% 2287 (longest request)

Already not bad, 310 requests per second is 1.26 times more than HHVM in normal mode.

Optimize further

Since initially the code was not written asynchronous - one request before another does not pop up, so you can add the usual, not asynchronous mode, and assume that the requests will be executed strictly in turn.
In this case, we can do with regular arrays in superglobal variables, no need to do debug_backtrace () when creating system objects, and some system objects instead of full re-creation can be partially reinitialized and also save.
This is the result of this on a pool of 16 running Http servers (HHVM), Nginx balances requests (warming up 50,000 requests to the pool):

Hidden text

nazar-pc @ nazar-pc ~> ab -n5000 -c256 cscms.org : 9990 / uk
This is ApacheBench, Version 2.3 <$ Revision: $ 1604373>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, www.zeustech.net
Licensed to The Apache Software Foundation, www.apache.org

Benchmarking cscms.org (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software: nginx / 1.6.2
Server Hostname: cscms.org
Server Port: 9990

Document Path: / uk
Document Length: 8497 bytes

Concurrency Level: 256
Time taken for tests: 5.716 seconds
Complete requests: 5000
Failed requests: 4983
(Connect: 0, Receive: 0, Length: 4983, Exceptions: 0)
Total transferred: 44046822 bytes
HTML transferred: 42381822 bytes
Requests per second: 874.69 [# / sec] (mean)
Time per request: 292.676 [ms] (mean)
Time per request: 1.143 [ms] (mean, across all concurrent requests)
Transfer rate: 7524.85 [Kbytes / sec] received

Connection Times (ms)
min mean [± sd] median max
Connect: 0 0 0.9 0 7
Processing: 6,284,215.9,241 976
Waiting: 6 284 215.9 241 976
Total: 6,284,215.8,241 976

Percentage of the requests served within a certain time (ms)
50% 241
66% 337
75% 409
80% 442
90% 623
95% 728
98% 829
99% 869
100% 976 (longest request)

875 requests per second, this is 3.57 times more than the original version with HHVM, which is good (sometimes there are a couple hundred more requests per second, sometimes a couple hundred less, the weather on the desktop is different, but at the time of writing this article such are).

There are also prospects for even greater productivity gains (for example, support for keep-alive and other things in ReactPHP is expected), but much depends on the project where it is used.

Restrictions

Since we maintain maximum compatibility with any existing code, asynchronous mode with different time zones requires users to use them explicitly, otherwise date () may return an unexpected result.
Also, downloading files is not supported yet, but 2 pull requests for multipart support already exist, may soon be included in react / http, then it will work here.

Underwater rocks

The main pitfall in this mode is a memory leak. When after completing 1000 requests, the memory consumption was one, and after 5000 requests, a couple of megabytes more.
Tips for catching leaks:

Trim the amount of executable code to a minimum, run 5000 requests, log the amount of memory after each execution, compare consumption
Add a little executable code, repeat
Continue until the entire code is checked, the number of requests can be lowered gradually up to 2000 (in order not to wait long), but in case there are doubts, throwing a few thousand more requests will not be superfluous
Several requests may be required to stabilize memory consumption, first up to 100 requests, sometimes when you run a full system, up to 800 requests to stabilize memory consumption occurred, after which the amount of memory consumed stops growing.
Since the situation is not very mainstream, it may happen that the memory is not flowing in your code, but in a third-party library, or in general PHP extension (PHP-CGI as an example) - here you can wish good luck and not forget about the supervisor on the server :)

The second is the connection to the database - it may come off, be prepared to lift it when it falls. This is absolutely not relevant with the popular approach, it can immediately create problems.
Third, catch errors and do not use exit () / die () unless you mean exactly that.
Fourth, you need to somehow separate the global state of different requests if you are going to work with asynchronous code, if there is no asynchronous code — you simply need to fake a global state, the main thing is not to use request-dependent constants, static variables in functions and similar things, unless want to suddenly make a guest admin :)

Conclusion

With a similar approach, significant productivity growth can be achieved either without changes or with minimal ones (automatic search and replacement), and with Request / Response frameworks it is even easier to do.
The speed increase depends on the interpreter and what the code does - with heavy calculations, HHVM compiles heavy parts into machine code, when requests to external APIs you can use a less efficient asynchronous mode, but asynchronously load data from an external API (if the request to the API takes hundreds of milliseconds this will give a significant increase in the overall query processing speed).
If you want to try - in CleverStyle CMS this and much more is available out of the box and just works.

Sources

There are not many sources , if desired, it can be modified and used in many other systems.
The class in Request.php accepts a request from ReactPHP and sends the response, functions.php contains functions for working with the global context (including several CleverStyle CMS-specific), Superglobals_wrapper.php contains a class that is used for massive-like superglobal objects, Singleton .php is a modified version of the trait, which is used instead of the system one to create system objects (it also determines which objects are common to all requests and which are not).

Source: https://habr.com/ru/post/252013/

All Articles

Atomic reactor in every site

The initial state

The first steps

Optimization, asynchronous support

Optimize further

Restrictions

Underwater rocks

Conclusion

Sources

More articles: