📜 ⬆️ ⬇️

Badoo switched to PHP7 and saved $ 1M

Badoo Switched to PHP7 and saved $ 1M

We did it! Several hundred of our application servers are translated to PHP7 and are doing great. As far as we know, this is the second transition to a PHP7 project of this magnitude (after Etsy). In the process, we found some very unpleasant bugs in the PHP7 bytecode caching system, but they were fixed. And now - hurray! - Good news for the entire PHP community: PHP7 is really ready for production, stable, consumes much less memory and gives a very good performance boost. Below we will describe in detail how we switched to PHP7, what difficulties we encountered, how we struggled with them and what results we got. But let's start with a small introduction.

The opinion that a bottleneck in web projects is a database is one of the most common misconceptions. A well-designed system is balanced - with an increase in the input load, all parts of the system hold a blow, and if threshold values ​​are exceeded, everything starts to slow down: the processor and the network part, and not just the disks on the bases. In this reality, the application-cluster's processor power is almost the most important characteristic. In many projects, this cluster consists of hundreds or even thousands of servers; therefore, “tuning” the processor load on an application cluster turns out to be more than economically justified (a million dollars in our case).
')
The processor in PHP web applications "eats" as much as any high-level dynamic language - a lot. But over the years, PHP developers had a special sadness (and the reason for the strongest “trolling” from other communities) - the lack of a “fair” JIT in PHP or at least a compiler in the compiled text in languages ​​like C and C ++. The inability of the community to provide such solutions within the framework of the main project has given rise to an unpleasant trend: the major players began to invent their own solutions. This is how HHVM appeared on Facebook, KPHP on Vkontakte, there must have been other “crafts”.

Fortunately, in 2015, the first step was taken to make PHP “more mature”: PHP7 was released. JIT did not appear in PHP7, but the result of the changes in the “engine” is difficult to overestimate: now on many tasks PHP7 even without JIT is not inferior to HHVM (see, for example, benchmarks from the LiteSpeed ​​blog and benchmarks from the presentation of PHP7 developers ). The new PHP7 architecture also makes it easier to further add JIT.

Platform developers at Badoo have been closely following these passions over the past few years and even made a pilot project with HHVM, but decided to wait for PHP7, as they considered it more promising. And we recently launched Badoo for PHP7! It was an epic project at least because of its size: we have over 3 million lines of PHP code and 60,000 tests. How we coped with all this, invented a new framework for testing PHP applications (already released in open source - like Go! AOP) and saved a million - read on.

Experiments with HHVM


Before switching to PHP7, we were looking for other ways to optimize our backend for a while. Of course, the first thing we decided to "play" with HHVM.

Having spent a couple of weeks on research, we got very decent results: after warming up the JIT on our framework, the gain in speed and CPU utilization was hundreds of percent.

However, HHVM had unpleasant disadvantages:


And we waited for PHP7.

The transition to the new version of the interpreter is an important and complex process, so we prepared for it, having made a clear transition plan. It consisted of three stages of preparation:


We will tell about all stages in more detail.

Kernel and extension fixes


We have our own, actively maintained and developed PHP branch. We started the project to translate Badoo to PHP7 before its official release, so we had to smoothly provide a regular PHP7 upstream rebase to our tree to be able to receive updates for each release candidate. All patches and customizations (see the Patches section of our tech site tech.badoo.com/open-source ), which we use in our daily work, also had to be portable between versions and work correctly.

We have automated the extrusion and assembly of all dependencies, extensions and the PHP tree under 5.5 and 7.0. This not only simplified the work, but also gave a good start for the future: when version 7.1 is released, everything will be ready.

Over extensions, too, had to sweat. We support about 40 extensions, and more than half are open source external extensions with our modifications.

For the fastest possible transition, we decided to launch two processes in parallel. The first is to self-write the most critical extensions for us: the Blitz template engine, the APcu data cache in the Shared memory, the collection of statistics in Pinba, and some custom ones for working with internal services (as a result, about 20 extensions).
The second is to actively get rid of extensions that are used in non-critical parts of the infrastructure. Easy to get rid of we managed to 11 extensions - a lot!

And, of course, we began to actively communicate with people who support the main open extensions used by us for compatibility with PHP7 (special thanks to Derick Rethans, who is developing Xdebug).

Next, we will dwell on the technical details of porting extensions for PHP7 in more detail.

In the 7th version of the PHP-developers have changed a lot of internal API, which caused the need to edit a large amount of code in extensions. The most important changes are as follows:

All this has allowed to drastically reduce the number of small allocations of memory and, as a result, to speed up the core of PHP by tens of percent.

It should be noted that all these changes entailed the need to, if not rewrite, then actively rule all extensions. If in the case of embedded extensions we could rely on their authors, then only we could edit our development, and many changes needed: due to changes in the internal API, some parts of the code were easier to rewrite.

Unfortunately, the introduction of new structures that use garbage collection, simultaneously with the acceleration of code execution, complicates the engine itself and finding problems in it. One of them was a problem in OPcache, which was as follows: when clearing the cache, the byte-code of the cached file was destroyed at the moment when it could still be used in another process, which led to a drop in the process. Outwardly, it looked like this: strings (zend_string) in the names of functions or constants suddenly collapse and garbage appears instead.

Since we use a significant number of extensions of our own design, many of which are actively working with strings, first of all suspicion fell upon the incorrect use of strings in them. We wrote a lot of tests, conducted many experiments, but to no avail. As a result, I had to ask for help from the core developer of the PHP kernel, Dmitri Stogov.

First of all, he asked if the cache was cleared. Found that, in fact, in each case it was. It became clear that the problem was not with us, but with OPcache. We quickly reproduced the problem and Dmitry fixed it in a couple of days. Without this fix, which was included in the version of PHP 7.0.4, it was impossible to use it stably in production!

Change testing infrastructure


Testing at Badoo is our special pride. We spread the PHP code into production 2 times a day, 20-50 tasks fall into each calculation (we use the feature branch in Git and automated builds of builds with tight JIRA integration). With this schedule and the volume of tasks without auto-tests in any way.

To date, we have about 60 thousand unit tests with approximately 50% coverage, which take place on average for 2-3 minutes in the cloud (we already talked about this in Habré ). In addition to unit tests, we use higher-level autotests — integration and system tests, selenium tests for web pages, and calabash tests for mobile applications. All this diversity allows us in the shortest possible time to make a conclusion about the quality of each specific version of the code and make appropriate decisions.

Transition to the new version of the interpreter - a fundamental change. There may be as many problems as possible, so it is imperative that all tests work. In order to make it clear what, how and why we did, it is necessary to make a small excursion into the story and tell about the evolution of test development in our company.

Often, people who think about testing their products encounter in the process of experiments (and some already during implementation) that their code is not ready for this. Indeed, the developer must remember that his code must be testable . The architecture should allow unit tests to substitute calls and objects of external dependencies in order to isolate the code under test from external conditions. I must say that this requirement complicates life, and many programmers of the principle do not want to write code so that it can be tested - the imposed restrictions enter into an unequal struggle with other “good code” values ​​and usually lose. And often, imagining the amount of available code written not by the rules, experimenters simply postpone testing until better times or try to be content with little, covering only what can be covered with tests (as a result, tests do not always give the expected result).

Our company is no exception. We also started to implement testing not immediately after the start of the project. Many lines of code have already been written, which worked quite well in production and brought good money. To rewrite all this code for the sake of being able to cover it with tests in the way recommended would have been too long and expensive.

Fortunately, at that time there was already a great tool that solved most of the problems with untested code — runkit. This is an extension for PHP that allows you to change, delete, add methods, classes and functions used in the program during script execution. It can do much more, but we did not use other expansion functions. The tool was developed and maintained for several years (from 2005 to 2008) by Sarah Goleman (eng. Sara Golemon), who now works on Facebook, including on HHVM. And from 2008 to the present day, our compatriot Dmitry Zenovich has been supporting the project (he worked as head of the testing departments at Begun and Mail.Ru). And we, too, have a little “done” in the project.

The runkit itself is a very dangerous extension. With it, you can change the constants, functions and classes directly during the operation of the script that uses them. In fact, it is a tool with which you can rebuild your plane during the flight. Runkit climbs into the very insides of PHP on the fly; one error or flaw in runkit - and the plane explodes beautifully in the air, PHP crashes, or you spend many hours searching for memory leaks and other low-level debugging. Nevertheless, it was a necessary tool for us: to introduce testing into a project without serious rewriting is possible only through changing the code on the fly, simply replacing it with the necessary one.

When switching to PHP7, runkit turned out to be a big problem - it did not support this version of PHP. There was the option of sponsoring the development of a new version, but this path did not seem to be the most reliable in the long term. In parallel, we considered several other options.

One promising solution was to switch from runkit to uopz. This is also an extension of PHP with similar functionality that appeared in April 2014. It was offered to us by colleagues from Mamba, giving very good reviews, first of all, about the speed of work. The project is supported by Joe Watkins from First Beat Media (UK). This project looked more lively and promising compared to runkit. But, unfortunately, we failed to translate all tests into uopz. Somewhere there were fatal errors, somewhere were segfolded - we got some reports, but unfortunately there is no movement on them (for more, see, for example, this bug on github ). To do the rewriting of tests in this case would be very expensive, and not the fact that something else would not have come to light.

As a result, we came to an obvious solution for us: since we already need to rewrite a lot of code and still depend on external projects like runkit or uopz, with which we constantly have problems that are very expensive or impossible to solve on our own, then why would not have to rewrite the code so that to remove all dependencies to the maximum? So much so that we never have such problems again, even if we want to switch to HHVM or any other similar product. And then we had our own framework.

The system is called SoftMocks. The word soft underlines that the system works in pure PHP instead of using extensions. This is an open source project, it is available as a plug-in library and is publicly available . SoftMocks is not tied to the specifics of the PHP core implementation and works by rewriting code on the fly, similar to the Go! Aop .

Our test code mainly uses the following things:
  1. Substitution of the implementation of one of the class methods.
  2. Substitution of the result of the function.
  3. Change the value of a global constant or class constant.
  4. Adding a method to a class.

All these features are perfectly realized with the help of runkit. When rewriting the code this becomes possible, but with some reservations.

Job Description SoftMocks - material for a separate article, which we will write in the near future. For now we will restrict ourselves to a brief description of the operation of this system:


Let's return to our task - transition to PHP7. After we began to use in the SoftMocks project, we had about 1000 tests left that needed to be “repaired” manually. This can be considered a good result, considering that initially we had 60,000 tests. The speed of their run compared to runkit has not decreased, so there is no serious loss in terms of performance from using SoftMocks. In fairness, we note that uopz still has to work much faster.

Utilities and application code


In addition to many innovations, PHP7 brought with it some reverse incompatibilities. The first thing we started to study the problem is to read the official migration guide . It quickly became clear that without correcting the existing code, we risk both getting fatal errors in production and encountering behavior changes that will not be reflected in the logs, but will lead to incorrect application operation logic.

Badoo is a few PHP code repositories, the largest of which contains over 2 million lines of code. And in PHP, we have implemented a lot of things: from the business logic of the web and the backend of mobile applications to the utilities of testing and code layout. In addition, the situation was complicated by the fact that Badoo is a project with a history, it is already 10 years old, and the legacy of PHP4, unfortunately, was still present. Accordingly, the method of "peering" is not applicable. The "Brazilian system" is also inapplicable, that is, put it in production as is and see what happens, too increases the risk of breaking the business logic for too many percent of users. Therefore, we began to look for an opportunity to automate the search for incompatible places.

At first we tried to use the most popular IDE developers, but, unfortunately, at that time they either simply did not support the syntax and features of PHP7, or found suspiciously few problems skipping, obviously, dangerous places in the code. After a little research, it was decided to try the php7mar utility. This is such a simple static code analyzer implemented in PHP. It is very easy to use, it works quite quickly, the result is provided in the form of a text file, it requires PHP7. Of course, this utility is not a panacea, there are both false positives and omissions of especially “tricky” places in the code. But about 90% of the problems with its help were found, which significantly accelerated and facilitated the process of preparing code for working with PHP7.

The most common and potentially dangerous problems for us were:

The remaining incompatibilities were either extremely rare (for example, the 'e' modifier for regular expressions), or corrected by a simple replacement (for example, now all constructors should be called __construct (), using the class name is forbidden).
But, before starting to fix the code, we thought that while some developers make the changes necessary for compatibility, others will continue to write code incompatible with PHP7. To solve this problem, we added a pre-receive hook to each Git repository that ran on the php7 -l files being modified, i.e. tested them for PHP7 syntax compliance. This does not guarantee complete protection against incompatibility, but already eliminates a number of problems. In other cases, the developers simply had to be a little more attentive. In addition, we began to do a regular run of the full test suite for PHP7 and compare the results with the runs for PHP5. At the same time, it was forbidden to use any new features of PHP7 developers, i.e. We did not turn off the old pre-receive hook with php5. This allowed us at a certain moment to receive code that is compatible with both the seventh and fifth versions of the interpreter. Why is it important? Because, in addition to problems with PHP code, when upgrading to a new version, there may be problems with PHP7 itself and its extensions (in fact, as mentioned above, we encountered these problems). And, unfortunately, not all of them were reproduced in the test environment, some of which we could only see under considerable load in production.

"Run into battle" and results


Obviously, we needed a simple and fast way to change the PHP version on any number of any servers. To do this, in the entire code, the paths to the CLI interpreter were replaced with / local / php, which, in turn, was a symlink either with / local / php5 or / local / php7. Thus, to change the PHP version on the server, it was necessary to change the link (atomic operation is important for CLI scripts), stop php5-fpm and start php7-fpm. It would be possible to have two upstream for php-fpm in nginx, to run php5-fpm and php7-fpm on different ports, but we didn’t like this option because of the complexity of the nginx configuration.

After all of the above was done, we were able to proceed to the run of selenium tests in a preproduction environment, which allowed us to detect a number of problems that had not been noticed before. They dealt with both PHP code (for example, they had to abandon the outdated global variable $ HTTP_RAW_POST_DATA in favor of file_get_contents (“php: // input”)) and extensions (all sorts of segmentation errors).

Having fixed the problems found at the previous stage and finished rewriting unit tests (during which we also managed to find several bugs in the interpreter, for example, such ), we finally started to “quarantine” in production. "Quarantine" we call the launch of a new version of PHP on a limited number of servers. We started with one server in each large cluster (back end of web and mobile applications, cloud), gradually increasing the number if errors do not occur. The first large cluster, fully switched to PHP7, was the cloud . The reason for this was the lack of need for php-fpm. The same clusters where fpm works had to wait until we discovered, and Dmitry Stogov did not fix the problem with OPcache. After that we have already transferred the fpm-cluster.

Now about the results. In short, they are more than impressive. Below are graphs of response time, rusage, memory consumption and processor usage in the largest (263 servers) of the clusters that we have, namely, the backend of mobile applications in the Prague data center:

Distribution of response times:


RUsage (CPU time):


Memory usage:


CPU load (%) on the entire cluster:


Thus, the processor time was reduced by 2 times, which improved the overall response time by about 40%, since some of the time spent processing the request is spent on communicating with databases and demons, and with the transition to PHP7, this part does not accelerate, which is expected. In addition, the effect is somewhat enhanced by the fact that the total load on the cluster has dropped below 50%, which indicates some of the features in the work of Hyper-Threading technology . Roughly speaking, with increasing load above 50%, HT-cores start to work, which are not as “useful” as physical cores, but this is a topic for another article.
Memory consumption, although it has never been a bottleneck for us, has decreased by about 8 times! And finally, we saved on equipment - now we can withstand a much greater load on the same number of servers, which, in fact, reduces the cost of its acquisition and maintenance. The results on the other clusters differ slightly, except that the gain on the cloud is slightly more modest (about 40% of the CPU) due to the lack of OPcache there.

How much did we save in money? Let's count. The cluster of application servers we have consists of more than 600 servers. By reducing the CPU usage by half, we are saving about 300 servers. By adding the initial price of such “iron” (about $ 4000 each) and depreciation, we get about a million dollars in savings plus about a hundred thousand a year on hosting! And that's not counting the clouds, whose performance has also grown. We believe that this is an excellent result!

Have you switched to PHP7? We will be glad to hear your opinion and questions in the comments.

Source: https://habr.com/ru/post/279047/


All Articles