How we accelerated PHP projects 40 times with caching

The questions of SEO-optimization and improvement of User eXperience, which at some point stood before the Wrike team, demanded a significant increase in the speed of our web projects. At that time there were about ten of them (main site , blog , help center , etc.). The decision to accelerate projects was made on the basis of the Nginx + fastcgi cache + LUA + LSYNC bundle.

')
Given

On most projects, we used a bunch of Wordpress + themosis for convenience, versatility and extensibility with plugins, and for some, just Wordpress. Naturally, WordPress still had a lot of plug-ins added + our theme: Nginx + php-fpm for server nodes with web projects, and Entry Point (Nginx + proxy_pass for them) before.

Each of the applications was on its server upstream, which was proxy_pass on round-robin. You understand that there was no reason to expect good results from such a bundle.
At that time, TTFB (Time To First Byte) and Upstream Response Time in most cases ranged from 1 to 3 seconds. Such indicators did not suit us.

Before finding a solution, we determined that we would be satisfied with 50 ms for an upstream response time. Upstream response time was chosen as the most significant value, which showed only the response time of the server with the web application and did not depend on the Internet connection.

Step 1: fastcgi

According to the results, they stopped on a fastcgi cache . The thing turned out to be really good, customizable and does a wonderful job of its task.

After it was turned on and settings on the nodes, the indicators improved, but only slightly. Significant results were not achieved due to the fact that Entry Point scattered requests for the round-robin algorithm inside the Upstream, and, accordingly, the cache on each of the servers for the same application had its own, albeit the same. Our architecture did not allow us to add a cache to our Entry Point, so I had to think further.

Step 2: lsyncd

The solution was the following: use lsyncd to distribute the cache between upstream nodes on the inotify event. It is said - done: the cache immediately during creation on one node by inotify began to “fly away” to the other nodes, but this, of course, did not lead to success. About the page in the cache knew only Nginx of the node in which the request was processed.

We thought a bit and found a way that other nodes can teach how to work with the cache obtained through lsyncd. The method turned out to be not elegant - it is Nginx restart, after which it starts the cache loader (after a minute), and that in turn begins loading information on the cached data into the cache zone - thereby he learns about the cache that was synchronized from other nodes. At this stage, it was also decided that the cache should live for a very long time and, in most cases, be generated through a special bot that would go through the necessary pages, and not through site visitors and search bots. Accordingly, the options fastcgi_cache_path and fastcgi_cache_valid were tuned.

All is well, but how to revalidate the cache, which, for example, is necessary after each deployment. The question of revalidation was decided with the help of a special header in the type option fastcgi_cache_bypass:

fastcgi_cache_bypass $skip $http_x_specialheader;

Now it was necessary to ensure that our bot, after deployment, began revalidation of the project using the following header:

--header='x-specialheader: 1'

During the revalidation process, the cache immediately “scattered” to all the nodes (lsyncd), and since the cache lifetime is long and Nginx knows that the pages are cached, it begins to give the visitors a new cache. Yes, just in case we add the option:

fastcgi_cache_use_stale error timeout updating invalid_header http_500;

This option is useful if, for example, php-fpm suddenly fell off accidentally, or a code arrived in production, which for some improbable reason returns 500. Now Nginx will not return 500-ku, but will return the old "working" cache.

Also, the revalidation scheme using the header allowed us to make a web interface for revalidation of certain URLs. It was made on the basis of php scripts that sent a special header to the required URL and revalidated it.
Here we felt the desired increase in the speed of return of pages. It went back on track :)

Step 3: LUA

But there was only one “but”: we had to manage caching logic depending on certain conditions: requests with a certain parameter, cookie, etc ... I didn’t want to work with “if” in Nginx, and he wouldn’t have decided all those tasks with logic that we faced.

A new search has begun, and LUA has been chosen as a layer for controlling the caching logic.

The language was very simple, fast, and, most importantly, it was well integrated through the module with Nginx. The build process is well documented here .
Having evaluated the capabilities of the Nginx + LUA bundle, we decided to assign the following responsibilities to it:
redirects with several conditions;
experiments with the distribution of requests for different landings on the same URL (different percentage of requests for different landings);
blocking conditions;
deciding whether to cache this or that page. This was done according to predetermined conditions, constructions of the form:

location ~ \.php {
set $skip 0;
set_by_lua $skip '
local skip = ngx.var.skip;
if string.find(ngx.var.request_uri, "test.php") then
app = "1";
end
return app;
';
...
fastcgi_cache_bypass $skip $http_x_specialheader;
fastcgi_no_cache $skip;
...
}

The work done has allowed us to obtain the following results:

Upstream response time for the vast majority of requests has ceased to go beyond 50 ms, and in most cases it is even less.
Also, it was noted in the Google console ~ 25% reduction in Time spent downloading a page (work).
Significantly improved Apdex-indicators for Request Time.
A bonus was the option fastcgi_cache_use_stale, which will serve as a kind of protector from 500-k in case of unsuccessful deployment or problems with php-fpm.
The ability to hold much more RPS due to the fact that php calls were minimized, and the cache, roughly speaking, is static html, which is given directly from the disk.

In the most illustrative example of the upstream response time of one of the applications, the dynamics looked like this:

The dynamics in the Google console during the implementation of the solution is as follows:

The discrepancies in the charts are logical, since the console shows the dynamics of all domain projects. The caching itself did not bring significant inconvenience, since the tools for its revalidation were very simple.

Thus, we have achieved a significant increase in the speed of our web projects, almost without spending developer resources on upgrading applications, and they could continue to develop features. This method is good for speed and convenience of implementation, but its weak side is that the cache, although it solves the problem of slow page rendering, does not eliminate the root of the problem - the slowness of the scripts themselves.

Source: https://habr.com/ru/post/306932/

All Articles

How we accelerated PHP projects 40 times with caching

More articles: