Nginx cache: all new - well forgotten old

In the life of each project, the time comes when the server no longer meets the requirements of the SLA and literally begins to choke with the amount of incoming traffic. Then a long process of finding bottlenecks, heavy queries, incorrectly created indexes, uncached data, or vice versa, too often updated data in the cache and other dark sides of the project begins.

But what to do when your code is “perfect”, all heavy requests are brought into the background, all that can be cached, and the server still does not reach the SLA indicators we need? If there is a possibility, then of course you can buy new cars, distribute some of the traffic and forget about the problem for a while.

But if the feeling that your server is capable of more, or if there is a magic parameter that accelerates the work of the site 100 times, then you can recall the built-in feature of nginx, which allows you to cache responses from the backend. Let's sort through what it is and how it can help increase the number of requests being processed by the server.
')

What is Nginx cache and how does it work?

Nginx cache can significantly reduce the number of requests for the backend. This is achieved by storing the HTTP response, for a certain time, and when re-accessing the resource, retrieving it from the cache without proxying the request for the backend. Caching, even for a short period, will give a significant increase in the number of requests processed by the server.

Before proceeding with the nginx configuration, you need to make sure that it is built with the “ngx_http_proxy_module” module, since with this module we will configure.

For convenience, you can put the configuration in a separate file, for example “/etc/nginx/conf.d/cache.conf”. Let's take a look at the “proxy_cache_path” directive, which allows you to configure cache storage options.

proxy_cache_path /var/lib/nginx/proxy_cache levels=1:2 keys_zone=proxy_cache:15m max_size=1G;

“/ Var / lib / nginx / proxy_cache” indicates the cache storage path on the server. It is in this directory that nginx will save those files with the response from the backend. At the same time, nginx will not create a cache directory on its own, you need to take care of this yourself.

“Levels = 1: 2” - sets the nesting level of cache directories. Nesting levels are indicated by “:”, in this case 2 directories will be created, in total 3 levels of nesting are permissible. For each nesting level, values from 1 to 2 are available, indicating how to form the directory name.

The important point is that the directory name is not chosen randomly, but is created based on the file name. The file name in turn is the result of the md5 function of the cache key; the cache key will be discussed a little later.

Let's take a practical look at how the path to the cache file is built:

 /var/lib/nginx/proxy_cache/2/49/07edcfe6974569ab4da6634ad4e5d492

The “keys_zone = proxy_cache: 15m” parameter sets the name of the zone in shared memory where all active keys and information on them are stored. The “:” indicates the size of the allocated memory in MB. As stated by nginx, 1 MB is enough to store 8 thousand keys.

“Max_size = 1G” determines the maximum cache size for all pages, above which nginx will take care of deleting less needed data.

It is also possible to control the lifetime of the data in the cache, for this it is sufficient to define the “inactive” parameter of the “proxy_cache_path” directive, which by default is 10 minutes. If during the time specified in the “inactive” parameter there were no accesses to the cache data, then this data is deleted even if the cache is not yet “sour”.

What is this cache? In fact, this is a regular file on the server, the contents of which are written:

• cache key;
• cache headers;
• content response from the backend.

If everything is clear with the headers and response from the backend, then there are a number of questions about the “cache key”. How is it built and how can it be managed?

To describe the cache key building pattern in nginx, there is a “proxy_cache_key” directive, in which the string is specified as a parameter. A string can consist of any variables available in nginx.

For example:

 proxy_cache_key $request_method$host$orig_uri:$cookie_some_cookie:$arg_some_arg;

The “:” symbol between the cookie parameter and the get-parameter is used to prevent collisions between cache keys, you can choose any other symbol to your liking. By default, nginx uses the following line to generate a key:

 proxy_cache_key $scheme$proxy_host$request_uri;

The following directives should be noted to help manage caching more flexibly:

proxy_cache_valid - Sets the response caching time. It is possible to specify a specific response status, for example, 200, 302, 404, etc., or specify everything at once using the “any” construct. If you specify only the caching time, by default, nginx will cache only 200, 301 and 302 statuses.

Example:

 proxy_cache_valid 15m; proxy_cache_valid 404 15s;

In this example, we set the cache lifetime to 15 minutes, for status 200, 301, 302 (their nginx uses the default, since we did not specify a specific status). The next line set the caching time to 15 seconds, only for responses with status 404.

proxy_cache_lock - This directive will help to avoid several passes on the backend after a cache set, it is enough to set the value in the “on” position. All other requests will wait for a response in the cache, or the timeout blocking the request to the page. Accordingly, all timeouts can be configured.

proxy_cache_lock_age - Allows you to set a timeout limit for a response from the server, after which the next request for a set of cache will be sent to it. The default is 5 seconds.

proxy_cache_lock_timeout - Sets the wait time for the lock, after which the request will be transferred to the backend, but the response will not be cached. The default is 5 seconds.

proxy_cache_use_stale - Another useful directive that allows you to configure when it is possible to use an outdated cache.

Example:

 proxy_cache_use_stale error timeout updating;

In this case, it will use the outdated cache in case of a connection error, transfer of the request, reading the response from the server, exceeding the time limit for sending the request, reading the response from the server, or if the data in the cache is updated at the time of the request.

proxy_cache_bypass - Sets the conditions under which nginx will not take a response from the cache, but immediately redirect the request to the backend. If at least one of the parameters is not empty and is not equal to “0”. Example:

 proxy_cache_bypass $cookie_nocache $arg_nocache;

proxy_no_cache - Sets the condition under which nginx will not save the response from the backend to the cache. The principle of operation is the same as that of the “proxy_cache_bypass” directive.

Possible problems with caching pages

As mentioned above, along with caching the HTTP response, nginx stores the headers received from the backend. If your site uses a session, then the session cookie will also be cached. All users who come to the page that you were lucky to cache will receive your personal data stored in the session.

The next task to be faced is caching control. Of course, you can set a small cache time in 2-5 minutes and this will be enough in most cases. But not in all situations this is applicable, so we will reinvent our bicycle. Now about everything in order.

Cookie Preservation Management

Nginx caching imposes some restrictions on development. For example, we cannot use sessions on cached pages, since the user does not reach the backend, another limitation would be the return of cookies to the backend. Since nginx caches all headers, in order to avoid saving someone else's session in the cache, we need to prohibit the return of cookies for cached pages. This is where the “proxy_ignore_headers” directive will help us. Headers that should be ignored from the backend are listed as an argument.

Example:

 proxy_ignore_headers "Set-Cookie";

In this line we ignore the installation of cookies from the proxied server, that is, the user will receive a response without the “Set-Cookies” header. Accordingly, everything that the backend tried to write to the cookie will be ignored on the client side, since he will not even know that something was intended for him. This cookie setting restriction should be considered when developing an application. For example, for an authorization request, you can disable ignoring the header so that the user gets a session cookie.

You should also consider the session lifetime, you can see it in the “ session.gc_maxlifetime ” parameter of the php.ini config. Imagine that a user has logged into the site and has started viewing the news feed, all the data is already in the nginx cache. After some time, the user notices that his authorization has disappeared and he needs to go through the authorization process again, although all this time he was on the site, looking through the news. This happened because for all its requests, nginx returned the result from the cache without sending the request to the backend. Therefore, the backend decided that the user is inactive and after the time specified in “ session.gc_maxlifetime ” deleted the session file.

To prevent this from happening, we can emulate requests for a backend. For example, via ajax to send a request that will be guaranteed to pass to the backend. To go to the backend past the nginx cache, just send a POST request, you can also use the rule from the “proxy_cache_bypass” directive, or simply disable the cache for this page. The request does not have to give something, it can be a file with a single line starting the session. The purpose of such a request is to extend the life of the session while the user is on the site, and nginx faithfully gives the cached data to all of its requests.

Cache Reset Control

First you need to decide on the requirements, what goal we are trying to achieve. For example, on our site there is a section with text translation of popular sporting events. When loading the page is given from a cache, further all new messages come on sockets. In order for the user to see the current messages for the first time at the current time, rather than 15 minutes ago, we need to be able to reset the nginx cache on our own at any time. In this case, nginx may not be located on the same machine as the application. Also, one of the requirements for resetting will be the ability to delete the cache on several pages at once.

Before you start writing your solution, let's see what nginx offers from the “box”. To reset the cache, nginx has a special directive “proxy_cache_purge”, in which the condition of resetting the cache is written. The condition is actually a regular string, which, with a non-empty value and not a “0” value, will delete the cache using the transferred key. Consider a small example.

 proxy_cache_path /data/nginx/cache keys_zone=cache_zone:10m; map $request_method $purge_method { PURGE 1; default 0; } server { ... location / { proxy_pass http://backend; proxy_cache cache_zone; proxy_cache_key $uri; proxy_cache_purge $purge_method; } }

The example is taken from the official site of nginx.

The $ purge_method variable is responsible for resetting the cache, which is a condition for the “proxy_cache_purge” directive and is set to “0” by default. This means that nginx works in “normal” mode (saves answers from the backend). But if you change the request method to “PURGE”, then instead of proxying the request for the backend with saving the answer, you will delete the cache entry by the corresponding caching key. It is also possible to specify a deletion mask by indicating a “*” sign at the end of the caching key. Thus, we do not need to know the location of the cache on the disk and the principle of key formation; nginx assumes these responsibilities. But there are downsides to this approach.

The “proxy_cache_purge” directive is available as part of a commercial subscription
Only a cache point removal is possible, either by a mask of the form {cache key} “*”

Since the addresses of cached pages can be completely different, without common parts, the approach with the mask “*” and the directive “proxy_cache_purge” does not suit us. It remains to recall a bit of theory and discover your favorite ide.

We know that nginx cache is a regular file on the server. We independently specified the directory for storing cache files in the “proxy_cache_path” directive, we even specified the logic of forming the path to the file from this directory using “levels”. The only thing we lack is the correct generation of the caching key. But we can also see it in the “proxy_cache_key” directive. Now all we have to do is:

create a full path to the page, exactly as specified in the “proxy_cache_key” directive;
encode the resulting string in md5;
create subdirectories using the rule from the “levels” parameter.
And here we already have the full path to the cache file not the server. Now all we have to do is delete the file itself. From the introductory part, we know that nginx may not be located on the application machine, so you need to be able to delete several addresses at once. Let us describe the algorithm again:
We will write the generated paths to the cache files to a file;
We write a simple script on bash, which we put on the machine with the application. Its task will be to connect via ssh to the server, where we have a caching nginx and delete all cache files specified in the generated file from step 1;

Let's move from theory to practice, write a small example illustrating our algorithm of work.

Step 1. Creating a file with paths to the cache.

 $urls = [ 'httpGETdomain.ru/news/111/1:2', 'httpGETdomain.ru/news/112/3:4', ]; function to_nginx_cache_path(url) { $nginxHash = md5($url); $firstDir = substr($nginxHash, -1, 1); $secondDir = substr($nginxHash, -3, 2); return "/var/lib/nginx/proxy_cache/$firstDir/$secondDir/$nginxHash"; } //        tmp $filePath = tempnam('tmp', 'nginx_cache_'); //      $fileStream = fopen($filePath, 'a'); foreach ($urls as $url) { //      $cachePath = to_nginx_cache_path($url); //       fwrite($fileStream, $cachePath . PHP_EOL); } //     fclose($fileStream); //  bash       exec("/usr/local/bin/cache_remover $filePath");

Please note that the $ urls variable contains the urls of the cached pages, already in the proxy_cache_key format specified in the nginx config. Url acts as a kind of tag for the displayed entities on the page. For example, you can create a regular table in the database, where each entity will be associated with a specific page on which it is displayed. Then, if we change any data, we can sample the table and delete the cache of all the pages we need.

Step 2. Connect to a caching server and delete cache files.

 #      ,      FILE_LIST=`cat $1 | tr "\n" " "` #   ssh  SSH=`which ssh` USER="root" #         nginx HOST="10.10.1.0" #   KEY="/var/keys/id_rsa" # SSH ,          $SSH -i ${KEY} ${USER}@${HOST} "rm -f ${FILE_LIST}" #       rm -rf rm -f $1 #

The above examples are exploratory in nature, you should not use them in production. In the examples, input parameter checks and command restrictions are omitted. One of the problems you may encounter is limiting the length of the “rm” command argument. When testing in a dev environment on small volumes, this can be easily missed, and in production you can get the error “rm: Argument list too long”.

Caching custom blocks

Let's summarize what we managed to do:

reduced the load on the backend;
learned how to manage caching;
learned how to flush the cache at any time.

But not everything is as good as it might seem at first glance. Now, probably, if not every first, then exactly every second site has a registration / authorization functionality, after passing which we want to display the user name somewhere in the header. The block with the name is unique and must display the name of the user under which we are authorized. Since nginx saves the response from the backend, and in the case of the page it is the html content of the page, the block with personal data will also be cached. All site visitors will see the name of the first user who passed to the backend after dialing the cache.
Consequently, the backend should not give blocks in which personal information is located so that this information does not fall under the nginx cache.

It is necessary to consider alternative loading of such parts of the page. As always, this can be done in many ways, for example, after loading the page, send an ajax request, and display a loader in the place of personal content. Another way that we will consider today is the use of ssi tags. Let's first understand what SSI is, and then how we can use it in conjunction with the nginx cache.

What is SSI and how does it work

SSI (Server-Side Includes, server-side inclusions) is a set of commands embedded in the html page that tell the server what to do.

Here is a list of such commands (directives):

• if / elif / else / endif - branching operator;
• echo - Displays the values of variables;
• include - Allows you to insert the contents of another file into the document.
Just about the last directive and will be discussed. The include directive has two parameters:
• file - Indicates the path to the file on the server. Regarding the current directory;
• virtual - Specifies the virtual path to the document on the server.

We are interested in the “virtual” parameter, since it is not always convenient to specify the full path to the file on the server, or in the case of a distributed architecture, there is simply no file on the server. Example directive:

 <!--#include virtual="/user/personal_news/"-->

In order for nginx to start processing ssi inserts, it is necessary to modify the location as follows:

 location / { ssi on; ... }

Now all requests processed by the location “/” will be able to perform ssi inserts.

How in all this circuit will pass our request?

customer requests page;
Nginx proxies the request for the backend;
backend gives page with ssi inserts;
the result is stored in the cache;
Nginx “dozaprashivaet” the missing blocks;
the summary page is sent to the client.

As you can see from the steps, the ssi constructions will get into the nginx cache, which will allow not to cache personal blocks, and the ready-made html page with all the inserts will be sent to the client. Here our podgruzka works, nginx independently requests the missing blocks of the page. But like any other solution, this approach has its pros and cons. Imagine that there are several blocks on a page that should be displayed differently depending on the user, then each such block will be replaced by an ssi insert. Nginx, as expected, will request each such block from the backend, that is, one request from the user will generate several requests for the backend at once, which we would not like at all.

We get rid of constant requests to the backend via ssi

To solve this problem, the nginx “ngx_http_memcached_module” module will help us. The module allows you to receive values from the memcached server. Writing through the module will not work, the application server should take care of this. Consider a small example of configuring nginx in conjunction with a module:

 server { location /page { set $memcached_key "$uri"; memcached_pass 127.0.0.1:11211; error_page 404 502 504 = @fallback; } location @fallback { proxy_pass http://backend; } }

In the $ memcache_key variable, we specified the key by which nginx will try to get data from memcache. The parameters for connecting to the memcache server are specified in the “memcached_pass” directive. Connection can be specified in several ways:

• Domain name;

 memcached_pass cache.domain.ru;

• IP address and port;

 memcached_pass localhost:11211;

• unix socket;

 memcached_pass unix:/tmp/memcached.socket;

• upstream directive.

 upstream cachestream { hash $request_uri consistent; server 10.10.1.1:11211; server 10.10.1.2:11211; } location / { ... memcached_pass cachestream; ... }

If nginx managed to get a response from the cache server, then it gives it to the client. In the case when there is no data in the cache, the request will be transferred to the backend via “@fallback”. This small configuration of the memcached module under nginx will help us reduce the number of requests for the backend from ssi inserts.

We hope this article was useful and we were able to show one of the ways to optimize server load, to review the basic principles of configuring nginx caching and to close the problems that arise when using it.

Source: https://habr.com/ru/post/428127/

All Articles