📜 ⬆️ ⬇️

Drupal 8 + Varnish: Cache HTML correctly

Drupal 8 is the latest CMS Drupal release. Varnish is an HTTP reverse proxy cache, a web application add-in that allows you to cache HTTP responses in server RAM.

When we put Varnish in front of Drupal (or any other web application), the processing scheme of the incoming HTTP request is transformed as follows.

image
')
Back in the days of Drupal 6 and 7, using Varnish, it was very convenient to cache static resources (images, CSS, JavaScript files). But there were problems with caching HTML pages - there was no convenient mechanism for selective invalidation of the cache. It remained only to either deliberately give away the outdated cache, or completely clear the cache in Varnish with any changes in Drupal. Both approaches had their drawbacks.

If, during any change in Drupal, you did not invalidate the Varnish cache, then it was necessary to set a relatively small TTL in order for the Varnish cache to quickly become outdated and Varnish update its cache with new data from Drupal.

On the other hand, if you cleaned the Varnish cache on “every sneeze” in Drupal, then these measures were often redundant. Some user could write an innocuous comment under the obscure page, and this led to the invalidation of the entire varnish cache (including the very visited pages, which actually did not change due to the added comment).

Drupal 8 Cache API


Fortunately, things are much better in Drupal 8, where a flexible caching system via meta information was added. I highly recommend curious to read the official documentation . In short, this system allows Drupal to keep track of what (nodes, comments, configs, etc.) the generated page depends on. Each page is associated with a set of cache tags (cache-tags). Then, in any state change, Drupal recognizes which cache tags must be disabled. This in turn allows Drupal to invalidate only the minimally necessary set of cached pages when something (node, comment, config) has changed. This approach is used in the core of Drupal to disable the internal cache. With the help of a pair of clever movements, you can achieve similar “minimally necessary” disability in Varnish's cache.

Extending the Cache API to Varnish


This tutorial assumes that you have already successfully installed and configured Varnish for static resources (graphics, css, javascript). To cover the cache and more anonymous HTML pages we need 3 steps:
  1. Explain to Varnish what page depends on which cache tags. Drupal already has this knowledge, it just needs to be passed to Varnish.
  2. At the moment when some cache tag is invalidated in Drupal, you should also disable all pages in the Varnish cache that depend on this cache tag.
  3. Prepare Varnish for all these innovations.

All these things are already implemented as modules in the Drupal ecosystem. We will need:

If you use composer, then:
composer require "drupal/purge:^3.0" composer require "drupal/varnish_purge:^1.0" 

Include the following modules:


Drupal 8 setup


You need to explain to Drupal where Varnish is located and how to partially invalidate his cache. After installing the above modules, go to /admin/config/development/performance/purge . Add Varnish purger , specify the correct path to the host where Varnish is located.
In Type, specify the Tag , because we use Drupal's cache tags for invalidation. In the Request method, specify Ban , because The corresponding part of the Varnish configs is waiting for just such an HTTP request method.
image

On the Headers tab:
image

Add a Cache-Tags header with [invalidation: expression] content. This tokenized expression will be replaced by the value of the invalid cache tag module Token.

The rest can (and should) be configured to suit your taste and specific situation. Now Drupal will include the Cache-Tags header in all of its responses, where it will enter the list of cache tags that participated in the generation of the generated page. Varnish will store these headers as part of its cache and will use them to invalidate the cache.

In this place, make sure that your Drupal, when requested directly (bypassing Varnish), does indeed contain a Cache-Tags header. In my case, it looks like this for the first page:
spoiler
Cache-Tags: block_view config:block.block.seven_breadcrumbs config:block.block.seven_content config:block.block.seven_help config:block.block.seven_local_actions config:block.block.seven_login config:block.block.seven_messages config:block.block.seven_page_title config:block.block.seven_primary_local_tasks config:block.block.seven_secondary_local_tasks config:block_list config:coffee.configuration config:shortcut.set.default config:system.menu.admin config:user.role.administrator config:user.role.authenticated http_response rendered user:1

By default, Drupal responds with Cache-Control: private, no-cache (i.e., prohibits caching) for all HTML resources it generates. This means that Varnish will not cache them. You need to enable external caching in the Drupal settings. Go to /admin/config/development/performance and select a non-zero value for Page cache maximum age .
image

I recommend starting with some innocuous 5-minute cache, and only when everything is debugged and successfully run in, go to full-scale values ​​that are either hours or days long.

Again, double-check - when requesting an Drupal page by an anonymous user, you should see in the answer: Cache-Control: public, max-age=[ ] . And when requested on behalf of an authorized user, Cache-Control: private, no-cache should arrive in response Cache-Control: private, no-cache . So we allowed Varnish to cache anonymous and only anonymous HTML pages.

Varnish setting


We now turn to the preparation of Varnish'a. The following should be added to your VCL.

First, Drupal will now send HTTP Ban requests to your Varnish. First of all, we must restrict access to the processing of this request, otherwise anyone will be able to clean the Varnish cache. To do this, we introduce the acl group:
 #  ,    Ban . acl purge { "127.0.0.1"; } 

It is necessary to add the processing of Ban requests. In your vcl_recv enter at the very beginning:
 if (req.method == "BAN") { #    . if (!client.ip ~ purge) { return (synth(403, "Not allowed.")); } #    “Cache-Tags” . if (req.http.Cache-Tags) { ban("obj.http.Cache-Tags ~ " + req.http.Cache-Tags); } else { return (synth(403, "Cache-Tags header missing.")); } #     . return (synth(200, "Ban added.")); } 

Also, by default, Varnish will re-translate the Cache-Tags and Cache-Control headers that he receives from Drupal in response to the final client (browser).

In the case of Cache-Tags - this is trivial garbage that you will drive across the network and an extra source of information about the internal structure of your infrastructure. And in the case of Cache-Control: public is a rather undesirable behavior, since Drupal allowed caching for Varnish, but it would be incorrect to cache the pages in the browsers of your visitors (after all, they will be late with the updated content and our entire clever scheme about disabling the cache will be used). Therefore, in vcl_deliver you need to add:
 #     . unset resp.http.Cache-Tags; #       . if (resp.http.Content-Type ~ "text/html") { set resp.http.Cache-Control = "private,no-cache"; } 

At this stage, you can cross over and restart Varnish on a live server. Now Varnish can selectively clean its cache using HTTP Ban requests that Drupal will send it. It will also cleverly cache HTML resources (those that Drupal will allow it to do), but will not allow browsers to cache them. This is where its name comes from, as a class — reverse HTTP proxy cache — we cache it on an HTTP proxy, not on the client.

Buns, candy wrappers and cookies


Recheck the entire configuration. Finding out about this will be much more painful from the client at 2 am than immediately after making the changes. Believe my experience of intuition.

Cron processor and Late runtime processor modules


I mentioned that we need one of these 2 modules. Disabling Varnish's cache is done on a queuing basis — during the processing of a request, Drupal inserts certain cache tags into a queue, and then someone must process this queue by sending the appropriate HTTP Ban requests to Varnish. Both modules solve the problem of processing the queue, but they do it in different ways.

The cron processor processes the queue when the cron of Drupal starts. This means that cron becomes strategically important (it should be considered as such before). Also, it follows from this model that there is a delay up to X (where X is the launch frequency of cron), when Varnish can render cached and already outdated HTML content. In some cases this is acceptable.

An alternative (and therefore more aggressive) queue handler is the Late runtime processor. It processes the tag cache queue at the end of processing each incoming HTTP request, i.e. in real time.

Choose one of the two handlers for your taste and requirements submitted to the website. I got the impression that most Drupal 8 websites use the Cron processor. However, problems and overt bugs with the Late runtime processor are also not registered.

Drupal communication channel → Varnish


From the host where Drupal is located, make curl -X BAN --header 'Cache-Tags: dummy-cache-tag' http://varnish-server.com
Make sure Varnish responded with 200 HTTP OK. This means that you have correctly configured Varnish, and it is ready to process requests for disabilities that Drupal will send.

Debugging info in Drupal watchdog


On the /admin/config/development/performance/purge page, you can configure the required logging level in Drupal's watchdog. Very useful when you need to identify a problem.

Disable queue size


There is some current status on the /admin/config/development/performance/purge page. In particular, pay attention to the counter Queue size . It shows the current queue length for invalidation of tag cache - this is the number of cache tags that have already been invalidated in Drupal, but have not yet been invalidated in Varnish.

If you use invalidation through cron, make sure that this number drops to zero after running cron.

If you use the “Late runtime processor”, then you should expect non-zero values ​​only periodically. Too high a number, or not falling for a long time to 0 will be a sign of some problems.

Cache hit in Varnish


No need to believe the theory. After the work done, it is better to double-check in practice the fact that Varnish correctly caches anonymous HTML pages! If your Varnish does not initially insert cache hit headers into its responses, then it is enough to add to vcl_deliver :

 if (obj.hits > 0) { set resp.http.X-Varnish-Cache = "HIT"; } else { set resp.http.X-Varnish-Cache = "MISS"; } 

Make sure in practice that when requesting a page from an anonymous user, Varnish cached it (the second time, it should reply with the X-Varnish-Cache: HIT header). At this point, provoke the invalidation of the page under study in Drupal - clean the Drupal cache or resave the node (if you are analyzing a page of a node), etc. Do not forget to run cron Drupal'a, if he is responsible for the invalidation of the cache in Varnish'e.

Repeat request for the page under study. The first answer should contain the X-Varnish-Cache: MISS header - this way you confirm that Drupal successfully notified Varnish of the invalidation of the required tag cache, and the latter successfully processed the received invalidation request.

HTTP response header length


Sites on Drupal 8 can often be cumbersome. In particular, this could result in a long list of tag caches associated with a particular HTML page. Many web servers and generally Webserver + PHP stack in different implementations may have a limit on the maximum length of the response headers. On this, you can really stumble. Moreover, if the test environment has a test (truncated) data set, then the problem will appear only on the live environment. It is necessary to treat the problem depending on how your stack is deployed.

Update: It turned out that there is not so simple with the length of the response headers. Here is an issue on drupal.org where I downloaded the patch. Patch does a pretty good job of reducing the length of the headers www.drupal.org/project/purge/issues/2952277

Once again about Cache API in Drupal 8


Described in the article is only a small part of how the cache API is used in Drupal 8. I personally consider the Cache API as the main innovation compared to Drupal 7. If readers have an interest, then I can write the following article, where I will look more deeply at issues of internal design. cache API; how caching is intertwined with rendering in Drupal 8.

Source: https://habr.com/ru/post/350978/


All Articles