Recently, in heavily loaded sites, the use of Partial Caching or block caching has become increasingly common. This is achieved, as a rule, through the use of seemingly long forgotten,
SSI or technologies close to it (for example,
ESI ). For example, in bundles of Nginx + Memcached + SSI or Varnish + ESI.
Recently, a
topic in which the author described this caching method also appeared on Habré.
In this topic in the 3rd version of the solution, the author invited the readers of the topic to bring their own solutions to this problem.
')
This topic is actually dedicated to this topic.
Formulation of the problem
In most cases, the web page consists of blocks. For example, for the simplest page, these are blocks: a header, a basement, a right or left block, and a block of main content. If the site is more complex, then, accordingly, there will be more such blocks, for example, for a habr these are the blocks: “last posts”, “last comments”, “similar posts”, etc. Accordingly, problems arise if we want to cache the page at the presentation level, i.e. directly generated html because to invalidate the cache for such a page would have to change any of the blocks located on this page.
Therefore, in most cases, caching is applied at the model or data level, which subsequently fill in a certain page template.
This is where SSI comes to the rescue, thanks to this technology, we actually break the page into these very logical blocks, and cache each block separately.
An example of a page using the SSI insert:
<html> <body> <div class="header"> </div> <div class="main_content"> </div> </body> </html>
<html> <body> <div class="header"> <!--# include virtual="/header.php" --> </div> <div class="main_content"> <!--# include virtual="/main.php" --> </div> <!--# include virtual="/footer.php" --> </body> </html>
Here, it would seem, all is well, but there are several BUTs on which I would like to linger.
Problems
- Personalized blocks are blocks that contain the personal data of a user, for example, “Hi% username%!”. In fact, such data may be very much, take the same form on VKontakte. Do not confuse them with blocks for authorized users! There are only two copies of the second ones in your cache (for logged in users and not), for the first ones you will have to store the presentation in the cache for each user! Keeping the following keys in the memok {% block_id%} _ {% PHPSESSID | user_id%}. And since we have caching at the presentation level, i.e. In addition to the data, we also store a bunch of html code that will be repeated for each user, therefore, the memory consumption for the cache (Memcached) in this case is very high. I’m not even talking about the fact that in a large farm there are memkes of servers, some servers fall off from time to time and even with the Consistent hashing algorithm, there are still problems
- It takes a lot of time to warm up the cache (usually after reboots, releases of new versions, etc.)
What is offered?
And the following caching mechanism is proposed:
- The blocks responsible for the presentation are generalized for all users, i.e. we take out all the personalized data from them in order to store only one copy of the block in the cache for all users of the site. What remains of these blocks? That's right, the usual presentation templates remain, which we will pass to the user, and each user will fill in this template himself, on the client side, with the help of JavaScript. Those. the client on request to the page will receive a page consisting of logical blocks, each block, in turn, will be a template. for example
<html>
<body>
<div id = "head_block">
Some {% personified%} data here
</ div>
<div id = "main_block">
Hello {% username%}!
</ div>
</ body>
</ html>
Well, or, for example, so
<html>
<body>
<div id = "head_block">
Some <div id = "{% personified%}"> </ div> data here
</ div>
<div id = "main_block">
Hello <div id = "{% username%}"> </ div>!
</ div>
</ body>
</ html>
- In order to fill in the data with a javascript, you need to receive it from somewhere. We will receive data using the container frame well, or using an AJAX request. Who like more. Those. the page that returns will contain an invisible iframe or Input hidden containing the URL, referring to which we will receive a list of URLs with data for each block.
As a result, the user receives such a page.
<html>
<body>
<div id = "head_block">
Some <div id = "{% personified%}"> </ div> data here
</ div>
<div id = "main_block">
Hello <div id = "{% username%}"> </ div>!
</ div>
<iframe src = "all_blocks_data_urls.php" style = "display: none"> </ iframe>
<! - or so ->
<input type = "hidden" name = "all_blocks_data_urls" value = "all_blocks_data_urls.php" />
</ body>
</ html>
- The all_blocks_data_urls.php script is the simplest script that checks if there is a cache for a given user. It looks like this:
<? php
.......
$ key = $ memcached-> get ($ user_id.'all_blocks_data_urls');
if ($ key) {
header ("HTTP / 1.1 304 Not Modified");
exit;
} else {
// Extracting URLs here and send to user
}
?>
Those. in this case, we do not store data in the cache. The key for each session serves as a semaphore for us, indicating whether the cache version is valid, which is kept by the user or not. If valid, then simply return 304 header, and say that the data was not updated, if not - update the list of urls. This is below.
- So, as I wrote above. Each data set for a block is represented by a URL that is returned via the request to the IFRAME at the address “all_blocks_data_urls.php”. On the server side, we have a semaphore for each user, accessible by key
$ memcached-> get ($ user_id.'all_blocks_data_urls')
as well as a list of keys in a memkesh with generated urls for data in blocks, for example, of the form:
$ user_id .'_ '. $ block_id =>' hash_for_url '
Now, the most interesting: each URL for getting user-specific data, for example
www.site.com?block_id=1&hash=hash_for_url and returns with the Expires heading far, far away in the future, i.e. we cache data using http headers in the browser forever.
- When the user updates the data associated with a logical unit, for example, the education section in the social. then we reset the key in the memkey for the $ cache memcached-> delete ($ user_id.'allalllocks_data_urls ') cache block and also delete the key that stores the URL with the data for this $ memcached-> delete block ($ user_id .'_' . $ block_id). On the next request to the IFRAME container, the all_blocks_data_urls.php script will not return the 304 response, but will re-form the URL for the missing block and return the list to the user, having previously installed two keys in the memkey ($ user_id.'all_blocks_data_urls 'and $ user_id .'_'. $ block_id). Moreover, the data itself, i.e. We do not store data cache anywhere! All caching is organized at the user's browser cache.
- After the client has received all the URLs pointing to the data for the blocks. He begins to request them, because Since these URLs were sent by the server with the Expires header, the client will receive most, if not all of the data from the browser’s cache. By the way, you can choose any format of data: JSON, XML, HTML. You can even customize it to be custom in different formats! Well, actually at the end, using JavaScript, we process our templates and fill them with user data.
Results
- We received block caching, which I call CSI (client side includes), based on caching http headers on the client side, while the amount of data that we store in the cache on the server is much reduced
- In the most sense they separated logic from representation. By and large, using this approach, it is very easy to make various layouts for sites or custom layouts. Just need to connect other JavaScript libraries. Again, different presentation for different platforms, in particular mobile.
The disadvantages of this caching method can include problems with indexing, because some search engines' crawlers do not handle JavaScript
It will be interesting to hear your views on the approach and suggestions.
List of links