📜 ⬆️ ⬇️

Experience in developing a total caching engine

I would like to tell you about my experience in creating an engine for a specialized website, a feature of which is the ability, ideally, not to use a database at all. I would like to share my solution to the problem of episodic high loads and get feedback on similar solutions and improvements.

So, I was tasked with developing an informational site based on user content — blog posts. The site is edited, which collects posts from the Internet and makes up plots of them, supporting with various relevant information. The specificity of the site is such that with an average load of 5-10 thousand visitors per day in the event of a socially important topic, where you can get fresh information in the blogosphere, traffic to specific materials increases many times (sometimes by orders of magnitude, as in the case of a terrorist attack or unexpected political solutions). The decision was made: we cache the most requested. But let's first define some assumptions:

At the same time, there is a problem: the structure of the pages is rather complicated and variable to cache it head-on. The page consists of blocks, some of which are constant, some depend on the request parameters (for example, “other materials on the topic”), and some are not cached at all (for example, search on the site). In order not to produce an abundance of duplicate pages, it was decided to compile the page based on the template and blocks, which are formed by modules that take into account only the necessary parameters.

Let's take a look at the structure of the template for the Subject page:

Subject template

We have a three-column design. Modules are embedded in each column. The top_menu module does not depend on any parameters, the content_subject module depends on the material ID and page number, the rest - only on the material ID.

Now let's look at the structure of the module that forms the HTML code of the block:


The module interface contains 3 methods necessary for working with the caching system:

How a mapping request is processed

The engine performs the following actions:

On the last point I want to stay special. Among all the "static paradise" safely received from the cache, we have exceptions. These are exception-modules, which are mentioned above, and which are not cached, but also counters for viewing materials. When you call a module such as subject_content, which generates the main part of the theme page, the number of views will automatically increase and immediately in the cache (cnt, subject, $ id), but these values ​​are actively used in the design of material announcements. Therefore, for them we have special markers, by which values ​​will be taken from the cache and inserted on the fly.

In general, the cache structure is obtained as follows:

Cache structure

Moreover, the order of the “assembly” of the page is exactly the same as in the diagram: the code calls generated by the modules ©, into which the values ​​of the counters (cnt) are inserted, are inserted into the template (tpl).

In addition to the engine on the server, scripts executed from the scheduler are executed - robots. Without taking into account site-specific functions, I will mention two robots: resetting the counters from the cache to the database, and asynchronously updating various site statistics and blocks like “popular materials”. The first is needed so that the number of views of materials is not lost, and the second is to keep the blocks up to date for which calculations must be made periodically.

Algorithm CMS

Here everything is much easier. When updating / adding / deleting materials, all modules are polled if it is relevant to this action with this table and given ID (if any). For example, during the update action for a material, only one index page is reset, where the announcement of this particular material is located. All modules check if they use data from the table with the transferred name and, if so, how. In my implementation, the blocks with “Fresh articles” are always reset, without checking whether a specific material ID is used in this list, I keep the balance between manufacturability and reasonableness.

So, for the entire list of modules from var, modules, the getDependencies method ($ tableName, $ action, $ id = 0) is executed, and the resulting list of blocks for reset is passed to the kernel to set the checkbox “outdated”. Blocks will be regenerated upon request from the front-end (and maybe they will not if the material is deep and no one needs it anymore).

Practice using the engine

The site has been successfully operating since 2010 and has experienced a number of cataclysms that we managed to endure thanks to the architecture of the engine. Once we burned the hard drives in the raid, and, both at once. The editors were given a go-ahead to temporarily stop updating the site so as not to reset the cache, and the site worked successfully all the time the disks were installed and data was returned from the backup and the disks were synchronized. Another time there was a terrorist attack in Domodedovo and visitors rushed to look for the most current information about the event and about 70 thousand visitors came to the relevant topic within half an hour after the tragedy. The time of issuance of pages increased to 10 seconds, but the server survived.

If it is interesting to see how the growth of attendance affects the consumption of CPU time and memory, let's look at the recent incident that occurred on September 25. That's what Liveinternet.ru says about it:

Liveinternet statistics

Attendance growth is about 7 times. As I wrote above, as a rule, traffic goes to some separate materials, and this case is no exception:

Memory consumption varied within the framework of statistical error:

About the CPU time, the load was slightly felt:

(Two “spikes” at the end of the 20th and the 27th are associated with a weekly full backup.)

Memcached statistics:

[uptime] => 6371668
[get_hits] => 409123948
[get_misses] => 6869860
[incr_misses] => 1259
[incr_hits] => 2476204
[bytes_read] => 13353236827
[bytes_written] => 135590836194
[bytes] => 358927266
[curr_items] => 1246460
[total_items] => 1733562

Misses when reading: 1 to 60, uptime 74 days.

I will be glad to hear questions and opinions. How could the engine be improved? Make it more versatile? What are some similar solutions?

Source: https://habr.com/ru/post/239871/

All Articles