Experience in developing a total caching engine

I would like to tell you about my experience in creating an engine for a specialized website, a feature of which is the ability, ideally, not to use a database at all. I would like to share my solution to the problem of episodic high loads and get feedback on similar solutions and improvements.

So, I was tasked with developing an informational site based on user content — blog posts. The site is edited, which collects posts from the Internet and makes up plots of them, supporting with various relevant information. The specificity of the site is such that with an average load of 5-10 thousand visitors per day in the event of a socially important topic, where you can get fresh information in the blogosphere, traffic to specific materials increases many times (sometimes by orders of magnitude, as in the case of a terrorist attack or unexpected political solutions). The decision was made: we cache the most requested. But let's first define some assumptions:

Front-end is almost static - the materials come to the database through the CMS, and the user does not add or change anything. The content on the site is rarely entered relative to the number of views, so the CMS has the right to be more voracious than the front-end;
We have only one weak server at our disposal, but it is possible to add memory;
The amount of RAM is much larger than the volume of the database (RAM at the initial stage is 8GB against the current 500 MB of text data in the database);
Individual materials have tens and hundreds of thousands of visits, while most - hundreds;
We use PHP / MySQL / Memcached.

')
At the same time, there is a problem: the structure of the pages is rather complicated and variable to cache it head-on. The page consists of blocks, some of which are constant, some depend on the request parameters (for example, “other materials on the topic”), and some are not cached at all (for example, search on the site). In order not to produce an abundance of duplicate pages, it was decided to compile the page based on the template and blocks, which are formed by modules that take into account only the necessary parameters.

Let's take a look at the structure of the template for the Subject page:

Subject template

We have a three-column design. Modules are embedded in each column. The top_menu module does not depend on any parameters, the content_subject module depends on the material ID and page number, the rest - only on the material ID.

Now let's look at the structure of the module that forms the HTML code of the block:

Module

The module interface contains 3 methods necessary for working with the caching system:

getCode () - is engaged in the generation of code for the block and takes into account the parameters passed from the kernel;
getDependencies () - returns a list of dependencies. Here the module receives: the name of the table in the database, the name of the action with this table (add, delete, update) and the material ID in this table (if any). According to them, the module calculates the names of the dependent blocks and returns their list. Example: an action to add an article and return a list of all pages of the section so that they are marked as obsolete by the engine engine;
getParameters () - returns an array of those parameters that affect the formation of the code. It is necessary for the correct connection of modules to the templates. Some parameters may be redundant and we would get a numerous list of duplicates in Memcache.

How a mapping request is processed

The engine performs the following actions:

Router determines by the URL the name of the action $ action and its parameters. In my implementation, they are written hard;
The tpl, $ action template corresponding to the action is connected (for convenience, their names are the same), from the cache (in the case of a miss, we consider the template from the disk);
Getting the list of modules from the var, modules cache (in case of a slip, we get a list of module files);
Getting the parameters of modules from the var, params cache (in case of a miss for all modules, we will execute the getParams () method);
Bypassing the template in search of plugins. The found module is checked via in_array with a list of modules to prevent errors. For each module:
- If the module parameters contain “nocache”, the generated block will not be cached;
- If among the “increment” parameters, we will increase the view counter, which we also have in the cache (if not, we obtain from the database);
- We select the parameters of the module call: those that are specified, from those that are required;
- Choose from cache or execute getCode ();
  - In the found code we are looking for counter markers for viewing materials in order to substitute actual values.

On the last point I want to stay special. Among all the "static paradise" safely received from the cache, we have exceptions. These are exception-modules, which are mentioned above, and which are not cached, but also counters for viewing materials. When you call a module such as subject_content, which generates the main part of the theme page, the number of views will automatically increase and immediately in the cache (cnt, subject, $ id), but these values are actively used in the design of material announcements. Therefore, for them we have special markers, by which values will be taken from the cache and inserted on the fly.

In general, the cache structure is obtained as follows:

Moreover, the order of the “assembly” of the page is exactly the same as in the diagram: the code calls generated by the modules ©, into which the values of the counters (cnt) are inserted, are inserted into the template (tpl).

In addition to the engine on the server, scripts executed from the scheduler are executed - robots. Without taking into account site-specific functions, I will mention two robots: resetting the counters from the cache to the database, and asynchronously updating various site statistics and blocks like “popular materials”. The first is needed so that the number of views of materials is not lost, and the second is to keep the blocks up to date for which calculations must be made periodically.

Algorithm CMS

Here everything is much easier. When updating / adding / deleting materials, all modules are polled if it is relevant to this action with this table and given ID (if any). For example, during the update action for a material, only one index page is reset, where the announcement of this particular material is located. All modules check if they use data from the table with the transferred name and, if so, how. In my implementation, the blocks with “Fresh articles” are always reset, without checking whether a specific material ID is used in this list, I keep the balance between manufacturability and reasonableness.

So, for the entire list of modules from var, modules, the getDependencies method ($ tableName, $ action, $ id = 0) is executed, and the resulting list of blocks for reset is passed to the kernel to set the checkbox “outdated”. Blocks will be regenerated upon request from the front-end (and maybe they will not if the material is deep and no one needs it anymore).

Practice using the engine

The site has been successfully operating since 2010 and has experienced a number of cataclysms that we managed to endure thanks to the architecture of the engine. Once we burned the hard drives in the raid, and, both at once. The editors were given a go-ahead to temporarily stop updating the site so as not to reset the cache, and the site worked successfully all the time the disks were installed and data was returned from the backup and the disks were synchronized. Another time there was a terrorist attack in Domodedovo and visitors rushed to look for the most current information about the event and about 70 thousand visitors came to the relevant topic within half an hour after the tragedy. The time of issuance of pages increased to 10 seconds, but the server survived.

If it is interesting to see how the growth of attendance affects the consumption of CPU time and memory, let's look at the recent incident that occurred on September 25. That's what Liveinternet.ru says about it:

Liveinternet statistics

Attendance growth is about 7 times. As I wrote above, as a rule, traffic goes to some separate materials, and this case is no exception:

Memory consumption varied within the framework of statistical error:

About the CPU time, the load was slightly felt:

(Two “spikes” at the end of the 20th and the 27th are associated with a weekly full backup.)

Memcached statistics:

[uptime] => 6371668
[get_hits] => 409123948
[get_misses] => 6869860
[incr_misses] => 1259
[incr_hits] => 2476204
[bytes_read] => 13353236827
[bytes_written] => 135590836194
[bytes] => 358927266
[curr_items] => 1246460
[total_items] => 1733562

Misses when reading: 1 to 60, uptime 74 days.

I will be glad to hear questions and opinions. How could the engine be improved? Make it more versatile? What are some similar solutions?

Source: https://habr.com/ru/post/239871/

All Articles

Experience in developing a total caching engine

How a mapping request is processed

Algorithm CMS

Practice using the engine

More articles: