What to cache?

All This could be the end of the article, since this axiom is repeated from year to year in forums and conferences and wanders from text to text on all technical resources. However, this capacious "Everything" does not explain the details. After all, there is a fairly wide layer of programmers, engines and projects that cope with the task without memkesh and sharding. But the load moment comes, and it is necessary to understand.
For such people, I disassembled for parts this is the universal answer of high-loaded projects.
We limit ourselves to studying the system for the web. Simply put, on the usual usual site. Whether you use a ready-made CMS, or have already grown to a frame , or have written from scratch the code for a non-standard project - the same elements will always be the main elements of the process of receiving and returning data. I will consider them from the point of view of where it is possible to cache.
Stages of issue

Database

Even if you have a file database, there is a driver (right?) Who is responsible for issuing the necessary information.
What we have here? Usually it is inefficient to locate the data alphabetically, so the best minds have already invented a huge number of methods, methods and hacks for high-speed information delivery on a platter. And you should know that databases do not just enter indexes, but cache them, cache query results, optimize the structure of relational tables, and try once again not to scan tons of data. But, faith in geniuses is one thing, and checking the configuration file for the inclusion of these mechanisms is the duty of every developer! If you have a separate server administrator (or servers), then at the first fixed one, make him tighten the nuts in the settings of your most important node. Examine memory usage, fragmentation, and disk space availability for temporary files.
Timely competent query caching can give you enough time in a fast-growing project, for tightening the nuts at all subsequent stages.
This optimization is available to all participants of the regatta with full access to the hosting and will have to deal with the topic, sooner or later.
There is another smart full-size cache called a database replica. But it is unlikely that you will read this article if your project already has such a complex structure as the master database, its replica, web server, file server and statistics collection group. In any case, a separate database for reading with a stretch can be considered a cache.

Logic (or our Code, which we are proud of)

The programming language does not matter. It is just a tool for processing the received data. Even if you do not have a modular structure and the entire project is one file, you still work with the data.
If your code is yours, then you must include your head while writing the code. The apologists of the rapid development and the holy ORM will say that there is nothing to look for and it is better to cache somewhere else. There is a grain of truth, however, what kind of monsters do not give birth to the cursors!
So, what should we pay attention to when we are already looking at the code?

Optimize database queries. Why pull all the data when you need only the index and date? The smaller the request, the easier it is to cache it.
Using the database utilities, determine the so-called slow queries. Think about indexes in tables or in rescheduling requests and the necessary data for caching this does not apply, but since we're here, wouldn't it be true right now?
After stabilization of the code, look where you refer to the database twice for the same or similar data. It may be better to make one query thicker, but then the data obtained is used in several modules / functions / cycles? This is not caching yet, but further optimization — taking complex or massive computational results into a cache (into a database or file) as a serialized array, or whatever is convenient for you, will be very useful. And then the intermediate calculations will be carried out faster, less expensive and more comfortable for the site, tasks on the schedule and API.
Collect all the little things that usually hang out in a session, in variables, in cookies in one place (if the security policy allows) and refer to such a notebook. This is better than calling the file system, calling the class methods that call the function of the general class, which gives the user login or, oh, horror, pulls the base with a simple query. After all, you are not sure that reading the session comes from the RAM and not from the session file? Or maybe the admin without a request made the session in the database and did not take this into account? Or the code logic implies the internalization through tons of abstractions and calls to subfunctions, and you need to return the word "Recycle" to the Ajax three times.

Engage code is recommended last. That is: the business is established, the base is optimized, the external cache works, everything is in order, we are expanding. Cache data in this node should be at the stage of optimizing the cost of expanding the server part, or when a narrow mathematical part does not have time to process large data streams. This happens in the cumbersome marketing plans of network companies, generating reports on elaborate statistics and other such things. The result of the optimization will be just a reduction of the load on the not very loaded server and will delay the time of purchase of the balancer and the second server.
If you are still sitting on the CMS and start thinking about optimizing it, then it’s time to rewrite the engine for your high-load needs or buy more RAM. No money? Then urgently engage in the monetization of the resource, and not picking the depths of the code!

Template engine

It seems to be the code, but a bit more specialized. This element is almost always present and is also important for our research.
What is a pattern? This is a kind of construction that serves as the raw material for creating code for subsequent compilation. The only difference is that in this code a huge number of tags, lines and links, which are interspersed with fragments of logic. The thing is quite understandable, but I noticed that many beginners do not delve into the details of the work of smarti and blitz. And in vain!

If rude. This template engine first looks for previously compiled code (or checks its relevance). If it doesn’t exist, it re-processes the template, writes the file (sometimes deleting the previous one). Sometimes, the developer simply forgets to enable similar caching of templates (in the working version it is inconvenient to clean the cache). Worse, when the option is enabled, but the lifetime of the precompiled file is small. Then the delay is irregular and it is difficult to “catch” it for a beginner. Syndrome "I do not slow down."

Understand the caching capabilities of your template engine, select the optimal update intervals (if necessary). In extreme cases, set a task in the schedule for the forced update of this cache at night, so as not to get out of the peak. Examine the speed characteristics of the drive where the caches are stored. Maybe it makes sense and there is an opportunity to make in RAM?
Setting up caching of this stage is required at the beginning of the project development. But the search for optimal values can already be transferred to after tuning the base and introducing the issue caching.
For those who sit on the finished CMS, there is not enough space for fantasy. However, check what is used (algorithm or finished product), check where the template cache is stored and how it behaves will not be superfluous.
')

Result

His Majesty the generated page should be delivered to the client as quickly as possible with the necessary relevance. And here we can think of something separately, for example, file caching or favorite many memkes.
If all of you are still reading this article with interest, then your project, most likely, has already encountered problems with slowing down, but is still alive and working as usual.
For aksakals of project implementation and site building, there is nothing new in setting up a memkesh cluster and their problems rest on the relevance of the sample. But, I hasten to surprise you, there are a huge number of projects of normal size, in which the hands of the webmaster have not yet reached the normal normal cache in noSQL DB. First, they raised the project as best they could, and formed the idea on the fly. Then it might have been trite not to have enough money for the second server, where it would be worthwhile to bring our plump database. There are a huge number of people who are well versed in business, design, write good texts. They raised the project, but now the forces go deep into the administration * nix to install the memkesh anymore.
In short, if you grew out of a diaper, the first thing to do is to cache the page that is ready after the template engine. First static. Then static fragments of dynamic pages. Next, look for your bottle neck. If you do not have permissions to set up the key-value storage, at first make yourself the main file cache of the most requested pages. Even such a primitive thing will save more than months of optimization on previous nodes.
Users of content management systems should look in the bins for a caching module for their version. Or even the most successful version of it. For example, MODx immediately includes a mechanism for file caching of both entire pages and individual chunks and snippets. But there is a module that instead of file storage offers to use memokish.

Balancer

If you do not have it yet, it will appear with time. With his appearance, the entire crap from above will be called a backend, and our subtitle will be called the frontend. In fairness, it must be said that the file caching of the result and the frontend of the architect’s quirks can be located either side by side or in reverse order. But let's not complicate taxonomy simple article for a beginner.
The task of the balancer is to arrange the queue of ordinary requests between all backends, determine by request type where to send it, etc. The main optimization at this stage is the return of static elements, bypassing all heavy barricades from the web serer with logic and a database with data. These are mainly pictures, style sheets, documents. You can set up a separate file cache, which will not be given by the backend, but directly by the balancer. You just need to arrange the delivery of html (js, css, xml) pages along the right path in a timely manner.
You can write your own module with personalized logic for caching backend results. After all, the answer is not necessarily kilomegabyte. It is possible to remember JSON with the current best player or the last admin tweet in the memory of the balancer server at once.
To resort to the introduction of the balancer itself is possible in the early stages of project development. Jengenix before Apache will not eat much, and the scalability of the project will grow.

Customer

He, too, is subjected to our persuasion to remember something and not to once again ask us busy to torment all the demons of the system. And the browsers themselves are successfully trying to rid the remote server of unnecessary suffering. Locally cached graphic files and other multimedia. Remember login-password pairs, last visited resources and so on. But we can make an even more advanced version: after all, we know what we want from our page!
In the minimum version, we can count on storing any personalized trifles in cookies. Flash has some megabyte margin for overclocking. HTML5 gives us an even bigger ground for storing little notes in the format of huge Talmuds. Save the client-side secure information and modify the page with JavaScript. Trash, last sent messages, current ammunition and so on.
It is even better when you untie the logic of your application from the familiar PHP and transfer it to the side of our favorite user. The script itself is cached by the browser (and the pre-bullet will be given by Engenix) and calmly on the server side will calculate the statistics of personal data (purchases, bonuses, killed monsters) from the local cache.

We will summarize

I hope for all startups, businessmen, and not techies, the phrase in the title and the short answer now mean much more than just having to pay money to solve the problem. We have reviewed the general scheme of the usual web project in the context of giving a single response to the user and now we understand what and when and in what direction we can optimize with caching. Before setting programmers to optimize the code, it is already possible to safely analyze the effectiveness of purchasing RAM or introducing a new server with a balancer and a memo on it.
How and what exactly to cache is another question. Where and when, I think you can already determine:

Quickly configure the basic trivia in the database configuration, check the caching of the template engine and | or CMS. If possible, immediately adjust the balancer and organize the return of statics bypassing the web server.
We cache the result (statics, dynamic fragments, full cache) into files or key-value storage.
We deeply study our database and set up the most efficient query caching, we perform key optimization. Maybe it's time to introduce a replica.
We study the insides of our template maker, optimize parameters. Perhaps changing the familiar Smarty to a faster, but difficult to implement Blitz. We reduce and accelerate file operations at this level.
We eliminate jambs and excess of our code. We optimize work with data from the database and put in order work with trifles.
In the working code and the developed business logic of the project we begin to transfer to the client's side routine operations, personalization and its processing.

And do not forget: caching is not the only method of project optimization!

Source: https://habr.com/ru/post/129623/

All Articles