📜 ⬆️ ⬇️

Caching data, maybe the last thing you should use

Recently, I had a fairly hot conflict with the popular PHP e-commerce package. As a result, I wanted to talk about a common error in the web application architecture.

What is this error?

The package with which I worked intensively used caching. It could not render more than 10 pages per second if some “optional” cache settings were not included. Obviously, with such performance, they are not really optional, but mandatory.
I think that when you have such a great tool like memcached, you want to use it to solve any performance problem. But in many cases it should not be the first tool you are trying to use. And that's why:

- Caching may not work for all users nd - you open the page - it loads quickly. But is this for all users? Caching very often allows you to optimize the load time for most visitors, but often in reality you need the page to load quickly for everyone without exception (if you follow the six sigma principle). In practice, a query can always miss the cache for the same user, which aggravates the situation even more ( Approx. Translator : I know a very real case when the cache in the electronic store worked for 99% of users and did not work for 1% of visitors who had a long shopping history, as a result, the store worked slowly just for active buyers).

- Caching can lead you away from solving a problem - you look at the slowest loading page and try to optimize it. But the trick here is that in reality, the performance problem may lie in another area (again six sigma). You “heal” the problem by caching, for example, the entire page, but the performance problem itself does not go anywhere and remains hidden ( Note : in order to float on other pages again and again and again).

- Cache management in reality is not an easy task - Have you ever struggled with " running away from the cache " or with a situation when a large number of cache elements are invalidated at the same time?

Alternative approach

Caching should be considered as a burden without which many applications cannot live. You should try to avoid this burden until you have exhausted the entire arsenal of easily applicable optimization methods.

What are these ways?

Before you enter optimization, make sure that you go through this fairly simple list:

- Do you understand the execution plan for each request? If not, set long_query_time = 0 and use the mk-query-digest command to get a complete list of queries. Perform EXPLAIN for each of them, analyze the execution plan.

- Do you use SELECT * to use only a small set of columns later? Or do you choose from the database many lines, but use only some of them? If this is so, then you choose too much data, limiting the optimization of the DBMS level, such as using indexes.

- Do you know exactly how many queries you use to generate a single page? Are they really necessary? Is it possible to turn any of these requests into one request or to remove it altogether? ( Note of translator : A very common problem. I really know the case when the page displayed a list of students in a class, and then additional information was requested for each student, including the name of the class. After the conversion, the number of requests was reduced from 61 to 3).

I think that as a conclusion one can say: “Optimization very rarely reduces the complexity of the application. Try to avoid complication by optimizing only what is really to be optimized ”- quote from Justin’s slide - instrumentation-for-php .

From the point of view of a long-term perspective, many applications should keep the architecture simple and resist the temptation to solve problems like “real guys do”.

Note of the translator : Absolutely real dialogue which happened not so long ago:
- So we have performance problems, we need to add caching, vertical partitioning and NoSQL DB for logins
- Guys - I looked EXPLAIN here - you have a fullscan query for 4,000 rows, I tried to create an index, everything accelerated 26 times.

A few notes to the translation

1. The term cache stampeding - I translated it as a cache runaway (it was tempting to translate as “splicing”, but that would be wrong). In short, this is a situation where, for example, a certain query is executed long enough and the results of this query are cached, when this data then sooner or later leaves the cache, and 10 pages are simultaneously rendered on which this data is needed, 10 slow queries are sent to the database one. Usually, this is what a rewrite is trying to do before requesting data from the cache. see for example
2. I want to note that the article does not say that you do not need to cache data. They need to be cached, but only after you try some simple ways to optimize database queries. In other words, we must begin with a simple one.

Source: https://habr.com/ru/post/101227/

All Articles