It's about caching in the web, or rather how and where to start. I often see how web developers who have no experience with caching, when starting to work, do everything wrong, and then they think why they don't get fresh data (they sometimes think that it cannot be otherwise) or why the server load has not decreased.
Of course, everything depends on the task, that is, the approach will be different everywhere. I will tell on the example of a news site in which cases you need caching and in which you do not need it, and of course, with examples.
Continued
Basics of caching. Practice')
Do we need to use caching?
Before we start caching everything, do we need to define it? It may be needed in two cases:
- Reduced server load. Everything is clear here, the server is
choking and unable to cope with the task.
- Reduced page generation time. There are cases when data processing before output takes a lot of time. Instead of processing them each time, you can process them once and put them in the cache. As a result, data from the cache will be given instantly.
Where to begin?
And so, we realized that you need caching like air. But how to determine the places that need it, and who definitely do not need it? Let's consider, as an example, an ordinary news site. In most cases, the database becomes the bottleneck, so we need to cache the samples. What are our most visited pages?
- Home, its components are a lot of blocks (latest news, popular for the last week, the most commented news, recent news comments, etc.).
- View the news itself, and there and comments to it.
- For an authorized user, a system of private messages is available, in this case we have to make a request to the database on each page, checking whether there are new messages, and if so, inform the user.
What we need to cache, we realized, and what is not necessary? Although there is probably a question whether to cache? For example, a list of private messages. In our case, we don’t need to cache it, since the site’s specialization is news, users only view them when they receive new messages, which is rare.
Getting to the theory
There are several caching tactics:
- Obsolescence (for a certain time).
- Disability (forever and if necessary, kill him yourself).
- Combining (for a certain time, but also if necessary, we kill him).
When using obsolescence, the time at which the data will be cached is selected depending on the frequency with which this data is updated and the level of importance for obtaining actual data. We have identified the places with which we will work, so let's proceed.
Home pageDue to the fact that on this page we have a lot of blocks, we get a lot of queries to the database. It would be possible to cache the contents of the main page as a whole, and update it once every 10 minutes, but since our blocks have different refresh rates, they will have to be cached separately. Consider each block.
- Latest news. Cache it forever, kill when adding news to the site.
- Popular news for the last week. We cache for a day.
- The most commented news. Cache for an hour.
- Recent comments on the news. Cache forever, kill when adding a new comment. If new comments appear very quickly, we cach the block for one minute.
View newsHere the stage of caching is divided into two parts - the news itself and comments to it.
a)
News . Imagine that the news is made up by bb-codes, and the conversion process to html is time-consuming and sometimes even lengthy (regular expressions are also about how much CPU time is consumed), which means we have to convert and cache the finished html once. We cache the news forever, and kill the cache when changing / deleting the news. But what if we have a view counter you ask? Everything is very simple, it would be possible to constantly update the cache of the news itself, but this trick is risky as there is a possibility of data integrity violation. To do this, we will create a cache of views. When viewing the news, we will have a request to the database by updating the number of views, as well as the cache increment of views. Here we also cache forever, delete news when deleting.
b)
Comments . We also use bb-codes in comments, here we also store ready-made comment html, but we cache a serialized array of comments, for which I’ll say a little further. Cache forever, delete the cache when adding a new comment / editing or deleting any comment to this news / deleting the news itself. And what if we have several pages of comments? We keep all comments in one cache, and before direct output we beat them onto pages.
Check for new messagesHere it is necessary to think carefully before choosing the tactics of caching, since it is selected for the type of load. Consider several options:
a)
Few users, permanent. Cache forever.
b)
Many users, permanent. It depends on what is more expensive to us, memory (for the cache) or a decrease in the load from the database. If there is a lot of memory and we don’t feel sorry for it, we will cache it forever, otherwise during the session.
c)
Any number of users unique. Cache for the duration of the session.
The check cache for new messages is always deleted when a new message is received and when a user is deleted.
At this the theory ends, and the practice is yours.
Ps. I hope those who want to get acquainted with caching, but do not know where to start, will be useful. Thanks for attention.