Application Caching Strategy

When talking about caching comes up, a paradoxical situation arises. On the one hand, everyone understands the importance and necessity of caching in the application architecture. On the other hand, few people can clearly explain what and how to cache.

Usually, people immediately begin to offer ready-made cache implementations, like memcached or HTTP cache, but this is only an answer to the question of where to cache.

Caching is one of many topics, along with security and logging, that everyone knows and says, but few can do it right.

')

Why do I need a cache

Kesh brings data to the place of their use. In the modern world, consisting of 98% of the Internet, data usually lies very far from the user. All the way from the repository to the user, there are caches that serve only one purpose - to get the user as quickly as possible.

If you look more closely, you can see that precious time is spent on processing data at the supplier and transferring data from the supplier to the customer, we do not take into account the time of data processing at the customer.

With high loads, caching is a must. It allows you to serve more customers with the same resources, because data providers are more relaxed. But even with low loads, caching has a positive effect on the responsiveness of the application.

Cache can not just turn on

One of the main misconceptions about caching is that many people think that you can simply turn on the cache.

At the dawn of my programming career, I just turned on caching once, literally an hour later I had to turn it off. Then I ran into the main problem with caching - data obsolescence . The user after changing the data did not see the result for 15 minutes.

It is very important to understand what and how you are going to cache, so as not to violate the logic of the application. And the first question you need to answer is how outdated data can be given to the client. Naturally, you can make your cache for each client, it will simplify the decision on the relevance of the data, but will bring many other problems.

Types of caching

There are three main types of caching on the mechanics of work:

Lazy cache, it’s also a lazy cache , it’s also a dumb cache — the easiest type of caching to implement; it is often built into frameworks. The cache simply saves the data and gives it away until it becomes outdated.
Synchronized cache, the synchronized cache client with the data is obtained the label of the last change and can ask the supplier whether the data has changed so that you can not re-request it. This type of caching allows you to always have fresh data, but is very difficult to implement.
Write-through cache, or write-through cache — any data change is performed immediately in the storage and in the cache. This type of cache may never become obsolete, but there are problems with the so-called “ coherence ”.

Probably you can come up with other types of caches, but I have not met.

Obsolescence and cache coherence

The cache size is always limited. Often it is less than the amount of data that can be put into this cache. Therefore, the items placed in the cache will be superseded sooner or later. Modern caching frameworks allow for very flexible management of obsolescence, taking into account priorities, time of obsolescence, data volumes, etc.

If the same data falls into different caches, then the cache coherence problem arises. For example, the same data is used to form different pages and pages are cached. Pages generated later will contain updated data, and pages that have been cached earlier will contain outdated data. Thus, consistency of behavior will be broken.

A simple way to maintain coherence is to force a cache to become obsolete (reset) when data changes. Therefore, increasing the memory for the cache so that it is less obsolete is not always a good idea.

Cache efficiency

The main parameter that characterizes the caching system is the percentage of requests hit in the cache . This parameter is fairly easy to measure in order to understand how effective your caching system is.

Frequent cache flushes, caching of rarely requested data, insufficient cache size - all this leads to a waste of operational (usually) memory, without increasing work efficiency.

Sometimes data changes so often and unpredictably that caching will have no effect, the percentage of hits will be close to zero. But usually the data is read much more often than it is written, so the caches are effective.

Application of different types of caching

Lazy cache

This is the easiest type of caching, but it should be used carefully, as it gives out obsolete data. It is possible to reset a lazy cache with each record in order to keep the data current, but then the implementation costs will be comparable to more complex types of caching.

This type of caching can be used for data that almost never changes. Another use case is to make a lazy cache with a short obsolescence time for stable operation during spikes.

This type of caching allows you to quickly answer.

Synchronized cache

This is the most useful type of caching, as it gives fresh data and allows you to implement a multi-level cache.

This type of caching is built into the HTTP protocol. The server returns the change label, and the client caches the result for you and sends the tag in a subsequent request. The server can answer that the state has not changed and you can use an object cached on the client. The server, in turn, having received the label, can ask again whether the store has made any changes or not.

This type of caching does not eliminate the overhead of communication between systems. Therefore, it is often supplemented with other types of caching to speed up the work.

Write-through cache

If there is a distributed caching system (memcached, Windows Sever App Fabric, Azure Cache), then write-through cache can be used. Hand-to-hand implementation of cache synchronization between nodes is itself a separate large project, so you should not engage in it as part of application development.

You should not try to cache everything in a synchronized cache, otherwise most of the application code will rebuild the cache.

Also, do not forget that distributed caching systems also require communication between systems, which can affect performance.

What else should be considered in the cache strategy?

Choose the correct granularity of the cached data. For example, caching data for each user is likely to be inefficient with a large number of users. If you cache data for all users at once, then there will be problems with data obsolescence and cache coherence.

Cache data as late as possible, just before returning to the external system. Caching data received from the outside is necessary only in case of performance problems at this stage. External storage, such as DBMS and file systems, themselves implement caching, so there is usually no point in caching query results.

You do not need to fence your bikes for caching in applications, usually there are already ready tools and you need to be able to use them.

Conclusion

I hope the article was interesting and useful for you. Comment, rate, I will be glad to any suggestions.

Source: https://habr.com/ru/post/168725/

All Articles