What should we build Cache?

Not a few good articles were written on the topic “What, how and where to cache”. So why once again expose this topic? And because the topic is quite important, and many, until they encounter specific problems, do not consider it necessary to deal with it. So the audience I am counting on are those who were not interested in them by the time the existing articles came out, but now there is interest and they will not pass by.

I will try to briefly highlight the main points of caching organization, and then review the .Net Framework 4.0 innovations, which should simplify the life of developers (it will be a question of In-memory cache outside of ASP.NET infrastructure).

Introduction

Often, when it comes to performance, it is quite difficult to do without the use of caching techniques. But before we can apply it effectively, we need to answer the following questions:

What to cache : which data should be stored in the cache;
How to cache : what is the maximum amount we can allocate for the cache; whether the maximum allowed time will be set during which the item will not be considered obsolete; whether the relevance of our elements in the cache will depend on some external factors or will there be dependencies between the elements themselves inside the cache; whether the order in which we will delete the elements from the cache when the memory limit is reached will be important; and so on…
Where to cache : what will act as a cache — in devices, it can be a hardware cache; in programs, as a rule, we resort to a ready or self-written cache implementation that is able to satisfy the requirements in the “how” question;

Interestingly, the answers must be given exactly in the order in which these questions are listed. For it is difficult to say “where to cache”, not understanding “what and how” we cache. Also, it is highly desirable to take care of caching in the early stages of system design. Since contrary to popular belief that “Kesh can always be added at the last moment,” this is often not the case. Without thinking about caching at the initial stage, then adding it and testing it can be extremely difficult. Let's try to find answers to the questions asked above, but just want to clarify that most of the thoughts below will be given regarding the general-purpose cache inside the .Net application stored in RAM, i.e. This is not a processor cache or browser cache. In addition, within the framework of a single article, it will be extremely difficult to explain in detail and intelligibly all possible caching theory, so I will give basic recommendations and tips that I hope will help to avoid common mistakes.

What? How? Where?

When we decide that we will cache, we need to understand that the cache is not free and will not be useful for all data types. Regardless of the chosen cache strategy and cache implementation, in one way or another, we will experience problems of data obsolescence and the amount of memory they occupy. And since the caching adds complexity during the creation / testing / maintenance of our application, it is definitely not necessary to cache everything. We need a balanced approach to the choice of those places where adding a cache, in addition to problems, will also bring benefits.
')
You should not cache data that is only useful in real time. For example, currency quotations in the trading system, which are obsolete for 30 seconds, can weighfully influence the correctness of the decisions made. But if our system has a summary statistics on the company's sales for the last week on the home page, then these data are perfect for caching.

There is also no sense to cache something that can be obtained so quickly from a data source or calculated on the fly. But if the data source is far and very slow, and the specificity of the data allows them to be used with some kind of delay, then these are good candidates for caching.

As another example, let's consider the data that are calculated on the fly, require a lot of CPU time, but the result is quite large in volume. When trying to cache this data, we can very quickly fill up all the available memory, putting only a few results into it. Under these conditions, it will be difficult to achieve efficient operation of the cache, since It is likely that literally several new items will result in the mashing of the newly calculated values and the percentage of successful hits (finding the desired item in the cache) will be extremely low. In this case, you should rather think about accelerating the calculation algorithm, rather than caching the result. When choosing the right candidates to save to the cache, always think about the effectiveness of the cache. Those. we must strive to ensure that the data we select, when placed in a cache, is extracted from it many times before they are pushed out of it by newer data or become obsolete.

Thinking about how to properly store our data in the cache, we should pay attention to the following points:

Timely data obsolescence
Correct order of deleting elements when the maximum available memory is reached
Data coherence (if the cache is distributed, then the same object may differ in different instances of the cache and thus lead to negative consequences)

Some problems related to the question “how to store” can be so complicated that in order to solve them separate projects are created and specialists with relevant experience are identified. I hope that this does not apply to your project, because, as I said, this article will not address the depths related to cache issues.

So, having received answers to the “What” and “How” questions, it may turn out that our answer to the question of where the Dictionary <T, T> will be created in our application. If so, then we are very lucky. But, as a rule, everything is a bit more complicated and we still have to write a full-fledged implementation of the cache or choose one of the ready-made solutions.
Note: there is no consensus on whether a dictionary-based implementation will be considered a cache or not. Personally, I prefer to consider this as a special case, which stands apart aside. At the same time, I even met the term describing such a cache as “static”, i.e. cache in which data is not deleted and are considered infinitely relevant.

Handwritten cache

I will not tell you how to write your cache. I, on the contrary, will try to protect you from a false impression that it is easy and simple to do this. Except for the case when the Dictionary-like implementation perfectly covers our needs, writing a full-fledged cache is quite a challenge.

The first difficulty that comes to mind is to work in a multithreaded environment. After all, if we use the cache, then for sure the system is not small and working in one thread will be extremely inefficient. Those. all data write / read / invalidate operations must be thread safe. Without extensive experience working with threads, we are guaranteed deadlocks or slow work due to a not optimally chosen thread synchronization approach.

If you think about how the data becomes obsolete, then a number of far from the simplest scenarios that our cache should support may come to mind. Taking the Cartesian product of all possible options, we get the set of states in which our system can be. This will be sufficient reason to make the task of debugging and testing the samopny cache just very heavy.

Many examples that implement caching that can be found on the Internet use the Weak references mechanism. There may be an irrepressible desire to apply them in their implementation. But without sufficient experience in the relevant field, we are not weakly increasing the chances of getting a code, which is not enough that the majority of the team will not understand, so it is still unlikely to work after the first 5 times of rewriting.
I think that I could continue this list for a long time, but I hope that even the reasons already listed are enough for you to lose the desire to test yourself for strength and perseverance. If not, then I can only wish "Strength, intelligence and patience (C)."

Now, realizing that your cache is far from being so simple, I propose to move on to the final part of the article, which will tell you that there is already ready in the .NET Framework to simplify our lives.

Life before .Net Framework 4.0

Caching has always been an integral part of ASP.NET web applications and the .Net Framework offered excellent tools for ASP.NET applications. Therefore, historically, all the classes for working with the cache were located in the System.Web assembly. When the cache was required outside the web (for example, Windows service), many developers sacrificed the beauty of their solutions and added a link to the System.Web assembly. This made it possible to take advantage of the cache, but it pulled a huge amount of unnecessary code. This problem remained unsolved for a long time, but fortunately, in the .NET Framework 4.0 it was returned to it. As a result, we got the System.Runtime.Caching namespace , which, among other things, contains the abstract ObjectCache class and its only implementation, MemoryCache . It is with them that I would like to introduce you.

Objectcache

ObjectCache is an abstract class that allows us to standardize approaches when working with different cache implementations. Having the same interface (API) to work with the cache, we will not have to study in detail each new cache implementation. After all, implementations from the user's point of view should look the same and behave according to well-known expectations expressed in the form of an API of this class. The main methods, properties and their purpose are given below.

Properties :

DefaultCacheCapabilities — bit flags (enum, with the Flags attribute) that determine what capabilities a particular implementation provides (removing an element at a specific time, supporting regions, having a callback mechanism, etc.)
Name - the name of the cache instance; in the case of using MemoryCache, it can be useful if we want to save data in isolated memory locations and create more than one cache instance (this feature is called “regions” and is not implemented in MemoryCache)
this - index for accessing items by key

Methods :

Add (...) , AddOrGetExisting (...) , Set (...) - add data to the cache
Get (...) , GetCacheItem (...) , GetValues (...) - return data from the cache
GetCount () - returns the current cache size
Contains (...) - checks the existence of an element by key
Remove (...) - removes item by key

I hope now, in general terms, it is already clear how you can create your own implementation of the cache. But there are a couple of points that I would like to consider in more detail, namely the methods of adding data to the cache. I suggest to make it, on the example of methods in already existing implementation of cache - the class MemoryCache.

Memorycache

As the name implies, MemoryCache is an implementation that stores data in RAM. Currently it is the only class in the .Net Framework that inherits ObjectCache, but there are Nuget packages that other implementations offer (for example, the SqlCache Nuget package can be used to store data in the Sql server). Below will be considered only those methods whose work may not be immediately apparent. As a demonstration of the methods, listings of unit tests written using xUnit will be given.

Method AddOrGetExisting (...)

Adds an element only if the key has not yet been used, otherwise it ignores the new value and returns the existing value.

Method Add (...)

It is a wrapper on AddOrGetExisting (...) and works almost identically, with the only difference that it returns True if the element is successfully added, and False if the key already exists (that is, adding a value does not occur).

Set (...) method

Adds a new or replaces an existing item without checking existing keys. Those. Unlike the Add and AddOrGetExisting methods, the passed value to the Set method always appears in the cache.

Regions in MemoryCache

All methods of adding data to MemoryCache have overloads that take the region parameter ( example1 , example2 and example3 ). But if you try to pass any non-NULL value to it, we get a NotSupportedException. Someone may say that this violates the Liskov substitution principle (so L id), but it is not. Indeed, before taking advantage of the regions opportunity, the client code must make sure that they are implemented in a specific implementation. This is done by checking the DefaultCacheCapabilities property for the presence of the corresponding bit flag (DefaultCacheCapabilities.CacheRegions), and it is not set for MemoryCache.

CacheItemPolicy

To demonstrate all the methods of adding data, the simplest versions of overloads were chosen, taking the key, the value, and the time until which the value will be considered relevant. But all of them also have a version that accepts a parameter of type CacheItemPolicy. It is thanks to this parameter that we have quite rich possibilities for managing the lifetime of an item in the cache, which makes the implementation of MemoryCache extremely useful.

Most of the properties of this type look clear, but in practice we will encounter many unexpected surprises. Strictly speaking, many of them will not be in the CacheItemPolicy class itself, but in the logic of the MemoryCache methods that accept this type. But since these types are often used together, I propose to consider them together.

AbsoluteExpiration and SlidingExpiration properties

From the name it is clear what these properties are responsible for. But the curious can ask the following question: "How will the cache behave if you simultaneously specify values for both properties?" Someone may assume that AbsoluteExpiration has a higher priority and the object will be deleted at the time of AbsoluteExpiration, even if it is regularly requested from the cache (more often than SlidingExpiration). Someone, on the contrary, will assume that the value of SlidingExpiration will allow the object to survive AbsoluteExpiration. But the developers of Microsoft, they considered that there is no true correct answer and acted differently - they generate an ArgumentException at the stage of adding an item to the cache. Therefore, we can choose only one time (time-dependent) invalidation strategy for each element.

The second surprise awaits us if we decide to write tests for functionality using the cache. Surely, to speed up the test run, we want to set a sufficiently small value for SlidingExpiration (less than 1 second). In this case, our tests will behave unstable and will often fall. This is all due to the fact that in order to optimize the operation of the cache, when an element is read (the Get method and its derivatives), the new Expires value will be set only if it differs from the old one, by at least one second. I could not find confirmation of this in the documentation, but I can verify this by decompiling the MemoryCache class and studying the UpdateSlidingExp (...) method of the internal MemoryCacheEntry class.

Priority property

When I saw this property, I expected that it could be “low / medium / high” to set the order for deleting items from the cache when the maximum volume was reached. But it can have only 2 values: CacheItemPriority.Default or CacheItemPriority.NotRemovable.
MSDN says that setting the CacheItemPriority.NotRemovable value will cause the item to never be removed from the cache. Personally, I perceived this fact as the fact that, by adding all the elements with such a priority, we will get a Dictionary-like implementation, but this is far from the case. Elements will still be deleted if they “rotten” (AbsoluteExpiration comes or SlidingExpiration passes), but unlike the default mode, they will not be deleted from memory when the limit on the amount of memory is reached. By the way, the limit can be set via the CacheMemoryLimit property (in bytes) or through the PhysicalMemoryLimit property (percentage of the total memory in the system).

RemovedCallback and UpdateCallback

Another surprise. Both properties are accepted by delegates who will be called, both in the case of an update, and in the case of deleting an item from the cache.

If you think about it, an update is essentially a delete operation, immediately followed by an operation to add a new value. This explains why RemovedCallback works when an item is updated. And the fact that UpdateCallback works when you delete is just a fact from MSDN.

The difference in properties is that RemovedCallback should be called after, and UpdateCallback should be called before the item is actually removed from the cache. The delegates stored in these properties accept a parameter that contains a link to the cache, a link to the item to be deleted, and the reason for deleting the item.

Another gift is stored in the MemoryCache implementation. In this class there is a bit strange logic of validation of the parameter passed by CacheItemPolicy. First, it checks that both delegates are not set at the same time, otherwise we will get an ArgumentException at the stage of adding an item to the cache.

And everything would be fine if for the UpdateCallback property to work correctly, it would be enough to make sure that the RemovedCallback property does not have a value. But in fact, we always get an ArgumentException at the stage of adding an element when setting a non-empty value in UpdateCallback.

As a result, the only valid property for setting a delegate that signals changes in the cache is the RemovedCallback (valid only for the MemoryCache implementation).

ChangeMonitors property

ChangeMonitor, , .

, ChangeMonitor, .Net Framework :

CacheEntryChangeMonitor –
FileChangeMonitor (HostFileChangeMonitor) –
SqlChangeMonitor – ( ).

, CacheItemPolicy . .

Conclusion

MemoryCache, , - «» . , .

Source: https://habr.com/ru/post/258247/

All Articles