Some time ago, I faced the task of caching requests to a large database on disk in a high-load multi-threaded application (C ++). The application itself was intended for deployment in the cloud and the disk would obviously become the bottleneck. The base at the same time was a huge graph, along which many threads were crawling at the same time. Thus, the cache should also be multi-threaded.
The idea of using external applications, such as memcached, I dropped immediately - it would make an inevitable additional lag to each transition along the edge of the graph. There was a question about implementation inside the application.
Documentation helpfully prompted the existence of LRU cache implementation in the
TBB library I use. Unfortunately, this implementation at that time was in the preview state (where it is still).
I had a choice - to rely on the insufficiently tested implementation, where repairing any bug would be a great adventure, or writing my own. Also, a cursory study of the available publications of caching algorithms led me to believe that perhaps LRU is not the most efficient scheme. For high-load applications, even a few percent of the additional efficiency is in itself bread. After going through all the ones I found, I settled on the
CART algorithm.
')
This scheme combines several useful advantages:
- Scan protection. If you go through the entire database once - it will not affect the performance of the cache. Such a situation may arise, for example, when using analytical tools.
- Protection against double circulation. It often happens that the same database item is requested twice with a short interval between requests, and then it is not accessed at all. This can lead to a significant drop in the effectiveness of circuits that do not have protection against such behavior, since any element requested in this way receives priority in the circuit.
- Each hit in the cache does not require any significant operations or maintenance of internal structures (which is a significant disadvantage of LRU). This is very important for multithreaded implementation, since it does not require blocking with each access.
Let's take a look at how effective the scheme is in practice. Since I did not have enough time to implement and debug other schemes, I will compare the effectiveness of all with the same implementation of LRU from the TBB library.
First, we test the effectiveness of both schemes on a randomly generated sequence of numbers (0 ... 10,000) for three cache sizes (100, 500, and 1000):
Less is better.
Random numbers, cache size 100
CART result: 0.99, missed 994950/1005000
LRU result: 0.990057, missed 995007/1005000
Random numbers, cache size 500
CART result: 0.949862, missed 954611/1005000
LRU result: 0.95016, missed 954911/1005000
Random numbers, cache size 1000
CART result: 0.90002, missed 904520/1005000
LRU result: 0.900309, missed 904811/1005000
CART is slightly ahead of LRU in efficiency, but, to be honest, completely random access is not a good test. Moreover, in a real application with this type of access, the effectiveness of any cache will be low.
Therefore, I made the second test more like a practical application. Here the numbers are taken from 6 baskets, each of which has an increasing number of elements, which reduces the likelihood of choosing a particular number. Thus, numbers from 0 to 150 in total have the same probability of being selected as numbers from 5,000 to 10,000. This tactic is similar to the sampling pattern, for example, from a user database, where often incoming users often pull the base.
Bins draw, cache size 100
CART result: 0.920258, missed 924859/1005000
LRU result: 0.965795, missed 970624/1005000
Bins draw, cache size 500
CART result: 0.721484, missed 725091/1005000
LRU result: 0.84106, missed 845265/1005000
Bins draw, cache size 1000
CART result: 0.581132, missed 584038/1005000
LRU result: 0.71412, missed 717691/1005000
In this case, CART already shows significantly better results than LRU. What, in fact, we wanted to achieve. I suggest everyone to download an example and run it myself.
Find this implementation
here . It was tested in a real application under load, which, incidentally, does not guarantee 100% of the absence of possible bugs. Use, so to speak, at your own risk. To integrate into your projects you need only the files in the Include folder. You can get rid of the HashMurmur3 dependency by replacing it with any hash function that suits you.
In dependencies, this implementation has only TBB, but most of those who write multi-threaded C ++ applications use it as such. As a bonus, a good implementation of a random number generator is attached. Also, the original implementation uses EASTL instead of STL, but I got rid of this dependency before posting the public version.
Sample sources are compiled for VS2010, but the Linux port should not cause problems. Since I do not have Linux at hand now, I would be grateful to the community if someone would spend a little time on this and make this port.
PS Ways to use a good caching scheme can come up with a lot. For me, for example, this implementation is also used in accessing Memory Mapped Files, where the file is not entirely mapped, which can lead to a huge consumption of virtual memory on large files, but a limited number of small pieces of 16 MB. The cache then controls which pieces to push out of memory. You can also write a few hundred lines to write your memcached, if such a desire arises.