Efficient caching. From theory to practice

As a rule, articles about caching begin for health, and end with LRU cache. Let's try to reverse this trend? To begin with, what LRU is bad, and finish for health. I hope.

Regardless of whether you build a highload service for millions of visitors or design a mobile application, write an operating system or DBMS - the key element affecting the final cost of the system and the responsiveness of the interface / service is a cache.

Asking for interviews what caching algorithms you know - as a rule you hear in response, mmm ... LRU Cache ... But if you ask about sorting algorithms, the probability of hearing something besides "Bubble" is much higher. A person is willing to spend several days searching for the optimal library of image resizing or a web framework, not realizing that by implementing an effective cache, he can in principle take any library with similar characteristics, since the cache minimizes access to it, smoothing the difference in speed.
')
For Relap.io , as for the highload service, caching is especially important. For example, yesterday we showed recommendations on various sites 789301033 times. Therefore, we have a densely smeared cache of everything: recommendations, pictures, advertising, and so on.

Not all caches are equally useful.

A good example of LRU Cache.

At the competitions of algorithms it is usually not taken. No one wants to have anything to do with a loser. It is difficult to come up with a more inefficient algorithm. The only algorithm that LRU Cache wins in efficiency is probably just a queue, for example, FIFO. However, LRU is embedded everywhere as a default and, unfortunately, often the only algorithm, since it is simple to implement.

Would you like to use a site, application or operating system that slows down, is inefficient and eats resources like not to itself, but is it written in an easy-to-implement language, for example, conditional basic? If not, welcome under cat.

I love the Pareto rule. On a stat meaningful sample, it can be applied to absolutely everything.

Let's try:

20% of effort brings 80% of the result,
20% of goods bring 80% of profit
80% of urls account for 80% of views,
20% of the code implements 80% of the functionality.

This is a rather interesting pattern valid for large data arrays. It would seem, where does Pareto?

<Lyrical digression>
A few years ago we wrote an application for android for Surfingbird. We switched to RX Java. Asynchronized all that is possible. Smoothed all the transitions with animation, nevertheless, one unsolved problem remained, these are constant reloads of pictures. And your application is literally teeming with pictures, and they constantly rotate and change, and you do not have enough memory to place them.

I admit, at first I thought that the whole thing was in an imagloader. It is enough to choose effective and voila. I reviewed everything. Picasso, Facebook fresco, UIL I do not remember all the names. But the problem remained. Pictures were loaded somewhere a little faster, somewhere slightly smoother, but they were loaded. Then I sat down and wrote mine . Plain. Clean. Light. And it did not help. Stupid imagloader continued to constantly pull pictures unnerving the user and could not separate the wheat from the chaff. Then I remembered the Pareto rule.
</ Lyrical digression>

If we assume that 20% of the pictures - are shown 80% of the time - everything falls into place. The only thing left to understand is exactly which pictures should be stored.

How does LRU cache work?

Let's look at a spherical application in a vacuum. Let it be an instant messenger, it's easiest to imagine.

screenshot_from_telegramm.jpg

If you look closely at the screenshot, you can see that user messages are accompanied by avatars, and in the body of messages - there are pictures. Your task is to make the interface as smooth as possible. Let's take another look at the screenshot above. We see 2 duplicate autarkies in the dialogue, and then user 1 sent a large picture.

Avatar 1 - 100 by 100 pixels came, we recorded 100 * 100 * 4 bytes in the cache.
An avatar came 2 - 100 by 100 pixels, we recorded 100 * 100 * 4 bytes in the cache.
Avatar has come 1 - we raised it in the queue up.

So far so good.

The picture came 1024 by 768 pixels, we recorded 1024 * 768 * 4 bytes in the cache - and BAM! Our beautiful avatars completely knocked out of the cache. Now there is a solemn roll of a picture that had to be shown once and there was no need to cache it.

How to win?

If you look at the AQuery library, for example, the developer suggests dividing the cache into several piles. A separate heap for small avatars, a separate heap for large pictures. Already bread is the way, but this solution is not universal, it requires programming and attention, but I want everything at once. Since all the interesting has already been invented before us - it's time to look at what other caching algorithms exist.

Wiki Article

Forgive me for a little bit of a bit of dryness here, and I will describe very brief truths.

LRU - unused for the longest flies from the cache.
MRU - the last used crashes out of the cache (specific case, we save the old ones).
LFU - least often used crashes out of the cache.

This is the base. You may be scared that the cost of computing grows with the complexity of the algorithms, but not critical. Try to compare the time of the lukap by the keys in memory with the rendering time of the picture 1024 by 768. Namely, this will happen if the algorithm “missed”.

SNLRU (segmented LRU) - we get some “boxes” with LRU. First, we put in the first box, when we repeat the request, we shift it to the second of the second - to the third.

If you call the boxes - it will be clearer:

Cold - the first box
Warm - the second,
Hot is the third.

This is already a good algorithm, it is used in the depths of freebsd, if I'm not mistaken. At least I came across him in this context.

Mid point LRU - segmented LRU in which there are only two boxes.

MQ - segmented LRU in which we memorize. The position from which the element flew out is remembered - and upon repeated request - returns to where it was, if it did not take off from the queue of memorized positions. In essence, the cache warms up faster in the event of cyclic rotation of elements (poops can become popular). Looks like a pretty weird use.

ARC, GCLOCK - and other more complex algorithms will have to be taken out of the brackets for a while. Not that they are bad or uninteresting, the same ARC is used (or rather, probably used, judging by this painful article: www.varlena.com/GeneralBits/96.php ) in postgreSQL. I can not refrain from a small quote:

Many systems are often not handled by LRU. For example, it can be used again. The cache is not yet helpful until it is re-populated with more commonly used pages.

2Q - or two turns, it is remarkable that, while maintaining the simplicity of implementation, it adapts well. The cache is divided into three parts, as in a segmented LRU, but with a more complex strategy:

The first part of In is the FIFO incoming cache in which new items are placed.
The second part of the Out is the outgoing FIFO cache, into which elements displaced from the In box are moved.
The third part is the Hot LRU cache for items requested from Out.

Cache displacement strategy:

the elements requested from In do not move anywhere. Items pushed out of In - move to Out.
elements requested from Out - go to heaven, in the Main box. Forced out of the same Out (not used) - immediately go to hell (null).

Link to canonical description .

Firstly - it is beautiful. For the Main box, for example, we are doing 20% (remember Pareto?) It is here that our avatars will be accumulated. But Out - you need to do more, 60 percent. Since this is a “sump”.

What is the beauty of In - new elements quietly descend along the FIFO pipe from In to Out, not bouncing and not moving anywhere as you query. But if they asked again (for example, the user scrolled up) and, the picture managed to go from In to Out - here it is the winner picture. The cache at the level of architecture corrects some typical correlations that are present in ordinary life. And after the introduction, permanent reboots disappeared in conditions of limited memory size. Pareto worked. But we will return to Pareto more than once.

First, let's move on to the nuances. When you read about the three boxes, there is an involuntary temptation just so stupidly and to make three linked lists and drive elements through them. But this is an inefficient way, and somehow not Jedi. In fact, we only need to know which box the key is in, and the values themselves can roll in this ugly heap at this time. Let's move on to programming.

Implementation on java, bam!

package com.squareup.picasso; import java.util.ArrayList; import java.util.HashMap; import java.util.Iterator; import java.util.LinkedHashSet; import java.util.List; import java.util.Map; /** * 2Q: A Low Overhead High Performance Buffer Management Replacement Algorith * Based on description: http://www.vldb.org/conf/1994/P439.PDF * Created by recoilme on 22/08/15. * email: vadim-kulibaba@yandex.ru */ public class TwoQCache<K, V> { /** * Primary container */ final HashMap<K, V> map; /** * Sets for 2Q algorithm */ private final LinkedHashSet<K> mapIn, mapOut, mapHot; protected float quarter = .25f; /** * Size of this cache in units. Not necessarily the number of elements. */ //private int size; private int sizeIn; private int sizeOut; private int sizeHot; private int maxSizeIn; private int maxSizeOut; private int maxSizeHot; private int putCount; private int createCount; private int evictionCount; private int hitCount; private int missCount; /** * Two queues cache * * @param maxSize for caches that do not override {@link #sizeOf}, this is * this is the maximum sum of the sizes of the entries in this cache. */ public TwoQCache(int maxSize) { if (maxSize <= 0) { throw new IllegalArgumentException("maxSize <= 0"); } calcMaxSizes(maxSize); map = new HashMap<K, V>(0, 0.75f); mapIn = new LinkedHashSet<K>(); mapOut = new LinkedHashSet<K>(); mapHot = new LinkedHashSet<K>(); } /** * Sets sizes: * mapIn ~ 25% // 1st lvl - store for input keys, FIFO * mapOut ~ 50% // 2nd lvl - store for keys goes from input to output, FIFO * mapHot ~ 25% // hot lvl - store for keys goes from output to hot, LRU * * @param maxSize if mapIn/mapOut sizes == 0, works like LRU for mapHot */ private void calcMaxSizes(int maxSize) { if (maxSize <= 0) { throw new IllegalArgumentException("maxSize <= 0"); } synchronized (this) { //sizes maxSizeIn = (int) (maxSize * quarter); maxSizeOut = maxSizeIn * 2; maxSizeHot = maxSize - maxSizeOut - maxSizeIn; } } /** * Sets the size of the cache. * * @param maxSize The new maximum size. */ public void resize(int maxSize) { calcMaxSizes(maxSize); synchronized (this) { HashMap<K, V> copy = new HashMap<K, V>(map); evictAll(); Iterator<K> it = copy.keySet().iterator(); while (it.hasNext()) { K key = it.next(); put(key, copy.get(key)); } } } /** * Returns the value for {@code key} if it exists in the cache or can be * created by {@code #create}. If a value was returned, it is moved to the * head of the queue. This returns null if a value is not cached and cannot * be created. */ public V get(K key) { if (key == null) { throw new NullPointerException("key == null"); } V mapValue; synchronized (this) { mapValue = map.get(key); if (mapValue != null) { hitCount++; if (mapHot.contains(key)) { // add & trim (LRU) mapHot.remove(key); mapHot.add(key); } else { if (mapOut.contains(key)) { mapHot.add(key); sizeHot += safeSizeOf(key, mapValue); trimMapHot(); sizeOut -= safeSizeOf(key, mapValue); mapOut.remove(key); } } return mapValue; } missCount++; } /* * Attempt to create a value. This may take a long time, and the map * may be different when create() returns. If a conflicting value was * added to the map while create() was working, we leave that value in * the map and release the created value. */ V createdValue = create(key); if (createdValue == null) { return null; } synchronized (this) { createCount++; if (!map.containsKey(key)) { // There was no conflict, create return put(key, createdValue); } else { return map.get(key); } } } /** * Caches {@code value} for {@code key}. * * @return the previous value mapped by {@code key}. */ public V put(K key, V value) { if (key == null || value == null) { throw new NullPointerException("key == null || value == null"); } if (map.containsKey(key)) { // if already have - replace it. // Cache size may be overheaded at this moment synchronized (this) { V oldValue = map.get(key); if (mapIn.contains(key)) { sizeIn -= safeSizeOf(key, oldValue); sizeIn += safeSizeOf(key, value); } if (mapOut.contains(key)) { sizeOut -= safeSizeOf(key, oldValue); sizeOut += safeSizeOf(key, value); } if (mapHot.contains(key)) { sizeHot -= safeSizeOf(key, oldValue); sizeHot += safeSizeOf(key, value); } } return map.put(key, value); } V result; synchronized (this) { putCount++; final int sizeOfValue = safeSizeOf(key, value); //if there are free page slots then put value into a free page slot boolean hasFreeSlot = add2slot(key, safeSizeOf(key, value)); if (hasFreeSlot) { // add 2 free slot & exit map.put(key, value); result = value; } else { // no free slot, go to trim mapIn/mapOut if (trimMapIn(sizeOfValue)) { //put X into the reclaimed page slot map.put(key, value); result = value; } else { map.put(key, value); mapHot.add(key); sizeHot += safeSizeOf(key, value); trimMapHot(); result = value; } } } return result; } /** * Remove items by LRU from mapHot */ public void trimMapHot() { while (true) { K key; V value; synchronized (this) { if (sizeHot < 0 || (mapHot.isEmpty() && sizeHot != 0)) { throw new IllegalStateException(getClass().getName() + ".sizeOf() is reporting inconsistent results!"); } if (sizeHot <= maxSizeHot || mapHot.isEmpty()) { break; } // we add new item before, so next return first (LRU) item key = mapHot.iterator().next(); mapHot.remove(key); value = map.get(key); sizeHot -= safeSizeOf(key, value); map.remove(key); evictionCount++; } entryRemoved(true, key, value, null); } } /** * Remove items by FIFO from mapIn & mapOut * * @param sizeOfValue size of * @return boolean is trim */ private boolean trimMapIn(final int sizeOfValue) { boolean result = false; if (maxSizeIn < sizeOfValue) { return result; } else { while (mapIn.iterator().hasNext()) { K keyIn; V valueIn; if (!mapIn.iterator().hasNext()) { System.out.print("err"); } keyIn = mapIn.iterator().next(); valueIn = map.get(keyIn); if ((sizeIn + sizeOfValue) <= maxSizeIn || mapIn.isEmpty()) { //put X into the reclaimed page slot if (keyIn == null) { System.out.print("err"); } mapIn.add(keyIn); sizeIn += sizeOfValue; result = true; break; } //page out the tail of mapIn, call it Y mapIn.remove(keyIn); final int removedItemSize = safeSizeOf(keyIn, valueIn); sizeIn -= removedItemSize; // add identifier of Y to the head of mapOut while (mapOut.iterator().hasNext()) { K keyOut; V valueOut; if ((sizeOut + removedItemSize) <= maxSizeOut || mapOut.isEmpty()) { // put Y into the reclaimed page slot mapOut.add(keyIn); sizeOut += removedItemSize; break; } //remove identifier of Z from the tail of mapOut keyOut = mapOut.iterator().next(); mapOut.remove(keyOut); valueOut = map.get(keyOut); sizeOut -= safeSizeOf(keyOut, valueOut); } } } return result; } /** * Check for free slot in any container and add if exists * * @param key key * @param sizeOfValue size * @return true if key added */ private boolean add2slot(final K key, final int sizeOfValue) { boolean hasFreeSlot = false; if (!hasFreeSlot && maxSizeIn >= sizeIn + sizeOfValue) { mapIn.add(key); sizeIn += sizeOfValue; hasFreeSlot = true; } if (!hasFreeSlot && maxSizeOut >= sizeOut + sizeOfValue) { mapOut.add(key); sizeOut += sizeOfValue; hasFreeSlot = true; } if (!hasFreeSlot && maxSizeHot >= sizeHot + sizeOfValue) { mapHot.add(key); sizeHot += sizeOfValue; hasFreeSlot = true; } return hasFreeSlot; } /** * Removes the entry for {@code key} if it exists. * * @return the previous value mapped by {@code key}. */ public V remove(K key) { if (key == null) { throw new NullPointerException("key == null"); } V previous; synchronized (this) { previous = map.remove(key); if (previous != null) { if (mapIn.contains(key)) { sizeIn -= safeSizeOf(key, previous); mapIn.remove(key); } if (mapOut.contains(key)) { sizeOut -= safeSizeOf(key, previous); mapOut.remove(key); } if (mapHot.contains(key)) { sizeHot -= safeSizeOf(key, previous); mapHot.remove(key); } } } if (previous != null) { entryRemoved(false, key, previous, null); } return previous; } /** * Called for entries that have been evicted or removed. This method is * invoked when a value is evicted to make space, removed by a call to * {@link #remove}, or replaced by a call to {@link #put}. The default * implementation does nothing. * <p> * <p>The method is called without synchronization: other threads may * access the cache while this method is executing. * * @param evicted true if the entry is being removed to make space, false * if the removal was caused by a {@link #put} or {@link #remove}. * @param newValue the new value for {@code key}, if it exists. If non-null, * this removal was caused by a {@link #put}. Otherwise it was caused by * an eviction or a {@link #remove}. */ protected void entryRemoved(boolean evicted, K key, V oldValue, V newValue) { } /** * Called after a cache miss to compute a value for the corresponding key. * Returns the computed value or null if no value can be computed. The * default implementation returns null. * <p> * <p>The method is called without synchronization: other threads may * access the cache while this method is executing. * <p> * <p>If a value for {@code key} exists in the cache when this method * returns, the created value will be released with {@link #entryRemoved} * and discarded. This can occur when multiple threads request the same key * at the same time (causing multiple values to be created), or when one * thread calls {@link #put} while another is creating a value for the same * key. */ protected V create(K key) { return null; } private int safeSizeOf(K key, V value) { int result = sizeOf(key, value); if (result < 0) { throw new IllegalStateException("Negative size: " + key + "=" + value); } return result; } /** * Returns the size of the entry for {@code key} and {@code value} in * user-defined units. The default implementation returns 1 so that size * is the number of entries and max size is the maximum number of entries. * <p> * <p>An entry's size must not change while it is in the cache. */ protected int sizeOf(K key, V value) { return 1; } /** * Clear the cache, calling {@link #entryRemoved} on each removed entry. */ public synchronized final void evictAll() { Iterator<K> it = map.keySet().iterator(); while (it.hasNext()) { K key = it.next(); it.remove(); remove(key); } mapIn.clear(); mapOut.clear(); mapHot.clear(); sizeIn = 0; sizeOut = 0; sizeHot = 0; } /** * For caches that do not override {@link #sizeOf}, this returns the number * of entries in the cache. For all other caches, this returns the sum of * the sizes of the entries in this cache. */ public synchronized final int size() { return sizeIn + sizeOut + sizeHot; } /** * For caches that do not override {@link #sizeOf}, this returns the maximum * number of entries in the cache. For all other caches, this returns the * maximum sum of the sizes of the entries in this cache. */ public synchronized final int maxSize() { return maxSizeIn + maxSizeOut + maxSizeHot; } /** * Returns the number of times {@link #get} returned a value that was * already present in the cache. */ public synchronized final int hitCount() { return hitCount; } /** * Returns the number of times {@link #get} returned null or required a new * value to be created. */ public synchronized final int missCount() { return missCount; } /** * Returns the number of times {@link #create(Object)} returned a value. */ public synchronized final int createCount() { return createCount; } /** * Returns the number of times {@link #put} was called. */ public synchronized final int putCount() { return putCount; } /** * Returns the number of values that have been evicted. */ public synchronized final int evictionCount() { return evictionCount; } /** * Returns a copy of the current contents of the cache, ordered from least * recently accessed to most recently accessed. */ public synchronized final Map<K, V> snapshot() { return new HashMap<K, V>(map); } @Override public synchronized final String toString() { int accesses = hitCount + missCount; int hitPercent = accesses != 0 ? (100 * hitCount / accesses) : 0; return String.format("Cache[size=%d,maxSize=%d,hits=%d,misses=%d,hitRate=%d%%," + "]", size(), maxSize(), hitCount, missCount, hitPercent) + "\n map:" + map.toString(); } public List<Object> getMapHotSnapshot() { List<Object> result = new ArrayList<Object>(); Iterator<K> it = mapHot.iterator(); while (it.hasNext()) { K key = it.next(); result.add(key); result.add(map.get(key)); } return result; } }

Pay attention to the containers:

  /** Primary container */ private final HashMap<K,V> map; /** Sets for 2Q algorithm */ private final LinkedHashSet<K> mapIn, mapOut, mapHot;

The management of the "handful" is implemented on the super-fast and memory-efficient LinkedHashSet, it does not matter to us, it matters only in which "handful" which key is currently located. This is the key to speed.

More practice. Suppose we want to tie it to an image loader - a pull request to a Picasso: github.com/square/picasso/pull/1134
But in general it is not necessary. Normal - allow you to connect an arbitrary caching algorithm - just copy the class and redefine the cache (glide knew how, picasso, starting with some version)

I do not remember the exact numbers on the hit rate in my case. I only remember that LRU had a hit rate of more than 70% but less than 80. A 2Q had just over 80%. But a miracle happened. Because all we need is to cache 20% of the information, which will make up 80% of the traffic. A miracle, by the way, was that in speed 2Q was faster than LRU.

We have in Relap.io , several implementations of caches, for example, mine - github.com/recoilme/2qcache (in general, I’m not a programmer, this is my first program and I hope the only program in this language is its only plus - it’s simple).

Therefore, I recommend to look at the implementation of 2Q on the Pearl from our lead developer:

Implementation on the pearl, bam: github.com/grinya007/2q

So, it does not matter what you write: a website, a high-speed service, a mobile application or an operating system, having implemented a clever caching algorithm once, you can use it everywhere, saving the resources and nerves of the user.

Source: https://habr.com/ru/post/306252/

All Articles

Efficient caching. From theory to practice

Not all caches are equally useful.

How does LRU cache work?

How to win?

More articles: