📜 ⬆️ ⬇️

Implementing a cache with a limit on the number of elements in Python - solutions: simple and more complicated

Task statement


Suppose that we have a need to have some kind of service that would give us any information upon request, and give it as soon as possible. What does any normal person do for this? Improves caching of the most frequently requested data. In this case, if at least a little to think about the future, the size of the cache should be limited.
For ease of implementation, in the case of Python, we will make a restriction on the number of items in the cache ( here it is assumed that the data is more or less uniform in size and also takes into account the specifics that determining the amount of memory actually occupied by any Python object is a very trivial task for anyone interested , let him come here ), and in order for the cache to contain the most frequently used information - let's build it according to the principle of recently recently used , i.e. the more a piece of information was requested a long time ago, the greater the chance that he would “fly out” of the cache.

I will talk about two solutions (simpler and more complicated) in this article.

Philosophical retreat


I noticed that often the code written on Python does not shine with optimizations on memory consumption or speed ( here you can casually note that this is not only the fault of the programmers-users of the language, there is simply no good toolkit, or at least I don’t know about this ( yes, I know about cProfile, but it doesn’t always help, for example, it doesn’t have an attach-detach; it’s not a problem for the faint-hearted to look for memory leaks, if pympler finishes ...), if someone tells you, I will ). Basically, this is even correct, as Python is usually used when the program execution time or memory consumption is not critical, but the time of writing the code and the simplicity of its further support are critical, so the obvious, even less economical solution will be more convenient.
Though the aforesaid does not mean at all that any code on Python needs to be written “anyhow”. In addition, sometimes performance and memory consumption can become critical, even if the code runs on a server machine with a good processor and tons of memory.
Just such a case (lack of CPU time) and motivate me to research this issue and replace one caching class with another.
')

A simple solution


It is often said that “the simplest solution is the right one,” although in the case of programming this is by no means always the case, is it not? However, if there is a task to ensure the cache, but there are no special requirements for speed (for example, the fact that this cache is used is very slow in itself, and there is no point in investing in a complex implementation), then you can get by with trivial solutions.

Implementation

So, a relatively simple solution:
import weakref class SimpleCache: def __init__(self, cacheSize): self.cache = weakref.WeakValueDictionary() self.cache_LRU = [] self.cacheSize = cacheSize def __getitem__(self, key): # no exception catching - let calling code handle them if any value = self.cache[key] return self.touchCache(key, value) def __setitem__(self, key, value): self.touchCache(key, value) def touchCache(self, key, value): self.cache[key] = value if value in self.cache_LRU: self.cache_LRU.remove(value) self.cache_LRU = self.cache_LRU[-(self.cacheSize-1):] + [ value ] return value 


Using

After creation, you can work with such a cache as you would with a regular dict, but when you read / write an element it will be placed at the end of the queue, and due to the combination of WeakValueDictionary () and the list that stores no more than cacheSize elements, we get the number limit saved data.
Thus, after executing a piece of code
 cache = SimpleCache(2) cache[1] = 'a' cache[2] = 'b' cache[3] = 'c' 

only the 'b' and 'c' entries will remain in the cache (the 'a', as the oldest, will be preempted).

Advantages and disadvantages

The advantage is relative simplicity (about 20 lines of code, after reading the documentation on WeakValueDictionary, the principle of operation becomes clear).
The disadvantage is the speed of work, because each time the cache is touched, the entire list has to be re-created, removing an element from its arbitrary location (in fact, a whole bunch of long operations on working with the list take place there, so “if value in self.cache_LRU” is a linear search in the list, then .remove () - once again a linear search, then another cut is taken - i.e., an almost complete copy of the list is created). In a word, relatively long.

Difficult decision


Now let's think about it - how can we accelerate such a class? Obviously, the main problems we have are precisely with keeping the cache_LRU up to date - as I said before, searching through it, then deleting an item and rebuilding is expensive operations. Here we will come to the rescue by such a thing as a bidirectional linked list - it will provide us with the support of the “last used - at the end”, but WeakValueDictionary () will help with a quick search on this list (dictionary search is much faster than linear searching, since inside dictionaries implement something like binary trees on the key hashes (here Ostap suffered - I can honestly say that I did not look at the sources, so I’m just guessing how the dictionary search works)

Implementation

First we create a class - a list item in which we will wrap the stored data:
  class Element(object): __slots__ = ['prev', 'next', 'value', '__init__', '__weakref__'] def __init__(self, value): self.prev, self.next, self.value = None, None, value 

Here it is necessary to note the use of such a thing as __slots__, which allows you to significantly save memory and a little - performance compared to the same class implementation, but without this attribute (what it is and what it is eaten with - in the official documentation ).
Now we create the cache class (we throw the element class “in order to avoid”):
 import weakref class FastCache: class Element(object): __slots__ = ['prev', 'next', 'value', '__init__', '__weakref__'] def __init__(self, value): self.prev, self.next, self.value = None, None, value def __init__(self, maxCount): self.dict = weakref.WeakValueDictionary() self.head = None self.tail = None self.count = 0 self.maxCount = maxCount def _removeElement(self, element): prev, next = element.prev, element.next if prev: assert prev.next == element prev.next = next elif self.head == element: self.head = next if next: assert next.prev == element next.prev = prev elif self.tail == element: self.tail = prev element.prev, element.next = None, None assert self.count >= 1 self.count -= 1 def _appendElement(self, element): if element is None: return element.prev, element.next = self.tail, None if self.head is None: self.head = element if self.tail is not None: self.tail.next = element self.tail = element self.count += 1 def get(self, key, *arg): element = self.dict.get(key, None) if element: self._removeElement(element) self._appendElement(element) return element.value elif len(*arg): return arg[0] else: raise KeyError("'%s' is not found in the dictionary", str(key)) def __len__(self): return len(self.dict) def __getitem__(self, key): element = self.dict[key] # At this point the element is not None self._removeElement(element) self._appendElement(element) return element.value def __setitem__(self, key, value): try: element = self.dict[key] self._removeElement(element) except KeyError: if self.count == self.maxCount: self._removeElement(self.head) element = FastCache.Element(value) self._appendElement(element) self.dict[key] = element def __del__(self): while self.head: self._removeElement(self.head) 

Here you can pay attention to the following significant / interesting points:

Using

Completely similar to the previous implementation (and the logic is similar inside, only it uses not the simple type of list built into Python, but implements its bidirectional list with chess and poetess ).

Another simple solution with good performance.


Thanks seriyPS for the tip!
Another solution emerged, which looks as simple as the first (if not simpler) and works almost as fast as a complex one. So, the implementation:
 from collections import OrderedDict class ODCache(OrderedDict): def __init__(self, cacheSize): self.cacheSize = cacheSize OrderedDict.__init__(self) def __getitem__(self, key): value = OrderedDict.__getitem__(self, key) return self._touchCache(key, value) def __setitem__(self, key, value): self._touchCache(key, value) def _touchCache(self, key, value): try: OrderedDict.__delitem__(self, key) except KeyError: pass OrderedDict.__setitem__(self, key, value) toDel = len(self) - self.cacheSize if toDel > 0: for k in OrderedDict.keys(self)[:toDel]: OrderedDict.__delitem__(self, k) return value 


Another solution is good performance.


The solution from the comments (according to the tests I used is the leader in speed), thanks to tbd :
 class ListDictBasedCache(object): __slots__ = ['__key2value', '__maxCount', '__weights'] def __init__(self, maxCount): self.__maxCount = maxCount self.__key2value = {}# key->value self.__weights = []# keys ordered in LRU def __updateWeight(self, key): try: self.__weights.remove(key) except ValueError: pass self.__weights.append(key)# add key to end if len(self.__weights) > self.__maxCount: _key = self.__weights.pop(0)# remove first key self.__key2value.pop(_key) def __getitem__(self, key): try: value = self.__key2value[key] self.__updateWeight(key) return value except KeyError: raise KeyError("key %s not found" % key) def __setitem__(self, key, value): self.__key2value[key] = value self.__updateWeight(key) def __str__(self): return str(self.__key2value) 


Comparison


Bare words are one thing, but unbiased numbers are another. Therefore, I do not urge to take my word for it, on the contrary - we will measure these classes!
Here the timeit module built into Python comes to our rescue :
 class StubClass: def __init__(self, something): self.something = something def testCache(cacheClass, cacheSize, repeat): d = cacheClass(cacheSize) for i in xrange(repeat * cacheSize): d[i] = StubClass(i) import random class testCacheReadGen: def __init__(self, cacheClass, cacheSize): d = cacheClass(cacheSize) for i in xrange(cacheSize): d[i] = StubClass(i) self.d = d self.s = cacheSize def __call__(self, repeat): cacheSize, d = self.s, self.d for i in xrange(cacheSize * repeat): tmp = d[random.randint(0, cacheSize-1)] def minMaxAvg(lst): return min(lst), max(lst), 1.0 * sum(lst) / len(lst) import timeit def testAllCaches(classes, cacheSize, repeats): templ = '%s: min %.5f, max %.5f, avg %.5f' genmsg = lambda cls, res: templ % ((cls.__name__.ljust(20),) + tuple(minMaxAvg(res))) for cls in classes: t = timeit.Timer(lambda: testCache(cls, cacheSize, repeats[0])) print genmsg(cls, t.repeat(*repeats[1:])) def testAllCachesRead(classes, cacheSize, repeats): templ = '%s: min %.5f, max %.5f, avg %.5f' genmsg = lambda cls, res: templ % ((cls.__name__.ljust(20),) + tuple(minMaxAvg(res))) for cls in classes: tst = testCacheReadGen(cls, cacheSize) t = timeit.Timer(lambda: tst(repeats[0])) print genmsg(cls, t.repeat(*repeats[1:])) if __name__ == '__main__': print 'write' testAllCaches((SimpleCache, FastCache, ODCache, ListDictBasedCache), 100, (100, 3, 3)) print 'read' testAllCachesRead((SimpleCache, FastCache, ODCache, ListDictBasedCache), 100, (100, 3, 3)) 


Startup results on my machine (Intel Core i5 2540M 2.6GHz, Win7 64-bit, ActivePython 2.7.2 x64-bit):
  write
 SimpleCache: min 9.36119, max 9.49077, avg 9.42536
 FastCache: min 0.39449, max 0.41835, avg 0.40880
 ODCache: min 0.79536, max 0.82727, avg 0.81482
 ListDictBasedCache: min 0.25135, max 0.27334, avg 0.26000
 read
 SimpleCache: min 9.61617, max 9.73143, avg 9.66337
 FastCache: min 0.19294, max 0.21941, avg 0.20552
 ODCache: min 0.22270, max 0.25816, avg 0.23911
 ListDictBasedCache: min 0.16475, max 0.17725, avg 0.16911 

The difference between simple and complex solutions is quite noticeable - about 20 times! The solution on OrderedDict is a little behind in terms of performance, but only slightly. For further conclusions, it is necessary to make more complex measurements, reflecting the peculiarity of the cache problem — quick access to arbitrary pieces of information, rather than some linear, as was used above.

On memory consumption, we run the previous script with another section of main and look at the memory consumption by the Python interpreter through the task manager:
 if __name__ == '__main__': import time print 'measure me - no cache' try: while True: time.sleep(10) except: # let user interrupt this with Ctrl-C pass testCache(FastCache, 1000, 1) print 'measure me - cached' while True: time.sleep(10) exit() 

The measurement results are in the table:
Cache classbefore creating cacheafter creating cachecache consumption
Simplecache4228K4768K540K
Fastcache4232K4636K404K
Odcache4496K4936K440K
ListDictBasedCache4500K4880K380K

As you can see, a complex solution not only adds noticeably faster (~ 20 times) elements, but also consumes slightly less memory than a simple one.

In the specific task that I solved by creating such a cache, replacing it reduced the request return time from about 90 seconds to about 70 (more dense profiling and rewriting almost the entire response generation logic then helped bring the response time to 30 seconds, but this is completely other story).

Source: https://habr.com/ru/post/147756/


All Articles