More about caching in Django

Everyone knows what caching is and why it is needed. Attendance grows, the load on the database increases, and we decide to give the data from the cache. In an ideal world, probably, it will be enough for this to add the line USE_CACHE = True in settings.py, but until this time has come, it will take a bit more gestures.

When we are going to use the cache in Django, we need to make a choice: to take a ready-made solution that will make everything “behind the scenes”, or implement our own. Unlike many other situations, this choice is not so obvious, as there are quite a lot of limitations and potential inconveniences in existing ready-made solutions.

First, we will quickly look at ready-made solutions, and then we will figure out how best to implement caching yourself.

Turnkey solutions

We go to djangopackages.com and see what will please us.
')

Johnny cache

Monkeypatchit Django querysets so that all requests to ORM are cached automatically. The installation is as simple as possible: a couple of new lines in settings.py, otherwise everything remains the same, the syntax of the queries does not change. Data is cached forever, invalidate changes.

But just a disability can negate all the effectiveness of Johnny. Disabled tables in the database entirely . In other words, if you have 9000 users, if you change at least one of them, the cache will be reset for all. If you need to cache a rarely changing table, this solution may be appropriate, in other cases (and most of them) - alas.

Django cache machine

Too requests to ORM. Only querysets are cached, queries of type .get() not cached. Also does not .values() and .values_list() . To use, you need to add a mixin and a manager to the model.

Tries to reasonably approach disability. When changing one object, only those cache elements that include this object are invalidated (including relations like ForeignKey and ManyToMany ).

Django-cachebot

Automatically caches all .get() requests. To cache querysets, call the cache() method. Uses his manager.

Disability is about the same as Django Cache Machine. Not invalidated by changes in ManyToMany .

Total

Django Cache Machine and Django-cachebot acceptable solve the task, Johnny Cache is too indistinct in invalidation, I would not recommend it.

It would seem that you can take and use, but there are a couple of things that must be remembered.

There is practically no control over disability. Very often it may be needed. For example, you have a site with articles (or any other materials). There is a page with a list of articles, there are only headlines, there are pages for each article. Do I need to invalidate the cache of the list of articles if the text of any article changes (the title remains the same)? Of course not. But to explain something similar to a third-party application is very difficult.
Almost all apps have some limitations. Someone does not cache aggregation, someone calls .get() calls, someone is not disabled in some cases. If you have such things in the project, then perhaps using a ready-made solution is not the best choice, because you still have to write a lot yourself (or yourself).

If such things do not confuse you, then ready-made solutions for you. If for some reason you want to implement caching yourself, then move on.

Do it yourself

Architecturally, you need to implement two things. The first is the logic of data acquisition: we look if there is data in the cache, if there is, give it, if not, take it from the database, put it in the cache, give it. The second is the logic of invalidation.

Get the data

Everything is simple and obvious:

  cached = cache.get('my_key') if cached is not None: return cached result = make_heavy_query() cache.set('my_key', result) return result

Q : how to store data forever (infinite timeout)? A : use a backend that supports it, for example, django-newcache

Q : What if I want to keep None in the cache? A : read the documentation and find out that you can use any value instead of None :

  cached = cache.get('my_key', -1) if cached != -1: # ...

Where to keep the code associated with the cache?

The main thing - to keep it in one place, and not to smear throughout the project. In my (and not only in my) opinion, the most suitable place is the manager of the corresponding model. You can override MyModel.objects , you can add another MyModel.cached type.

Often you need code that caches access to related objects. For example, for some article you need to get a list of tags. There is a temptation to put the caching code in the model method, but I am in favor of being consistent and do it through the manager. And already in the model contact the manager:

  class Article(models.Model): # ... def get_tags(self): return Article.cached.tags_for_instance(self.id)

How to store data?

Instance models can be put into the cache just like that, they are perfectly serialized. All methods will work, type, get_FOO_display . You just need to remember that related objects ( ForeignKey and ManyToMany ) will not get into the cache, and when you try to access them, the base will twitch again. Therefore, it is better to add your methods to refer to them (see the example above).

If you want to cache the queryset, then it is better to first bring it to the list ( list ). It can be cached this way, but this may affect compatibility issues between Django versions .

If the list of objects is relatively small and rarely changes (for example, the list of cities, faculties, etc.) and the order of the elements is not important, then you can store it in the form of a dictionary, in this form:

 dict((x.id, x) for x in MyModel.objects.all())

This will make it possible to get by with one cache entry, rather than making an entry for each object.

Sometimes it makes sense to store not a list of objects, but only a list of IT specialists, and the objects themselves to get their cache with another get_many request. Plus: disability is needed only when the composition of the elements of the list changes, that is, less often. Minus: sometimes no benefit from the plus. Here, probably, we need an example. Suppose we have a list of "10 recent articles." If you store only IDs, then you need to invalidate this list only if a new article has been added to the site or some article has been deleted from this list. If you keep the entire list of objects, then you need to invalidate any change to the article (for example, a typo corrected). On the other hand, if articles are added infrequently, then there will be no benefit here, so this method will not work everywhere.

How to name keys?

If we store something in a unique site, for example, a list of articles, then you can call the key as you please. For example, 'articles' . You do not need to add a unique prefix each time; one time is enough .

If the key name depends on the object, then you need to use string formatting. Often do this: 'article::%d' . Everything is good, but it is possible better: 'article::%(id)d' . In the first case, "some kind of whole", in the second - aidish. Or compare 'tags_for::%d' and 'tags_for::%(article_id)d' . If this syntax seems strange to you, then it is fixable .

Disability

Disability is best done with signals. The signal code can be stored anywhere, I prefer @staticmethod 's model class for this. Disability is often not done very effectively. Here is a typical example:

  @receiver(post_save, sender=Article) @receiver(pre_delete, sender=Article) def invalidate(instance, **kwargs): cache.delete('article::%(id)d' % {'id': instance.id})

After all, you can do better!

  @receiver(post_save, sender=Article) def on_change(instance, **kwargs): cache.set('article::%(id)d' % {'id': instance.id}, instance) @receiver(pre_delete, sender=Article) def on_delete(instance, **kwargs): cache.delete('article::%(id)d' % {'id': instance.id})

Why delete a value when it can be replaced with a new one? We save the query to the database and we insure against the dogpile effect. Of course, now we need two handlers: for changing an object and for deleting. It is better to do this whenever possible.

Disability ManyToMany

For each ManyToManyField you need to hang an additional disability. Like that:

  @receiver(m2m_changed, sender=Article.tags.through) def on_tags_changed(instance, **kwargs): # do update / invalidation

Caching ModelChoiceField and ModelMultipleChoiceField

Django has no built-in ability to cache options for these fields. This means that each rendering of this field will result in a request to the database. You can use your hands to replace them with ChoiceField and MultipleChoiceField respectively (+ add some logic), or you can use my little application . It works exactly on Django 1.2-1.4. Here I will not grovel, the link describes everything.

Never rely on cache persistence!

Finally, a few words about persistence. Firstly, this word looks very clever, and secondly, remember that you can do exactly two things with the cache: read from there and write a new value there. Never try to change the data in the cache, for example, like this:

  mylist = cache.get('mylist') mylist.append(value) cache.set('mylist', mylist)

This operation is not atomic, that is, there is no guarantee that two customers will not change the list at the same time. And when this happens, you will spend a sleepless night, figuring out what is the matter and why you have the wrong data. So it is better not to. Of course, you can use those operations whose atomicity is guaranteed by the backend, for example, cache.incr() / cache.decr() for the memocache.

Conclusion

If something in the above written is not optimal or wrong, write in the comments, I will correct the article. She will be more useful, and her readers - happier. Thank.

Source: https://habr.com/ru/post/143789/

All Articles