We finish HTTP caching in Django

This post focuses on HTTP caching ( translation ) and its use in conjunction with the Django framework. Few would argue with the statement that using HTTP caching is a very correct and reasonable practice of developing web applications. However, it is in this functional that Django contains a number of errors and inaccuracies that severely limit the practical benefits of this approach. For example, the bug # 15855 , instituted in April 2011, is still relevant, which can lead to very unpleasant errors in the operation of the web application.

Middleware vs. explicit decorator

There are two standard ways to enable HTTP caching in Django: by activating UpdateCacheMiddleware / FetchFromCacheMiddleware , or by decorating the presentation function using the cache_page decorator. The first method has one major drawback - it includes HTTP caching for all project views (view) without exception, but the second one contains the same # 15855 bug. If it were not for this bug, the option using cache_page would be preferable. Plus, this option is in good agreement with the most important of the postulates of The Zen of Python , which is “obvious better than implicit”.

The reason for the appearance of # 15855 lies in the mechanism for processing Django requests using so-called middleware. Schematically, this mechanism is shown in the figure below.

Decorators for the views on the diagram are located along with the views themselves (view function), that is, after they have been worked out, each middleware has the opportunity to additionally affect the final result (HttpResponse). For example, SessionMiddleware does this by adding a Vary header with the value “Cookie” to the response if inside the view function there was an access to the session (a normal case when working with authorized users). Disregarding the Vary header values while maintaining the cache may cause the application user to retrieve data from another user's cache. By the way, in the comments to the described bug there are examples of its solution specifically for the case of SessionMiddleware, but the problem is also relevant when using other middleware, for example, LocaleMiddleware, which expands the Vary header with the “Accept-Language” value.
')

Fix bug

For a complete fix # 15855, you must update the HttpResponse cache after all middleware have been processed. Now it is clear why in the case of UpdateCacheMiddleware / FetchFromCacheMiddleware this error does not exist, because if we put UpdateCacheMiddleware above all other middleware, then it is executed last and takes into account all the response headers. The only non-middleware way to implement a similar solution is to process the request_finished signal. But in this method there are two problems that need to be solved: first, the signal handler does not receive information about the current request / response, and second, the signal is sent after the response has been sent to the client. To update the cache, the second item is in general irrelevant (we can update the cache after sending the answer), but we need to add our own headers - Expires and Cache-Control (the most important!) To the response, which we cannot do if The request has already been processed.

Before proceeding, you should familiarize yourself with the source code of the original cache_page decorator. As you can see, it is based on the same UpdateCacheMiddleware and FetchFromCacheMiddleware, which in general is not surprising, because the tasks they solve are the same. We can do the same and write our own decorator, who will use the slightly modified versions of the mentioned middleware:

cache_page.py

from django.utils import decorators from .middleware import CacheMiddleware def cache_page(**kwargs): """    django.views.decorators.cache.cache_page """ cache_timeout = kwargs.get('cache_timeout') cache_alias = kwargs.get('cache_alias') key_prefix = kwargs.get('key_prefix') decorator = decorators.decorator_from_middleware_with_args(CacheMiddleware)( cache_timeout=cache_timeout, cache_alias=cache_alias, key_prefix=key_prefix, ) return decorator

middleware.py

 from django.middleware import cache as cache_middleware class CacheMiddleware(cache_middleware.CacheMiddleware): pass #   middleware,

First, let's solve two existing problems with request_finished, which I mentioned earlier. We know for sure that only one request is processed at the same time in one thread, which means that the current response can be saved to the user, correctly, in threading.local . We do this at the moment when the management is still at the decorator's place in order to subsequently use it in the request_finished handler. Thus, we can “kill two birds with one stone”: adding Expires and Cache-Control headers before sending a response to the client and deferred saving to the cache with all possible changes:

middleware.py

 import threading from django.core import signals from django.middleware import cache as cache_middleware response_handle = threading.local() class CacheMiddleware(cache_middleware.CacheMiddleware): def __init__(self, *args, **kwargs): super(CacheMiddleware, self).__init__(*args, **kwargs) signals.request_finished.connect(update_response_cache) def process_response(self, request, response): response_handle.response = response return super(CacheMiddleware, self).process_response(request, response) def update_response_cache(*args, **kwargs): """   request_finished """ response = getattr(response_handle, 'response', None) #  response if response: try: pass #  response   finally: response_handle.__dict__.clear()

But in this simplest case, saving to the cache will occur twice, and for the first time without taking into account all values of Vary. Technically, this problem can be solved. Who cares, under the spoiler below set out such a solution.

middleware.py

 import contextlib import threading import time from django.core import signals from django.core.cache.backends.dummy import DummyCache from django.middleware import cache as cache_middleware from django.utils import http, cache response_handle = threading.local() dummy_cache = DummyCache('dummy_host', {}) @contextlib.contextmanager def patch(obj, attr, value, default=None): original = getattr(obj, attr, default) setattr(obj, attr, value) yield setattr(obj, attr, original) class CacheMiddleware(cache_middleware.CacheMiddleware): def __init__(self, *args, **kwargs): super(CacheMiddleware, self).__init__(*args, **kwargs) signals.request_finished.connect(update_response_cache) def process_response(self, request, response): if not self._should_update_cache(request, response): return super(CacheMiddleware, self).process_response(request, response) response_handle.response = response response_handle.request = request response_handle.middleware = self with patch(cache_middleware, 'learn_cache_key', lambda *_, **__: ''): #        ( ) with patch(self, 'cache', dummy_cache): #       ,  #   response     , #       Vary, # . https://code.djangoproject.com/ticket/15855 return super(CacheMiddleware, self).process_response(request, response) def update_cache(self, request, response): with patch(cache_middleware, 'patch_response_headers', lambda *_: None): #      response  super(CacheMiddleware, self).process_response(request, response) def update_response_cache(*args, **kwargs): middleware = getattr(response_handle, 'middleware', None) request = getattr(response_handle, 'request', None) response = getattr(response_handle, 'response', None) if middleware and request and response: try: CacheMiddleware.update_cache(middleware, request, response) finally: response_handle.__dict__.clear()

Eliminate other inaccuracies

At the beginning I mentioned that Django contains several errors in the HTTP caching mechanism, and it is. And the bug solved above is not the only one, although the most critical one. Another inaccuracy of Django is that when reading a saved query from the cache, the value of the max-age parameter of the Cache-Control header is returned as it was when the response was saved in the cache, that is, the max-age may not correspond to the Expires header value due to the difference in time between these two events. And since browsers prefer to use Cache-Control instead of Expires, we get another error. Let's solve it. To do this, our middleware needs to override the “process_request” method:

process_request

 def process_request(self, request): response = super(CacheMiddleware, self).process_request(request) if response and 'Expires' in response: #  'max-age'  'Cache-Control' # ,    'Expires' expires = http.parse_http_date(response['Expires']) timeout = expires - int(time.time()) cache.patch_cache_control(response, max_age=timeout) return response

If there is no urgent need to save all HTTP responses in the cache (and only HTTP caching headers are needed), then instead of everything described above in the project settings, you can replace the main cache driver with the fake one (this solution also protects against the consequences # 15855 ):

 CACHES = { 'default': { 'BACKEND': 'django.core.cache.backends.dummy.DummyCache', }, }

Further, it is not clear why, but UpdateCacheMiddleware, in addition to the standard Expires and Cache-Control, also adds the Last-Modified and ETag headers. And this is despite the fact that FetchFromCacheMiddleware does not process the corresponding requests in any way (with the headers of If-Modified-Since, If-None-Match, etc.). There is a violation of the fundamental principle of a single duty . I suppose the calculation was that the developer would not forget to include ConditionalGetMiddleware or at least CommonMiddleware, the benefits of which are in fact very doubtful, and I never turn them on in my projects. Moreover, if something still returns 304 Not Modified (this happens, for example, when using last_modified or etag decorators), then such a response will not get caching headers (Expires and Cache-Control), which will cause the browser to return again and again (and get 304 Not Modified), despite the fact that we seemingly enabled HTTP caching, which should tell the browser that there is no point in going back in time for the specified time. We eliminate this inaccuracy in the “process_response”:

process_response

 def process_response(self, request, response): if not self._should_update_cache(request, response): return super(CacheMiddleware, self).process_response(request, response) last_modified = 'Last-Modified' in response etag = 'ETag' in response if response.status_code == 304: #    Not Modified  Expires  Cache-Control cache.patch_response_headers(response, cache_timeout) else: response_handle.response = response response_handle.request = request response_handle.middleware = self with patch(cache_middleware, 'learn_cache_key', lambda *_, **__: ''): #        ( ) with patch(self, 'cache', dummy_cache): #       ,  #   response     , #       Vary, # . https://code.djangoproject.com/ticket/15855 response = super(CacheMiddleware, self).process_response(request, response) if not last_modified: #   Last-Modified,        del response['Last-Modified'] if not etag: #   ETag,        del response['ETag'] return response

It is worth explaining a little here that if we want Expires and Cache-Control headers to be added to the 304 Not Modified response, then the last_modified and etag decorators should go after the cache_page, otherwise the latter will not have the chance to process these type of responses:

 @cache_page(cache_timeout=3600) @etag(lambda request: 'etag') def view(request): pass

Adding useful features

Having eliminated all the shortcomings, you suddenly realize that in the resulting solution, well, there is very little opportunity to set a calculated (on-demand) value of the caching time, especially if you look at the last_modified decorators and etag, where such an opportunity exists.

And that is not all. I would also like to somehow cleverly invalidate the cache, for example, when changing the returned entity. It is most convenient to do this by automatically changing the key for the cache, that is, you also want to set the key not statically, but calculate on-demand.

The simplest and most elegant way to realize both of these needs is to set the necessary parameters in the form of a lazy expression:

 from django.utils.functional import lazy @cache_page( cache_timeout=lazy(lambda: 3600, int)(), key_prefix=lazy(lambda: 'key_prefix', str)(), ) def view(request): pass

In this case, the function passed as an argument to lazy will only be executed (and always) when an attempt is made to refer to the expression in the context of the types specified by the subsequent arguments.

Another more flexible way is the ability to pass ordinary values for the cache_timeout and key_prefix values with the signature corresponding to the presentation function:

 @cache_page( cache_timeout=lambda request, foo: 3600, key_prefix=lambda request, foo: 'key_prefix', ) def view(request, foo): pass

This option would allow to calculate cache_timeout and key_prefix based on the request itself and its parameters, but it requires one more refinement. In order not to bore the reader with large chunks of source code, I will simply give a link to the component, where this and everything mentioned above is already implemented as a separate Python module: django-cache .

Conclusion

I did not mention one more useful feature that it would be nice to have, about the client’s ability to force the server to skip the cache, so that the latter would give the most recent data to the client’s request. This is done using the Cache-Control request header: max-age = 0. In django-cache, there is no such possibility yet, but perhaps in the future such an option will appear.

UPD : the mentioned option still appeared .

Anticipating questions on the topic of why all fixes and new features should not be immediately attributed to Django, I will answer that I plan to do this in the near future. But the new features will only get into the next version of Django, most likely already in 1.11, and django-cache can already work with all the latest versions (starting with 1.8). Although bug fixes are added, as a rule, in all currently supported branches.

Another bug

When the note was already being prepared for publication, on one of the projects I found another inaccuracy in the Django query caching functionality. Its essence is that the so-called conditional requests (containing If-Modified-Since headers, etc.), the cache_page always tries to get the result from the cache and, if successful, returns a response with the code 200. This behavior is undesirable in cases where the handler request may return 304 Not Modified. Fix code here .

UPD : in fact, you can do without threading.local and signals if you add a special “callback” to the response._closable_objects list, which will save the response to the cache after all the middleware have been processed.

Source: https://habr.com/ru/post/305774/

All Articles

We finish HTTP caching in Django

Middleware vs. explicit decorator

Fix bug

Eliminate other inaccuracies

Adding useful features

Conclusion

Another bug

More articles: