📜 ⬆️ ⬇️

HTTP server optimization through resource versioning. Features of the implementation

  1. Essence of optimization
  2. Page load vs forced refresh
  3. Automation will be required
  4. Server side implementation
  5. Server side optimization
  6. Features google app engine
  7. Source
  8. Summary


We consider an example implementation for Google App Engine / Python.


Essence of optimization


Yahoo engineers in their famous article wrote about an interesting technique for optimizing HTTP processing through file versioning. Its essence is this ... Usually in HTML they write simply:
')
 < img src="image.jpg" > 


Having once acquired image.jpg in the cache, after the browser reads the HTML again and again finds a link to the same picture there. Whether the browser is updated on the server, in general, the browser itself cannot, so he has to send a request to the server.

To avoid an extra request, you can specify the version of the resource in its address, making the address unique:

 < img src="image.v250.jpg" > 


Thus, the browser can be sure that the file version number 250 in the future will not change, and number 251, too; and if # 250 is in the cache, then you can use it without any questions to the server. Two HTTP headers will help give the browser this confidence:

 //  ,     Expires: Fri, 30 Oct 2050 14:19:41 GMT //       Cache-Control: max-age=12345678, public 


Thus, to view a certain page you only need to download HTML for the nth time, and you no longer need to access numerous resources.

Page load vs. Forced Refresh


In the current view, this optimization works for following links and for Ctrl + L, Enter. But if the user refreshes the current page through F5, the browser forgets that for resources it was indicated “no longer disturb”, and here “extra” requests are rushing to the server, one for each resource. This behavior of browsers cannot be changed, but what can and should be done is not to give the files in full at every time, but to introduce additional logic, if possible, trying to answer “nothing has changed in me, take it from your cache”.

When the browser requests “image.v250.jpg”, then if it has a copy in the cache, the browser sends the “If-Modified-Since: Fri, 01 Jan 1990 00:00:00 GMT” header. The browser who came for this picture for the first time does not send such a leader. Accordingly, the north must first say “nothing has changed”, and the second honestly give the picture. Specifically, in our case, the date can not be analyzed - the fact that there is a picture in the cache is important, and the picture is correct there (due to the versioning of files and unique URLs).

But just because the title "If-Modified-Since" on the server will not come, even if the picture is in the cache. To force the browser to send this header, the (chronologically) previous answer needed to give the header "Last-Modified: Fri, 01 Jan 1990 00:00:00 GMT". In practice, this only means that this header server should always give. You can give an honest date of the last file change, and you can specify any date in the past - the same date will then go back to the server, and there she, as it turned out, is not of particular interest.

In fact, the optimization described in this section has no direct relationship with Yahoo, but should be used in pairs to avoid unnecessary burdens. Otherwise, the effect will be incomplete.

Automation will be required


The technique is not bad, but in practice it is not possible to arrange file versions manually. In GAE / django, the problem is solved through custom tags. In the template code is written:

 < img src="{% static 'image.jpg' %}" > 


HTML converted:

 < img src="/never-expire/12345678/image.jpg" > 


And the implementation of this tag:

 def static(path): return StaticFilesInfo.get_versioned_resource_path(path) register.simple_tag(static) 


Server side implementation


Basically, this optimization is convenient for processing static files - images, css, javascript. But App Engine handles files designated as static itself (not very efficiently) and will not allow to change HTTP headers. Therefore, in addition to the standard “static” directory, another one appears - “never-expire”.

First, the GET request handler verifies that the requested version of the file matches the last one. If it does not match, it redirects to a new address, in order:

 # Some previous version of resource requested - redirect to the right version correct_path = StaticFilesInfo.get_resource_path(resource) if self.request.path != correct_path: self.redirect(correct_path) return 


Then sets the response headers:
- Content-Type according to the file extension
- Expires, Cache-Control, Last-Modified as already described.

If the request contains the If-Modified-Since header, do nothing and set the code to 304 — the resource has not changed. Otherwise, the contents of the file are copied to the response body:

 if 'If-Modified-Since' in self.request.headers: # This flag means the client has its own copy of the resource # and we may not return it. We won't. # Just set the response code to Not Changed. self.response.set_status(304) else: time.sleep(1) # todo: just making resource loading process noticeable abs_file = os.path.join(os.path.split(__file__)[0], WHERE_STATIC_FILES_ARE_STORED, resource) transmit_file(abs_file, self.response.out) 


Perhaps, if the database in GAE is faster than the file system, it is worthwhile at the first request of the file to copy the contents of the file into the database and then only go there. The question is open to me.

Server side optimization


As a version of the file, you can use both the version from VCS and the time of the last file update - there is no fundamental difference. I chose the second, and with it, and simpler:

 os.path.getmtime(file) 


However, it seems that it’s not very good to poll the file system for each request - I / O is always slow. Therefore, you can collect information about the current versions (all) of static files upon the first request and put the information into memcache. The output is such a hash:

 { 'cover.jpg': 123456, 'style.css': 234567 } 


which will be used in the custom tag to find the latest version. Naturally, you will need something like a singleton just in case memcache gets rotten:

 class StaticFilesInfo(): @classmethod def __get_static_files_info(cls): info = memcache.get(cls.__name__) if info is None: info = cls.__grab_info() time = MEMCACHE_TIME_PRODUCTION if is_production() else MEMCACHE_TIME_DEV_SERVER memcache.set(cls.__name__, info, time) return info @classmethod def __grab_info(cls): """ Obtain info about all files in managed 'static' directory. This is done rarely. """ dir = os.path.join(os.path.split(__file__)[0], WHERE_STATIC_FILES_ARE_STORED) hash = {} for file in os.listdir(dir): abs_file = os.path.join(dir, file) hash[file] = int(os.path.getmtime(abs_file)) return hash 


Features of the Google App Engine


You can collect information about all static files, but what if the designer changes the picture? How does the server know that it is time to update the cached versions of the files? In general, I don’t really imagine - you need to either start a daemon listening to file system changes, or remember to run scripts after deployment.

But the App Engine is a special case. In this system, the development is carried out on a local machine, after which the ready code (and static files) are deployed (deployed) to the server. And, importantly, the files on the server can no longer be changed (until the next deployment). That is, it is enough to read the versions only once and no longer worry about the fact that they can change.

The only thing is that when developing locally, files can change very much, and if in this case the alternative does not work, the browser will, for example, show the developer the old version of the image, which is inconvenient. But in this case, the performance is not very important, so you can put the data in memcache for a few seconds or not put at all.

Source code of the finished example


code.google.com/p/investigations/source/browse/#svn%2Ftrunk%2Fnever-expire-http-resources

svn checkout investigations.googlecode.com/svn/trunk/never-expire-http-resources investigations

Summary


On the appspot, I have not yet filled it, but locally everything works and flies. People, use the benefits of client-server optimization, do not respond stupidly 200 OK :)

UPD. In the comments they write (and I confirm) that for static files of the same effect it is possible to achieve through standard static. That is, such “manual” code is hardly suitable for processing statics - GAE will cope with this better. However, the approach may be useful for processing dynamically generated resources . In this context, ETag may be more convenient than Last-Modified for implementation.

Source: https://habr.com/ru/post/126083/


All Articles