UPD0 (2016-07-19 23-31): apparently, the first half of my article is a successfully invented bicycle. Thanks to habravchanam for the link to the specification
An article is no more valuable than a free description of an already invented technology.
July Saturday evening was coming to an end. Chopping wood on a shish kebab, I hung the USB modem on the baguette, commanded sudo wvdial
, deployed the browser and updated the tab with the open github. Rather, I tried to update. The speed did not please, and as a result the page was refreshed, but some style files were obviously missing; and it was not about blocking, since I had observed similar problems with other sites, and often they were solved simply by updating the page several times. The overload of the 3G network was to blame for everything.
Stop! But what about the cache?
A short googling led to the official Google manual . I will not retell it entirely; Most likely, the fact was that the browser was diligently waiting for the server to transmit ETags, and the server’s response was lost in crowded trijungles.
After a couple of days, returning from the cafe on a stuffy day, I came up with a rationalization proposal that solves this (and several other problems), which I present in this article.
To add to all tags for connecting subordinate statics (styles, scripts, images) the checksum
attribute that would store the hash (for example, SHA-1, as in git) of the required file:
<link href="//habracdn.net/habr/styles/1468852450/_build/topic_form.css" rel="stylesheet" media="all" checksum="ef4547a3d5731e77f5a1a19e0a6d89915a56cd3a"/>
Having found a similar tag in the body of a web page, the browser looks to see if there is an object with such a hash in the cache, and if so, do not send any requests at all : it’s clear that the file is exactly the one that is required. It is better to store the files in the browser cache immediately with the names corresponding to their hashes, as the same git does.
The backward compatibility of the proposed solution is obvious.
Small site owners often have to choose: either to connect jQuery and / or similar libraries from a CDN (Google, for example), or from their domain.
In the first case, the site loading time decreases (including the primary one, that is, when a visitor first visits the site) due to the fact that the file from Google’s servers is more likely to already be in the browser cache. But, for example, WordPress developers adhere to the second option, focusing on autonomy. And in conditions when CDNs fall, are blocked, etc., they can be understood.
Now it will be possible to get rid of such a problem forever: does it matter where the file comes from, if its contents are exactly what the html page needs, and does it certify it? You can safely specify your domain, and if the library is in the cache (no matter downloaded from this site, another “small” site or from some CDN), it will pick up.
One of the reasons for prohibiting the download of HTTP resources on HTTPS pages is the possibility of replacing HTTP content. Now it is no longer a barrier: the browser can get the required content and verify its hash with the hash transmitted via HTTP. The abolition of the ban on mixed content (with the presence and coincidence of the hash) will speed up the spread of HTTPS.
It is known that the owner of a certain site evilsite.org
can (with some probability) determine whether a visitor was on another site goodsite.org
by requesting, for example, an image of goodsite.org/favicon.ico
. If the download time of the icon is negligible, then it is in the cache, therefore, the visitor was on goodsite.org
. Now this attack will become more complicated: the near-zero response time will only mean that the visitor was on the site with the same favicon . This, of course, does not solve the problem entirely, but it nevertheless somewhat complicates life to the defining one.
As usual (I’m a mathematician, what can I do here) we formulate the axioms that are put into the sentence:
Knowing the hash of the required auxiliary file, you can almost safely request it from anyone; the main danger is that if the node being polled does indeed have the required file, then it knows its contents and, most likely, at least one URI address at which the required file could (or could) be obtained. We have two options for using the proposed technology, taking into account this threat in order to ensure a smooth approach to the network mesh network:
For example, in the office there are programmers whose computers are integrated into a local network. Programmer Vasya comes early in the morning, opens the githab and gets into the cache styles from the new design, which is rolled out at night (we have the night, there is the day). When programmer Petya comes to the office and also downloads the html-code of the gitkhabovskaya page, his computer asks all the computers on the network: "Do you have a file with such and such a hash?" "Catch!" - answers Vasina computer, thereby saving traffic.
Then comes a break, Vasya and Peter climb to watch the seals and send photos to each other. But each cat is downloaded through the office channel only once ...
Anya rides a tram from work and reads news ... for example, on Yandex News. Having met the next <img>
, Anin’s phone from a random MAC address asks everyone he sees: “Guys, and no one has a file with such and such a hash?”. If the answer is received in a reasonable time - profit, Anya saved cheap mobile traffic. It is important to change the MAC address to random and not to “yell” more often when there are too few nodes in the field of view and the questioner can be identified visually.
The reasonable response time is determined based on the cost of traffic.
The photo in the social network can be represented as a blob containing the hash and address of the actual image (possibly in several different sizes), as well as a list of comments and likes. This blob can also be viewed as an auxiliary file, cached and transferred to each other.
Moreover, an album of photos is also easily turned into a blob: a list of image hashes + a list of hashes of photo blobs (the first is necessary for displaying photos immediately when adding a like / comment, and meta-information - as it is received).
It only remains to implement the electronic signature and the fields of the form "replaces the blob of such-and-such" - and the nyash-mesh-sotsialochka is ready.
Ideally, when writing a hash, you should not use a hexadecimal number system, but a system with a larger base (since we decided to save traffic). Another idea is the magnet
attribute, which contains a magnet link . Cheap, angry, standardized and allows you to also specify several classic source addresses, which is important in the case of carpet locks and in cases where the browser knows that traffic to different servers is charged differently.
It is possible that the hash of the received file did not match the required one. In this case, it would be reasonable to provide meta-tags that indicate to the browser whether to use such a file (by default - not) and whether to report the incident to the server (by default - not).
In some cases, you can use any of several files with different hashes. For example, the site uses minified jQuery, but if there is non-minified in the browser cache - what prevents you from using it?
Many devices work in two modes: when the Internet is conditionally unlimited (for example, a mobile phone in a Wi-Fi network) and when the Internet is limited (traffic limit or narrow channel). Browser or extension to it can, using unlimited connection, pre-download popular libraries (like jQuery and plug-ins to it), as well as they need to be updated. Isn't it a dream of many for jQuery to be included in the browser?
The proposed rationalization is important, since the struggle to optimize the download sites is in full swing. Most of all, small and medium sites will benefit from shared libraries (and maybe some frequently used images) in the cache. The consumption of traffic by mobile devices will decrease, which is important in view of the limited bandwidth of cellular Internet channels. Large sites can also reduce the load on their servers if mesh technologies are implemented.
Thus, the support of the proposed technology is beneficial for webmasters, whose websites will load faster, and for browser manufacturers, who will also display pages faster, and for providers, who will decrease the bandwidth consumption (though not so much, but not from active providers). ).
PS
I would be very pleased to hear from Mithgol , Shpankov and BarakAdama .
Pps
Habr is omniscient, to which sport lotto send a rational offer?
Source: https://habr.com/ru/post/305898/
All Articles