📜 ⬆️ ⬇️

HTTP headers - is it a lot of garbage?

HTTP protocol is the main data transfer protocol in the world wide web (www).
It was developed for a long time by the standards of the IT industry, the specification of version 1.0 was published in 1996, and the current version 1.1 - in 1999 ( RFC 2616 ).

The total amount of daily http traffic is incalculable. I see a problem in that most of the data transferred is HTTP headers, mostly unnecessary.
Some may say that this is not so important in our century of high speeds and unlimited tariffs, and will be partly right. But there are other considerations.


What browsers send


Consider, for example, yandex.ru.
When you first load the main page, the browser loads 19 resources via the HTTP protocol, with a total amount of 44Kb (add more headings to this, depending on the browser). It can be said, Yandex engineers did a good job on optimizing the load. But, according to approximate calculations, 10% (about 4-5Kb) of information in the headers are superfluous.
')
It would seem that 10% are insignificant, but you can look at it from different sides.
On the part of the user, the page load speeds up by 10% (hello to the regions on the dialup)
On the part of the site owner, it is possible to reduce traffic by 10%, which also means the ability to serve 10% more users or, if we consider a cluster system, buy 9 servers instead of 10.
And in relation to web developers, it turns out to be ugly at all - press png down to the smallest possible size of (for example) 250 bytes, and you get as many more trailers in the headers.

Below are the headlines sent by modern browsers (for example, took the title page yandex.ru). Even lower are further thoughts on this.

Firefox 3.0
GET / HTTP / 1.1
Host: yandex.ru
User-Agent: Mozilla / 5.0 (Windows; U; Windows NT 5.1; ru; rv: 1.9.0.6) Gecko / 2009011913 Firefox / 3.0.6
Accept: text / html, application / xhtml + xml, application / xml; q = 0.9, * / *; q = 0.8
Accept-Language: ru, en-us; q = 0.7, en; q = 0.3
Accept-Encoding: gzip, deflate
Accept-Charset: windows-1251, utf-8; q = 0.7, *; q = 0.7
Keep-Alive: 300
Connection: keep-alive


IE7
GET / HTTP / 1.1
Accept: image / gif, image / x-xbitmap, image / jpeg, image / pjpeg, application / x-shockwave-flash, application / xaml + xml, application / vnd.ms-xpsdocument, application / x-ms-xbap, application / x-ms-application, * / *
Accept-Language: ru
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla / 4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)
Host: yandex.ru
Connection: Keep-Alive


Safari 3.1
GET / HTTP / 1.1
User-Agent: Mozilla / 5.0 (Windows; U; Windows NT 5.1; en-RU) AppleWebKit / 525.19 (KHTML, like Gecko) Version / 3.1.2 Safari / 525.21
Accept-Encoding: gzip, deflate
Accept: text / xml, application / xml, application / xhtml + xml, text / html; q = 0.9, text / plain; q = 0.8, image / png, * / *; q = 0.5
Accept-Language: en-RU
Connection: keep-alive
Host: yandex.ru


Opera 9
GET / HTTP / 1.1
User-Agent: Opera / 9.63 (Windows NT 5.1; U; en) Presto / 2.1.1
Host: yandex.ru
Accept: text / html, application / xml; q = 0.9, application / xhtml + xml, image / png, image / jpeg, image / gif, image / x-xbitmap, * / *; q = 0.1
Accept-Language: ru-RU, ru; q = 0.9, en; q = 0.8
Accept-Charset: iso-8859-1, utf-8, utf-16, *; q = 0.1
Accept-Encoding: deflate, gzip, x-gzip, identity, *; q = 0
Cache-Control: no-cache
Connection: Keep-Alive, TE
TE: deflate, gzip, chunked, identity, trailers


Chrome
GET / HTTP / 1.1
User-Agent: Mozilla / 5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit / 525.19 (KHTML, like Gecko) Chrome / 1.0.154.48 Safari / 525.19
Accept: text / xml, application / xml, application / xhtml + xml, text / html; q = 0.9, text / plain; q = 0.8, image / png, * / *; q = 0.5
Accept-Encoding: gzip, deflate, bzip2, sdch
Accept-Language: en-US, ru, en-US, en
Accept-Charset: windows-1251, *, utf-8
Host: yandex.ru
Connection: Keep-Alive


This is approximately what our browsers send when loading each resource (html pages, javascript, css, images). Add to this another cookie (cookie), which are often present.
Because HTTP protocol has no state (stateless protocol), then the headers have to be repeated with each request.

With all this, the minimum "valid" HTTP / 1.1 request is as follows:
GET / HTTP / 1.1
Host: yandex.ru
Accept: * / *
Accept-Encoding: gzip, deflate
Connection: keep-alive


HTTP request / 1.0 and even less:
GET / HTTP / 1.0

but to use it is impractical, because compression, proxy, etc. are not supported.

Detailed header analysis


User-Agent - historically always present and theoretically allows you to determine the type of browser. But for some reason, instead of the string “ MSIE 7.0.5730.13 ”, the browser sends an incredibly long string with data that no one, in principle, uses.

Accept is an unnecessary header. Specify at least a dozen sites that filter incoming requests for this parameter (I do not even know where it might be useful). The idea is to serve as a pointer to the server, what types of files can the browser handle. Practically, it is always transmitted “ * / * ”, which means that the browser is ready to accept any files, which makes the header meaningless.

Accept-Language - it makes sense to transfer to specify the default language for multilingual sites. In practice, this value is extremely rarely used by sites, and even giants like google use geolocation (geoIP) to determine the display language.

Accept-Charset - also practically not used. In reality, browsers accept any charset, even those that are not in the list.

Conclusion


It would be great if the browser developers excluded unnecessary headers from the http requests sent so as not to increase the entropy of the Internet beyond what was needed.
I think harsh supervisors will prevent such liberties.

The web developer has one way out - optimize. Without this, the results can be deplorable (uploading 50 small pictures when loading a page gives a noticeable delay). You can learn more about optimization methods, for example, on webo.in

By the way, the information provided in this article may be useful to the developer of web parsers (robots, crawlers), indistinguishable from a real browser. I advise you to honestly send “ User-Agent: crawler ”, but sometimes you need to send a request that is identical to the real one.

Source: https://habr.com/ru/post/51379/


All Articles