📜 ⬆️ ⬇️

Http headers will tell a lot about your device


If you correctly collect and interpret the headers, then you can say a lot about the device, and perhaps about the user himself. In this article I’ll tell you how we use the information in the http headers in Wapstart .
Disclaimer - the article is an overview. Some things may seem community too " captainist ".

I'll start with the basics:
As a rule, within the web, the interaction is performed using the http protocol .

The minimum valid http request using the GET method is as follows:
GET / HTTP/1.0\r\n \r\n 

or so:
 GET / HTTP/1.1\r\n Host: wapstart.ru\r\n \r\n 

The header is a pair: the name of the field and its value, separated by a colon. Read more - in rfc .
As a rule, browsers transmit some additional headers, which may or may not be described in rfc. :)
Almost always the User-agent header will be transferred, and when working through a proxy, the via or x-forwarder-for headers will also be added. Strictly speaking, rfc does not forbid you to transfer your titles, it simply says that they should be ignored.
For example, this query is still valid:
 GET / HTTP/1.1\r\n Host: wapstart.ru\r\n User-agent: dovg\r\n x-ololo: trololo\r\n x-habrauser: dovg\r\n \r\n 

Headers will most likely get into your application in the form defined in rfc on the cgi protocol . (section 4.1)
Roughly speaking, an indication of the protocol (HTTP) will be added to them, they will be converted to upper case, and the minuses (hyphens) will be replaced with an underscore: x-habrauser will turn into HTTP_X_HABRAUSER, for example. Values ​​will not undergo changes.
In the real world, a lot of additional headers add opera-mini, as well as the standard browser of Nokia phones.

Let's return to our tasks.
We are engaged in advertising in the mobile web, so one of our priorities is to separate conditionally “mobile users” from non-mobile ones.
Of course, this task cannot be solved with 100% efficiency, because information is generated on the client side, and as we know, no user data can be trusted.
')
In addition to defining the “mobility” of a user, we want to know the following:

This data allows us to show more targeted, and therefore interesting to the user advertising.

Let's start with the first goal - how to understand the "mobility" of the user. Once upon a time we wrote such a script .
I give the implementation in php, but the algorithms used here are so trivial that the script can be ported to the nginx config though. We, by the way, had an idea to do it at the nginx level, but our hands never reached its implementation.
In this article I will not give the code, it is on github. An important remark is that you cannot fully trust only one user-agent!

For other tasks, we came up with the gdi database (Get Device Info), which currently can get information about the device, os and browser from the set of http headers.
For us, the interface is as follows -
 get header:HTTP_ACCEPT_ENCODING=gzip%2C+deflate&HTTP_USER_AGENT=Opera%2F9.80+%28J2ME%2FMIDP%3B+Opera+Mini%2F6.24093%2F27.1324%3B+U%3B+ru%29+Presto%2F2.8.119+Version%2F11.10&HTTP_X_OPERAMINI_FEATURES=advanced%2C+file_system%2C+camera%2C+touch%2C+folding%2C+routing&HTTP_USER_AGENT=LG+%23+KP500&HTTP_USER_AGENT=LG-KP500+Teleca%2FWAP2.0+MIDP-2.0%2FCLDC-1.1 VALUE header:HTTP_ACCEPT_ENCODING=gzip%2C+deflate&HTTP_USER_AGENT=Opera%2F9.80+%28J2ME%2FMIDP%3B+Opera+Mini%2F6.24093%2F27.1324%3B+U%3B+ru%29+Presto%2F2.8.119+Version%2F11.10&HTTP_X_OPERAMINI_FEATURES=advanced%2C+file_system%2C+camera%2C+touch%2C+folding%2C+routing&HTTP_USER_AGENT=LG+%23+KP500&HTTP_USER_AGENT=LG-KP500+Teleca%2FWAP2.0+MIDP-2.0%2FCLDC-1.1 0 234 O:12:"CuttedDevice":6:{s:5:"*id";i:3027;s:7:"*name";s:5:"KP500";s:10:"*deleted";b:0;s:9:"*parent";O:18:"CuttedDeviceParent":2:{s:5:"*id";i:23;s:7:"*name";s:2:"LG";}s:14:"*screenWidth";i:240;s:15:"*screenHeight";i:400;} //,    memcache ;) //        memcache,     . 

I hope that you will learn more about the interaction on devconf in this report.

Known problems that we can not solve:


And finally, some statistics on the headers in our database:
 gdi=> select count(distinct name) from request; count ------- 134 (1 row) 

 gdi=> select count(*) from request; count -------- 651655 (1 row) 

 gdi=> select name, count(value) as different_values from request group by name order by different_values desc limit 10; name | different_values ---------------------------+------------------ HTTP_USER_AGENT | 648494 HTTP_X_WAP_PROFILE | 701 HTTP_X_OPERAMINI_PHONE_UA | 698 HTTP_VIA | 572 HTTP_X_PROXY_ID | 245 HTTP_X_OPERAMINI_FEATURES | 184 HTTP_X_OPERAMINI_PHONE | 109 HTTP_X_MSISDN | 96 HTTP_X_BLUECOAT_VIA | 84 HTTP_X_DEVICE_USER_AGENT | 77 (10 rows) 

Source: https://habr.com/ru/post/139722/


All Articles