
If you correctly collect and interpret the headers, then you can say a lot about the device, and perhaps about the user himself. In this article I’ll tell you how we use the information in the http headers in
Wapstart .
Disclaimer - the article is an overview. Some things may seem community too "
captainist ".
I'll start with the basics:
As a rule, within the web, the interaction is performed using
the http protocol .
The minimum valid http request using the GET method is as follows:
GET / HTTP/1.0\r\n \r\n
or so:
GET / HTTP/1.1\r\n Host: wapstart.ru\r\n \r\n
The header is a pair: the name of the field and its value, separated by a colon. Read more - in
rfc .
As a rule, browsers transmit some additional headers, which may or may not be described in rfc. :)
Almost always the User-agent header will be transferred, and when working through a proxy, the via or x-forwarder-for headers will also be added. Strictly speaking, rfc
does not forbid you to transfer your titles, it simply says that they should be ignored.
For example, this query is still valid:
GET / HTTP/1.1\r\n Host: wapstart.ru\r\n User-agent: dovg\r\n x-ololo: trololo\r\n x-habrauser: dovg\r\n \r\n
Headers will most likely get into your application in the form defined in rfc on the
cgi protocol . (section 4.1)
Roughly speaking, an indication of the protocol (HTTP) will be added to them, they will be converted to upper case, and the minuses (hyphens) will be replaced with an underscore: x-habrauser will turn into HTTP_X_HABRAUSER, for example. Values will not undergo changes.
In the real world, a lot of additional headers add opera-mini, as well as the standard browser of Nokia phones.
Let's return to our tasks.
We are engaged in advertising in the mobile web, so one of our priorities is to separate conditionally “mobile users” from non-mobile ones.
Of course, this task cannot be solved with 100% efficiency, because information is generated on the client side, and as we know, no user data can be trusted.
')
In addition to defining the “mobility” of a user, we want to know the following:
- information about the user's device (screen resolution, the presence of wifi, etc.);
- the operating system that controls the device;
- browser (application) from which the request is made.
This data allows us to show more targeted, and therefore interesting to the user advertising.
Let's start with the first goal - how to understand the "mobility" of the user. Once upon a time we wrote
such a script .
I give the implementation in php, but the algorithms used here are so trivial that the script can be ported to the nginx config though. We, by the way, had an idea to do it at the nginx level, but our hands never reached its implementation.
In this article I will not give the code, it is on github. An important remark is that you cannot fully trust only one user-agent!
For other tasks, we came up with the gdi database (Get Device Info), which currently can get information about the device, os and browser from the set of http headers.
For us, the interface is as follows -
get header:HTTP_ACCEPT_ENCODING=gzip%2C+deflate&HTTP_USER_AGENT=Opera%2F9.80+%28J2ME%2FMIDP%3B+Opera+Mini%2F6.24093%2F27.1324%3B+U%3B+ru%29+Presto%2F2.8.119+Version%2F11.10&HTTP_X_OPERAMINI_FEATURES=advanced%2C+file_system%2C+camera%2C+touch%2C+folding%2C+routing&HTTP_USER_AGENT=LG+%23+KP500&HTTP_USER_AGENT=LG-KP500+Teleca%2FWAP2.0+MIDP-2.0%2FCLDC-1.1 VALUE header:HTTP_ACCEPT_ENCODING=gzip%2C+deflate&HTTP_USER_AGENT=Opera%2F9.80+%28J2ME%2FMIDP%3B+Opera+Mini%2F6.24093%2F27.1324%3B+U%3B+ru%29+Presto%2F2.8.119+Version%2F11.10&HTTP_X_OPERAMINI_FEATURES=advanced%2C+file_system%2C+camera%2C+touch%2C+folding%2C+routing&HTTP_USER_AGENT=LG+%23+KP500&HTTP_USER_AGENT=LG-KP500+Teleca%2FWAP2.0+MIDP-2.0%2FCLDC-1.1 0 234 O:12:"CuttedDevice":6:{s:5:"*id";i:3027;s:7:"*name";s:5:"KP500";s:10:"*deleted";b:0;s:9:"*parent";O:18:"CuttedDeviceParent":2:{s:5:"*id";i:23;s:7:"*name";s:2:"LG";}s:14:"*screenWidth";i:240;s:15:"*screenHeight";i:400;}
I hope that you will learn more about the interaction on devconf
in this report.
Known problems that we can not solve:
- Apple devices (Iphone, Ipad, Ipod) transmit only information about the version of the operating system, but not about the model of the device. In other words, having an http request from a standard browser cannot be said from which Iphone it was made. In terms of transmitted headers, 3gs and 4g will look the same. Yes, we know that this can be solved by js.
- Some opera-mini assemblies cut (replace) all information about the phone.
And finally, some statistics on the headers in our database:
gdi=> select count(distinct name) from request; count
gdi=> select count(*) from request; count
gdi=> select name, count(value) as different_values from request group by name order by different_values desc limit 10; name | different_values