📜 ⬆️ ⬇️

Persistent non-blocking cookies using HTTP headers

Last week, news about a study thundered claiming that analyst firm KissMetrics was tracking users on sites using the unique ETag header value (specs). KissMetrics denied the use of ETag and eventually sued the authors of the study (see upd. At the end of the article).

Using ETag (abbreviation from 'element tag', “element label”) to track users has been known and used in partner networks since the beginning of the last decade. It is also known that the Last-Modified ( spec ) header can theoretically be used to track users using a unique update time value.

True, it seems to me that few people know that the Last-Modified header can take any string as a value, that is, the value does not have to be the correct date.

It is best to show it with an example. This example uses my page , which sets different values ​​for the last modified date. If you refresh the page after setting the date value, you will notice that the browser sends this value back.
')
First request

The first server request looks like this. Nothing unusual happens.

 HTTP / 1.1 GET / tracking cookie
 Accept: text / html, application / xhtml + xml, application / xml; q = 0.9, * / *; q = 0.8
 Accept-Charset: ISO-8859-1, utf-8; q = 0.7, *; q = 0.3
 Accept-Encoding: gzip, deflate, sdch
 Accept-Language: en-US, en; q = 0.8
 Cache-Control: max-age = 0
 Host: nikcub.appspot.com
 User-Agent: Mozilla / 5.0


Server response: token settings

The server responds and sets a unique identifier (in this case, a UUID) as the value for Last-Modified:

 HTTP / 1.0 200 OK
 Server: Dev / 1.0
 Date: Sat, 19 August 2011 7:48:25 GMT
 Content-type: text / html;  charset = utf8
     Last-Modified: d5ee23de-ca05-11e0-ab0b-c336b05508a0
 Cache-Control: no-cache
 Content-Length: 1634


Note that usually, if this data caching method is used, the value will be a standard time string:

 Last-Modified: Sat, Oct 29 1994 19:43:31 GMT

Subsequent calls

The browser will now send this token at each call to the same URI, using the If-Modified-Since (header) header. The browser asks: “If the date of the change of this resource is later than this date, send it to me,” but it sends out a unique identifier, not a date.

 HTTP / 1.1 GET / tracking cookie
 Host: nikcub.appspot.com
 Connection: keep-alive
 Cache-Control: max-age = 0
 User-Agent: Mozilla / 5.0
     If-Modified-Since: d5ee23de-ca05-11e0-ab0b-c336b05508a0
 Accept: text / html, application / xhtml + xml, application / xml; q = 0.9, * / *; q = 0.8
 Accept-Encoding: gzip, deflate, sdch
 Accept-Language: en-US, en; q = 0.8
 Accept-Charset: ISO-8859-1, utf-8; q = 0.7, *; q = 0.3


It works even if you close the browser and open it again, and it works in all major browsers. The ETag-based method does not always work, especially if web proxies are in the way, but the Last-Modified method is always working.

Solutions

The problem with these methods is that they bypass the user and software security settings associated with cookies. You can block any cookies, but ETag, Last-Modifed and other methods will still allow you to track your browser.

The Last-Modified specification states that the value should be a date, but with a note that possible problems may occur with unsynchronized clocks. Most library implementations simply send this value back without checking, in particular because date parsing is a total headache. Browsers do the same, which leads to the presence of the described problem. This means that Last-Modified works like a cookie, but without any security checks.

I will send the bug report to all open source browsers with the request to correctly parse the dates. This is not a 100% solution, as you can continue to track users using unique dates. But, perhaps, a solution will be found in the form of bringing the date to the next hour or other basic checks on the validity of the date. There is no other solution besides clearing and disabling the cache, but conditional GET requests still occur during a browser session in some browsers.

You can see for yourself the problem on my page .

Addition : Parley , the plugin for confidential data, on which I work, will solve this problem, as it blocks requests from any third-party sites. I am thinking of adding date parsing to it. There is a lot of work to be done when I take up this project again (plugins can do little, and sometimes it is tempting to create a fork of WebKit and make your own secure browser).

Upd. KissMetrics did not sue the authors of the study, but filed a counterclaim against the law firm and the clients of this company, who filed a lawsuit against KissMetrics. The author of the study, Ashkan Soltani , has nothing to do with these suits.

Source: https://habr.com/ru/post/126643/


All Articles