📜 ⬆️ ⬇️

Browser history sniffing using favicon

The method allows you to determine the websites visited by the user by querying website icons from the page. The idea of ​​this method came to my mind when discussing the possibilities of analyzing user behavior on its website with a friend. We discussed which metrics you need and which you do not need to collect about its visitors. I thought it would be nice to know what other sites are used by its visitors. The old, but known method with CSS-styles immediately came to mind.

That method is based on the use of the getComputedStyle element's DOM method. Being called from HTMLAnchorElement, it allows you to distinguish between: visited and the usual state of links to popular sites.

The bug has long been closed and it can no longer be used.
')
My method is based on the fact that the favicon.ico of the sites visited by the user will most likely be in his cache and, accordingly, load faster than those sites where he has never been. Browsers very aggressively cache favicon.ico, which only increases the reliability of this method.

Below is the full source code of the proof-of-concept implementation of this method. It can be used to demonstrate that you are visiting habrahabr.ru, but have never been to hornet.com.

var diffTreshold = 200; //  ,   ,  ,    . var body = document.querySelector('body'); var testResults = []; var testCases = [ 'hornet.com', 'habrahabr.ru' ]; testCases.forEach(test); function test (host) { var start = new Date(); var img = new Image(); img.src = 'http://'+host+'/favicon.ico'; img.onload = function () { saveResult(host, start, new Date()); } body.appendChild(img); } function saveResult (host, start, end) { var diff = end - start; testResults.push({ host: host, start: start, end: end, diff: diff, visited: diff <= diffTreshold }); } 


This code gives not very accurate results, because it uses the diffTreshold constant, which is chosen empirically. This variable is the number of milliseconds that have elapsed from the beginning of the download of the image to the end of it, which should be considered a hit in the browser cache.

A more accurate method should be based on calculating the average value between the minimum and maximum image loading time, while one of the links should lead to the site icon that is non-existent in the cache. Then, anything less than this average can be considered a cache hit and, therefore, indicate that the user was on the specified site.

This method also has one drawback that I have not solved: after the first such test, its subsequent launches will make the results useless.

It may seem that the very need for such a sniffing of history is dubious, but for owners of online stores or landing pages, it can be useful to show the user relevant advertising, based on knowledge of which competitor or partner sites he has already visited.

UPD:
I was surprised that this post became worthy of an invite. Apparently, this is really a very interesting topic. I will try to implement this method in the form of a finished library.

Source: https://habr.com/ru/post/259073/


All Articles