📜 ⬆️ ⬇️

Study: over 400 major popular sites record user sessions



On most popular and visited sites there are third-party analytical scripts that record which pages the user visits and which queries he enters into the search box. But progress does not stand still, and some companies began to use scripts on their websites that record keystrokes, mouse movements and even the direction of scrolling along with the entire contents of the pages, and then send this data to third-party servers.

Unlike conventional analytic services, which provide general statistics, these scripts record and reproduce individual viewing sessions as if the behavior of the user is being watched through his shoulder. On some sites with high load, software is executed that records the time of pressing and each word entered. Such scripts are called re-session scripts.
')
The stated purpose of collecting such data is to find an answer to the question of how users interact with web pages, as well as finding crooked working pages. However, the amount of data collected by such scripts is much higher than what one would expect from a user agreement. For example, if you go to the site and start filling out a form, and then reject it, the entered information will still be recorded. Accidentally pasted clipboard contents will also be recorded.

Companies resort to session repetition scripts to understand how customers use sites. Scripts do not run on each page, but are often placed on those where users enter confidential information. For example, in 2013, Facebook users noticed that a social network does something similar with status update — it records what message they typed, even if they did not publish the record.

Scientists from the University of Pennsylvania, Steven Englehardt, Gunes Akar and Arvind Narayanan, tried to assess the scale and legitimacy of this. They conducted a study of the most popular services that provide companies with the opportunity to study user behavior using re-session scripts, including FullStory, SessionCam, SmartLook, UserReplay, Hotjar, and Yandex solutions. The results showed that at least one of these scripts is used by 482 of the 50,000 most popular sites rated by Alexa in the world.

Bonobos, the largest US pharmacy chain Walgreens and financial investment company Fidelity, are among the large companies that use scripts. The researchers note that in 482 cases, not all sites could get there. Some of these services offer a record of the actions of not every visitor, but only a small statistically significant part. Perhaps, when the researchers performed an automatic scan, they were not lucky, and they were not included in the sample.

Companies that sell repeat session scripts offer a number of anonymization tools that allow websites to exclude sensitive content from the records, and some even explicitly prohibit the collection of user data. However, the use of re-run scripts on many of the world's most popular websites has serious implications for privacy.

Passwords are sometimes accidentally included in the records, despite the fact that scripts should not collect them. Especially often this happens on mobile sites, where regular input fields are sometimes used for the visibility of characters for passwords. The researchers found that other personal information was also often not impersonal or incompletely impersonal. Security measures vary by analytics service provider. Two companies - UserReplay and SessionCam - replace all user input with text of equivalent length, FullStory, Hotjar and Smartlook, by default, replace input data only for fields of a certain type.

However, the point is not only in terms of keyboard input. When you enter the site, what is displayed on the screen may also be personal information. The researchers found that none of the companies provide automatic depersonalization of displayed content by default; everything on the user's screen is leaking.

For example, the researchers tested the Walgreens pharmacy chain site, where the script of the FullStory company was executed. Although Walgreens.com does indeed deface user input, information — symptoms and prescriptions — is still collected using the re-session script along with real user names.


Request a prescription for Zoloft antidepressants on Walgreens. During the creation of the request, the name of the designated drug falls into the FullStory script. The username, the name of his doctor and the dosage are replaced here (highlighted in red). However, the full username has already been leaked in another dialog (not displayed on this picture), which allows anyone who has access to the recording to associate this recipe with the user's identity.


In a special section on the Walgreens site, users can record their health history, which may include other recipes. During this process, most of the user's personal and medical information is excluded from the FullStory script using manual depersonalization. But at the same time, the selected medicines and health conditions are preserved, as shown above.


During the registration process, Walgreens requires the user to verify their identity by asking a standard set of questions. Choices for the answers to these questions, which can display the user's personal information, are displayed on the page and transmitted in FullStory. In addition, the FullStory mouse tracking function will most likely show the user's choice, even though the user's choice will be impersonal. Including this data in the records directly contradicts the statement at the top of the page: "Walgreens does not save this data and cannot access it or view your answers."

Finally, the authors of the study are concerned that analytics companies are a worthy target for cyber attacks. Site owners can view user actions in dashboards. But the Yandex, Hotjar, and Smartlook panels give away user actions via HTTP, including actions on HTTPS pages. Worse, Yandex and Hotjar deliver the content of the pages on the analyzed site via HTTP, including pages with HTTPS content. This leaves room for man-in-the-middle attacks.

In an e-mail comment to the journalist of the Motherboard publication, a representative of Yandex noted that the company was trying to use HTTPS wherever it could, and said that in future updates, HTTP delivery would be excluded. “HTTP is used deliberately, because session recordings load sites by iframe. Unfortunately, downloading HTTP content on HTTPS sites is prohibited at the browser level, so HTTP player has to support HTTP sites for this feature, ”the statement says.

Are there any means of protection from such surveillance? The researchers found that EasyList and EasyPrivacy block list subscriptions do not block FullStory, Smarlook, or UserReplay scripts, but contain filtering rules that prevent Yandex, Hotjar, ClickTale, and SessionCam from collecting data. UserReplay allows companies to disable the collection of user data in the browsers of which there is a Do Not Track HTTP header.

Update as of November 22, 2017, 13:35

Addition from the representative of Yandex: “In reality, we only use HTTPS to deliver data from users to Metrics. Here we are talking about replaying visits in the interface of the "Metrics" itself. In this playback scenario, we are forced to use HTTP due to browser restrictions on loading HTTP content from HTTPS sites. This also worries us a lot, and this was one of the drivers for creating a new version that uses a completely different approach and which we plan to do HTTPS-only. ”

Source: https://habr.com/ru/post/357444/


All Articles