Using an innovative way to distinguish between people and bots on the Internet presents a number of serious problems.Surprisingly, a lot of energy is spent so that websites can make sure that the user is not a robot. For this reason, when entering sites, you can often see questions from the CAPTCHA system: blurred photos of pedestrian crossings, traffic lights and shop windows that are proposed to be identified with a few mouse clicks.
')
Tasks come in various forms: from vague letters that need to be recognized and entered into the field, to branded slogans
like “Comfort Plus” on the Delta website - as if the deplorable state of modern air travel is not yet enough dystopian. The most common, however, is Google's reCAPTCHA service, the
third version of which was released at the end of 2018 . Its task is to significantly reduce the number of actions required from the user when entering the site, which is achieved by assigning non-displayed ratings to users depending on how “human” their behavior is. In the end, the initial task of the CAPTCHA was to weed out bot accounts that flooded websites with not very honest goals.
But the innovative system developed by Google’s specialists has a downside: the new version tracks every user movement on a website, which is why it determines if it really is human.
Transferred to Alconost
Source: Alexey Bezrodny / iStock / Getty Images PlusNecessary improvement?
Before we get to how this new product works, it’s helpful to find out where it came from. The new reCAPTCHA has come to replace the relatively old web technology, which was used not only to protect sites.
The CAPTCHA tool - which means “a fully automated public Turing test for distinguishing between computers and people” - first appeared in the late 90s: it was
developed by the team of one of the first search engines - AltaVista . Before that, it was quite easy to write a bot, which was automatically registered on the service and sent spam comments to thousands. The AltaVista solution was based on the recommendations in the printer manual for preventing poor optical character recognition (OCR); the characteristic blurred text of the CAPTCHA system was specially brought into such a form that it was difficult to read by a computer, but it was easy for a person, and this made it possible to eliminate bots.
By the beginning of the two thousandth, these tests were everywhere. Then reCAPTCHA appeared, developed by researchers from Carnegie Mellon and bought by Google in 2009, which used the same idea, but in a new way: by entering verification text, users must identify certain words that programs cannot recognize. That is, the program scans the text and marks the words that it can not recognize. In reCAPTCHA tests, these words are then placed next to known words — in this way, the user is checked for a known word, and then he helps identify a new one.
By 2011,
Google had digitized the entire archive of the
New York Times using only the reCAPTCHA tests. Users recognized text from newspaper scans one blurred word at a time, which ultimately made it possible to digitize a newspaper catalog and organize a search on it. By creating a handy tool to protect websites from bots, Google was able to attract people to do its own tedious work.
You cannot refuse to use reCAPTCHA: you must either agree to track or stop using the site you need.Having achieved such results, in 2014, reCAPTCHA switched to displaying images from Google's Street View application. After clicking the “I am not a robot” button, you may be asked to determine which of the nine images contains “bicycles” or “street lights”. At the same time, Google lowered the frequency with which users were asked to undergo verification - this was achieved
thanks to behavioral analysis : now reCAPTCHA can work in the background and track how we use websites.
If you have a Google cookie on your computer or you use a mouse and keyboard on the page in a way that doesn’t look like a bot, you won’t be asked to take the Street View test. But some users who cared about confidentiality complained that after deleting cookies and while browsing in incognito mode
, the number of reCAPTCHA tests that are offered to pass
dramatically increases .
Users also noted that when working in browsers competing with Google Chrome, for example, in Firefox, more tests were required to perform, and this naturally raises the question: is reCAPTCHA not used to strengthen Google's browser dominance?
This raises serious privacy concerns, especially since Google’s main revenue comes from the advertising business, which relies on data tracking. What can worry is that reCAPTCHA is essentially an advertising tracking tool hiding on regular websites, like the Facebook Like button on the web pages.
Google's point of view
To use the latest version of reCAPTCHA,
developers should include tracking tags on as many pages of the website as possible - this gives you the opportunity to get a better picture of user actions. However, this tool does not exist in a vacuum: for example, there is also Google Analytics - a platform that helps developers and marketers understand how visitors use a website. This is an excellent tool used on more than
100,000 of the 1 million most visited websites built with , but is also part of a strategy for tracking users' habits on the Internet.
The new version of reCAPTCHA fills in the missing parts of this image and allows Google to penetrate even further - on sites that do not use Google Analytics. In response to the relevant claims, the
company informed Fast Company that it would not collect user data from reCAPTCHA for advertising purposes, and that the data collected is used to improve the service.
But this data remains imprisoned in a black box, even for developers who are implementing this technology in themselves.
The reCAPTCHA
documentation does not mention user data, nor does it explain how users can be tracked, and where the resulting information ends up — it just describes the practical implementation.
I asked Google to tell in more detail what the company undertakes regarding the reCAPTCHA's independence from the advertising business in the long term: just because they are not connected now does not mean that they will not be connected in the future.
"Google will not use reCAPTCHA for personalized advertising."
A Google spokesman said that “reCAPTCHA can only be used to combat spam and misuse [of websites]” and that “the reCAPTCHA service API works by collecting hardware and software information, such as device and application data, and sending these data in google for analysis. Information collected in connection with your use of the service will be used to improve reCAPTCHA and for general security purposes. Google will not use reCAPTCHA for personalized ads. ”
This is great, and I hope Google keeps its promise. The problem is that there is no reason to believe that everything will be like that. The introduction of such a powerful tracking technology is a step that should be the subject of close attention of the public, since we have already seen how easy things can go wrong. For example, in 2014, it was promised that WhatsApp will remain independent, will work separately from the Facebook back-end infrastructure, but
after only two years, this decision was revised . When Google acquired Nest, they also promised us independence, which they
only five years later refused : the owners of the devices had to switch to a Google account or refuse some functions.
Google can create a service such as reCAPTCHA, thanks to its vast resources and wide coverage, but this is also a reason to suspect that everything will turn out for the worse.
Unfortunately, we as users can do little. You cannot refuse to use reCAPTCHA: you must either agree to track or stop using the site you need. If you don’t like body scanners at airports, you can at least give them up and go through a routine search. But if the site has reCAPTCHA, you can not refuse to use it.
If Google intends to create such tools, taking into account public interest, and not its own profit, then companies need to find more convincing ways to assure others that they will not change their word when it is convenient. If they wanted to open the source code of the project (as was the case with
many other products ), take it outside the company or at least establish third-party supervision, perhaps this would be a good start that will help win the trust of users.
About the translatorThe article is translated in Alconost.
Alconost is engaged in the
localization of games ,
applications and websites in 70 languages. Language translators, linguistic testing, cloud platform with API, continuous localization, 24/7 project managers, any string resource formats.
We also make
advertising and training videos - for sites selling, image, advertising, training, teasers, expliners, trailers for Google Play and the App Store.
→
Read more