We are the developers of the script of the online store
Shop-Script , which, willy-nilly, became implicated in yesterday's story that Yandex
indexed the private data of customers of many online stores. The article dealt with stores operating on the basis of the Shop-Script engine. I understand that I can get a lot of angry criticism from those who believe that the fault is completely on the engine developer, however, I consider it necessary to reflect our position and I will try to objectively describe what happened, discuss possible solutions.
Yesterday, of course, it was unexpected to find out about the current situation. The first thing we did was to check if all stores were subject to this problem. It turned out that not all. There was a suggestion that the problem could be in stores that installed any plugin or third-party server module. But it turned out that this was not the case. All stores that have pages with private customer data appeared in Yandex search results were united by one thing - the
installed Yandex.Metrica code . Exactly, as in the
recent case with MegaFon.
Next is about where the links to private pages came from, and how they could get into the Yandex index, the magnitude of the problem and possible solutions.
')
The problem was the following:
- In the online store based on Shop-Script you can place an order without mandatory registration. That is, without entering a username and password.
- After an order is placed, a notification about the order is sent to the buyer via e-mail, in which there is a direct link to the page with detailed information about the order, its status, the ability to pay for the order and see its processing history. Since the order is issued by an unregistered user, this page is opened by the link from the notification letter (in the link, authentication occurs by a hash, in which all parameters are passed, of course, in a GET request). No password is requested from the user, as there is no password in this context. Requiring registration for ordering in the online store, you see, is completely inconvenient for the buyer, and you need to show him the page with the history of the order.
- Yandex indexed such pages. More precisely, the pages that were visited by buyers (clicked on the links from the letters of notification).
- Google and other search engines indexed the same pages after the information about it appeared in the news feeds on Habré and became “public domain”.
Operational decision, we did the following: screwed user authorization by last name. If the user followed the link from the order-notification letter, we first ask him for the last name, and only if he entered it correctly, we show information about the order. Of course, you can say that this is also not a very beautiful solution, however, in this case it is the only parameter on which you could “rely”. Plus, we added a redirect that “eats” all the “problematic” GET parameters at once, without giving them Yandex.Metrica (more on this later in the post). The patch and the updated version are
published in a blog on our site.
Now the fun part. From where Yandex could learn about the addresses (URL), which were sent only to the user personally by e-mail? Since all the affected online stores have one common feature - the installed Yandex.Metrica code - it is easy to conclude that the addresses that Yandex.Metrica fixes went to the general Yandex index. These addresses have never been publicly documented on the side of the online store Shop-Script, so it would be wrong to say that the addresses are in public sources.
Yandex's answer with five points about how Yandex can learn about the new page, we read, but did not find the answer to this question. Now, of course, it is obvious how the addresses were “merged”, and, of course, we will take this into account in the future. (By the way, many thanks to the author of the
first comment in the discussion “Why is everything” on the Yandex website for its constructiveness and good examples.)
The scale of the current situation lies in the fact that the practice of passing authentication parameters as GET parameters in a letter or in any case is used very widely. And this means that cases like the case of MegaFon and ours can and will arise in the future with other sites, engines and services. This is obvious and is only a matter of time. The case of "Megaphone" paid good attention to this problem.
Take, for example, at least common services of online tracing of parcels by shipment number. The sender of the parcel is not registered on the site, and simply receives the address on the trace page, for example, by e-mail. You can add the address to your favorites in order to check the status regularly (I, for example, do this all the time), you can send to the recipient. At the same time, it cannot be said that a similar system of online tracing of shipments is a priori incorrectly constructed, because it does not require the user to enter authorization data - in this case, there is no username and password. Do not force the sender to receive an account on the Post of Russia.
Examples of notifications when the user receives an e-mail link that leads directly to the account (with automatic authorization) can be found on the Internet in abundance. As far as I know, the Yandex My Circle service uses such a system to authorize users.
Now let's talk about the solution.
As in the case of Megaphone, the main omission of the developers was called the incorrectly formed (or missing) robots.txt file. I agree that the presence of correct instructions in the file in this case would help to prevent the problem, however, in a general sense, this is an incomplete solution, since robots.txt is just instructions of a recommendatory, not a prescriptive nature. It’s good that robots.txt take the search bots into account today, but how this will be handled in the future is an open question. Suddenly something in the bot will break. Or the bot will not search ...
In this regard, the issue can be completely (in the most general case) solved, of course, only by introducing a mandatory user authorization to view all private pages. In cases where this is unacceptable, and the user needs to immediately show some private information immediately after clicking on a link with explicitly specified GET parameters (as in the example of sending tracing), it is technically reasonable to immediately PHP link to the page after clicking on the link without these parameters, memorizing them in session. This will allow not to bring the parameters to the JS-counter installed on the page (Yandex.Metric, Google Analytics, whatever it is) - and I would recommend that developers pay attention to this, make such redirects in their projects. However, this is also a partial solution, because suddenly the address will be noticed before the redirect by the installed bar or browser plugin. Well, or introduce additional authorization by indirect parameters (as was done with checking the name) ...
I do not make any accusations, however, I believe that Yandex should review the method of adding new addresses to the general search index and abolish the practice of turning any address visited by a user into a public one, no matter how tricky this address was recorded. This practice is at least
unethical . And do it better as soon as possible, until the problem has become too massive.
Learn from the mistakes of others, and try to do it on time. Thanks for attention.
UPD: It seems the problem is only gaining momentum. Now, here, train tickets:
http://news.yandex.ru/yandsearch?cl4url=www.ria.ru%2Fsociety%2F20110725%2F407118103.html