Let's try to consider one of the best ways to collect information on the Internet - parsing - from a legal point of view.
Attention! This publication deals with some general legal issues related to parsing, but is not legal advice. The article is a continuation of the publication "
10 tools that allow parsing information from websites, including competitors' prices + legal assessment for Russia "
Parsing is an automated process of extracting data from someone else's website. But it’s worth finding out if this is really one of the most useful IT tools for data collection or a trap, causing inevitable problems with the law? Parsing could certainly become one of the most perfect ways to extract content across the entire network, but a reservation is attached to it: it is very difficult to deal with this tool from a legal point of view. Parsing is the process by which an automated piece of software retrieves website data, “combing” multiple pages. Search engines like Google and Bing do something similar when they index web pages, and the parsing engines go further and convert the information into a format that allows you to use this data, put it in databases or spreadsheets.
Parsing is not the same as API. For example, a company may open access to an API to allow other systems to interact with its data; however, the quality and quantity of data available through the API, as a rule, is lower than can be obtained using parsing. In addition, the parsing provides more relevant information than through the API, and is much easier to configure from a structural point of view.
The scope of "parsing" information is very numerous. A sports journalist can use parsing to examine baseball statistics for an article. Or, for example, in e-commerce, you can retrieve the names of goods and prices for them from different sources for further analysis (as an example in Russia - an open service of parsing and monitoring prices of competitors
xmldatafeed.com ).

')
But at least parsing and, undoubtedly, a powerful tool, when it comes to legal issues, difficulties may arise. Since in the process of parsing, initially existing content from different sources is assigned to those who use this tool, ethical and legal difficulties arise.
To date, there is no clearly defined legal framework in the parsing environment, this is a state of constant movement, but you can try to approximately outline the areas of greatest risk. The most vivid cases of court proceedings that took place in the United States and became precedent are described below in general terms.
2000-2009: eBay
After the appearance of the parsing legal problems did not arise for a long time. But in 2000, the use of this tool triggered a real battle - eBay spoke out against auction data collection company Bidder's Edge. EBay has accused Bidder's Edge of illegally using data extraction, referring to the Doctrine on the violation of movable property boundaries. The judge supported the plaintiff, saying that the high activity of the robot software could undermine the work of eBay.
Then, in an Intel lawsuit against Hamidi in 2003, the California Supreme Court rejected the rationale that eBay used against Bidder's Edge, deciding that the Doctrine on trespassing of movable property could not be distributed in the computer environment if no real property damage was caused.
All the earliest cases against parsing relied on the Doctrine of Violation of Movable Property and ended with the success of the plaintiffs. But this approach is no longer valid.
2009: Facebook
In 2009, Facebook sued Power.com, a website that integrated various social networks into one centralized resource, when the latter incorporated Facebook into its service. Since Power.com parsed Facebook's content, instead of adhering to giant standards, Facebook sued for copyright infringement. Facebook accused Power.com of copying Facebook's website in the process of extracting user information. Facebook argued that this process is a direct and indirect copyright infringement. The decision of the court was in favor of Facebook, and from that time decisions regarding the legality of parsing began to be made in favor of the authors of the content sites.
Even if the parser ignores counterfeit content in the process of searching for publicly available information, its actions can be characterized as copyright infringement, because technically counterfeit content is still “copied”.
2011-2014: Auernheimer
In 2010, hacker Andrew Auernheimer found a security breach on the AT & T website and extracted email addresses of users who visited the site from their iPads. Taking advantage of the lack of security and parsing, Auernheimer was able to access thousands of email addresses from the AT & T website. Auernheimer was found guilty of unauthorized access to the AT & T server and misappropriation of other people's data.
Using parsing to extract confidential personal information can lead to a charge, even if the information was nominally public. You can try to convince the court that neither passwords nor codes were cracked to gain access to information, however, this is a dangerous territory.
2013: Meltwater
Meltwater is a software company whose product Global Media Monitoring uses parsing to gather news. The Associated Press sued Meltwater for parsing articles, some of which were protected by copyright, and for misappropriating news. Facts can not be protected by copyright, but the court decided that the articles themselves and the author's statement of facts are illegal to copy. In addition, the use of articles by Meltwater did not meet the established standards. Author's content can not always parse!
2014: QVC
In 2014, QVC (a well-known television retailer) and Resultly (app store) were sued for what QVC called “excessive parsing.” The QVC charge was that Resultly disguised its search robots to hide the source IP address, so QVC could not block their unwanted parsers. Due to the fact that the bots were quite aggressive to the QVC servers, there was an overload with outage, which caused damage of $ 2 million. The court acquitted Resultly by deciding that there was no intention to cause damage.
And what about Russia?
Let's start with the simplest and most common question - photographing price tags in stores, although this has no direct relation to site parsing, but the problems are similar (indeed, it seems that there is no difference in photographing price tags in stores, or parsing prices from competitors' sites).
So, the question is:
Is it possible to establish for the buyers a rule prohibiting unauthorized photo and video filming in the store? If you do not go into a detailed interpretation of the law, let's look at the most important article about information:
In accordance with Article 5 of the Law “On INFORMATION, INFORMATION TECHNOLOGIES AND ON PROTECTION OF INFORMATION”:
1. Information may be subject to public, civil and other legal relations. Information may be freely used by any person and transferred from one person to another person unless federal laws establish restrictions on access to information or other requirements regarding the procedure for its provision or distribution.
2. Information, depending on the category of access to it, is divided into publicly available information, as well as information, access to which is restricted by federal laws (information of limited access).
3. Information, depending on the order of its provision or distribution, is divided into:
1) free information;
2) information provided by agreement of persons participating in relevant relationships;
3) information that is subject to provision or distribution in accordance with federal laws;
4) information whose distribution in the Russian Federation is limited or prohibited.
4. The legislation of the Russian Federation may establish types of information depending on its content or owner.
Thus, information about prices in stores is publicly available , because There is no legislation restricting access to such information. In this connection, rewriting and removing prices in the store is not prohibited.Indeed, there are no violations of the law. Moreover, Article 29 of the Constitution of the Russian Federation enshrines the right of every citizen “to freely seek, receive, transmit, produce and disseminate information in any legal way”.
Now on parsing sites. The question that we asked the law firm (“Frese and partners”): “Is the organization entitled to perform automated collection of information that is publicly available on Internet sites (parsing)?”
In accordance with the legislation in force in the Russian Federation, everything that is not prohibited by law is permitted. Parsing sites is legal if there are no violations of statutory prohibitions. Thus, with automated data collection, it is necessary to comply with current legislation. The legislation of the Russian Federation establishes the following restrictions related to the Internet:
- Infringement of Copyright and related rights is not allowed.
- Unauthorized access to legally protected computer information is not permitted.
- It is not allowed to collect information constituting a trade secret in an illegal manner.
- Obviously unfair exercise of civil rights (abuse of right) is not allowed.
- The use of civil rights in order to restrict competition is not allowed.
From the above prohibitions it follows that the organization has the right to carry out automated collection of information (parsing sites) that is publicly available on sites on the Internet if the following conditions are met:
- Information is publicly available and is not protected by copyright and related rights.
- Automated collection is done by legal means.
- Automated collection of information does not lead to disruption of websites on the Internet.
- Automated collection of information does not limit competition.
There are recommendations that should be followed if parsing is used:
- Extractable content must not be protected by copyright.
- The parsing process should not interfere with the operation of the site that is being parsed.
- Parsing should not violate the terms of use of the site
- The parser should not retrieve personal (personal) information of the user
- Content that is parsed must meet fair use standards.
ps The “thinnest” moment is the possibility of claims that “parsing interferes with the operation of our site and we incur losses”. In response to such a claim, you can refer to the fact that the search engines Google and Yandex are engaged in parsing (indexing) the entire site and collect all available information, doing it quite regularly. Accordingly, it sounds logical that a similar parser that comes to the company's website to collect pricing information performs the same technical action. It can be difficult to prove that a similar action interferes with the work of the site, and the work of search engines does not interfere. But in any case, a good parser should follow the rules in robots.txt ...