📜 ⬆️ ⬇️

When an online archive forgets



There are certain organizations on the Internet that we are accustomed to rely on every day in the hope of preventing the truth from becoming an elastic or indefinite substance. Not necessarily the way that such stupid projects as Verrit aspire to , but at least in a way that confirms that you don’t go crazy, that the old post or article that you’re supposedly reading does exist. It can be such a superficial action, like reading a remote tweet via the Google cache, or such a deep immersion in the content, such as studying the archive of the now deceased site via the Wayback Machine. But what happens if the archive becomes less reliable and, for supposedly convincing reasons, decides to bend and remove the controversial material that has entered the archive?

A few weeks ago, when writing a podcast, we started talking about an old blog sponsored by The Ultimate Warrior [The Last Warrior] - a bodybuilder who became a chiropractor, who became a professional wrestler, who became a political speaker, prone to pompous speeches, speaking under his real name - yes, Warrior. As Barry Peteski described from Deadspin's blog after Warrior died in 2014, he was a “crazy goat”, ranting in blogs and student campuses about people with disabilities, homosexuals, New Orleans and many others. However, when I decided to search for a specific blog entry, I saw that they were not just deleted - the site was not even in the Internet Archive archive, instead there was an error message: "This URL was excluded from the Wayback Machine".

It turned out that the Warrior website was removed from the archive for several months - it happened shortly after Rob Russo walked through it in an article for Vice Sports, accusing WWE of hypocrisy in using the Warrior photo to advertise Breast Cancer Month . The campaign called for women to “release their inner warrior,” but since in his blogs, the Warrior wanted people who survived cancer to die, the situation looked bad. Rousseau was surprised at how the archive deleted this site “almost immediately after the release of my article, literally within a week,” as he told the Gizmodo site.
')
Rousseau suspected that WWE was behind this, but a company representative told Gizmodo magazine that they had nothing to do with it. Steve Wilton, managing director of Ultimate Creations, also denied any involvement in this. An Internet Archive spokesperson told Gizmodo that the archive was deleted on a request made within the framework of the DMCA by Wilton’s business manager dated October 29, 2017, two days after publishing the article in Vice.

Over the past few years, the perception of the Wayback Machine service has changed under the influence of political moods. For a long time, this site remained a useful tool for searching the contents of broken links, and now it is considered to be the arbiter of truth and a stronghold of opposition to erasing history.

The fact that the sites in the archive demonstrate the digital footprint and the origin of the content is not only useful for journalists, but also effective for almost anyone trying to track down disappearing web pages. Considering this, the fact that the Internet Archive practically does not try to deal with requests for the removal of content becomes a problem. And this is not the only example: when the site administrator decides to block the Wayback robot using settings in the robots.txt file, the archive does not just stop crawling the site, but also removes its entire history from public access.

In other words, if you publish controversial content and want to avoid liability, there are at least two standard ways to remove it from the most reliable third-party web archive on the public Internet.

For the Internet Archive, a quick response to the requirements for deleting content, appealing to seemingly conscientiously used copies of websites, as well as the practice of processing robots.txt, slightly reduces the risks, but they are contrary to the spirit of its activities. And if someone decided to sue the service because of non-compliance with the requirements, even ready-made legal protection methods available to the archive could have been incredibly expensive. It does not matter that the use of materials does not violate anything by any standard. If the copyright holder makes a similar attempt, you still have to defend in court.

“In this context, no one has yet tried to challenge fair use,” said Annamaria Bryde, a law professor from the University of Idaho and a freelancer at the Center for Internet and Society at Stanford Law School. “The Internet Archive is a non-profit organization, so it carries great risks associated with possible lawsuits. Considering the scope of their work, the fact that they archive almost everything that is in the general access to the Internet, their risks are phenomenal. One can understand why they are behaving cautiously, even if it goes against their main mission - to create an accurate historical archive of everything that was on the Internet, and prevent people from erasing evidence from their history. ”

The Internet Archive did not respond to specific questions related to the processing of robots.txt, its willingness to fulfill removal requests, and whether it was possible for it to use in court an argument about the fair use of materials. However, the service representative sent the following message:

A few months after the launch of Wayback Machine in 2001, we participated in a group of third-party archivists, librarians and lawyers, drafted a set of recommendations for responding to requests for deleting content, which the Internet Archive as a result accepted as a set of instructions on behavior, and followed them first ten years of existence.

This year we held a meeting with a group of similar composition to review these recommendations and explore the possible value of their updated version. We are still discussing some of the problems and hope that we will be able to provide updated information on our website very soon in order to help society better understand how we feel about removal requests. Some of our thoughts on robots.txt are outlined in a separate article .

In fact, we are trying to find a balance between the concerns of site owners and rights holders, and the interest of the public, deserving free access to the fullest possible history of the Internet.

Given all this, remember that the Internet Archive has always positioned itself as a library - should it not matter?

“In the current copyright law, although there are special cases that give certain rights to libraries, there is no definition of a library,” explained Brandon Butler, director of information policy for the Library at the University of Virginia. “On this occasion, right holders have always been outraged, as well as about organizations such as the Internet Archive, which are not 200-year-old public or university libraries. They often claim that they are afraid of the appearance of fake libraries that will call themselves libraries, and in fact serve as a refuge for pirates. ” The only exception that Butler was able to recall was the case of the American Buddha non-profit online library of Buddhist texts, which discovered that Penguin had sued it because of several books for which it received rights. “The court didn’t care that this place called itself a library; it did not protect them from charges of violating rights. ” Butler notes that although the status of the library would not protect the Internet Archive as it would be possible, the “right to make copies for storage,” as Butler calls it, speaks in their favor.

“Usually, libraries are not sued, bad advertising comes out of it,” says Butler. Therefore, there is no mountain of modern legal precedents related to libraries in the digital era, with the exception of a few cases related to the affairs of Google Books.

As Bridy notes, in the United States, copyright is “commercial law.” It is not a matter of damage to reputation, but of protecting the value of the work, and, more specifically, the ability to constantly earn money from it. “We justify this by wanting to encourage artists and other creative people to publish and sell their work,” she said. “Use of copyright for attempts to control privacy or reputation ... It can, of course, be used as it is, but it can be stated that this is an unlawful use of copyright that goes beyond its area of ​​responsibility.”

We take a lot of things for granted, especially because we rely more and more on technology. “The Internet forever” - this refrain can often be found in the media, and the wisdom hidden in this statement about the need to behave cautiously is probably justified, but this should not be taken literally. People remove posts. Websites and entire platforms disappear due to business and other reasons. The rich, famous and those in power feel free to intimidate small non-profit organizations. It is good to have protection just in case, but the constancy of the Internet has boundaries - and where there are boundaries, there are ways to bypass it.

Source: https://habr.com/ru/post/433806/


All Articles