Yesterday, Google sent a
letter to the publishing houses, announcing the termination of the
News Archive project on scanning and indexing microfilms and other newspaper archives. Instead, Google will focus its efforts on “newer projects that help the newspaper industry, including the development of the
Google One Pass platform, which allows publishers to sell content directly from their sites.”
The five-year News Archive project was an ambitious attempt to make the same archive for old newspapers as there is for books on Google Books, a worldwide library. It seems the idea was good: the scanning of films was carried out at the expense of Google, and the profits from the display of advertisements on the pages of the service were shared by the partners.
The service is an addition to the usual search on the news, where the available archive is limited to only 30 days. There are no restrictions in the archive. There are far fewer sources of information, but the earliest reports date back to the mid-18th century. Search results for each request are sorted by year and by source of information. You can also see the
frequency of mentioning various words broken down by years and decades. Most texts are available for free.
In five years, Google has already scanned 60 million newspaper pages covering 250 years of history. In a letter to partners, Google announced that it will continue to provide access to the already created archive, but the site will not be updated and new functionality will not be developed for it.
')
Moreover, newspapers can now take all scanned content free of charge and place it on their site (previously this service was paid).
It is difficult to say for what reasons Google decided to refuse to continue the project. There may have been some problems / risks regarding copyright issues. Or Google simply considered the relatively low value of the new information, they say, the archive is already quite complete, and messing around with small regional newspapers is only at a loss.
Another likely version is the technical complexity and high cost of processing. Google published archives in the form of graphic images with the possibility of full-text search. Recognizing text in newspapers is much more difficult than in books because of specific formatting and hyphenation (articles are often transferred from one page to another, and to a random page and to a random place). There is no way to do without attentive human oversight.
It is also possible that the attendance of the News Archive project was low and advertising did not even come close to recouping the cost of digitizing newspapers.
It is also unclear whether Google will add to the index those films that have been scanned in recent months. The fact is that the publishers reported that the films were quickly scanned and returned, but they were added to the index for a very long time, so now a large amount of scanned, but not processed information has accumulated in the “stack”.