As you know, the Google Books project is one of the most ambitious projects of our time. Creating a single database of books in electronic form is a serious task, which is complicated by the need to negotiate with authors, publishers and other copyright holders. This project is interesting in many ways - social, technological and logistic. Its influence on modern society also has a place to be, although at the moment this influence is not so strong. But it's not about that. The fact is that the creators of the project tried to count every book in the world (meaning not the total number of books, but the total number of titles of books). It is clear that with such a calculation errors are inevitable, but you can still hope for Google. So, the resulting number is huge - 129864880 items are released.
Unfortunately, the methods of counting books used by specialists are not particularly advertised. It is only known that various catalogs were used, requests were sent to university libraries, public libraries, private collections, museums and other organizations. Creating a robust algorithm for separating the “chaff from the chaff” is a difficult task, but it seems that Google has done this. Of course, it was necessary to think over algorithms for sorting, classifying and analyzing the number of books - this is a complex, complex system of algorithms, which I would like to know more about.
Generally speaking, the calculation was made not out of idle curiosity, but in order to assess the real extent of the work done within the project, plus assess the efforts that will have to be made to continue and (if at all possible) complete the project.
When calculating the number of books, a corporation most often used various ISBN catalogs as an information source, which have been around since the early 60s of the last century. It is interesting that during the analysis errors were found in the names of the catalog - about one and a half thousand books received the same identifier, about which Google employees already notified the library, in whose catalogs the error crept in.
Interestingly, at first, Google, when calculating, turned out a figure close to a billion. However, after deleting all copies and duplicates, the number of books was reduced to 600 million. After an even more thorough analysis, the final figure reached the value of 129864880. It would be interesting to know how much information is contained in a similar mass of books, in quantitative terms. In general, the most interesting study of the Google development team, which successfully ended. Who is there a book lover - you can already start collecting the full collection in print :-)
More information about the project can be found in the original source