📜 ⬆️ ⬇️

Google base reached a trillion pages

Google has reached a landmark trillion-URL URL and continues to grow exponentially.

In this trillion, only unique web pages are counted, after removing all duplicates. Although the search robot registered them all, but in fact not all of them are actually indexed for full-text search, because many are too similar to each other, while others contain only service information.

The search engine began work in 1998 with 28 million pages in the index, and by 2000 the base had reached 1 billion. Over the past eight years, the index has grown another thousand times. As reported in the official blog, even the developers of Google could not assume such a rapid growth in the amount of information on the web. Currently, the Internet is growing at several billion pages per day.
')
In order to process such data arrays, Google in recent years has significantly increased the power of its data centers. If ten years ago, one workstation in the server rack was able to calculate the PageRank graph for the entire web (26 million pages) in a couple of hours, and then the search engine worked for a week without reindexing, then today Google updates the index much more often. Links between a trillion web pages are recalculated several times a day.

Source: https://habr.com/ru/post/30204/


All Articles