📜 ⬆️ ⬇️

Bigtable: a distributed database created by Google

As reported in the Bigtable published description ( PDF ), the distributed system is designed to store and manage a huge array of structured data. The main requirement for a distributed database is its scalability. The system contains hundreds of terabytes of information on thousands of interchangeable Google servers.

Bigtable distributed database is used in many proprietary services, including Google Analytics, Google Finance, Orkut, Personalized Search, Writely, Google Earth and, of course, the main web indexing system. Each of these applications has its own database requirements. The amount of stored information varies considerably. For example, satellite photos of Google Earth occupy about the same place as the search index of the entire Internet.

In the description of Bigtable is the amount of information that is stored in a distributed database and the level of compression. All information is as of August 2006.

The search database of web documents consists of two parts: 800 and 50 terabytes with a compression level of 11% and 33%, respectively. The Google Analytics database is also stored in two tables of 200 TB (14%) and 20 TB (29%).
')
Google Earth occupies 70.5 TB, of which 70 TB are the original images and 500 GB index.

Personal search takes up very little space compared to the most resource-intensive applications: only 4 TB (compression level 47%). Each user in the system is assigned a unique identifier, and all his actions on the search site are recorded in the database.

Google Base uses 2 TB, and Orkut’s social network has a total of 9 TB of database space.

If you count how much real disk space is occupied by all Google services, taking into account compression, you get about 220 TB .

Unfortunately, in the published document there is no mention of the Gmail mail system, and indeed millions of mailboxes of several gigabytes each require considerable resources.

However, even with account of Gmail accounts, one can call all Google disk arrays as very small. For example, oil companies or other corporations that deal with geographic information systems can store on their servers even large amounts of data than Google. Their account can go not to hundreds of terabytes, but to petabytes. In this sense, Google’s slogan on “organizing all the information in the world” looks a little ridiculous.

Source: https://habr.com/ru/post/4395/


All Articles