📜 ⬆️ ⬇️

Sorting petabyte data took 6 hours 2 minutes.

image

Google conducted an experiment on sorting 1 PB of data using the MapReduce framework. Data was presented in the form of 10 trillion records, each 100 bytes long. 4000 computers were used for sorting. This unprecedented amount of data for this type of data was sorted out in 6 hours and 2 minutes.

During the experiment, Google employees had to solve the problem with the placement of 1 PB of data. The fact is that with each new start of sorting, at least one of the 48,000 used hard drives failed. As a result, it was decided to give the Google File System a command to store three copies of each file on different hard drives.
')
Sorting less data into 1 TB on 1000 computers took 68 seconds. Thereby, Google broke the previous record for sorting a similar amount of data, amounting to 209 seconds on 910 computers.

For comparison, the total amount of photos stored on Facebook is 1 PB, the Large Hadron Collider will produce 15 PB of data per year, and Google processes about 20 PB of data per day.

Source: https://habr.com/ru/post/45340/


All Articles