📜 ⬆️ ⬇️

PetaBox or where the archive.org Internet archive lives

Not so long ago, on October 25, 2012, the Internet Archive (archive.org) announced that the amount of sites archived from the Internet exceeded 10 petabytes (10,240 terabytes). But how and where is it all stored?

You can learn some details, as well as see the repository itself, thanks to our short review. Since Habrastoredzh temporarily does not work, we were forced to upload images to ua-hosting.com.ua server. I hope that we will withstand, if not - do not kick much, later we will download the images as it should :)

image
')
PetaBox was developed for storing such a large amount of data specifically for the Internet archive. PetaBox is a storage solution from Capricorn Technologies, which was developed by employees of the Internet archive and CR Saikley for storing and processing 1 petabyte of information.

image

Specification:

- Capacity: 650 terabytes / rack;
- Power consumption: 6 kW / petabyte;
- No air conditioning, instead, the excess heat is used for space heating.

image

Used infrastructure as of December 2010:

- 4 data centers, 1300 nodes, 11 000 hard drives;
- “Time Machine”: 2.4 petabyte;
- Books / videos / music in collection: 1.7 petabytes;
- Total stored: 5.8 petabytes.

image

History of creation


PetaBox (tm) is specially developed by employees of the Internet archive for safe storage and processing of 1 petabyte of information. The objectives for the development were as follows:

- Low power consumption: 6 kW per rack, 60 kW for the entire storage cluster;
- High "density" of data placement: 100+ TB / rack;
- The use of local computers for data processing (800 low-end PC's);
- the possibility of using multiple operating systems;
- Possibility of placement in standard 19 "cabinets / racks;
- Ability to place in a shipping container 20x8x8 m;
- Ease of maintenance: one system administrator / petabyte;
- Software to automate full backup (mirroring);
- Easy to scale;
- Inexpensive design;
- Low cost of storage.

image

Story


The first 100 TB rack of the European archive began its work in June 2004. The second 80 TB counter began operating in San Francisco that same year. The online archive then created Capricorn Technologies, which specialized exclusively in the development and implementation of PetaBox.

image

In the period 2004-2007 Capricorn Technologies makes replicas of PetaBox for large academic institutions, government agencies and other enterprises. Their largest product uses 750-gigabyte drives. In 2007, the online archive's data center stores about 3 petabytes of information using PetaBox technology.

Now the fourth version of PetaBox is used, the main specifications of which are: 24 disks per 4U-unit of equipment, 10 such units of equipment in a rack running Ubuntu, 240 disks of 2 TB each in one rack.

image

Internet archive in a container


Well, in conclusion, I would like to draw attention to the transport container, which was developed by SAN for the Internet archive. The capacity of a 20x8x8 meter container will allow you to save the entire library of the US Congress 55 times!

Source: https://habr.com/ru/post/156383/


All Articles