📜 ⬆️ ⬇️

The largest database in the world is Yahoo! And it works on PostgreSQL!

Yah The company Yahoo claims that it managed to break the world record, creating the largest and most loaded database in the world!

The database launched a year ago reached 2 petabytes. The system was created for analytical purposes, it stores the history of web users' behavior (it is argued that half a billion users are stored per month). Among other things, the Internet giant claims that it is not only the largest database in the world, but also the most loaded - it records 24 billion events per day.
Postgres!
And now the fun part. This monster is controlled by a modified PostgreSQL. This is the result of the purchase of the startup company Mahat Technologies, initially working with the most advanced open-source database, PostgreSQL. The “Postgres” code was modified to work with such huge amounts of information (one of the biggest changes: orientation towards column-by-column storage instead of traditional line-by-line, which slows down writing to disk, but provides better data access speed for analytical purposes). A positive result is evident: some tables in the database contain trillions of rows, which are not just dead weight on the disks, but can be queried and processed using standard SQL, in a standard ACID-compatible environment.

Yahoo engineers expect to grow to 5 petabytes by next year. And they are ready for such growth. For comparison: seldom there are DBs of the enterprise level with a volume of more than tens of terabytes. For example, one of the largest publicly known databases in the world — the US Tax Service database “weighs” only 150 terabytes. EBay says it works with systems that process 10 billion lines per day, with a total of 6 petabytes of data in these systems and about 1.4 petabytes of data from the largest of the systems.
')
It should be understood that we are talking about the database and databases built on them. There are data warehouses with even more impressive volumes, but the data in them are practically inaccessible for analysis and processing. For example, the World Climate Data Center in Hamburg has a repository of more than 6 petabytes of data stored on a magnetic tape, while “only” 220 terabytes of data are in the “active” state (which are maintained by Linux-based DBMS, see PDF ) .

“PostgreSQL continues to evolve, confirming the title of the most advanced open-source database,” comments Nikolay Samokhvalov, a representative of Postgresmen. “Last year, Sun engineers showed the world that PostgreSQL is not inferior to Oracle . At the recently held PGCon2008 International Conference in Canada, NASA representatives spoke about their experiences using PostgreSQL to work with large databases from the climate field. The Yahoo experience is another bright confirmation of PostgreSQL maturity. And this is very good news for all of us, it’s a pity that, as far as I know, Yahoo is not planning to share its experience with the community. ”

Source: https://habr.com/ru/post/26289/


All Articles