📜 ⬆️ ⬇️

Google Platform. 10+ years

Storing and processing data is a task that mankind with varying success solves a thousand years. The problems associated with solving this problem are connected not only with the physical volume of data ( volume ), but also with the rate of variability of these data ( velocity ) and the variety of data sources - the fact that Gartner analysts in their articles [11, 12] designated as "3V".

Computer Science has recently faced the problem of Big Data, whose solutions for IT are private companies, governments, and the scientific community.

And a company has already appeared in the world, which, with varying success, has been coping with the Big Data problem for 10 years already. In my feeling (because to declare we truly need open data, which is not freely available), no commercial or non-profit organization operates with a large amount of data than the company in question.
')
This company was the main contributor to the ideas of the Hadoop platform , as well as many components of the Hadoop ecosystem, such as HBase, Apache Giraph, Apache Drill.

As you guessed, it is about Google.



Chronology of Big Data at Google


Conventionally, the history of the development of “Big Data” solutions at Google can be divided into 2 periods:

2003-2008


During this period, Google engineers described and published freely available research papers on 3 systems that Google uses to solve its problems:

The impact of the work published by Google on the first steps in the development of the Big Data industry is difficult to overestimate.
The most famous example of the implementation of the concepts described by Google is the Hadoop platform. So the prototype of the HDFS file system is GFS ; the ideas behind the HBase architecture are taken from BigTable ; and the Hadoop MapReduce computing framework (without YARN ) is an implementation of the principles embodied in a similar Google MapReduce framework.

Starting from 2008, the Hadoop platform itself will be gaining popularity over several years and by 2010-2011 will become the de facto standard for working with Big Data.

Now Hadoop is already the “locomotive” in the Big Data world and has a huge impact on this IT segment. But once the same huge influence on Hadoop had the architectural approaches described in Google to build the Big Data platform.

The Google platform itself has been developing all this time, has adapted to more and more new requirements, the search engine has new services, including those whose nature corresponded to an interactive processing mode rather than a batch ; chunk sizes (clusters in GFS) were suitable for efficiently storing not all data types; there were requirements related to geodistribution and support for distributed transactions .

By 2009-2010, both within Google itself and in the academic environment, the merits and limitations of the set of approaches for building the Big Data platform described by Google engineers from 2003 to 2008 were investigated in sufficient detail. Yes, and the Google platform itself for the period up to 2009 has evolved and evolved.

2009-2013


So, in (conditionally) the 2nd stage of development of the Big Data platform in Google - 2009-2013 - the following software systems were described by researchers with varying degrees of detail:

In subsequent articles of the cycle on the Google platform, most of the above internal Google software products will be reviewed, with which Google successfully solves the tasks of storing, structuring and searching by data, detecting spam, increasing the effectiveness of ad impressions in contextual advertising services, maintaining data consistency on the social network Google+, etc.

Instead of conclusion


Instead of a conclusion, I’ll quote a person who has already proven his ability to successfully predict the future of the Big Data industry, Cloudera CEO Mike Olson:
High-performance data processing
read your Google research papers that are coming out right now.
- Mike Olson, Cloudera CEO

List of sources used to prepare the cycle


main sources



Additional sources



Post change history
Commit 01 [Dec 23rd, 2013]. Changed the title of the article.
- Google Platform. 2003-2013
+ Google Platform. 10+ years
Commit 02 [Dec 24, 2013].
+ link to the post with a description of Colossus.
Commit 03 [Dec 25, 2013].
+ link to post with description of Spanner.
Commit 04 [Dec 26, 2013].
+ link to the post with a description of Dremel.
Commit 05 [12/27/2013].
+ link to post with Photon description.


Dmitry Petukhov
MCP, PhD Student , IT Zombies,
caffeinated man instead of red blood cells.

Source: https://habr.com/ru/post/206972/


All Articles