Google launched the beta version of Cloud Spanner - NewSQL DBMS generation
Google has opened for all the beta version of the Cloud Spanner service, a globally distributed highly scalable multi-version NewSQL database with support for distributed transactions.
For several years, Google has used this service exclusively for internal needs. It employs key Google systems, including AdWords and Google Play. Spanner - the evolutionary development of Google's Notable predecessor Bigtable. C Spanner itself belongs to the NewSQL-solutions family, that is, it combines the advantages of relational and non-relational DBMS. These are the ACID transactions and SQL syntax of traditional DBMSs without sacrificing the horizontal scaling and high availability inherent in NoSQL. ')
Based on the company's in-house experience, Google offers 99.9999% uptime (six nines, i.e. maximum 31.5 seconds of downtime per year), client libraries with support for Java, Go, Python, Node.js, etc. The working principle of the Spanner is described in a scientific paper by James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, et al. Spanner: Google's Globally-Distributed Database.Proceedings of OSDI , 2012. See also the description of Spanner on Habré (2013) .
After several years of internal use, Google decided to roll out this infrastructure service for general use. Subscribe to offer corporate customers who need to deploy highly reliable, fault-tolerant cloud application. Previously, they had to choose between traditional databases with guaranteed transaction consistency and NoSQL databases with simple management, horizontal scaling and distributed data storage. Cloud Spanner service is designed to eliminate these contradictions, combining all the advantages of both technologies.
Cloud Spanner completes the range of database services in the Google Cloud Platform Cloud (GCP), complementing Cloud SQL, Cloud Datastore and Cloud Bigtable.
Cloud Spanner does not have a theoretical limit on the maximum base size. At the same time, this service can be used for small projects. The main advantage here is not only scalability, but also the ability to carry out global transactions simultaneously in all data centers around the world.
The cost of using the Cloud Spanner service is set at $ 0.90 per node per hour and $ 0.30 per gigabyte of used disk space per month. Payment for traffic within the region is not charged, between US regions - $ 0.01 per gigabyte, between countries - from $ 0.08 to $ 0.12 per gigabyte, to China - from $ 0.20 to $ 0.23 per gigabyte, to Australia - from $ 0.15 to $ 0.19 per gigabyte.
Key Innovation in Spanner
An article was published on Habré, why Google had to abandon NTP (Network Time Protocol) and implement its own time-checking system with GPS and atomic clocks, more accurate and reliable. It was called the TrueTime API. The introduction of such a system was necessary to ensure the integrity of the Google Spanner database.
Ensuring the integrity of data through a new system for coding information about the time of transactions is one of the key innovations of Spanner (as stated in the above-mentioned scientific work). Google engineers have developed a multi-level system for checking time, recording transaction time intervals, and evaluating the level of time mark reliability. This is a key factor on which the reliability of the system depends.
Instead of receiving data from external clocks, Google has equipped data centers with its own atomic clocks and GPS receivers. This equipment is connected to some servers that distribute time stamps to all other servers in the data center. In fact, on each machine in the data center, the daemon runs in the background, which constantly polls the time server in its data center and similar time servers in other data centers. Thus, Google servers around the world are guaranteed to work on the same time.
Through the Google TrueTime API, data synchronization is ensured when different data centers attempt to simultaneously write to the same cell in the database. The TrueTime API provides the value of the TTinterval time interval: this is a time with a measured measurement error and uncertainty. If the TTinterval intervals of two competitive transactions do not overlap, then it is safe to say which of them occurred earlier. If they intersect, it means a certain amount of uncertainty.
Compliance with CAP-theorem
Spanner combines the properties of relational and non-relational DBMS, while not violating the CAP-theorem . how this became possible - explains the author of the CAP-theorem, Eric Brewer, formerly a professor at the University of California, and now a vice president of Google for infrastructure.
Beta version of Cloud Spanner has been used by selected partners for some time now. For example, an engineer told Quizlet about his experience. This is an interesting look from the inside on the Spanner interface and protocols, because apart from official documentation, we do not yet have information about this unique service.