Hi Habr! Today, Pyrus customers upload about 60GB of data to us daily. Our information storage technology has repeatedly proven its reliability. The company is developing, and we are concerned about the choice of database for the next 10 years. Our goal is to be ready for 100-fold growth and at the same time not to change the platform every 2-3 years. The competition in the database market is developed: many solutions are presented, most of them are open source and / or free. We are looking for the “perfect solution” ™ for our task.
Requirements
The main requirement for the database - so as not to lose information. Surprisingly, many databases do not satisfy this key requirement: even solutions that have been proven over the years fail in simple scenarios (examples:
one ,
two ). We want to maintain redundancy while shutting down any server for maintenance, which means that any information must be stored on at least 3 servers.
Another requirement for the database is the ability to use modern iron. After 10 years, there will be more than 100 cores in processors, the RAM will be integrated into the chips themselves, and the cost of flash memory will noticeably decrease. What will not change in 10 years is the speed of light. The network packet from Europe to America goes about 100ms (RTT), and this time is pretty close to the theoretical limit. Therefore, the future data centers are clusters of powerful numerical cores with a fast network inside, connected all over the world by high-latency communication channels. A modern database should support synchronous replication inside the data center and asynchronous between data centers.
In the analysis, we focused on the statements of the database providers themselves, the results of independent tests (when they exist), and real-life cases (there are many examples on
highscalabitility.com ). We have excluded embedded databases from consideration because they do not have automatic replication over the network.
')
Commercial SQL databases
The most famous representatives of this segment are Microsoft SQL Server and Oracle Database. These are excellent, time-tested products, and with the latest innovations - in-memory tables and column stores - take full advantage of the capabilities of modern hardware. Both databases support clustering technologies, and both have rich features of the SQL language (although each has its own dialect).
Both databases can be licensed using the “price per processor core” model and then the price does not depend on the number of users. After analyzing our workload and making a growth forecast, we considered that the cost would be disproportionately high and decided to explore alternatives.
Open Source SQL Databases
MySQL and PostgreSQL - the most famous representatives of this group - the best choice for most tasks. Both support clustering, there are
examples of use in large projects and even
migration from one to another in large projects . Perhaps the main disadvantage for us is manual sharding and, as a result, the lack of automatic cluster rebalancing.
In our system, it is natural to select an organization (user group) as the sharding key — the parameter by which the cluster server is used to store the data item. However, some organizations remain small - 1-2 users, while others grow to tens of thousands of users as they work in the service. Distributing the load on such a key sooner or later will lead to overflow of some servers in the cluster and underloading of others. At this point, rebalancing is required - that is, the cluster node is divided into two. This job is hard to do on a running 24x7 cluster without losing reliability.
NoSQL database
Fashionable in the 2000s, the NoSQL movement is now experiencing a period of maturity. All players are well known and have their supporters. Created with the rapid growth of the Internet, these databases were developed for relevant tasks, for example, for
storing and processing billions of unstructured documents . Many solutions
declare “eventual consistency”, which means abandoning the strict “C” in the CAP theorem. We cannot lose customer data, therefore such a compromise is unacceptable for us.
Some NoSQL solutions reduce availability (“A”) and declare “CP”, for example, Cassandra. This is suitable for our tasks, but we were surprised by the
lack of row-level consistency : two matched entries in different columns of one row can lead to data corruption. And although you do not expect such a level of glitches from the database, you can find a workaround around this problem (for example, modifying the lines only entirely), and we took note of Cassandra.
Cloud database
About this category, you can write a separate review. Each of the main PaaS players (Amazon, Google and Microsoft) has 6-8 different offers for storing structured data (and many more BLOBS storage services). Under any type of load, you can choose a ready-made solution.
We abandoned cloud storage for personal data storage reasons. Our clients are located in different countries, and no service offers PD storage in all countries of the world in accordance with local legislation. Another reason was a strong dependence on a specific vendor - you cannot take their technology and deploy it on your hardware. If there is a desire to get away from the vendor (when prices increase or reliability decreases), the migration project can be very long.
Dropbox took more than 2 years to move from the Amazon cloud to its own storage.
NewSQL database
The popularity of the SQL language and the development of hardware created a new movement - distributed databases with the SQL query language. Among them stands Google Spanner, which guarantees linearizability - the global order of recording all transactions. To solve this problem on a global scale, you need to synchronize the time on the database servers around the world. Google uses atomic clocks for this, and GPS receivers for reserve.
However, for mere mortals, atomic clocks are still a luxury, so Spanner authors built a similar database with slightly less guarantees for the order of transactions, but sufficient for most applications. This database is called CockroachDB (from the English. “Cockroach”) and its name represents the survivability of the cluster in case of iron failures or connections between data centers. CockroachDB
provides full-fledged distributed transactions and automatic cluster rebalancing when a node is lost, which, coupled with the familiar SQL query language, distinguishes it from Cassandra. Among the shortcomings it is worth noting the lack of full-text indexes and the comparative youthfulness of the solution.
Move code to data
Often, business logic resides on the application server, which receives client requests and requests data from the database server for processing. When there is a lot of data, transferring them over the network from the database server begins to take significant time. From here there is a natural desire to transfer all processing inside the database and technologies like Apache Hadoop, which allow you to program such tasks. (Ordinary relational databases also allow you to write query logic inside stored procedures, but many developers do not like them because they are not convenient to debug.)
Recently, the idea of ​​combining application and database servers for near real-time OLTP loads is gaining popularity, and relevant technologies are emerging, for example, Tarantool. The architecture without blocking “cooperative multitasking” is very appealing, although it is more difficult to write such applications. It stops the Lua programming language - although it is popular among game developers, but closed, it develops slowly and there are no people in our team with real experience of using it.
Conclusion
Today we consider CockroachDB the most promising option. We are impressed by the openness of the company (the source code of the database is posted on github) and the quality of the documentation (architectural and other key decisions up to the low-level data storage format published on the site). We follow the evolution of the product and will be happy to exchange views with colleagues who use this database in production.
In the meantime, we are launching a pilot project and will share with you the experience of using in combat mode.