Why sql?

When we gave a general description of the architecture of our service on our English-language technical blog, readers who have experience with other large services, the most frequent questions were:

Why are your structured data stored in SQL databases instead of using NoSQL solutions?
Why do you use your own hardware instead of using cloud hosting services?

Both of these questions are logical and interesting. Today we will respond to the first, and the second we will save for a separate post.

When properly applied, a modern data storage mechanism in associative arrays (key-value) can provide significant performance and scalability compared to a single instance of a SQL server. However, there are several reasons why we decided to post all the data of your MySQL account.

SQL benefits

For starters, ACID properties of transactional databases, such as InnoDB with MySQL, are essential for our application and synchronization model .
')
Here is a small fragment of the database tables for storing notebooks and notes in the server database:

CREATE TABLE notebooks (
id int UNSIGNED NOT NULL PRIMARY KEY,
guid binary(16) NOT NULL,
user_id int UNSIGNED NOT NULL,
name varchar(100) COLLATE utf8_bin NOT NULL,
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE notes (
id int UNSIGNED NOT NULL PRIMARY KEY,
guid binary(16) NOT NULL,
user_id int UNSIGNED NOT NULL,
notebook_id int UNSIGNED NOT NULL,
title varchar(255) NOT NULL,
...
FOREIGN KEY (notebook_id) REFERENCES notebooks(id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

If you create a notebook called “Recipes” on your Windows client Evernote, and then immediately copy your favorite casserole recipe into this notebook, your client will do the following at the next sync:

Call NoteStore.createNotebook () to request creation of a notepad on the server. That will return him the identifier (GUID) of the created notepad.
Call NoteStore.createNote () to request the creation of a note on the server with the already specified notepad GUID.

Each of these high-level API requests is done through a single SQL transaction that ensures that the client can completely trust any server response.

ACID-compatible database provides such advantages as:

Atomicity If the API call is successful, then 100% of the changes will be executed, and if the API call fails, none of them will be made. This means that if we fail to place the fourth image in your note, then your account will not have a half-formed note, which, moreover, will be calculated from the remaining monthly volume for downloading data.

Consistency At the end of any API call, the account is in a fully operational and internally consistent state. Each note has its own notebook, and none of them is in limbo. The database will not allow us to delete the notebook, in which there are still notes, thanks to the FOREIGN KEY constraint.

Warranty When the server reports that a notebook has been created, the client may assume that he already has a notebook and take this into account in further operations (such as a call to create a note). This change is guaranteed, and the client knows that it consistently reflects the state of the service at any time.

The principle of warranty plays the most important role for our synchronization protocol. If the client application would not be sure that the changes made by the service were guaranteed to occur, the synchronization protocol would become much more complicated and less effective. Each synchronized client would have to constantly check the compliance of each server object with the current situation. Such an implementation of absolute control over consistency for an account with 20 thousand notes, 40 thousand resource files and 10 thousand tags would be extremely expensive if the changes did not imply a warranty.

How about a lot of data?

The advantages of ACID in the case of a transactional database make it very difficult to scale data beyond the limits of a single server. Database clustering and replication with multiple master servers is very dark, and associative data warehouses provide a much simpler approach to scaling a single repository to multiple servers.

Fortunately, Evernote doesn't need to solve this problem right now. Although we already have about a billion notes and almost 2 billion resource files on servers, this is actually not a single large data set — these are 20 million individual sets, one per user.

Such fragmentation means that we do not have the problem of storing a large single amount of data, we are dealing with the storage of a variety of isolated data sets of average size, which neatly falls into our server shard architecture.

Maybe in the future…

We, meanwhile, are trying to keep abreast of modern technologies in the field of data storage for future projects that do not require strict transaction ACID and provide horizontal scalability. For example, our reporting and analytics system has already outgrown the capabilities of the MySQL platform and needs to be replaced with something larger, faster and more interesting.

However, we are quite satisfied with the existing MySQL-storage of user data, although it is not so cool in the opinion of some guys.

Source: https://habr.com/ru/post/139021/

All Articles

Why sql?

SQL benefits

How about a lot of data?

Maybe in the future…

More articles: