📜 ⬆️ ⬇️

MySQL is the best NoSQL system

When considering the use cases of NoSQL, such as storing key-value pairs, it turns out that MySQL is more preferable in terms of performance, ease of use, and stability. MySQL is a solid system with an abundance of online materials that cover all topics from basic operations and error analysis to replication and various usage patterns. This gives MySQL an advantage over younger NoSQL systems that have no such experience.

In recent years, NoSQL-systems have become the dominant trend. Many developers see in NoSQL-systems, such as MongoDB, Cassandra, Redis or Hadoop, the best option for building their applications, considering them as a single product family that devalues ​​the old SQL-systems.

Often, the decision to use the NoSQL database is based on advertising hype or the mistaken belief that relational databases cannot provide the same performance as NoSQL databases. When it comes to choosing a database, engineers often lose sight of operating costs, as well as considerations of stability and maturity of a technology. To learn more about the limitations and flaws of various NoSQL (as well as SQL) systems, look at the Jepsen project series published on Aphyr.com.
')
In this article, we will explain why, in our opinion, using MySQL to store key-value pairs is better than most specialized NoSQL systems, and also provide instructions for using MySQL.

Defining Wix Sites


When someone clicks a link on a Wix site, their browser sends an HTTP request to the Wix server with the site address. This is also the case with the Wix premium site, which has its own domain (for example, domain.com), and with a free site on the Wix subdomain (for example, user.wix.com/site). The server must recognize the requested site at the site address, performing a key-value search for a URL-site pair. In the following discussion, we will denote the URL as a “route”.

The routes table is used to convert the site address to the site object. Since different paths can be used to enter the site, there is a “many-to-one” relationship. When the site is found, the application loads it to work with it. The site object, in turn, has a complex structure that includes two lists of child objects — various services used by the site. Below is a sample model of our facilities. Here we assume that a standard SQL database with a normalized structure is used:



When updating a site with a traditional normalized model, we need to use a transaction to update multiple tables to ensure data integrity (note that the transaction uses a lock at the database level, which prevents simultaneous writing and sometimes reading from the affected tables). Continuing to work with such a model, we will probably get to the SERIAL type key in each table, foreign keys and index in the URL field of the routes table.

However, data modeling based on a normalized scheme is fraught with a number of complications:

• locks restrict access to the table, so with large amounts of data this may limit our performance;
• reading an object requires either a few SQL queries (4 in our case) or using a JOIN - and this also affects the delay time;
• keys with the SERIAL attribute require locking, which again limits write performance.

These problems limit the bandwidth and parallelization of queries that MySQL (or any other SQL system) can provide for us. Because of these weaknesses, and also due to the fact that this is essentially a key-value pair, many developers prefer to look for a NoSQL solution that provides better performance and parallelization, even at the cost of stability, integrity and availability.

We found in Wix that MySQL, if used creatively as a repository of key-value pairs, can work better than MySQL with the normalized data model (given above), and also better than most NoSQL systems. Our current system provides such scaling, bandwidth, query parallelization, delay times that would honor any NoSQL system. Here are some data from our system:

• active installation on three data centers (active-active-active);
• throughput of about 200,000 RPM;
• The routes table has a volume of about 100,000,000 entries, 10 GB of disk space;
• table sites has about 100 000 000 records, 200 GB of disk space;
• the average read delay is 1.0-1.5 milliseconds (in fact, 0.2-0.3 ms in one data center).

Notice a delay of about 1.0 ms. It is considered an impressive figure in most systems built on a key-value pair, including both cloud systems and open source systems. And we achieved this using MySQL (the simplest, as is commonly believed, SQL-system).

Here is the diagram we use:



CREATE TABLE `routes` ( `route` varchar(255) NOT NULL, `site_id` varchar(50) NOT NULL, `last_update_date` bigint NOT NULL, PRIMARY KEY (`key`), KEY (`site_id`) ) CREATE TABLE `sites` ( `site_id` varchar(50) NOT NULL, `owner_id` varchar(50) NOT NULL, `schema_version` varchar(10) NOT NULL DEFAULT '1.0', `site_data` text NOT NULL, `last_update_date` bigint NOT NULL, PRIMARY KEY (`site_id`) ) /*ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16*/; 

All fields that are not used as query conditions were collapsed into a single blob type field (site_data text field). It includes the sub-obj tables, as well as any fields of the table object itself. Note that we do not use SERIAL keys, instead we use a VARCHAR (50) type field that contains user-created GUID variables (more on this in the next section).

Below is the query we use, it has high bandwidth and low latency:

 select * from sites where site_id = ( select site_id from routes where route = ? ) 

Here, first a query is made to the routes table using a unique index; the query must return only one value. Then we browse the sites by the primary key, again looking for one value. The nested syntax allows you to process both SQL queries in one call to the database.

The result shown above requires approximately 1 ms. consistent work in high traffic and high frequency of updates. Semi-transactional updates, even without the use of transactions. This is possible due to the fact that we enter the entire site with a single INSERT command and, until we enter data into routes, requests will not detect it. That is, when we enter data about a site first, and then about paths, we are sure of the integrity of our data, even in a border situation, while our data in the sites table are not connected.

Instructions for using MySQL as a NoSQL system
Using the experience gained in the example described above (and other similar cases from Wix practice), we developed a short list of recommendations for using MySQL as a NoSQL system.

The main thing to remember when using MySQL as a NoSQL system is to avoid locks at the base level and complex queries.

• Do not use transactions that involve locks. Instead, use transactions in the application;
• Do not use SERIAL keys. Such keys entail blocking and complicate the configuration of the active-active type;
• use unique keys created by customers. We use GUID.

Optimizing the structure for reading, pay attention to a few additional recommendations:

• do not perform normalization;
• if there is a field - it should be indexed. If the field for the index is not needed, save it in one field of type BLOB / TEXT (as JSON or XML);
• do not use foreign keys;
• design your structure so that upon request you can read a separate series;
• Do not use the ALTER TABLE statement. These commands entail blockages and periods of temporary inoperability. Instead, use Live Migration data migration.

When requesting data:

• look for records by primary key or by index;
• do not use JOIN;
• do not use aggregation functions;
• run validation functions (BI, data mining) on ​​replicas, and not on the head base.
We are planning to write another article where we will tell in more detail about the migration of Live Migrations data and transactions through the application.

You can think in new ways


Perhaps this is the most important conclusion from this article. It's great to use MySQL as a NoSQL system, that is, not the way it was designed. As shown in this article, an example is the use of MySQL for working with key-value pairs instead of NoSQL-systems specially designed for this. In Wix, we chose MySQL to work with key-value pairs (and not only), because it is easy to use, easy to manage, and it is a great ecosystem. As a bonus, it provides latency, throughput, and parallelization rates, which are almost the same as most NoSQL systems.

Chief Software Architect for Wix site creation ,
Yoav Abrahami

Original article: Wix engineers blog

Source: https://habr.com/ru/post/283158/


All Articles