SQL or NoSQL - that is the question

We all know that in the world of database technology there are two main areas: SQL and NoSQL, relational and non-relational databases. The differences between them lie in how they are designed, what types of data support, how they store information.

Relational databases store structured data that typically represents real-world objects. Let's say it can be information about a person, or about the contents of a shopping basket in a store, grouped in tables whose format is specified at the design stage of the repository.

Non-relational databases are arranged differently. For example, document-oriented databases store information in the form of hierarchical data structures. We can talk about objects with an arbitrary set of attributes. The fact that a relational database will be split into several interrelated tables can be stored in a non-relational table as a holistic entity.
')
The internal structure of various database management systems affects the way they work. For example, non-relational bases are more scalable.

What technology to choose? The answer to this question depends on the specifics of the project in question.

About choosing SQL databases

There is no database that will suit everyone. That is why many companies use both relational and non-relational databases to solve various problems. Although NoSQL databases have become popular due to speed and good scalability, in some situations structured SQL storages may be preferred. Here are two reasons that may lead to the choice of SQL-database:

The need for database compliance with the requirements of ACID (Atomicity, Consistency, Isolation, Durability - atomicity, consistency, isolation, durability). This reduces the likelihood of unexpected system behavior and ensures the integrity of the database. This is achieved by a hard determination of exactly how transactions interact with the database. This is different from the approach used in NoSQL databases, which prioritize flexibility and speed, not 100% data integrity.
The data you work with is structured, and the structure is not subject to frequent changes. If your organization is not in the stage of exponential growth, there is probably no convincing reason to use a database that allows you to be fairly free with data types and is aimed at processing huge amounts of information.

About the choice of NoSQL-databases

If there is a suspicion that the database may become a bottleneck of a certain project based on working with large amounts of information, it is worth looking in the direction of NoSQL databases, which allow something that relational databases do not know how.

Here are the features that have caused the popularity of NoSQL databases such as MongoDB, CouchDB, Cassandra, HBase:

Storage of large amounts of unstructured information. NoSQL database does not impose restrictions on the types of stored data. Moreover, if necessary, in the course of work, you can add new data types.
Use of cloud computing and storage. Cloud storage is a great solution, but it requires that data can be easily distributed across multiple servers to ensure scalability. Using local equipment for testing and development, and then transferring the system to the cloud, where it works, is exactly what the NoSQL database was created for.
Quick development. If you are developing a system using agile methods, the use of a relational database can slow down. NoSQL databases do not need the same amount of preparatory actions that are usually needed for relational databases.

In the next section, we’ll look at some of the differences between SQL and NoSQL. Namely, first take a look at a simple example that shows the fundamental difference between the two approaches to database organization, then we will talk about scalability and data indexing. As a result, let us dwell on the example of a large CRM system that needs high performance data storage.

SQL and NoSQL

Let's start with some key concepts of relational and non-relational databases. Below is a database containing information about people's relationships. Variant a is a diagramless structure, built as a graph, characteristic of NoSQL solutions. Option b shows how the same data can be presented in a structured form typical of SQL.

Two options for presenting data

Artificiality means that two documents in the NoSQL data structure should not have the same fields and can store data of different types. Here, for example, an array of objects whose set of fields does not match.

var cars = [ { Model: "BMW", Color: "Red", Manufactured: 2016 }, { Model: "Mercedes", Type: "Coupe", Color: "Black", Manufactured: "1-1-2017" } ];

With the relational approach, data should be stored in a pre-designed structure from which this data can then be extracted. For example, using the JOIN operator when selecting from two tables:

 SELECT Orders.OrderID, Customers.Name, Orders.Date FROM Orders INNER JOIN Customers ON Orders.CustID = Customers.CustID

As a more advanced example, to demonstrate when SQL is preferable to NoSQL, consider the features of the use of compression algorithms in NoSQL databases. The problem is that in some NoSQL databases (for example, in CouchDB and HBase), it is constantly necessary to form so-called sstables — string tables in the key-value format, sorted by key. In such tables, which are stored on the disk, the data gets from the tables stored in memory, when they are full, and in other situations. With intensive work with the database, the creation of tables, over time, leads to the fact that the I / O subsystem of the data storage device becomes a bottleneck for data reading operations. As a result, reading in the NoSQL database is slower than writing, which negates one of the main advantages of non-relational databases. It is in order to reduce this effect that NoSQL systems use, in the background, data compression algorithms, trying to merge multiple tables into one. But by itself, this operation is very resource-intensive, the system works under increased load.

Scalability

One of the main differences of the considered technologies is that NoSQL-bases are more scalable. For example, MongoDB has built-in support for replication and sharding (horizontal data sharing) for scalability. Although scaling is supported in SQL databases, it requires much more human and hardware resources.

Data Warehouse Type	Usage scenario	Example	Recommendations
Key-value store	Suitable for simple applications, with one type of object, in situations where the search for objects is performed only on one attribute.	An interactive update of the user's homepage on Facebook.	Recommended familiarity with memcached technology. If you have to search for objects by several attributes, consider the option of moving to a document-oriented repository.
Document Oriented Storage	Suitable for storing objects of various types.	A transport application that operates data about drivers and cars, working with which you need to search for objects by different fields, for example - the name or date of birth of the driver, the number of rights, the vehicle he owns.	It is suitable for applications in the course of which it is allowed to implement the principle of “consistency in the long run” with limited atomicity and isolation. It is recommended to apply a quorum reading mechanism to ensure timely atomic consistency.
Storage System with Extensible Records	Higher bandwidth and better parallel processing at the cost of a slightly higher complexity than document-oriented storage.	Applications similar to eBay. Vertical and horizontal separation of data to store customer information.	To simplify the separation of data used HBase or Hypertable.
Scalable RDBMS	The use of ACID semantics frees programmers from having to work at a fairly low level, namely, to be responsible for locking and consistency of data, to handle obsolete data, collisions.	Applications that do not require updates or merging data covering multiple nodes.	It is worth paying attention to systems such as MySQL Cluster, VoltDB, Clustrix, focused on improved scaling.

A more detailed comparison of SQL and NoSQL can be found in this article. Here are its main points. Namely, three basic characteristics of the systems were tested: parallel data processing, work with information storages, data replication. Parallel processing capabilities were assessed by analyzing blocking mechanisms, managing multi-versioned parallel access, and ACID. Testing of storage covered both physical media and storage using memory. Replication was tested in synchronous and asynchronous modes.

Using the data obtained during the tests, the authors conclude that SQL-databases with the possibility of clustering showed promising performance results per node, and, besides, have the ability to scalability, which gives RDBMS systems an advantage over NoSQL at the expense of ACID compliance guidelines.

Indexing

In RDBMS systems, indexing is used to speed up the extraction of data from databases. The absence of an index means that the table must be viewed in its entirety in order to fulfill the read request.

In both SQL and NoSQL databases, indexes serve the same purpose — to speed up and optimize data retrieval. But how exactly they work is different due to the different database architectures and peculiarities of information storage in the database. While SQL indices are represented as B-trees, which reflect the hierarchical structure of relational data, in NoSQL databases they point to documents, or to parts of documents, between which, basically, there are no relations. Here is the detailed material on this topic.

CRM systems

CRM applications are one of the best examples of systems characterized by huge volumes of daily processed data and a very large number of transactions. All developers of such applications use both SQL and NoSQL databases. And, although most of the transaction data is still stored in SQL databases, the use of publicly available DBaaS class systems (data-base-as-a-service, database as a service), like AWS DynamoDB and Azure DocumentDB, as a result, is a serious load. data processing can be transferred to cloud NoSQL-bases.

While the use of such services frees the developer from solving storage maintenance tasks, this is also the area where NoSQL bases are used for what they were mainly created for, for example, for in-depth data analysis. The amount of information stored in huge CRM systems of financial and telecommunication companies would be almost impossible to analyze using tools like SAS or R. This would require huge hardware resources.

The main advantage of such systems is the use of unstructured data similar to documents. Such data can be fed into the input of statistical models that give companies the opportunity to perform various types of analysis. CRM applications, moreover, are a very good example in which two database systems are not competitors, but exist in harmony, each playing its part in a large data management architecture.

Results

When searching for a database management system, you can select one technology, and later, after specifying requirements, switch to something else. However, reasonable planning will save a lot of time and money.

Here are signs of projects for which SQL databases are ideal:

There are logical data requirements that can be defined in advance.
Data integrity is very important.
We need a well-established technology based on well-established standards, using which one can rely on extensive experience of developers and technical support.

But the properties of projects for which something suitable from the scope of NoSQL:

Data requirements are vague, ambiguous, or developing with project development.
The goal of the project can be adjusted over time, with the possibility of starting development immediately.
One of the main database requirements is data processing speed and scalability.

As a result, I would like to say that in the modern world there is no opposition between relational and non-relational databases. Instead, it’s worth talking about sharing them to solve problems on which a particular technology performs best. In addition, the integration of these technologies into each other is increasingly observed. For example, Microsoft, Oracle and Teradata are now offering some form of integration with Hadoop to connect SQL-based analytics tools to the world of unstructured big data.

Dear readers, have you had to choose database management systems for your own projects? If yes - please share your experience, tell us what and why you chose in the end.

Source: https://habr.com/ru/post/324936/

All Articles