The data volumes and requirements for their processing speed have increased many times over the past decades. Database management systems (DBMS) are trying to meet the new realities and are undergoing significant evolutionary and revolutionary changes. One of these evolutionary factors is the movement towards the so-called. vertical (column-based) storage systems.
Let's look at the classic table EMPLOYEE as an example:Employees / EMPLOYEE table
In traditional (row-based) DBMS, these data will be placed line by line:
In this case, in order to select, for example, the names of all employees, you will first have to read all the lines, then throw away the unnecessary ones.
In vertical (column-based) DBMSs, the same data will be represented by several blocks:
(presented in a very simplified form)
In fact, this new data presentation is rotated 90 degrees relative to the traditional one. This has several advantages:
- The “columnar” data blocks are smaller in size, and the smaller the data size, the easier it is to manage them.
- Blocks can be spread to different disks and even servers (vertical partitioning).
- Vertical (“columnar”) data is much easier to compress, which saves memory space and offloads the disk system.
At the same time, it can be noted that the creation of a new line in the database will entail writing in three (!) Physically separated data blocks instead of one record in the classic “line-by-line” version. From the point of view of the disk system, this is not very cool. Therefore, the “columnar” repository is considered to be more focused on reading and analytical sampling (OLAP).
In 2009, Michael Stonebraker (who can be called Einstein relational DBMS without exaggeration) predicted the imminent decline of the era of classic DBMS; not to be unsubstantiated, he announced the development of a new “vertical” DBMS Vertica
Vendors of "traditional" DBMS (such as Oracle and IBM), of course, do not believe that their era is over. Therefore, they offer “hybrid” storage systems as a modern paradigm; in fact, it is an add-on over a traditional storage system, where data is not stored in whole rows and not in columns, but in groups of blocks in several columns. The hybrid model is not as efficient in data compression as the “vertical” storage, but is a good compromise.
Hybrid DBMS is not always officially described as hybrid. For example, Oracle calls its technology Advanced Compression (11g), and he calls the “hybrid” blocks the compression unit. In non-relational DBMS Cassandra, which Facebook developed for its needs, such blocks are called column family.
As in the case of object databases, vendors obviously act according to the proven principle “the fashion will soon pass - and the relational databases will remain forever.”
A small FAQ. Why do we need vertical DBMS, if almost all modern DBMS support:
- Data compression
- Cluster indices, materialized views, etc.
Let's see. Theoretically,
we can make a repository that is very effective in reading (read-oriented) from the indices and materialized views in the usual “horizontal” DBMS. Suppose even that our database effectively compresses data and even indexes. As a result, we will be able
to get something like an improvised vertical storage at the output. Of course, it will be redundant (which means that data recording becomes more complicated ...) and data storage will not always be optimal - but in fact it will correspond to the “vertical” or hybrid representation.In practice
, the vertical DBMS will implement all this more optimally already at the data storage level. There is nothing new in the technologies used, as Stonebriker writes about his new C-Store engine:
Lastly, materialized views, snapshot isolation,
transaction management, and high availability have also
been extensively studied. The contribution of C-Store is
an innovative combination of these techniques
simultaneously provides improved performance, K-safety,
efficient retrieval, and high performance transactions.
Ps. To be on the safe side, we’ll clarify that the separation between “vertical” and “horizontal” DBMS has nothing to do with the popular SQL / NoSQL dichotomy.