InterSystems Caché and NoSQL Technology

Modern high-load applications have changed the requirements for the DBMS - today, effective technologies are needed to create specialized solutions with guaranteed response time when processing large data arrays. At the same time, despite the emergence of such relatively new technologies as NoSQL, the potential of existing approaches for a long time has not been fully realized.

High-load Internet projects and XTP (extreme transaction processing) applications have changed the requirements for database technologies. The priority requirements are simplicity of development, the possibility of specializing the technology of stored data for a specific project, supporting a constant system response time with increasing load, ensuring a low cost of scaling and the cost of processing large amounts of data.

In response to new needs, the NoSQL movement has emerged - a new class of databases that promises developers a high rate of change in applications, low costs of scaling and processing / storing large amounts of data, high speed on relatively inexpensive hardware - values that have always been important for InterSystems technology. Almost always, NoSQL databases implement a different from the usual paradigm of applications with databases - the transition from the concept of integrating DBMS for several applications to the concept of DBMS for one application or one project and more - a separate specific task within the project.

')
NoSQL uses generic data models, simple APIs or access protocols (compared to traditional ones), the ability to scale out on demand for a certain set of operations on many servers, distributed data storage on many servers, efficient use of distributed indexes and memory for queries. , free handling of serious and unshakeable things for traditional DBMS - data integrity and transactions.

Today, there are more than a hundred NoSQL solutions, differing in approaches to scaling and distributed data storage, supported data models and storage schemes (including storage implementation). A separate point of comparison is undoubtedly requests to stored data and their execution — in the world of NoSQL, the standard query language does not yet exist, and a clear understanding of the principle of operation is necessary for successful implementation of queries by the developer.

At its core, the NoSQL solution design is aimed at either combating a large amount of data or its increased complexity. The idea is taken from the presentation of the author Neo4J Emil Eifrem . Interestingly, it speaks about the theoretical isomorphism of the data - the same information can be represented in different models - from graph to key / value. The practical possibility of this approach is demonstrated by InterSystems using the Unified Data Model principle in Caché.

The common thing that brings NoSQL projects together is the widespread use of compromises with respect to mutually contradictory requirements, the rejection of which was impossible and was a dogma for traditional DBMS - for example, avoiding the support of all ACID properties in favor of horizontal scalability. The most popular way of explaining the reasons why compromises are natural is the CAP theorem. At ease interpreting its meaning, it can be said that it is impossible to be reliable, fast, distributed and complete at the same time - however, there may be options.

Another source of compromise is the nature of the problem data. It is here that the important requirement for technology, which must be flexible and make the most of the features of the subject area. For example, if you can parallelize data processing and use the principle of “shared nothing” (without resource sharing), then you need to effectively use this both for storing and for executing queries. In this case, it is necessary to build a model of data storage and distribution, which relies on this possibility; however, there is almost no freedom of choice in relational bases, and you have to use what is, for example, you cannot store data on different servers in a colonial manner. At the same time, unlike traditional DBMSs, NoSQL gives the developer more freedom to use the natural features of the design task, which can be used for example for horizontal scaling. However, the developer bears more responsibility for the decisions made on the data persistence architecture. Partly, the NoSQL database can also be compared by the presence of mechanisms that are standard for traditional DBMSs, or they can be classified according to the engineering solutions used, proposed as an alternative to the traditional properties of the DBMS.

Despite the diversity of NoSQL projects, now there is not one that can be safely called the universal and comprehensive NoSQL platform - this contradicts the very principle of specialization, which is explicitly or implicitly traced in NoSQL. Therefore, in the event that you have an idea to use the NoSQL approach of the next project, then most likely you will have to answer a number of questions and resolve a lot of risks, for example: which data model to choose; how stable and mature the chosen technology is; how serious will be the changes in the code in case of an attempt to change the NoSQL solution to another, more efficient one; whether the query language will be sufficiently complete and technologically advanced to meet design requirements. Separately, it is worth noting that many NoSQL technologies were created specifically for a specific project and to some extent are similar to flux - there is a possibility that they may not be very suitable in your case by perfectly covering the requirements and tasks of the original project.

In the situation of the first project using the NoSQL approach, a hybrid approach to the construction of a storage data management subsystem would be a wise decision. For a hybrid approach, two possible designs can be proposed - the simultaneous use of both NoSQL technology and a conventional DBMS in a project or the use of technology that supports the concepts of both worlds to the extent necessary. And in this case, InterSystems Caché provides a unique opportunity to provide such a hybrid technology platform - a mature, proven, supported.

The first, obvious, phonetic, similarity that immediately attracts attention in comparing NoSQL and InterSystems Caché is non-relationality. At the core of Caché is the implementation of a simpler, than a relational, model, named after its atomic elements globala (or to be exact the full name Global Persistent Variables or simply globals). Globals do not have a scheme, allow dynamic addition of columns, use sparse storage of column values. At the global level, you can use locks, transactions, distributed storage and partitioning if desired. By sacrificing some inaccuracy in the definition, you can think of globals as a structure similar to an associative array in PHP or HashMap in Java.

Globals as a simple and flexible data model provide an excellent basis for building non-relational models that are used in NoSQL: key-value, extensible entries, column-based, graphs. Detailed examples of the implementation of models popular in NoSQL are given in the article A Universal NoSQL Engine ( Habré translation ) - the authors offer solutions for four types of data models.

For example, the implementation of column storage (from article A of the Universal NoSQL Engine, Using a Tried and Tested Technology):

At the global level, there is no declarative query language common to relational databases. Requests are defined in an algorithmic way - the execution of a request is reduced to the execution of code written in the Caché Object Script language, which provides a sufficient set of simple, efficient operations for working with data stored in globals. The uniqueness of Caché Object Script as a programming language is that it is perhaps the only language whose syntax explicitly introduces a construct to indicate where a variable is stored - in memory or roughly speaking on disk. Imagine that in traditional platforms such as Java or .NET there would be such an opportunity - in many respects the problem of overcoming the environment between the program and the database simply would not exist. The absence of such a construction for universal programming languages after working with Caché seems strange, because it is natural to assume that the code works not only with variables in memory, but also with stored variables. At the same time, you do not need to define the structures in the database in advance - you simply work with them as well as with variables in languages with weak typing.

The Unified Data Model concept is based on the principle of “data alone - multi-view models”

Following InterSystems, which has already implemented global object-oriented and relational (SQL) access, you are able to implement your own, unique data model and, like those already ready for use in Caché, use the Unified Data Model principle persistence, which involves working with the same data in different models, depending on the convenience of their use in the context of a specific task. For example, it is possible to use key-value models for quick insertion and reading, and the capabilities of the relational model are used for queries. When building your query language for a NoSQL solution, you can use Caché Object Script, which provides a set of simple operations for working with data stored in globals.

A separate, but not as obvious as the non-relational aspect of the comparison between Caché and NoSQL is distribution and scaling. If we compare Caché in such an important category for NoSQL, such as providing horizontal scaling and distributed storage using mechanisms such as sharding and partitioning, on the one hand, Caché does not have options out of the box with such names out of the box. From another point of view, this is not quite true, because in Caché either this is done a little differently, or again, as in the case with globala, reliable basic technologies are provided to effectively provide such capabilities. Using ECP, the concept of areas (namespace), Subscript Level Mapping, it is possible to implement efficient distributed data processing.

Partitioning and distributed storage using ECP and SLM:

Realizing that NoSQL is now attracting the attention of many developers, InterSystems has released a free DBMS, called InterSystems Globals . The goal of the Globals release is to introduce developers to the technology that is at the heart of Caché and expand the circle of developers and architects who know how to apply it.

Globals, like many other NoSQL projects, implies free use for development and distribution. Non-relational models can be implemented in Globals and, as an example of such an implementation, a project with the open source Globals Document Store (GDS) API has been created. Globals can be successfully used in projects that require high speed and performance, the order of the speed of the Globals - tens and hundreds of thousands of records per second).

InterSystems Globals provides simple APIs for working from .NET, Java and Node.js. In contrast to the history described above with Caché Object Script, in the case of Globals, a different approach is used: global operations are available from a programming language that is external to the DBMS. At the same time, the process of an application working with Globals (for example, a JVM) becomes in fact one of the DBMS processes.

In the case of Java, the technology allows you to quickly implement your own stored data structures, which are natural to the language. For example, you can quickly implement an analogue of HashMap, whose data will be stored in a DBMS. With this approach, as in the case of Caché Objects Script, the differences between the variable in the memory and on the disk begin to disappear.

For Node.js, access to Globals immediately provides the ability to work with javascript data types that are natural for javascript - you can save, read and modify arrays and javascript objects without additional development overhead, which simplifies the problem of data persistence when working in javascript. In addition to this, Globals in conjunction with Node.js gives a high speed of work - for comparison, Globals on tests works faster than Redis (one of the most widely used NoSQL projects, including one known for its speed).

InterSystems Globals is positioned as a NoSQL database, but at the same time differs from the main stream of NoSQL in several aspects: in InterSystems Globals there are no restrictions on a specific data model, unlike many other solutions you can use locks and transactions, Globals provides simultaneous and efficient work with data in memory and data integrity on disk. At the moment, Globals is not possible for distributed work with stored data. Globals use a stable base technology that is guaranteed to be developed and maintained.

Globals, in contrast to Caché, provides a core for working with globals, without object and relational accesses. But with the development of the project, there is always the opportunity to switch to Caché without changing the application code - Globals API is a subset of Caché Extreme technology.

Summarizing, we can say that now you can think of Caché as a NoSQL database and more — a universal, stable platform for NoSQL projects with support for object and relational models that are not inferior in their performance qualities to traditional DBMS. And despite the fact that NoSQL is a term that has become popular quite recently, it fully corresponds to the values of the company, which remain the same - as for 30 years, the technology that underlies Caché allows you to quickly create solutions that are completely specific your project. Why do you need the technology obtained in another project, if you can use your own?

Source: https://habr.com/ru/post/194818/

All Articles

InterSystems Caché and NoSQL Technology

More articles: