Using the experience of testing a relational database for NoSQL technology

Tarantool is an open, high-performance data warehouse. Such system characteristics as simplicity and performance largely determine the area of its applicability. On the scale of storage solutions, Tarantool lies somewhere in the middle between traditional relational systems, such as MySQL, and data caching solutions, such as memcached. The system is based on a persistent data store, fully mapped to RAM, and high-performance indexes for accessing data based on hashes and binary trees.

In my first months of working on Tarantool, I tried to create testing tools similar to the one I had in my previous open source project - MySQL.

Because tools like mysql-test-run, mysqltest, pushbuild are little known outside the MySQL ecosystem, I will describe what we managed to create, in several blog posts, each will be devoted to a separate component.
')
In short, modern open source projects use a set of tools that allow them to be developed in a “civilized” way. Some projects use only part of the toolkit, but the automation of the full development cycle, in my opinion, allows to achieve the best results.

We are talking about the following automation components:

- automated regression testing,
- tools for functional and unit testing,
- automatic continuous integration testing,
- automatic creation of packages and distributions for product installation.

These tools, assembled and working together, determine the habitat of each new feature added to the product, accompany development throughout the cycle, allow engineers to create better code, eliminate routine, give the project a whole opportunity to grow.

I have to admit that when I started working on the automation task, I did not fully know what I would have to face.

Tarantool is a key / value repository that can be distributed, but I approached the task, paying maximum attention to the issues of functional regression testing, that is, as it was done in MySQL.

My initial understanding of NoSQL was that it was just a more efficient storage, which, however, provides only part of the functionality of a full-fledged DBMS.
I was not quite right.

Functional testing Tarantool

Tarantool provides its customers with a simple binary protocol that supports the basic set of commands for working with data - GET, PUT, SET, DELETE.

Administrative commands, such as 'SAVE SNAPSHOT', 'SHOW STAT', are sent in text as a separate administrative port.

A separate port for the administration is necessary if only because at the moment Tarantool does not support any authorization scheme.

By the beginning of my work on the project in Tarantool, there were already several functional tests, but they did not work in automated mode. In a nutshell, if this test is written by you, you most likely will not use it.

Unlike existing tests, I was looking for a testing language in which it would be equally easy to write tests as well as to understand those already written. This was supposed to be a language that Tarantool developers and testing engineers could speak.

It was also important to be able to create test scripts for all errors found in the new environment.

Of course, for an ex-SQL programmer, SQL looked very attractive. In the end, for SQL, at least there is a standard, whereas the “languages” of modern NoSQL systems do not even have a common subset.

So, SQL was chosen as the language for writing tests. In terms of SQL, Tarantool provides 4 data manipulation operators: INSERT, SELECT, UPDATE, DELETE.

Of course, the SQL grammar is much richer than the functionality of Tarantool, but this, in my opinion, was acceptable. It was not a very pleasant surprise that the part of the storage key / value functional was impossible to express in terms of SQL.

First of all, Tarantool does not operate relations (sets of tuples), like the SQL standard, but key / value pairs. There are no multiple operations, and even those operations that work with several tuples, in fact, combine several operations into a package, and do not work with the set as a whole.
In SQL terms, this means that all operators that manipulate data must contain a WHERE clause and a link to a unique key.

Further, the objects of the Tarantool database are addressed by index, not by name. Instead of SQL “tables”, “key spaces” (namespaces) are used, numbered from 0 to N.

At the same time, in each space, which, in fact, consists of many different-sized tuples, there can be several indices for different fields or combinations of fields.
The data is accessed by index number (instead of “index” the term “key” is also often used, but I prefer to call an individual value rather than a data structure) rather than the number or name of a column. Index number 0 is always considered a primary key.

OK, Tarantool's SQL grammar has been shortened to allow only identifiers that end in numbers.

Then, because the key space has no dimension or fixed structure, there is no possibility to work only with a part of the tuple - all operators, with the exception of UPDATE, work with the entire tuple.

SELECT SQL, therefore, can only accept '*'.

When the grammar abbreviation was completed, I found that Tarantool allows INSERT, UPDATE and DELETE to return the old value of the tuple as a return value. For this purpose, there was no suitable construction in the standard SQL subset.
But in general, I achieved my goal - the creation of a micro-language with a simple, well-known, easy-to-understand syntax.

Because I chose Python as the programming language for the automation infrastructure, and the YAPPS library was used to create the SQL parser. I wrote about the advantages and disadvantages of this solution separately in my blog.

In order not to skip administrative commands through the parser, I simply added a regular expression that allows you to filter them from SQL queries at an early stage. Such commands were sent directly to the server through the administrative port.

The result was a fairly convenient command line client, on which you could easily write tests:

kostja@shmita:~/work/tarantool/test$ ./tarantool
tarantool> select * from t0 where k0 = 1
No match
tarantool> insert into t0 values (1, "My first tuple"
Insert OK, 1 row affected
tarantool> select * from t0 where k0 = 1
Found 1 tuple:
[1, 'My first tuple']
tarantool> save snapshot
---
ok
...
tarantool> show info
---
info:
version: "1.3.2-264-ga7bb270"
uptime: 73
pid: 22101
wal_writer_pid: 22102
lsn: 2
recovery_lag: 0.000
recovery_last_update: 0.000
status: primary
...

The next component was the automation of the test run, written in a new language. I will write about this in a separate article.

PS Yes, Tarantool is an open source product, and everything I've done is already available in the main project tree on github .

Trying how it works is simple enough: clone the source code, install the necessary libraries (mostly we are talking about Python libraries), compile (cmake. && make), start the server (cd test && ./run --start-and-exit) and connect to it using the new client (cd test && ./tarantool).

Source: https://habr.com/ru/post/113421/

All Articles

Using the experience of testing a relational database for NoSQL technology

More articles: