
Tarantool - NoSQL DBMS, which is developed and widely used in Mail.Ru Group. On the amount of use can be concluded on the publications:
Recently, Mail.Ru Group released a virtual machine with Tarantool preinstalled for Microsoft Azure:
')
We decided to check how well
Tarantool works in Microsoft Azure in comparison with other similar offers - Azure
Redis Cache, Bitnami
Memcached ,
Aerospike and
VoltDB . The word “good” is understood as “fast”, that is, we will compare the number of requests processed per second (Throughput, RPS).
Azure Redis Cache
We will need a Basic C4 (13 GB)
Azure Redis Cache instance in which we enable a non-SSL port (we don’t need SSL for fair comparisons). It is necessary to use the basic level in order to exclude replication. Azure Redis Cache is provided as a service, and we do not have access to the virtual machine. We do not know how it is configured, we can not influence it. The estimated cost of our size Redis Cache is 9765 rubles per month.
Tarantool vm
We need one Tarantool VM Standard D11 virtual machine with 14 GB with HDDs. This configuration will cost us 9067 rubles per month. We will test Tarantool in two modes: with
write-ahead logging enabled (for data persistence) and off, because we do not know for sure whether the corresponding setting is enabled for Redis.
To change the recording mode in /etc/tarantool/instances.enabled/example.lua, change the setting of wal_mode (none for working without WAL, write - with WAL, fsync - a very specific mode of operation, which we will not test).Tarantool, in contrast, for example, to Redis, has TREE-indices, but we need to compare the equal with the equal, therefore the HASH-index was used.
Memcached
We took for the test the image of the virtual machine Standard D11 with
Memcached pre-installed from Bitnami .
In the Azure Marketplace there is another Memcached - a cloud service from Redis Labs , but it is available only in the US, and failed to test it.After deploying the virtual machine, we disabled authentication in the memcached.conf configuration file (option –S).
Memcached cannot keep data persistent.
Aerospike
For Aerospike, we took the
official image from the Azure Marketplace (Standard D11).
VoltDB
But VoltDB in the Azure Marketplace is not. I had to take a
clean virtual machine (Ubuntu 14.04 LTS) and install it manually
from source . But I was pleasantly surprised by the Web out-of-the-box admin, which included live graphics, including the number of requests per second.
Synchronous asynchronous test
We will try to perform a “synchronous-asynchronous” test, that is, the interface will be synchronous, but inside we will work with the connection in an asynchronous manner. This type of test allows you to simulate work through a single connection for multiple synchronous clients. To eliminate doubts about the identity of the test for Redis Cache, Tarantool VM and Memcached, we will move all the common logic into the abstract class
NoSQLConnection , from which we will then inherit
TarantoolConnection ,
RedisConnection and
MemcachedConnection (see the
source ).
There are two queues in the class (normal std :: list) -
OutputQueue (to be sent to the socket) and
InputQueue (received from the socket), as well as the
SendThreadFunc and
ReceiveThreadFunc methods , which run on separate threads and, if there are corresponding non-empty queues, send / receive information by a packet using the
Send and
Receive methods (pure virtual, implemented in successors).
The synchronous interface is represented by the
DoSyncQuery method, which puts the query in the
OutputQueue and waits for the response in the
InputQueue . The test virtual should be powerful enough (we used Standard D3) and be geographically close to the database (we used the location of the “Western region of the USA”).
Due to the special structure of the client libraries Aerospike and VoltDB (built-in event-loop), the test for them was written separately.In the range of
up to 10 client flows in increments of 1, ems are close to a fully synchronous mode of operation (and one flow, in fact, is). The graph shows a more or less linear growth. Redis and Memcached give approximately equal performance, Tarantool is faster, Aerospike is the fastest, but VoltDB, on the contrary, is the slowest.


The next graph is
up to 100 streams in increments of 10 , for Tarantool, Redis and Memcached linear growth continues, Aerospike and VoltDB are “slowed down”, and at different values.


Next,
go up to 1000 in increments of 100 threads . Growth is slowing everywhere, and for Memcached it stops altogether.


Finally, we
go up to 8000 streams in increments of 1000 . Growth stops. After 4000 clients, Memcached stops working - closes the connection, so it cannot be tested. VoltDB dies even earlier - on 3,000 clients.


As a result, we see the leadership of Tarantool on large loads (on small Aerospike, nevertheless, quickly).
And what about the synchronous test?
And here everything is simple. We start a synchronous asynchronous test in one thread, and it obviously becomes just synchronous. But if there are many customers, then many connections will be required ... Well, then we will run several tests in parallel and sum up the results.
Aerospike and VoltDB in this mode were not tested.

We see that the synchronous test has a certain “ceiling”, which is lower than that of a synchronous-asynchronous connection. This ceiling is caused by network overhead.
Price comparison
Tarantool, Memcached, Aerospike and VoltDb are free, you only need to pay for the virtual machine on which they are running. We used Standard D11 (14 GB of RAM), which costs ~ 9067 rubles per month. Azure Redis Cache is a bit more expensive - ~ 9765 rubles per month for the base C4 instance (13 GB of RAM). We visualize.

Agree that nothing is clear? Prices are almost equal ... However, as we saw earlier, these databases have different performance. Let's try to express the cost not in a month, but in a billion queries. First, let's see how much a billion write requests cost at 1000 clients.

VoltDB is an obvious outsider here. Remove it.

Now remove Aerospike and Memcached to look at the leaders close by.

And now, how will the cost change, if we count reading requests on 100 clients.

We leave only the leaders.

findings
In the process of testing, the test process in a synchronous-asynchronous test caused a load on the Tarantool process to up to 70% of the CPU, and in the synchronous mode, to 100%. On all graphs, Tarantool VM, regardless of the WAL mode, showed itself better than its competitors. Note that the presence or absence of WAL does not affect the reading speed from Tarantool (the orange and gray curves on the reading charts are the same), since the disk is not used when reading from Tarantool. In addition, Tarantool VM turned out to be the cheapest solution both per unit of time (per month) and for each request.