📜 ⬆️ ⬇️

Tarantool vs. Microsoft Azure Comparison

image

Tarantool - NoSQL DBMS, which is developed and widely used in Mail.Ru Group. On the amount of use can be concluded on the publications:


Recently, Mail.Ru Group released a virtual machine with Tarantool preinstalled for Microsoft Azure:
')

We decided to check how well Tarantool works in Microsoft Azure in comparison with other similar offers - Azure Redis Cache, Bitnami Memcached , Aerospike and VoltDB . The word “good” is understood as “fast”, that is, we will compare the number of requests processed per second (Throughput, RPS).

Azure Redis Cache


We will need a Basic C4 (13 GB) Azure Redis Cache instance in which we enable a non-SSL port (we don’t need SSL for fair comparisons). It is necessary to use the basic level in order to exclude replication. Azure Redis Cache is provided as a service, and we do not have access to the virtual machine. We do not know how it is configured, we can not influence it. The estimated cost of our size Redis Cache is 9765 rubles per month.

Tarantool vm


We need one Tarantool VM Standard D11 virtual machine with 14 GB with HDDs. This configuration will cost us 9067 rubles per month. We will test Tarantool in two modes: with write-ahead logging enabled (for data persistence) and off, because we do not know for sure whether the corresponding setting is enabled for Redis.

To change the recording mode in /etc/tarantool/instances.enabled/example.lua, change the setting of wal_mode (none for working without WAL, write - with WAL, fsync - a very specific mode of operation, which we will not test).

Tarantool, in contrast, for example, to Redis, has TREE-indices, but we need to compare the equal with the equal, therefore the HASH-index was used.

Memcached


We took for the test the image of the virtual machine Standard D11 with Memcached pre-installed from Bitnami .

In the Azure Marketplace there is another Memcached - a cloud service from Redis Labs , but it is available only in the US, and failed to test it.

After deploying the virtual machine, we disabled authentication in the memcached.conf configuration file (option –S).

Memcached cannot keep data persistent.

Aerospike


For Aerospike, we took the official image from the Azure Marketplace (Standard D11).

VoltDB


But VoltDB in the Azure Marketplace is not. I had to take a clean virtual machine (Ubuntu 14.04 LTS) and install it manually from source . But I was pleasantly surprised by the Web out-of-the-box admin, which included live graphics, including the number of requests per second.

Synchronous asynchronous test


We will try to perform a “synchronous-asynchronous” test, that is, the interface will be synchronous, but inside we will work with the connection in an asynchronous manner. This type of test allows you to simulate work through a single connection for multiple synchronous clients. To eliminate doubts about the identity of the test for Redis Cache, Tarantool VM and Memcached, we will move all the common logic into the abstract class NoSQLConnection , from which we will then inherit TarantoolConnection , RedisConnection and MemcachedConnection (see the source ).

There are two queues in the class (normal std :: list) - OutputQueue (to be sent to the socket) and InputQueue (received from the socket), as well as the SendThreadFunc and ReceiveThreadFunc methods , which run on separate threads and, if there are corresponding non-empty queues, send / receive information by a packet using the Send and Receive methods (pure virtual, implemented in successors).

The synchronous interface is represented by the DoSyncQuery method, which puts the query in the OutputQueue and waits for the response in the InputQueue . The test virtual should be powerful enough (we used Standard D3) and be geographically close to the database (we used the location of the “Western region of the USA”).

Due to the special structure of the client libraries Aerospike and VoltDB (built-in event-loop), the test for them was written separately.

In the range of up to 10 client flows in increments of 1, ems are close to a fully synchronous mode of operation (and one flow, in fact, is). The graph shows a more or less linear growth. Redis and Memcached give approximately equal performance, Tarantool is faster, Aerospike is the fastest, but VoltDB, on the contrary, is the slowest.





The next graph is up to 100 streams in increments of 10 , for Tarantool, Redis and Memcached linear growth continues, Aerospike and VoltDB are “slowed down”, and at different values.





Next, go up to 1000 in increments of 100 threads . Growth is slowing everywhere, and for Memcached it stops altogether.





Finally, we go up to 8000 streams in increments of 1000 . Growth stops. After 4000 clients, Memcached stops working - closes the connection, so it cannot be tested. VoltDB dies even earlier - on 3,000 clients.





As a result, we see the leadership of Tarantool on large loads (on small Aerospike, nevertheless, quickly).

And what about the synchronous test?


And here everything is simple. We start a synchronous asynchronous test in one thread, and it obviously becomes just synchronous. But if there are many customers, then many connections will be required ... Well, then we will run several tests in parallel and sum up the results.

Aerospike and VoltDB in this mode were not tested.





We see that the synchronous test has a certain “ceiling”, which is lower than that of a synchronous-asynchronous connection. This ceiling is caused by network overhead.

Price comparison


Tarantool, Memcached, Aerospike and VoltDb are free, you only need to pay for the virtual machine on which they are running. We used Standard D11 (14 GB of RAM), which costs ~ 9067 rubles per month. Azure Redis Cache is a bit more expensive - ~ 9765 rubles per month for the base C4 instance (13 GB of RAM). We visualize.



Agree that nothing is clear? Prices are almost equal ... However, as we saw earlier, these databases have different performance. Let's try to express the cost not in a month, but in a billion queries. First, let's see how much a billion write requests cost at 1000 clients.



VoltDB is an obvious outsider here. Remove it.



Now remove Aerospike and Memcached to look at the leaders close by.



And now, how will the cost change, if we count reading requests on 100 clients.



We leave only the leaders.



findings


In the process of testing, the test process in a synchronous-asynchronous test caused a load on the Tarantool process to up to 70% of the CPU, and in the synchronous mode, to 100%. On all graphs, Tarantool VM, regardless of the WAL mode, showed itself better than its competitors. Note that the presence or absence of WAL does not affect the reading speed from Tarantool (the orange and gray curves on the reading charts are the same), since the disk is not used when reading from Tarantool. In addition, Tarantool VM turned out to be the cheapest solution both per unit of time (per month) and for each request.

Source: https://habr.com/ru/post/281841/


All Articles