
Many of you have already heard about our project
Tarantool . This is a DBMS, or, quite simply, a database with an application server inside. Tarantool is an open source project, and anyone can work with it. This project has been developing for more than eight years. In Mail.Ru Group Tarantool is actively used in more than half of the products: in Mail, Cloud, My World, Agent, etc. All the improvements we have made to this database we commit to GitHub, and the community has the same version of database available to us. . Now we have client libraries for almost all languages, we have greatly added in this direction over the past year. Some of them are written by the community, and some by us. If any more efficient library appears, we simply make it official. We are trying to make everything out of the box - both the database and the library.
One of the main features of Tarantool is the combination of database and cache properties. A database is something reliable, with transactions, a server query language. And the cache is fast. And both of these worlds organically merge into Tarantool. This database is intended for use in high-load projects and for working with hot data.
Comparison with classic solutions
If you are working with “traditional” databases, such as MySQL or Oracle, you probably have come across the fact that your system does not have enough cache properties: high speed, small Latency, and more. In traditional databases, all this is not. Caches also have their drawbacks, including the lack of transactions. Even if you use a cache + DB, for example, MySQL in conjunction with Memcached or PostgreSQL in conjunction with Redis, it still leads to compromises: you have partially lost the properties of the database, for example, there are no transactions, storages, secondary indexes. Also, some cache properties are lost, for example, a large write bandwidth. At the same time, new problems appear, the most serious of which is data inconsistency and a cold start.
')
But if this compromise does not suit you and you need all the advantages of cache and database, then pay attention to Tarantool. He is deprived of all the above disadvantages. Tarantool is very simple. Roughly speaking, it stores two files on a disk: a snapshot of data at some point in time and a log of all transactions from that point in time. We conducted a cold start speed test. Files are read into memory from a magnetic disk at a speed of 100 Mb / s. That is, for example, a database of 100 GB is considered to be 1000 seconds - approximately 15 minutes.
For comparison, when we played with MySQL and PostgreSQL, everything was much worse there. They store data on disk. There is no such problem that the database does not respond until everything is loaded into memory. But their cache warms up much slower (1–2 Mb / s), and therefore you need to resort to different tricks, like index preheating. Those who adminit MySQL are well aware of this. Tarantool just gets up and running. The cold start time is the minimum possible.
disadvantages
However, not everything satisfies us in this database. The first thing we work on is disk storage. Tarantool was originally created as an in-memory database. Due to its speed and low need for servers, it is still better at the cost of ownership than traditional disk databases. But since Tarantool is an in-memory database, the question arises: what to do with cold data? It works efficiently with hot data, but everything is in memory, including cold data. Therefore, we are developing disk storage. By the way, in our production, all Tarantools work on the cheapest SATA disks. SSD can be delivered only for a quick start, but in the presence of replicas this is irrelevant.
So far, we are not doing anything with cold data. Their ballast is more than paid off by the speed of work and an incredibly small number of servers. For example, user profiles are processed by just eight Tarantools, and in MySQL it would be a farm of thousands of servers. But if we worked better with the repression of cold data, then Tarantools could be not eight, but four.
We also develop automatic cluster solutions. We now have several, but they are not universal. And we want to make one correct universal, so that you can put Tarantool on ten servers, and everything inside shards, decides, replicates, and so that the head does not hurt.
In addition, we support different systems, for example, SQL. Again, it is still in an unstable state, but we have high hopes. SQL support is needed mainly so that you can easily migrate. In the same Mail.Ru Mail is under a hundred MySQL-servers, whose load can be transferred to a couple of Tarantool'ov. But since there is no SQL support, you need to rewrite a ton of code. So it's easier to do support once.
We use our own allocator, such as Slab-allocator, which allows minimizing the effect of memory fragmentation. But he is still not perfect, we are constantly working to improve it.
How to calculate the amount of memory for Tarantool'a
Tarantool has a very good Memory Footprint, which means a little overhead to store data. The size of the data on disk (or in memory) is only slightly larger than the size of the clean data. If you need to store 1 billion lines, each of which has ten fields, the field size is four bytes, it will be 4 x 10 x 1 billion plus 1-10% of the overhead to the control structures.
Our yuzkeysy
Mail.Ru Group Tarantool is used to solve a variety of tasks - a total of several hundred installations of this database are gathered, three of which are the most heavily loaded: an authentication system, a push notification system and an advertising display system. I'll tell you more about each of them.
Authentication system
Login and password authentication system
Session / Token Authentication SystemProbably, every site and mobile application has such a system. We are talking about checking login-password or session. This is a central system, our entire portal uses it to authenticate users. This system has very interesting requirements, which may even seem incompatible:
- Demand Every page, every Ajax request, every mobile API call calls on this system to authenticate.
- Low response time. Users do not like to wait, they want to quickly receive all the information. That is, each call must be quickly processed.
- High availability. The authentication system should not be the cause of the 500 error. If it cannot service the request, then the user is not served at all, because then the entire stream of execution of the server request does not go.
- Constant requests to the repository. Each authorization system hit is a session or login-password check, i.e., a certain Select in the database, and sometimes even an Update. There are also anti-bruteforce and anti-fraud systems - it is necessary to check for every hit whether the user appeals with good intentions. Each authorization system hit can update something, for example, the last session time. If this is an authorization by login and password, then you need to create a session, which means to insert into the database. When anti-bruteforce-check you need to record the location of the user (IP-address or something else). That is, there are a lot of reading and writing processes. The authorization system is constantly being attempted to be hacked, which creates an additional burden, because every time it accesses the database in order to then deny the attacker an authorization.
- Large amount of data. This system should contain information about all users.
- Data must quickly expire. In fact, it is also updates. For example, user sessions must be expired.
- Persistence Everything must be saved to disk, every change. User sessions cannot be stored in Memcached, because you will lose them if the server crashes, and users will have to re-enter their login and password. And they do not like to do this.
Some of these requirements are met only if you are using a cache. For example, high load, expiration and other things that are characteristic of the cache. Other requirements are met only if you are using a database. Therefore, the system must be based on both the cache and the database combined into one solution. It should be reliable and durable, like a truck, but at the same time fast, like a sports car.
Now the load of checking login passwords on our authentication system is about 50 thousand requests per second. It seems that not so much, but for each request you need to do a lot of work, including check anti-bruteforce, perform many transactions in the database, etc. Tarantool successfully copes with all this.
But the number of authentications per session reaches 1 million per second. This is what comes from around the portal. Only 12 servers hold this load: four with sessions and eight with user profiles. At the same time, they are loaded only by 15–20%, that is, the margin of safety is very large. We just like to re-lay as usual.
Push notification system

Now more and more users are moving to the mobile segment. They mainly use applications there, rather than the usual mobile web. And in applications there is such a thing as push-notifications. When a certain event occurs on the server side and you need to notify the mobile device about it, how is it usually arranged? The mobile application itself does not need to keep the connection to the server, this happens at the operating system level, which connects to the corresponding web gate and periodically checks for push notifications. That is, the server code goes to a special API from iOS and Android, which themselves are connected to operating systems on mobile devices.
To connect with these APIs and send data, you need to somehow identify the user, so the token is sent. The token must be stored somewhere. Moreover, one user needs several tokens, because he can have several devices. And you need to send a token to every event on the server about which you want to notify the user. And there are many such events. The more often you notify the user, the more often he uses your application. Therefore, for the system of push-notifications you need a fast and reliable database.
We used Tarantool simply because we have a huge number of requests and transactions, we need to do a lot of checks to send a push. And do it quickly. We can not slow down in this place, because this is Server Side, which depends on the work of many processes that consume a lot of memory.
Do you think it is good if Server Side connects directly to Android or iOS? This is bad for several reasons. First, in terms of architecture - because you are losing versatility. After all, there may be Windows Mobile or someone else, the complexity of development will increase, you will need to modify a bunch of systems. Secondly, you have an additional point of failure, and the whole interaction becomes much more complicated. And thirdly, these mobile APIs can slow down or fall. They do not guarantee a quick response, they can respond for a few seconds. Therefore, we need some kind of layer, a queue, where all changes are placed, and from there they fly off to Apple and Google, to their API. We can not lose these notifications. So the queue should keep everything on the disk, but at the same time be very fast. Tarantool fully satisfies these criteria. Our system can withstand quite a large load - 200 thousand requests per second, and write, and read. Each call to the queue is a record, a transaction that replicates to multiple replicas. Nevertheless, everything works very quickly.
Advertising display system

We have a large portal, and we show the user ads on almost every page. This process is controlled by an ad system called Target. One of the main problems of the advertising system is that it should work super fast and keep a lot of work. Even more than the authentication system. Because sessions are a call to the database, there can be several calls to each call.
Advertising is shown not only on our pages, but also on the pages of partners, and this is also a very large load. For example, on the page a dozen ad units. For each of them, you need to go to data sources with information about the user profile, aggregate the result, determine what kind of advertising to show, display it. And all this is done quickly (the standard is now 50 ms), because users do not like to wait. In addition, advertising does not carry any functionality for the user, and it certainly can not serve as an excuse for the slow operation of services.
Our advertising display system is one of the largest and most heavily loaded Tarantool clusters in the world: 3 million operations and about 1 million transactions (updates) are performed every second.
Finally
Tarantool is born for high loads. If you have a low load, it will provide a good response time - one millisecond or less. Traditional databases, even on a small load, do not know how to issue a response at such a speed. And often you need to make a few hits, all these milliseconds add up, and it turns out quite sad. Tarantool will provide you with a high RPS, low Latency, high Uptime, help you squeeze all the juice from the iron, which is possible, and at the same time you will have a database with transactions, replications and server procedures.