📜 ⬆️ ⬇️

Our report on the strike: "How to get rid of persistent database dependencies"

On April 10-11, our team took part in the largest in the Russian regions IT conference "Strike", which was held for the fourth time in Ulyanovsk. IT companies presented their stands where they could get acquainted with their products, learn about vacancies, take part in competitions.

This year, XIMAD also decided to present a stand with its products at the conference in connection with the development of the mobile section. This is a profile direction for us. We listened to the reports, exchanged experiences with colleagues and answered many questions about the technologies used in our games.

In two days, 130 experts spoke at 8 platforms and a total of 150 topical reports were delivered in various areas. Our developer Alexey Klyuchnikov presented his report on “We are doing interactive in mobile games or how to get rid of persistent database dependencies” on the example of our flagship Magic Jigsaw Puzzles (2M MAU, 600K DAU and up to 50K online players with interactive interaction).
')
Below is the text of his speech:


What is the main problem with the server part of gamedev? And the fact that this is hiload! This is not hiload in only one case, if the project is not completed. If the game enters the market, advertising will go on, the flow of players will go and for the first few days it will be, albeit small, but quite real hiload. And if you overtake success ...

Visiting the conference at high loads, it became clear that all are working in approximately the same direction. The first thing that resists any highly loaded project is the data, and most of the speakers told how to replicate, shard, denormalize, etc. We have not escaped this fate, but the path is chosen somewhat different. It is proposed to minimize work with the database. Essentially, get rid of her. How to do it? A very simple.

Idea
We write the server to which the player will log in, the selected process is started for the player, a profile is loaded into it from the database, and then all player actions take place in the process. As a player, he does not show activity for some time, even 10 minutes, save the profile to the database and complete the process. As a result, we have one reading from the database at login and one record in the database at logout and that's it!

What do we want?
We received one reading and one record for one gaming session. So, we can count and predict how many players our decision will pull. I think many people can figure out how much a simple key / value label can produce, for example, in mysql, read operations per second. And how many players such a base will last a day. The number will turn out impressive, and here on this number we have fenced ourselves from problems with the database. What could be better?

Implementation
To implement, we take Erlang, since it works well with processes and ... and that's it.
What Erlang gives us: processes out of the box, can start them, stop and send messages between them. This means that one player's process can send a message to another process of another player. And on the same principle to interact with the processes that provide the game logic. Interactivity in this case is also obtained almost out of the box.

We will understand in order with the nuances.


Everything is canonical, each player is assigned a unique identifier during registration, and all the addressing is carried out over it in the future. Sometimes it may be tempting to use additional keys for addressing, but this should be avoided for the following reasons: addressing is used to send messages when the message is transferred offline to the player, we must start the process, load this player’s profile into it and only then send the message to it. But if we use different keys when addressing, we have a chance to get into a difficult-to-track collision, when 2 or more processes start per player.


The recorder built into Erlang has serious drawbacks that do not allow its use for dynamically starting and terminating processes, so we take the gproc recorder. Registrar is required to register the processes and issue their Pid upon request. And at the completion of the processes or when they fall to produce their "deregistration".


As mentioned above, when a message arrives to a player, we turn to the registrar asking which Pid to send the message, if there is no process for such a player, you need to start it. Each operation takes, albeit small, but time and it is possible that two messages will come to the same player, at about the same time both of them will receive a negative response from the registrar and try to start the processes. As a result, one of them starts first, and the second receives an exception, and the message will be lost. We cannot start processes asynchronously and must organize a queue to start them. To do this, we start a process in which we will direct all our appeals to the registrar and which will start the processes. But we get a bottleneck, so we need not just one such “process_starting_worker”, but a pool, for example, out of 100 stations, among which to distribute all references to the users ’id by any convenient algorithm, even the remainder of the division.


Stopping processes is no less interesting. When it is time to complete the process, we need to perform a series of actions, such as saving the profile to the database, signing out from the registrar, sending a farewell message to all the friends, and actually completing the process. All these actions can not be done one after another, because the player may suddenly come to life, or he just might receive a message while we are engaged in saving the profile. Therefore, after each operation, you need to read the message queue, and if something is found in it, then process it, and in the case before leaving the registrar, return the process to its normal state, and after checking out the registrar, honestly reply to the sender that the process is unregistered. that the sender must re-send the message.


As you can see, our recorder is used for each message and this makes it a bottleneck, so it makes sense to cache Pids in processes. After the first exchange of messages between processes, each of the processes remembers the opponent's Pid, ​​and in the future they communicate without contacting the registrar. That is why at the end of the process, an action is added to notify all Pids from the cache about their completion, so that everyone can clear their caches from the terminating process.

The second thing to think about is an optimization for reading. Should a player see his friends play? And how should he receive this information? Every time I interview all my friends, or each friend should “boast” to all his friends who have registered this result in their profile and will not generate any queries to display the results of friends, but will give it immediately from their profile. Which approach to choose depends on the nature of the data usage. If reading is more frequent than writing, then it makes sense to go this way.


First, you can estimate our load and get that the server under the database + server under our code will stretch several hundred thousand players a day. Erlang fairly honestly uses memory, so if you use an average of 100kb under the player profile, then to serve 50 thousand players you will need 5GB of RAM. In other words, we take a server for 32-64GB and with a high probability we forget about the need to scale up, until the deafening success of the project.

Secondly, if the deafening success nevertheless has come, then nothing prevents “scaling down” the database on the player's id and distributing the players using DNS to different Erlang nodes. The problem is only with our registrar, he should be able to work in cluster mode. Gproc can, but as shown by tests - not to the end. All you need is to patch it up a bit or to get another registrar, but this is a separate topic, perhaps for a separate article.

Conclusion
The decision was not as simple as it might seem. There are still a lot of questions about messaging, how to guarantee their delivery, how to roll back, for example, message chains, which messages to transmit synchronously which asynchronously, etc.

But the most important conclusion from the use of such an architecture is that we crammed into the unwelding and got a workable service. What would be impossible, using the classical implementation of each player's sneeze with a query in the database.

Source: https://habr.com/ru/post/257155/


All Articles