Hi, Habr! Most recently, we launched the multiplayer mobile shooter Guns Of Boom in Russia and several other countries, which have already downloaded more than half a million people. To ensure a smooth and uninterrupted game of such a large number of users, a good backend is needed. In this article we will explain why we decided to use the cloud for this, and briefly describe the features of building a backend based on cloud services.
In general, cloud services have a number of undeniable advantages over the classic backend in the form of a server developer working in the basement. Unlike fixed servers, the number of which cannot be dramatically increased or reduced (the game is not popular, “not the season” and the equipment is idle), they can turn on and off the additional power automatically, depending on the load.
Two main advantages of the cloud, helping to understand its deep essence (tm) - providing redundancyand scaling speed . The scaling speed here means the ability to quickly provide additional hardware resources with a sharp increase in the number of users, and with a decrease - to reduce the consumption of server time and save money and resources. ')
In the case of redundancy, we are talking about the automatic duplication of game resources, data or hardware systems in the cloud. Most often, redundancy is provided by storing the same set of data on multiple servers within one or several data centers within the same geography. This is necessary for emergency recovery in case of failures, ensuring automatic failover to available servers and protection against DDOS attacks.
Since we are talking about creating an online shooter, it is very important to ensure low latency in data transfer. At least a quarter of all users play Guns of Boom over 3G / 4G, and smooth gameplay requires a delay not exceeding 100 milliseconds. The video above illustrates the dynamics of the game, which must be ensured and maintained at a qualitative level for each player in any point of the world.
Firstly, this is ensured by automatically connecting the player to the nearest server, located, as a rule, in the same geographical region. Secondly, for maximum performance and smoothness, we use session and meta servers . Meta-server is a subset of real-time services, which we have combined for simplicity of understanding into one role, uncritical in location for gameplay. At some point we considered the use of the micro-services model, but in the end we did not find significant advantages and stopped on the current version.
The game client accesses the meta-server to authorize and synchronize data; this connection is permanent and does not break during the game session. Usually in shooters try to avoid it, but with the current level of load, we decided that this approach is justified. Among other things, it is the meta-server that decides to which session server the client will be connected to at any given time. The game session itself takes place, as the name suggests, on the session server.
Through the use of the cloud, we are able to automatically scale the infrastructure and consume only the amount of resources that are currently required. The cloud can easily provide more hardware resources (CPU, RAM, Internet channel) to handle the increased number of users and, accordingly, gaming sessions.
Scaling works in the opposite direction. In Guns of Boom gaming sessions last for 5 minutes for 8 simultaneously connected players. When a player disconnects from the session server, the load decreases. If less than 4 combat sessions are running on the server (i.e. less than 32 people are connected), it becomes inaccessible to the game client. As soon as all sessions on such a server end, players are redirected from it to other session servers with free slots, and the current server is marked as inactive.
When we made the decision to use the cloud in Guns of Boom, we formulated the following requirements for it:
Support for basic * nix systems out of the box;
Support your own images;
Hybrid infrastructure support;
Support static external ip;
CDN support;
Support for autoscaling by specified parameters (taking into account the speed of deployment);
High sla;
VM support of various configurations;
Support for internal and external load balancers;
Support for services and containers;
Support for CLI and orchestration tools;
The presence of implemented game projects;
Availability of data centers on different continents.
At the same time, it was important to use a platform that has already proven itself to launch industrial projects of a similar scale. Thus, Amazon Web Services was chosen for Guns of Boom.
The next step in the selection was the issue of infrastructure sizing and the types of virtual machines that will be used as the basis. After long testing, we stopped on c4.xlarge machines as the main platform for most services and c4.4xlarge for servers with combat sessions.
One of the main factors of choice for us was the uninterrupted and fast operation of the network subsystem, as well as a large stock of RAM. This turned out to be important when scaling, because when optimizing quality and transferring all resources to RAM (and disabling the use of the paging file), we should be guaranteed to have a reserve even with a heavy load on the server.
AWS, for its part, prepared us some surprises, for which we were not quite ready. For example, the network subsystem, despite Amazon’s Enchanced Networking , was not so simple. It all started with the fact that under the load of several combat servers the network failed at the same time, which led to serious problems in the work of the cluster, disabling part of the nodes, and with them the players.
After an internal investigation, the reasons for the failure in the backend architecture were not found (the logs are clean, there are no errors in the console), but after the Amazon Premium Support was involved, it turned out that this is an unlikely but acceptable variant of the behavior of virtual machines in a small percentage of cases claimed SLA. After that, we had to optimize the architecture to ensure fault tolerance, including in case of hardware failures of the cloud infrastructure.
Thus, cloud services with beautiful SLA also do not solve all your problems and carry additional risks that need to be taken into account.
Now the game is on soft launch in several countries with more than half a million installations. Before launching around the world, we need to make sure that the backend sustains and scales correctly, providing a positive user experience for millions of players.
Unfortunately, many online games at the start are experiencing critical difficulties with the backend immediately after launching and recruiting some critical mass online. From the last such examples - Pokemon GO, Tom Clancy's The Division and Total War: Warhammer.
At the moment, the functions listed above are not implemented perfectly, but we continue to improve client-server communication protocols in order to reduce the network load and reduce delays. Using the geography of the Amazon cloud data center infrastructure, we are actively improving geo-distributed architecture so that everyone can play Guns of Boom equally comfortable. And only after the successful completion of all these points will it be possible to launch Guns of Boom all over the world, making happy all the players who are waiting for him. And we will also tell about this.