Skyforge load testing. One year later

More than a year has passed since the publication of articles on load testing of Skyforge - a new MMORPG from the studio Allods Team. Since then, much has changed: the design of Habr, Ubuntu was updated to 14.04.1 LTS, Java 8 was released, and most importantly, the development stage of the project changed. The first closed testing on external users took place, and soon there will be a stress test - an invitation to the maximum possible number of “live users” on servers within the PTA or MBT. But I will not take away the work of our marketing team, I will tell you better about what is new in our load testing, what we have rethought, and that this may be useful for the general public.

Summary of the previous parts

Skyforge is a MMORPG whose action takes place in the sci-fantasy world. The world in the game will be the same for all territories. That is, all players in Russia and other countries of the former USSR will be able to complete tasks together, save the world and become gods. There will be no division by servers.
Skyforge server is written in Java, the architecture is described in great detail in the corresponding randll post .
Databases - PostgreSQL + distributed transactions.
Bot - a program written in C + + and imitating the actions of a real player. Bots work according to the same protocol as an honest game client, use the same set of commands, and, in general, from the point of view of the server, they are slightly different from the regular client.
Load testing - a set of measures aimed at obtaining information about whether the server is able to hold the load. We run load tests of a different nature several times a day. The average test takes 40 minutes, while the net test time is in the range from 60 to 80 minutes.

More load tests

For quite a long time, the “client” load tests remained the only load tests that we conducted. But as time went on, ambitions grew, needs changed, and tasks emerged that required testing the load more than we could give using client bots. The restriction was primarily due to the fact that client bots were doing a very large number of “third-party” things - they made decisions, honestly checked some conditions, played, in the end. So server bots, written in Java, devoid of any logic and just giving off heat began to appear. Now we have three types of such "bots":

database - blindly send database operations using as a source profile the profile of real players from closed tests, and random data;
chat bots - do the same thing as database, only for chat services;
statistics generators - the idea is exactly the same as in the two previous cases, but for the statistics subsystem.

These tests proved to be very good exactly as load ones, and we did not expect more from them. They are not able to find errors that lie beyond the simple "does not work." But they have a very good repeatability of the result, much cheaper in development and, as a result, support. If we talk about saving more and in the glands, it turns out something like this:

for testing 10k CCU client bots we need a total of 7 (load objects) + 10 (bots) = 17 servers;
for testing database 50k CCU server: 4 + 2 = 6 servers;
100k CCU chat: 4 + 2 = 6 servers;
100k CCU system statistics: 2 + 1 = 3 servers.

This is primarily due to the fact that the further we are from the combat configuration, the more we can afford. For example, in the statistics system test, in principle there is not a single spare part related to the game itself, only the applications that process the data. In chat or database tests, we deliberately do not load the game mechanics, keeping the game realm in the minimum launch configuration, and only the object of the load is in fully combat mode. It is also worth noting that the smaller subsystems involved in the test, the higher the stability of the test.

Client bots

But no matter how beautiful the server bots are, we do not intend to refuse client bots. Because the benefits of them are significantly greater, and the load profile is as close to real as possible. Therefore, over the past year they have also been significantly improved. Now they are almost completely honest can pass a significant part of the game content. At the same time support is required in a minimum quantity. It looks something like this: the bot appears on the map, looks into his quest tracker, sees there is an instruction to run to point A and runs. Due to the fact that the bot is trained to interact with the outside world, at point A he will consistently try to talk to someone, interact with something or kill all the aggressors. Almost like in that bike: can it eat me? And I him? And can I copulate with it? Is it with me? :)
')
Also, the optimization of the client’s own game did not pass by our bots either, since their memory consumption was significantly reduced. And now we can run twice as many bots from one physical machine - 2k instead of 1k.

We now carry out client tests according to the following scheme: everyone passes the start of the game (the most important moment for us in terms of load), everyone plays somehow (the profile of players participating in different activities is taken from the head), everyone plays on a certain map. This allows us to find bad, in terms of load, maps and promptly intervene in the process of creating them. See what load profile we have in quiet time, and be sure that we are fine with the start of the game.

Without these tools, load tests would be 10 times dirtier.

Perhaps this is the most useful part of the article. When conducting load tests, it is not enough to know whether the server is holding the load or not. The most important thing is the ability to quickly understand what is going wrong. Here Java Mission Control and its feature - Flight Recorder make an invaluable contribution. Unfortunately, this option on combat servers is quite expensive ($), so we use it only in tests. It looks something like this:

-XX:+UnlockCommercialFeatures # JMC
-XX:+FlightRecorder #
-XX:StartFlightRecording=name=skyforge,filename=skyforge.jfr,delay=40m,duration=10m,settings=jmc.jfc

You can read more on the Oracle website .

Further this dump can be opened by means of JMC. All necessary information will be presented in the dump: allocation statistics, who ate CPU time, process contribution to the total load server cpu, and more. JMC is good, but since we cannot afford it on the combat servers, we use the old-fashioned method - GC logs, from which we pull out the following information: how much time we spent per minute in gc, total application stop time for the same period, What objects were before FullGC, what - after:

-XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintClassHistogramAfterFullGC -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -Xloggc:memory/gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=memory/heap.dump

Sample graphics:

An example of statistics before - after:

Just in case, we start all the servers with the option of remote debugging. This saves a lot of time when something goes wrong, but from the logs the exact cause of the problem is unclear:

-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=51003

Own statistics

In addition to using ready-made profiling tools, we actively developed our own. So, for example, we log every spell that a player conjures, measuring how much CPU time was spent on it. This allows you to make decisions about which abilities and mechanics should be optimized in the first place.

We also conduct similar statistics for database operations; we know not only which operations were performed:

But the time of their execution:

In order to optimize traffic, you also have to make your own decisions. Therefore, we measure exactly which messages were sent, taking into account both their number and volume.

Optimization when building test reports

With the increase in the number of tests and the number of graphs, it became clear that it was an unaffordable luxury to prepare the test, conduct it and analyze it in one process. In this regard, the analysis of the test results and the construction of the report were made into a separate service that is not related to the CI system. This allowed freeing up time for running additional tests.

Also, the allocation of a separate service for building reports contributed to the emergence of a single entry point for viewing data from load tests, combat servers, or other test benches.

Our rake

During tests it is very important to control the infrastructure on which these tests are carried out. I already mentioned in previous articles that we had problems with the CPU Frequency Governors , when the process clock frequency was artificially lowered in order to save electricity. So, we again fell for it. Now we are thinking how to embed the check of these flags into the server. And in date-based services, for example, we added a check that a synchronous replica is configured on the databases. Because its sudden "shutdown" gives a noticeable performance boost. In general, I advise you to add environment checks directly to the services themselves. This ensures that your servers are operated and tested in the environment for which they are designed.

findings

First of all, I would like to note that load testing, like any other means of improving software quality, brings the maximum benefit only when it is used constantly. Yes, test support takes effort, but it's worth it. It is better to spend these efforts in a relaxed atmosphere than in a fire mode.

Secondly, if you have a large and complex distributed system, then, apart from integration load tests, it may also be advisable to carry out load tests on individual components. This is usually cheaper, and such tests can be made more flexible.

And, thirdly, the load tests are also useful because a significant part of the strapping created for them can work very well in combat conditions.

That's all. As always, I will be happy to answer your questions in the comments.

Source: https://habr.com/ru/post/234223/

All Articles