
Thanks to everyone who tried to play for the
first time . It is a pity that I disappointed so many people with the terrible game brakes. But I could not guess to their reason, if not you. Now the server is optimized in order, but the number of simultaneous games is increased to just five. This is insignificant, but the point is not server performance, but the fact that in the worst evening hours the speed of my Internet will not allow it anymore. Zamanuhi for the sake of the opportunity to choose the level before the start of the game. And also in response to the
“offensive” comment , it became possible to play 2 to 2. So - a
demo ,
an alternative server ,
another server . Now it remains to hope that I did not hurry much, and the server will not fail. Under the cut, I'll tell you what nonsense I did in the first version.
Profiling
So, to start optimizing something, you need to find the bottlenecks. In the case of node.js, I did the following: run the game with the key --prof
node --prof src/server
Upon completion of the script, the v8.log file will appear in the current folder. To turn it into something digestible, I used the linux-tick-processor utility from the v8 sources. I didn’t go into the details of the linux-tick-processor, but to get it myself and make it work, I had to run several commands:
svn co http://v8.googlecode.com/svn/branches/bleeding_edge/ v8 cd v8 svn co http://gyp.googlecode.com/svn/trunk build/gyp make native
After make native, the out folder with binaries used by the linux-tick-processor will appear in the current folder. To process v8.log, in the v8 folder, execute:
')
tools/linux-tick-processor /full/path/to/v8.log > /full/path/to/v8.out
In the resulting v8.out see the results. The info about profiling is taken
from here . If I do it too hard and someone knows a better way, I will be glad to know.
We imitate the load
The second time to tempt fate and bring down the hopes of hundreds of habra people to play tanchiki I did not want. And I decided to simulate the load on the server on my own.
Selenium , or rather Selenium Server, is quite suitable for this. If someone is not familiar, Selenium Server is a java application that can launch almost any browser and execute various commands: click on links, click buttons, check for the presence or absence of certain text on a page, and much more. Also implemented clients for many programming languages.
For my purposes, I wrote a
small php-script that opens the page, logs in, starts the game and waits 30 seconds. By running this script to run in multiple consoles:
phpunit --repeat=10 test.php
we get a good way to imitate the load.
Optimization 1 - circuit
We start node with the profiler, we process a log, and we watch v8.out. In the file v8.out, the functions are sorted by descending execution time. The first in the log is the function Array.forEach:
ticks total nonlib name 1514 2.3% 14.9% LazyCompile: *forEach native array.js:1019
Then I did not immediately understand what was happening. I decided that this was due to the fact that in some places I use the construction:
someArray.forEach(function(item){ ... }, this);
With each function call with such a code, a closing function will be created. But in most places I do not use the possibility of closure, so painlessly rewrote such code to the following:
Class.prototype.handler = function(item) { ... } Class.prototype.func = function() { someArray.forEach(this.handler, this); }
Such code should be executed faster, but as it turned out later, this was not the case.
Optimization 2 - intersection search
We start the profiler, run the selenium tests and look further:
ticks total nonlib name 31043 3.8% 16.5% LazyCompile: MapArrayed.intersects src/utils/map_arrayed.js:45 26763 3.3% 14.3% Stub: CEntryStub 22800 2.8% 12.1% LazyCompile: *forEach native array.js:1019 16323 2.0% 8.7% LazyCompile: IN native runtime.js:353 13800 1.7% 7.4% Stub: KeyedLoadElementStub 6911 0.9% 3.7% LazyCompile: *MapArrayed.forEach src/utils/map_arrayed.js:59
Well, not surprising. Finding intersections takes too much time, you need to rewrite the map. I described three methods in the
last article : a three-dimensional array with indices-coordinates, a list of rectangles and a tree of nested rectangles. Having played with the tree, I came to the conclusion that with 1000 objects on the map there is no big performance gain. The idea of ​​a three-dimensional array with coordinate indices I don’t like in principle. I stopped at the following method: at the base of it there are also objects defined by rectangles, but in order not to run over all map objects each time, the objects are grouped into small cells. That is, we get the same three-dimensional array, but with a different meaning. The first two dimensions are the cell indices, in fact they are the same coordinates, only with a rough approximation. The third dimension is a list of rectangle objects intersecting with a given cell. The differences from
the Oblitus proposal are that this method does not impose any restrictions on the minimum displacement step for objects and the grain size of such an array can be painlessly varied at its discretion. It is possible that this is still not the best option.
Optimization 3 - forEach
Again, look at the results of profiling. It turns out a strange picture, if I played only one game, then we have:
ticks total nonlib name 51 0.1% 11.6% LazyCompile: *Loggable.log battlecity/src/server/loggable.js:32 12 0.0% 2.7% LazyCompile: MapTiled.intersects battlecity/src/utils/map_tiled.js:106 8 0.0% 1.8% Stub: CEntryStub 6 0.0% 1.4% LazyCompile: *forEach native array.js:1019 4 0.0% 0.9% Stub: StringAddStub 4 0.0% 0.9% Stub: ArgumentsAccessStub_NewNonStrictFast
If I run selenium and the server has lost a few dozen games:
ticks total nonlib name 4108 2.0% 16.1% LazyCompile: *forEach native array.js:1019 3626 1.8% 14.3% Stub: CEntryStub 2176 1.1% 8.6% LazyCompile: *MapTiled.forEach battlecity/src/utils/map_tiled.js:139 2048 1.0% 8.0% LazyCompile: IN native runtime.js:353 1755 0.9% 6.9% Stub: KeyedLoadElementStub {1} 1475 0.7% 5.8% LazyCompile: *Loggable.log battlecity/src/server/loggable.js:32 337 0.2% 1.3% LazyCompile: MapTiled.intersects battlecity/src/utils/map_tiled.js:106 336 0.2% 1.3% Stub: ToBooleanStub_Bool
A strange thing, over time forEach () is eating more and more time. I will say that each object on the map has its own unique id, and this id is global, that is, it only increases with time. But the number of objects on the map does not change from pariah to pariah. This is not a very good technique, you can make the object id local for each playing field, so that id does not increase indefinitely, but in javascript arrays are not stored in C ++. In javascript, this should be something like a hash table, so why is it more and more forEach time? At this point I had a long time hitting my head against the wall, until I thought of conducting such an experiment:
a=[1]; a[1000000]=1; console.time('qwe'); a.forEach(function(i){console.log(i);}) console.timeEnd('qwe'); console.time('asd'); for (var i in a) console.log(a[i]); console.timeEnd('asd');
As a result, we get disappointing results.
FF:
qwe: 163ms asd: 2ms
Chrome:
qwe: 254ms asd: 1ms
well, Opera for the big picture:
qwe: 0ms (188µsec) asd: 0ms (87µsec)
As we see in Opera, forEach () and for (var i in ...) do not fundamentally differ in execution time, but Chrome and Firefox upset me very much, that’s why the server (and the client) started to slow down a lot in a few games. There is nothing to do, rewriting forEach () to for (var i in ...). And, oh, a miracle! Brakes, which I blamed on memory leaks, disappear!
We leave node for a couple of hours by running “phpunit --repeat = 100 test.php” in several consoles and see:
ticks total nonlib name 746 0.2% 16.1% LazyCompile: *Loggable.log battlecity/src/server/loggable.js:28 128 0.0% 2.8% LazyCompile: *Game._stepItem battlecity/src/core/game.js:77 101 0.0% 2.2% LazyCompile: MapTiled.intersects battlecity/src/utils/map_tiled.js:102 61 0.0% 1.3% Stub: CEntryStub 52 0.0% 1.1% Function: EventEmitter.emit events.js:38 50 0.0% 1.1% Stub: SubStringStub 46 0.0% 1.0% LazyCompile: *MapTiled.add battlecity/src/utils/map_tiled.js:24 45 0.0% 1.0% LazyCompile: FILTER_KEY native runtime.js:398
Finally, the results of the profiling appear things about which I assumed, and not unclear where they came from forEach ().
Optimization 4 - traffic
Then I decided to retreat a little from the profiler. The fact is that in searching for the previous optimization, I counted the traffic between the client and the server. It turned out in the midst of the game the client can go up to 30kb / c. Obviously, for such a game as tanchiki - this is an exorbitant figure. But there are several reasons for this. First, when changing only one property, the object is sent to the client entirely. Secondly, the objects are sent to JSON, which also significantly increases the size of the transmitted data. Initially, objects were sent like this:
Bullet.prototype.serialize = function() { return { type: 'Bullet', id: this.id, x: this.x, y: this.y, z: this.z, speedX: this.speedX, speedY: this.speedY, finalX: this.finalX, finalY: this.finalY }; };
which led to the transfer of the string {"type": "Bullet", "id": 777, "x": 123, "y": 456, "z": 1, "speedX": 2, "speedY": 0, “FinalX”: 123, “finalY”: 456} about 100 bytes long. With a little thought, I remade the serialization of the objects so that the array was not an object, but an array:
Bullet.prototype.serialize = function() { return [ battleCityTypesSerialize['Bullet'],
as a result, we get about 25 bytes [0.777,123,456,2,0,123,456]. Traffic dropped to about 7-8kb / s at the height of the game. It can be reduced several times, passing only changed properties and passing only control commands, but I left this alteration for the future.
Optimization 5 - synchronization with the client
The synchronization algorithm from the
previous article was unsuccessful. The only reason for this choice, rather than the instant distribution of changes to all customers who are interested in them, was that a newly connected customer can receive all changes in the past in the same way as the current data. Also, during the implementation of this method, I came up with the idea of ​​grouping objects in a collection, and looking for updates not to the objects themselves, but to collections of objects. Such collections include “all users”, “messages in the general chat”, “list of games”, “users in the current game”, “messages in the current game” and “objects on the map of the current game”.
A new method of synchronization is the immediate distribution of changes to User objects, where they accumulate. And the User object still sends data to the browser with packets every 50ms. The question remains when and how to synchronize the original data? I decided to add the User 2 object to the object: watchCollection () and unwathCollection () and at the time of connecting to the group of objects, User sends all the objects to the client as newly created:
ServerUser.prototype.watchCollection = function(collection, syncKey) { this.unwatchCollection(syncKey);
Thus, immediately after the user is authorized on the server, the User object is connected to three groups of objects (collections):
user.watchCollection(registry.users, 'users'); user.watchCollection(registry.premades, 'premades'); user.watchCollection(registry.messages, 'messages');
And during the entry into the game and exit from the game, the user respectively connects and disconnects from other collections of interest:
Premade.prototype.join = function(user, clanId) {
It is a pity that simple and correct thoughts do not immediately come to mind.
As a result, we have:
ticks total nonlib name 2074 0.5% 9.9% LazyCompile: *Game._stepItem battlecity/src/core/game.js:29 751 0.2% 3.6% LazyCompile: MapTiled.intersects battlecity/src/utils/map_tiled.js:102 489 0.1% 2.3% LazyCompile: MapTiled.traversal battlecity/src/utils/map_tiled.js:132 376 0.1% 1.8% LazyCompile: FILTER_KEY native runtime.js:398
Judging by the fact that Game._stepItem came out on top, and began to run 9.9% of the time, not 2.8%, as before, we consider the remake successful. At this point, the server is loaded at about 50% with 10 simultaneous games. I did not dare to put 20 games at the same time in the demo, due to the fact that in the worst evening hours, the speed of my Internet drops to 200kByte / s and lower.
Optimization 6 - bypass game objects
Initially, I did not think about it, and for each object on the field in the loop I called the _stepItem () function:
Game.prototype._stepItem = function(item) {
This function is called for all pieces of the wall, and also checks the prototype of each object. To get rid of this disgrace, I started an array of stepableItems, which changes when adding and removing objects from the map. And you no longer need to do complex checks in the frequently called function:
Game = function Game(level, premade) {
As a result, I returned again to the intersection of objects, but at a completely different level:
ticks total nonlib name 129 0.0% 2.4% LazyCompile: MapTiled.intersects battlecity/src/utils/map_tiled.js:102 66 0.0% 1.2% Stub: SubStringStub 54 0.0% 1.0% Stub: CEntryStub 47 0.0% 0.9% Function: EventEmitter.emit events.js:38 39 0.0% 0.7% LazyCompile: MapTiled.add battlecity/src/utils/map_tiled.js:24 30 0.0% 0.6% Function: Socket._writeOut net.js:389
Now 5 games simultaneously occupy 15-16% of CPU time. That is, my old server should pull about 30 games in one stream.
Bug with inheritance
I had to contend with one bug, the reasons for which deserve attention. When inheriting, I forgot to call the parent "constructor":
function Parent() { this.property = []; } function Child() {
As a result, the property array was only in prototype, and not in this. That is, shared between all instances of Child, which did not cause any run-time errors, but led to a hard-to-find bug. All the same, with javascript it costs nothing to shoot yourself a leg.
Future plans
The first thing I plan to do is reduce traffic, and I think I will start with an idea floating in the air - transferring only control commands to the client, such as “start”, “stop”, etc. I thought there might be problems with desync. But it seems to me that they are easy to solve, if you send coordinates for synchronization with each control command.
You also need to optimize the client, but I have no specific ideas yet. More precisely, I have not yet profiled the client, and I don’t know what it’s slowing down there. In general, this project is interesting to me not for the sake of the game itself, but as a subject for any ideas. For example, in order to read books on the PLO or some articles, I had at hand a code to which I could apply the knowledge gained. So I will be glad to advice that you can read.
First part