History of one optimization: transmission and processing of the results of the battle

Today I will tell you about a small part of a large project - World of Tanks . Many of you probably know World of Tanks from the user side, but I suggest looking at it from the developer’s point of view. This article will discuss the evolution of one of the technical solutions of the project, namely, the transfer and processing of the results of the battle.

Under the hood

To understand the essence of the problem at hand, I will first describe how the Tanks are arranged. The project is a distributed client-server application that is represented on the server side by several nodes of different types. All physics and in-game logic are calculated on the server, and the client is responsible for user interaction — handling user input and displaying results. A simplified cluster diagram looks like this:

BaseApp nodes are responsible for interaction with the client. They receive information from the user over the network and transmit it to the inside of the cluster, and also send the user information about events that have occurred within the cluster. All physical interactions occurring in the arena are calculated on CellApp nodes. Information about events that occurred with a tank during a battle is also collected there: the number of shots, hits, penetrations, etc. One battle can be served by several CellApps, each of which can be calculated by several users from different battles. At the end of the battle, packets of statistics on all tanks are sent from CellApps to BaseApp. It aggregates the packets and processes the results of the battle: it calculates fines and rewards, checks the execution of quests and gives out medals, that is, it forms other data packets that it sends to users. It is important to note that CellApps and BaseApps are isolated processes. Some of them may be located on other machines, but within the same cluster. So data transfer between them takes place via network interfaces, and between the client and the cluster via the Internet.
For the transfer of data packets, a protocol with guaranteed delivery implemented over UDP is used. Under the hood, for all I \ O and guaranteed delivery on top of UDP, the Mercury library is responsible, which is part of the BigWorld Technology platform. In fact, we can only affect the data before sending it - prepare it in such a way as to optimize the cost of sending / receiving.

For obvious reasons, we want to reduce the amount of data transmitted between nodes to a minimum: this will reduce traffic bursts, delay in reception and transmission, and significantly reduce the amount of data transmitted to the client with information about the results of the battle. For data pre-processing, one tick equal to 100 ms is allocated, during which other events can also be processed. Therefore, pre-processing should take as little time as possible. Ideally, it should not take time at all.

Start

Let's take a look at the data preparation process.
Most of the in-game logic World of Tanks is written in Python. The data sent between CellApp and BaseApp, as well as the data sent to the user at the end of the battle, are dict with a variety of values from simple (like integer / fractional numbers or True / False values) to composite dictionaries and sequences. Neither code-object, nor classes, nor instances of user-defined classes are ever passed from one node to another. This is due to safety and performance requirements.
')
To make it easier to consider further material, we will give an example of the data with which we will work further:

data = { "someIntegerValue" : 1, "someBoolValue" : True, "someListOfIntsValue" : [1, 2, 3], "someFloatValue" : 0.5, "someDictionaryValue" : { "innerVal" : 1, "innerVal2" : 2, "listOfVals" : [1, 2, 3, 4], }, "2dArray" : [ [1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24] ] }

For direct data transfer between nodes, the dict instance must be converted into a binary data packet of the minimum size. When making a remote call, the BigWorld engine provides the ability for a programmer to convert Python objects into binary data and vice versa using the cPikcle module.
As long as the amount of data transferred was small, they were simply converted to binary format using the cPickle module (let's call this version 0 exchange protocol) before sending. It was in this form that the first closed beta version of the 0.1 tanks back in 2010 came out.

The advantages of this method include simplicity and sufficient efficiency both in terms of speed and compactness of the final binary representation.

 >>> p = cPickle.dumps(data, -1) >>> len(p) 277

Disadvantages result from the advantages, in particular, the simplicity of this method: the string keys from the dictionary are also transferred between the cluster nodes and from the cluster to the user. In some cases, these string keys may occupy a significant portion of the binary packet being transmitted.

We optimize

As time went on, the volume of data and simultaneous online increased significantly, and, consequently, the number of simultaneously completed battles and the amount of information transmitted at the end of the battle increased. To reduce the traffic, we threw out the string keys from the data being sent and replaced them with indices in the list. Such an operation did not result in data loss: knowing the order of the keys, restoring the original dict is easy.

For carrying out operations of deleting and restoring keys, a template was needed in which all possible keys and corresponding functions would be stored:

 NAMES = ( "2dArray", "someListOfIntsValue", "someDictionaryValue", "someFloatValue", "someIntegerValue", "someBoolValue", ) INDICES = {x[1] : x[0] for x in enumerate(NAMES)} def dictToList(indices, d): l = [None, ] * len(indices) for name, index in indices.iteritems(): l[index] = d[name] return l def listToDict(names, l): d = {} for x in enumerate(names): d[x[1]] = l[x[0]] return d >>> dictToList(INDICES, data) [[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24]], [1, 2, 3], {'listOfVals': [1, 2, 3, 4], 'innerVal2': 2, 'innerVal': 1}, 0.5, 1, True] >>> >>> len(cPickle.dumps(dictToList(INDICES, data), -1) 165

As you can see from the example, the binary representation has become more compact compared to version 0. But we have paid for the compactness with the time spent on preliminary data processing and adding a new code that needs to be maintained.
This decision was released in version 0.9.5, at the very beginning of 2015. The protocol with the conversion of dict to list, followed by pickle.dumps (data, -1) is called version 1.

Optimize further

With the release of the “Superiority” mode, the data volume has grown significantly, as they are collected for each tank separately, and in this mode, the user can go into battle and generate data on as many as three vehicles. Therefore, the need to ram the data even more closely became quite acute.
In order to transmit even less redundant information, we applied the following techniques:

1. For each nested dictionary, we applied the same approach as for the main dictionary — threw out the keys and converted dict to a list. Thus, the “template”, according to which data is converted from dict to list and back, became recursive.

2. Carefully looking at the data before sending, we noticed that in some cases the sequence is enclosed in a container of type set or frozenset. The binary representation of these containers in the cPickle protocol version 2 takes up much more space:

 >>> l = list(xrange(3)) >>> cPickle.dumps(set(l), -1) '\x80\x02c__builtin__\nset\nq\x01]q\x02(K\x00K\x01K\x02e\x85Rq\x03.' >>> >>> cPickle.dumps(frozenset(l), -1) '\x80\x02c__builtin__\nfrozenset\nq\x01]q\x02(K\x00K\x01K\x02e\x85Rq\x03.' >>> >>> cPickle.dumps(l, -1) '\x80\x02]q\x01(K\x00K\x01K\x02e.'

We saved a few more bytes by converting set and frozenset to a list before sending. Since the receiving side is usually not interested in the specific type of sequence, and only the data is important, such a replacement did not lead to errors.

3. Quite often, not all keys in the dictionary are given values. Some of them may be absent, while others may not differ from the default values, which are known in advance on both the transmitting and receiving sides. It should also be remembered that the “default” values for data of different types have different binary representations. Rarely enough, but the default values are still found, a little more complicated than just an empty value of a certain type. In our case, these are several counters combined in one field, represented as sequences of zeros. In these cases, the default values can take up a lot of space in the binary data sent between the nodes. In order to achieve even greater savings, before sending, we replace the default values with None. As a result, in all cases the binary representation will become more compact or will not change in length.

 >>> len(cPickle.dumps(None, -1)) 4 >>> len(cPickle.dumps((), -1)) 4 >>> len(cPickle.dumps([], -1)) 4 >>> len(cPickle.dumps({}, -1)) 4 >>> len(cPickle.dumps(False, -1)) 4 >>> len(cPickle.dumps(0, -1)) 5 >>> len(cPickle.dumps(0.0, -1)) 12 >>> len(cPickle.dumps([0, 0, 0], -1)) 14

Considering the examples, it is worth considering that cPickle adds a header and a terminator to a binary package, the total amount of which is 3 bytes, and the actual amount of serialized data is (X - 3) , where X is the value from the example.
Moreover, the replacement of default values is also beneficial when compressing binary data with zlib. In the binary representation, list-elements follow each other without any separators. Several default values in a row, replaced by None, will be represented as a sequence of identical bytes, which can be well archived.

4. The data is archived by zlib with a compression level of 1, since this level allows to achieve the optimal ratio of the degree of archiving to the time of work.

If we put together steps 1–3, we’ll get something like this:

 class DictPacker(object): def __init__(self, *metaData): self._metaData = tuple(metaData) # Packs input dataDict into a list. def pack(self, dataDict): metaData = self._metaData l = [None] * len(metaData) for index, metaEntry in enumerate(metaData): try: name, transportType, default, packer = metaEntry default = copy.deepcopy(default) # prevents modification of default. v = dataDict.get(name, default) if v is None: pass elif v == default: v = None elif packer is not None: v = packer.pack(v) elif transportType is not None and type(v) is not transportType: v = transportType(v) if v == default: v = None l[index] = v except Exception as e: LOG_DEBUG_DEV("error while packing:", index, metaEntry, str(e)) return l # Unpacks input dataList into a dict. def unpack(self, dataList): ret = {} for index, meta in enumerate(self._metaData): val = dataList[index] name, _, default, packer = meta default = copy.deepcopy(default) # prevents modification of default. if val is None: val = default elif packer is not None: val = packer.unpack(val) ret[name] = val return ret PACKER = DictPacker( ("2dArray", list, 0, None), ("someListOfIntsValue", list, [], None), ("someDictionaryValue", dict, {}, DictPacker( ("innerVal", int, 0, None), ("innerVal2", int, 0, None), ("listOfVals", list, [], None), ) ), ("someFloatValue", float, 0.0, None), ("someIntegerValue", int, 0, None), ("someBoolValue", bool, False, None), ) >>> PACKER.pack(data) [[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24]], [1, 2, 3], [1, 2, [1, 2, 3, 4]], 0.5, 1, True] >>> len(cPickle.dumps(PACKER.pack(data), -1)) 126

As a result, all these techniques were applied in protocol version 2, which was released in version 0.9.8, in May 2015.
Yes, we have further increased the time spent on preliminary training, but the volume of the binary package has decreased significantly.

Comparison of results

In order to be able to see what the application of the above techniques on real data has led to, we present a graph of the size of a data package for a single tank, transmitted from CellApp to BaseApp at the end of the battle, in different versions from version.

Tank information packet size sent from BaseApp to CellApp

Recall that in the "Superiority" version of version 0.9.8, the player can go into battle on three tanks and, accordingly, the total amount of data will increase threefold.

And the time spent on processing the same data before sending.

Time to pre-process the package with BaseApp to CellApp

Where 0.9.8u is processing without compression by zlib (uncomressed), and 0.9.8c is with using compression (compressed). The time is indicated in seconds per 10,000 iterations.

I note that the data were collected for version 0.9.8 and then approximated for 0.9.5 and 0.1, taking into account the keys used. Moreover, for each user and tank data will vary significantly, since the amount of data depends on the player's behavior (how many opponents were detected, damaged, etc.). So the graphs should be treated rather as an illustration of the trend.

Conclusion

In our case, it was crucial to reduce the amount of data transferred between nodes. It was also desirable to reduce the pretreatment time. Therefore, the protocol of the second version was the best solution for us. The next logical step is to bring the functionality of serialization into a separate binary module, which will perform all the manipulations on its own, and will not store information describing data in a binary stream, like pickle. This will allow to shrink the amount of data even more, and, possibly, reduce the time for processing. But while we work with the solution described above.

Source: https://habr.com/ru/post/272265/

All Articles