Age of Empires network code: 1500 archers on a 28.8 kbit / s modem

Translator's note: this article is already 17 years old, and it is interesting only from a historical point of view. It is curious to find out how developers managed to achieve a smooth network game in the era of 28.8k-modems and the first Pentiums.

This article describes the architecture and implementation, as well as some lessons learned in creating the multiplayer (network) code for Age of Empires 1 and 2 games. It also outlines the modern and future approaches to the creation of network architecture used by Ensemble Studios in their game engines.

Multiplayer Age of Empires: structure requirements

At the beginning of work on the multiplayer code Age of Empires in 1996, we set very specific goals for the implementation of the required gameplay.

Large-scale and epic historical battles with many different combat units
Support up to 8 players in multiplayer mode
Smooth gameplay simulation over LAN, via direct dial-up connection and over the Internet
Target platform support: Pentium 90 with 16 MB of RAM and a 28.8 kbit / s modem
The communication system should work with the existing engine (Genie)
Stable 15 frames per second on machines with minimal configuration

The Genie engine was already ready, and the gameplay in single-player mode began to take its forms. The Genie engine is a two-dimensional single-threaded game cycle engine. Sprites are rendered in 256 colors in the tile world. Randomly generated maps are filled with thousands of objects: from trees that can be cut down to jumping gazelles. Approximate breakdown (after optimization) of time for engine tasks: 30% for graphics rendering, 30% for AI and search for paths, 30% for simulation and service tasks.
')
Already at a rather early stage, the engine was relatively stable, and multi-user communications had to work with ready-made code without the need for significant changes to the existing (working) architecture.

Complicating the task was the fact that the time spent on each step of the simulation could vary greatly: the rendering time depended on whether the user was watching the units, scrolling or looking at the unexplored area, and the long paths or strategic planning of the AI significantly influenced the playing time of the game. : oscillations were up to 200 ms.

Brief calculations showed that transferring even a small set of data about units and attempts to update them in real time severely limit the number of units and objects with which a player can interact. If you simply transfer the X and Y coordinates, state, action, gaze direction and damage, then the game can have no more than 250 mobile units.

We wanted the players to destroy the Greek cities with catapults, archers and warriors, while at the same time leading the siege of triremes from the sea. Obviously, we needed some other approach.

Simultaneous Simulations

Instead of transferring the state of each unit of the game, we wanted to perform absolutely identical simulations on each machine, passing each the same set of commands given by the players at the same time. Players' computers, in essence, had to synchronize the gameplay in the best traditions of war films, allowing players to give commands, and then executing them in the same way and at the same time, ensuring the identical execution of the games.

Initially, such a tricky synchronization was difficult to implement, but as a result, it brought unexpected advantages in other areas.

Base Model Improvement

At the simplest conceptual level, the implementation of simultaneous simulation seems very easy. In some games that use fixed-step (lock-step) simulations and constant timings, it may even be quite possible.

Since with this approach, it must take responsibility for simultaneously moving hundreds or thousands of objects, the system must remain viable even with fluctuations in delays from 20 to 1000 milliseconds and have time to process changes during frame processing.

Sending player commands, acknowledging all messages and then processing them before moving on to the next move would be a nightmare in terms of gameplay, with constant waiting and slow exchange of commands. We needed a scheme that could continue processing the game in parallel with the background waiting for the completion of the data exchange process.

Marc [Terrano] used a command tagging system that must be completed in two “data exchange processes” in the future (the data exchange moves in AoE were separated from the rendering frames themselves).

That is, commands issued in the course of 1000 are assigned to be executed during move 1002 (see Fig. 1). In the course of 1001, the commands issued during the course of 0999 are executed. This allowed us to receive, confirm and prepare for the processing of the message, while the game continued to render animations and perform simulations.

Figure 1. Marking commands that must be performed through the two "communication progress".

Usually the moves took 200 ms, and the teams went during this turn. After 200 ms, the move stopped and a new move began. At each moment of the game, teams were processed in one move, received and saved for the next move, and then sent for execution two moves later.

"Speed Control"

Figure 2. Speed control.

Since simulations should always have exactly the same input data, the game can run no faster than the slowest machine has time to handle the exchange of data, render the move and send new commands. The system that changes the duration of the course to maintain the smoothness of animations and gameplay in the context of variable data exchange delay and processing speed, we called "Speed Control" (Speed Control).

The gameplay can be felt as “slowing down” for two reasons: if the frame rate of one machine falls (or it is lower than the others), then the other machines process their commands, render everything in the allotted time and as a result they have to wait for the next move. In this case, any pause immediately becomes noticeable. In addition, the delay in data exchange slows down the game - players have to wait until the machine receives enough data to complete the turn.

Each client calculated the frame rate that was always achievable, which was calculated by averaging the processing time of several frames. Since this value changes during the game depending on the scope, the number of units, the size of the map and other factors, it was transmitted in each message about the completion of the turn.

In addition, each client also measured the “ping time” from himself to other clients and back. The average ping to the longest client, he also sent a message on the completion of the course (all used to control the speed of 2 bytes).

In each course, the machine assigned by the host analyzed the messages about the completion of the course, calculated the necessary frame rate and the correction for the delay in data transmission over the Internet. Then the host sent a new frame rate and the duration of the exchange of data. Figures 3-5 show how the exchange of data was broken in different conditions.

Figure 3. The normal course of data exchange.

Figure 4. High latency data transfer over the Internet at a normal machine speed.

Figure 5. Slow machine speed with normal data transfer delay.

The “communication flow”, which was approximately equal to the round trip ping time for the message, was divided by the number of simulation frames that, on average, the slowest machine could perform during that time.

The duration of the exchange of data was weighted, so it could quickly increase in accordance with changes in data transmission delays over the Internet and slowly decrease to the best average speed that can be kept constant. Usually, the game slowed down and slowed down only at the moments of the worst peaks - the delay in the transfer of commands increased, but remained smooth (and increased by only a few milliseconds per turn), because the game gradually reduced the delays to the best possible speed. This created the greatest possible smoothness of the gameplay, while at the same time providing adjustment to changing conditions.

Guaranteed delivery

UDP was used in the network layer, and each client was responsible for ordering, recognition, and retransmission. Each message used a couple of bytes, indicating the course for which the execution of commands is scheduled, and the sequence number of the message. If the message was received after the move, it was rejected, and the incoming messages were saved for execution. Due to the very nature of UDP, Mark used the following principle when receiving messages: “If in doubt, the message must be considered lost. If messages are received out of order, the recipient immediately sends a request to retransmit the lost messages. If acknowledgment of receipt is received after the predicted time, the sender simply sends the message again, without waiting for a signal about its loss. ”

Hidden benefits

Since the results calculated by the game depended on all users performing the same simulation, it was incredibly difficult to hack and cheat the client (or the client data stream). Any simulation that was performed otherwise was marked as “out of sync” and the game stopped. It was still possible to cheat locally for information disclosure, but such leaks were relatively easily fixed in subsequent patches and revisions. Security has become our big win.

Hidden problems

At first it may seem that the identical execution of two instances of similar code is easy to implement, but this is not so. Microsoft Product Manager Tim Znamenachek, even at the very early stages of the project, told Mark: “Every project has one stubborn bug that does not give up until the very finish. I think in our case it will be out of sync. " And he was right. The difficulty of finding out-of-sync errors has multiplied with every small change. A deer, whose position is slightly different when creating a random map, will move a little differently, and a minute later the hunter will slightly move out of the way or miss a spear, as a result of returning home without meat. Therefore, what sometimes seemed to be just the difference in the checksums of the amount of food had causes that were very difficult to track.

Although we checked the checksums of the world, objects, search for ways, aiming and all other systems, there was always something that we could not take into account. Huge (50 MB each) volumes of tracing messages and dumps of world objects made the problem even more complicated. Part of the difficulty was conceptual - programmers were not used to writing code that used the same number of random number generator calls in the simulation (and random numbers were also generated and synchronized).

Lessons learned

When developing the network part of Age of Empires, we learned a few lessons that can be applied to the development of any gaming multi-user system.

Study your user. Learning the user is the most important step towards understanding his expectations regarding the speed of the multiplayer, perceived brakes and delays in the transmission of commands. Each genre is different, and you need to understand what exactly suits your gameplay style and management.

In the early stages of the development process, Mark and the lead designer prototyped the delays in data exchange (this prototype was revised several times during the development process). Since they were playing a single player game, it was very easy to imitate different levels of team transfer delays and get player feedback (“management seems good / slow / jerking / just awful”).

For games of the RTS genre, command delays of 250 milliseconds are even imperceptible; at 250-500 ms, the gameplay is quite playable, and the brakes are noticeable at 500 ms and higher. It is also interesting to note that the players are accustomed to the “pace of the game” and the mental expectation of the delay between pressing the mouse and the reaction of the unit. Constant delayed response was better than jumps of command transmission delays (for example, from 80 to 500 ms) - in this case, constant delays of 500 ms were perceived to be playable, while changeable ones seemed to be "twitchy" and complicating the game.

This made it necessary to direct the efforts of programmers to ensure smoothness - it is better to choose a longer duration of the turn and to be sure that everything will be smooth and constant than to perform operations as quickly as possible, when faced with regular decelerations. All speed changes should be gradual, and the increment values should be as small as possible.

We also measured the requirements of users to the system - usually they gave commands (move, attack, chop trees) approximately every one and a half to two seconds, sometimes with peaks of 3-4 commands per second during fierce battles. As active actions in our game constantly increased, the highest requirements for data exchange arise in the middle and closer to the end of the game.

If you take the time to study the behavior of users, you will notice other features of how they play, and this will help in setting up a network game. In AoE, during attacks, users quickly clicked the mouse (click-click-click-click - forward-forward-forward!), Which led to huge peaks in the number of commands given. In addition, they sent large groups of units that need to pave the way - also huge peaks in the requirements for transferring data over the network. A simple filter that cuts off duplicate commands at a single point has significantly reduced the negative impact of this behavior.

In general, user monitoring will allow you to:

Learn user expectations about game delays
Prototype multiplayer aspects early in development
See the behavior that is detrimental to the speed of the multiplayer mode.

Measurement is the most important. If you enter metrics in the early stages of work, then you will learn amazing things about your data exchange system. Make the metrics readable for testers and use them to understand what is happening inside the network engine.

Lesson: part of the problem with data exchange in AoE arose when Mark was displaying metrics too early and did not check the message levels (length and frequency) again after preparing the final code. Such unexpected things as random races between AIs, difficult-to-calculate paths, and poorly structured command packets can cause huge performance problems, even when the system works well for the rest.

Make it so that the system notifies testers and developers of what seems to be exceeding the boundary conditions — programmers and testers will see in the development process which tasks load the system; This will allow solving problems at the early stages of their occurrence.

Spend time explaining to the testers of the data exchange system, show and explain the metrics to them - you may be surprised at what they notice when strange failures inevitably arise in the network code.

In general, metrics should have the following properties:

Being human-readable and understandable by testers
Point to bottlenecks, brakes and problems
It is not enough to influence the performance and be constantly running.

Developer Training It is very difficult to teach programmers who are used to creating single-user applications, so that they think about the separation between output, receiving and processing a command. It is easy to forget that you can request something that did not happen, or that can happen a few seconds after the command is returned. Commands must be checked for correctness both when sending and when receiving.

With a synchronous model, programmers are also obliged to take into account that inside the simulation the code should not depend on any local factor (such as free time, special equipment, or different settings). Code execution on all machines must be the same. For example, the presence of random sounds of a relief within a simulation can lead to different behavior of games.

Other lessons. This should be common sense - but if you depend on a third-party network (DirectPlay in our case), write an independent test application confirming that when the owners declare “guaranteed delivery”, the messages do get that “guaranteed package order” in fact there is, and that the product has no hidden bottlenecks or strange behavior when processing transmitted data in your game.

Get ready to create simulation applications and stress test simulators. In the end, we created three different minimum test applications used to investigate individual and important problems: connection flooding, problems with simultaneous connections when selecting rivals, and lost guaranteed packages.

Test with modems (and, if lucky, with modem simulators) as early as possible; Continue modem testing (no matter how painful this may be) throughout the development process. After all, problems are difficult to isolate (what is the reason for a sharp decrease in speed - provider, game, communication software, modem, match search service or something else?), And users do not want to mess around with slow dialup connections, getting used to instantaneous LAN speeds . It is vital to perform testing on modem connections with the same perseverance as with multiplayer games on the LAN.

Improvements for Age of Empires 2

In Age of Empires 2: The Age of Kings, we have added multiplayer features such as recording games, transferring files, and constantly tracking statistics on The Zone website. We also improved multiplayer systems such as integration with DirectPlay and speed control to cope with the bugs and speed issues identified after the release of Age of Empires .

The function of recording games was one of those things that were initially invented for debugging, and as a result they became a full-scale game chip. Recorded games are incredibly popular on fan sites. They allow players to share strategies and analyze them, view famous battles and learn the games in which they participated. Recording games has become an invaluable debugging tool. Since our simulation is deterministic and the recorded games are synchronous in the same sense as multiplayer, the recording of games provided us with a great way to play bugs, because every time it was guaranteed to play the same way.

Our integration with The Zone’s search engine (matchmaking) for The Zone was limited in Age of Empires by simply launching the game. In Age of Kings, we expanded it, and this allowed us to manage launch parameters and provide constant statistics reports. This allowed the players to better find the games that they were interested in, because they could see the matchmaking level parameters, rather than wait for the transition to the game settings screen. In the backend, we implemented constant reporting and tracking statistics. We provided The Zone with a general structure that was filled in and transmitted to the server at the end of the game. Data from this structure was used to create user ratings and show them on The Zone website.

Multiplayer RTS3: tasks

RTS3 is the codename of the new generation of Ensemble strategy game (Lane comment: the game was released under the name Age of Mythology) . The structure of RTS3 is created on the basis of the successful formula used in the Age of Empires series of games, with the addition of many new features and requirements for a multiplayer mode.

Based on the Age of Empires 1 and 2 feature set. Requirements such as online play, large and varied maps, thousands of guided units are required.
3D: RTS3 is a completely three-dimensional game with interpolated animation and non-discrete positions and turns of units.
More players - support for more than eight players.
TCP / IP support: our main goal is a 56 kbps TCP / IP Internet connection.
Home Networking Support - supports end-user network configurations, including firewalls and NAT.

Even in the early stages of developing RTS3, we decided to adhere to the same internal network model as in Age of Empires 1 and 2 - synchronous simulation - because the RTS3 structure can take advantage of this architecture in many ways. In AOE / AOK, we used DirectPlay for session transfer and control services, but for RTS3 we decided to create a basic network library, using only basic socket procedures as a basis.

The transition to a fully three-dimensional world means that we must be more attentive to problems with frame rates and the overall smoothness of the simulation in multiplayer mode. However, this also means that the update time of the simulation situation and the frame rate will be even more susceptible to variability, and that we will have to spend more time rendering. In the Genie engine, the turns of the units were discrete, and the animations are tied to the frame rate - in BANG! possible arbitrary rotation of units and smooth animation, that is, visually the game will be much more sensitive to the effect of delays and jumps in the refresh rate.

Completing the development of Age of Kings , we wanted to address these critical areas in which thoughtful design and working with tools greatly shorten debugging time. We also realized how important the iterative process of playasting is in the design of our games, so a high priority was given to the earliest possible conclusion of an online game.

RTS3 communication architecture

Figure 6. RTS3 strict object-oriented network architecture.

Object oriented approach. The network architecture of RTS3 has a strict object-orientation (see Figure 6). Requirements to support various network configurations allow you to take advantage of the OO approach, to abstract from the specifics of the platform, protocol and topology that underlie a set of generalized objects and systems.

Protocol-specific and topology-specific versions of network objects contain as little code as possible. The main functionality of these objects is abundant in high-level parent objects. To implement the new protocol, we extended only those network objects that needed protocol-specific code (for example, for the client and the session, which, depending on the protocol, should act a little differently). No other system objects (such as Channels, TimeSync, etc.) required changes, because they had an interface with the client and the session only through their high-level abstract interfaces.

Peer Topology. The Genie engine supported the peer-to-peer network topology, in which all clients in a session are connected to each other in a star configuration. In RTS3, we continued to use this topology, because when implemented with a synchronous simulation model, it has inherent advantages.

Peer-to-peer topology implies using a star configuration for clients connected to a session (Figure 7). That is, each client is connected to all other clients. The same scheme was used in Age 1 and 2 .

Figure 7. Star configuration of peer-to-peer clients in a session.

Peer-to-peer benefits:

Reduced latency due to the “client-client” message transfer scheme instead of “client-server-client”.
There is no central weak link - if the client (even the host) disconnects from the session, the game can continue.

Disadvantages of Peer-to-peer:

More active connections in the system (sum from n = 0 to k-1 (n)), that is, more potential weak links and higher probable delays.
The inability to support in this scheme some configurations of NAT.

Net.lib. When designing the RTS3 data exchange architecture, our goal was to create a system designed specifically for strategic games, but at the same time we wanted to create a system that can be used for our internal tools, as well as expand it to support future games. To achieve this goal, we created a multi-layered architecture that supports game-level objects such as client and session, but also supports low-level transport objects such as links and network addresses.

Figure 8. The four service layers of our network model.

RTS3 is based on our BANG engine! A new generation that uses a modular architecture with component libraries such as sound, rendering, and the network. The network subsystem is built in here as a component, but connected to the BANG engine! (as well as with various intra-studio instruments). , , , OSI, (. 8).

Socks, 1

, Socks, API C. . . Socks .

Link, 2

2, Link, . , Link, Listener, NetworkAddress Packet, , (. 9).

Packet (): — , / ( ) .
Link (): . , . send receive , , void*.
Listener (): . .
Data stream ( ): , , , .
Net Address ( ): , .
Ping: . , .
9. Link.

Multiplayer, 3
— , API net.lib. , RTS3 , / — , .

BANG! , . API , , .

Client (): . () ( ). , .
Session (): , , , . . host() join(), , , . / , .
Channel Ordered Channel: . . TimeSync, .
Shared Data: . , , .
Time Sync: .

Game Communications, 4

RTS3. , , . , , .

Improved sync system. No one from the Age of Empires development team could say that we don’t need better synchronization tools. As in any project, when analyzing the development process in a post-mortem, it turns out that most of the time was spent on most areas, but it could be much less if we had dealt with them in advance. At the beginning of the development of RTS3 in the top lines of the list of such areas was synchronization debugging.

Synchronization Tracking System RTS3 is mainly aimed at quick recognition of synchronization bugs. Other priorities were simplification of use, the ability to process arbitrarily large amounts of synchronized data passed through the system, the ability to fully compile the synchronization code in the release assembly, and finally, the ability to completely change the test configuration by changing variables instead of fully recompiling.

Synchronization check in RTS3 is performed using two sets of macros:

#define syncRandCode(userinfo)
gSync->addCodeSync(cRandSync, userinfo, __FILE__, __LINE__)

#define syncRandData(userinfo,
v) gSync->addDataSync(cRandSync, v, userinfo, __FILE__, __LINE__)

Both of these macros get the userinfo string parameter, which is a name or an indication of a particular item being synced. For example, a synchronization call might look like this:

syncRandCode("syncing the random seed", seed);

Synchronous console commands and configuration variables. As any Quake mod developer can confirm, console commands and configuration variables are very important to the development process. Console commands are simple function calls made using a launch configuration file, an in-game console, or a UI that call for arbitrary game functionality. Configuration variables are named data types provided by simple get, set, define, and toggle functions that we use for all sorts of testing and tuning configuration parameters.

Paul has created multiplayer-compatible versions of our systems of console commands and variable configurations. With their help, we can conveniently turn an ordinary configuration variable (for example, enableCheating) into a multiplayer configuration variable by adding a flag to the definition of a configuration variable. If this flag is turned on, the configuration variable is passed inside the multiplayer game and synchronized in-game solutions (for example, on the admissibility of free transfer of resources) can be based on its value. Multiplayer console commands have a similar principle - calls to multiplayer console commands are transmitted over the network and are executed synchronously on all client machines.

Through the use of these two tools, developers can use the multiplayer system without writing code. They can quickly add new testing tools and configurations and easily enable them in a networked environment.

Summing up

The synchronous simulation and peer to peer model were successfully used in the Age of Empires series of games. Despite the critical importance of investing time in creating tools and technologies to solve the main problems of this approach (such as synchronization and network metrics), the viability of this architecture in the real-time strategy genre has been proven by experience. Subsequent improvements made by us in RTS3, have led to the fact that multiplayer gameplay is almost indistinguishable from single-player, even in the most terrible conditions of network connections.

Source: https://habr.com/ru/post/417703/

All Articles