Hi, my name is Bill "LtRandolph" Clark. I work as the technical leader of the
LoL Champions Team. Over the past few years I managed to work in different departments of
League development, but the only thing I was constantly obsessed with was technical debt. I need to find him, understand him and, if possible, eliminate him.
When developers discuss any existing technology, such as the
8.4 League of Legends patch , technical debt is often mentioned. I call the technical debt code or data for which future developers will have to pay. Countless posts, articles, and definitions are devoted to this sad side of software development. In my post I want to discuss the types of technical debt that I had to meet when working at Riot, and talk about the model that we began to use in the company. If I were asked to single out the most important lesson that can be learned from this article, I would say that this is the "infection" metric described below.
Metrics
In order to make the right decisions about which problems need to be fixed
now , and which can be put
on later (or, to be realistic, completely forget about them), we need some way to measure each specific element of the technical code. I have selected three main measurement axes for evaluation: impact, elimination costs, and infection.
Influence
The first axis is the most obvious: the influence of debt. It takes the form of problems faced by players (bugs, missing functions, unexpected behavior) and developers (slower implementation, problems with workflow, arbitrary useless nonsense that has to be remembered). It is worth noting that by “developer” here is meant any creator of the game working on any of its aspects. Part of the technical duty comes from programmers writing new code, other parts prevent designers from creating new scripts, some do not allow artists to create new particle systems by their effects, and so on.
')
Costs of elimination
The second axis is associated with the cost of getting rid of technical debt. If we decide to fix the problem in the code or data, then it will take some measured amount of time. If this is a deep-rooted assumption that affects every line of code in a game, then it can take weeks or months of developers time. If this is a stupid mistake in one function, then it can be fixed in a matter of minutes. Regardless of the timing of the implementation of the fix, we also need to take into account the risk of implementing such a fix. Even a system that I consider “bad” can be used as a tool for creating a great game. If I change the way the errors are handled by the scripting engine or by the particles calculating the time of my creation, this can destroy the behavior of more than 500 spells of 140 more than one game champions.
Infection
On the third axis is measured by what I am obsessed with, namely infection. If we allow technical duty to continue to exist, how far will it spread? Such distribution may occur due to the fact that other systems interact with the affected debt system, due to copying and pasting data created on top of the system, or because it affects the way other developers implement new functions.
If a piece of technical debt is well limited, then the cost of eliminating it in the future is almost the same as current. Considering the need for correction, we can weigh its influence today. On the other hand, if a fragment of debt is very contagious, then it will gradually become more and more difficult. Especially disgusting in a contagious technical duty is that its influence tends to grow, when more and more systems become infected with the technical compromises underlying it.
Types of debt
Now that we have a system for measuring each specific element of technical debt, let's discuss some of the general categories of technical debt that I noticed in
League of Legends .
Local debt
Local debt is reminiscent of the classic “black box” programming model. While the matter concerns the rest of the game, the local system (spells, network layer, script engine) looks pretty reliable. No one should keep debt in mind while doing the development without touching the system. But if someone opens the lid and looks inside, he will be amazed at what he sees.
A couple of examples of local debt from the real world can be found in our own eyes. Due to the peculiarities of the structure of the eye, we see everything upside down. More importantly, the retinal nerve creates a blind spot in the middle of each eye. These distorted data are transmitted to the visual centers of the brain, which must invert the image and fill in the blind spots so that the rest of the brain can interact with the “right” image. These quirks are localized in the eye / system of the optic nerves and are easily avoided by other systems, so they are “good enough”.
One of the most famous examples of local debt in
League of Legends is Jarvan’s Cataclysm, still consisting of minions. When designers need to tie gameplay effects to a point (or several points), then one of the tools available to them is the ability to create an “invisible minion”. What I call the “minion”, RiotXypherous describes
here . These game objects are a stable and well-understood way of tracking and executing the script logic. In cases like the Jarvan wall, you need to create a large number of minions (if exactly 24) to ensure that no one can squeeze through the wall. An alternative solution may be a ring structure of relief, consisting of a single logical element, controlling the possibility of passing through the Cataclysm. If we use this approach, we can clean up the logic and slightly reduce the computational cost. Let's look at the Cataclysm in our influence / cost / infection model to see why fixing it at the moment is not the best option.
Metric Cataclysm
1. Influence: 1/5
Previously, when 12 minions were created, people could sometimes squeeze through the wall, so Riot Exgeniar increased their number to 24. The fact that the wall was created from minions almost never affects other developers in the process of making new content. (A small digression: the infamous "
Jarvan Ult Hitch " was caused by a combination of this duty and a boot bug caused by an attempt to read the missing definitions of automatic attacks.)
2. Costs of elimination: 2/5
So far we have no opportunity to make figures for creating arbitrary geometry without writing new code. If we wanted to create a ring for the implementation of the “trigger region” in order for the Jarvan barrier to work more efficiently, then we would have to write special mathematical calculations for calculating collisions with the ring. We use
constructive solid geometry for other purposes, which can drastically reduce repair costs.
3. Infection: 1/5
No one should take into account the implementation of the Jarvan wall when developing new features, so it is well limited. The only risk of infection is that other designers can copy this implementation into their new champions (which happens from time to time). But no matter how far implementation problems go, the potential spread of the Cataclysm is low and well understood.
This is a fairly typical type of local debt. Most often, local debt is characterized by a low estimate of infection. If the effect is higher than the cost of elimination, then usually the debt is eliminated by a conscious developer before it is too late.
When deciding whether to eliminate local debt, first ask yourself: is it worth it? If the debt is not really contagious, then it will be safe enough to leave it alone for any necessary time. One of the biggest mistakes I encounter is the instinctive desire to crack down on local debt, caused by developer perfectionism, when in fact the influence of debt does not justify the effort invested. If you decide to make a correction, then due to the locality of the changes, the correction and regression testing are usually performed easily.
Recent examples of eliminating local debt include moderator bugs, forcing champions in certain conditions to make their way to the coordinate 0.0.0, Jeanne's Monsoon, ignoring spell shields, and the call stack Tears of the Goddess without mana costs.
MacGyver Debt
MacGyvera's debt is named after the mid-80s television series. Angus MacGyver solved problems with his Swiss Army Knife, electrical tape and items found at hand.
In his decisions, a combination of two unexpected parts was often used; in the context of technical duty, this means that two conflicting systems are sealed with each other with tape in the places of their interaction in the code base.
In Seattle (as in many other cities) there is a sad example of the MacGyvera debt described above. The city had two competing settlements, each with its own grid of neighborhoods. When these settlements grew into a modern Emerald City, the slightly different grids were merged, resulting in terrible forms of quarters and buildings, as well as completely inefficient use of space. I am particularly surprised by this small cut corner of the building in the lower left corner.
One of the best examples of MacGyver’s debt in the
LoL codebase is using std :: string from C ++ along with our own class AString. Both of these are methods for storing, modifying, and passing character strings. In general, we found that std :: string leads to a lot of “hidden” memory allocations and computational overheads, and it’s easy to write code that does bad things with them. AString has been specifically designed for smart memory management. Our strategy for replacing std :: string with AString was that we allowed them both to exist in the codebase and provide conversions between the two types (using .c_str () and .Get (), respectively). We made many easy-to-implement improvements to AString that made it easier for us to work with it, and encouraged developers to slowly replace std :: string in the process of changing the code. Thus, we gradually supplanted std :: string, and the "scotch" interface between the two systems gradually narrowed.
Metrics std :: string vs. AString
1. Influence: 2/5
At the moment, most of the heavily influencing memory locations of std :: string have been supplanted by
profiling , so the main cost now is a little mental effort to switch from one system to another.
2. Costs of elimination: 3/5
Conversion to AString was not just a “find and replace” task. In AString there are several aspects for different purposes (in addition to the basic AString with dynamic memory allocation, there is an AStackString for the initial location in the stack memory and an ARefString for references to static strings). For proper implementation, a real, thinking person should look at the replacement point. Driving out the old system will be a long and slow process.
3. Infection: -2 / 5
By making AString easier to use than std :: string, we actually wrapped the infection in our favor. Every time a developer makes a change to the game code, there is a chance that the AString will continue to spread like a virus.
Usually, the greatest costs from MacGuyver’s debt are the intellectual ones necessary to switch thinking modes when crossing borders. If a bug or function is saved because it is in the “wrong” system, then the transition to the “correct” system is usually the logical step. Here, the key metric that needs to be monitored is the infectivity of the new and old systems. If you can turn the balance in favor of the new system, then the best system will inevitably win.
When considering the need to eliminate MacGyver’s debt, strive to find ways to make a better (global) system desirable at the local level. If a time-limited developer who implements greedy optimizations in their daily work decides to go to the desired end state, then you are on the right track.
Another approach that might work is large-scale brute force refactoring. With close connection systems, there is the possibility of eliminating part or full duty MacGyvera using tricky regex.
Fundamental debt
Fundamental debt is when a certain assumption lies very deep in the heart of a system and is inextricably linked with all its work. It is sometimes difficult for experienced users of the system to recognize fundamental debt, because it seems to be something “natural”.
A ridiculously stupid example of a fundamental duty in the real world is a measurement system, known as the American system of measures. I grew up in the US, and my brain is filled with useless transformations, for example, I remember that a mile is 5,280 feet, a quart is 2 pints, and a gallon is 4 quarts. The US government has decided many times to switch to the metric system, but we still remain one of the seven countries that have not accepted the International System of Units as the official measurement system. This debt is embedded in road signs, recipes, elementary schools, and people's brains.
We talked about some of the biggest fragments of fundamental debt Riot has been struggling with in previous articles on our technical blog, for example, in
Determinism in the League of Legends and
Game Data Server .
Another example of fundamental duty that I think a lot about is the use of the Lua scripting language.
League designers use a tool called BlockBuilder to create complex behaviors by putting together functional blocks, such as taking distances between points, creating minions, causing damage, or working on script execution controls. The set of operations that designers can choose from is quite large, but limited, and the parameters of each operation are minimal. However, many years ago, in the prehistoric era of
League of Legends , it was decided not to store the blocks and parameters in a simple, limited format corresponding to the data. Instead, they began to be stored as arrays and tables in the powerful, beautiful and extremely difficult for this purpose language Lua. A decade has passed since this decision was made, and today one of the most frequent operations in the engine is manipulating Lua objects.
BlockBuilder Lua Metrics
1. Influence: 4/5
The mismatch between lua and this task space is costly. Each call stack is dirty with an average of six ordered stack frames for each frame of BlockBuilder logic. These ordered operations are not cheap in terms of server CPU utilization. Reading the differences in script changes is unreasonably difficult. Parsing / searching through script files to determine their functionality requires a fairly deep understanding of the Lua language.
2. Costs of elimination: 4/5
Since Lua is so deeply embedded in the engine, it would be difficult to dig up it. Currently there is a proposal to create a wrapper class that behaves like Lua objects, but with a much simpler internal structure, so that we can gradually transform the insides of scripts into something more appropriate. But no matter how we approach the solution of this problem, we need to be attentive and thoughtful.
3. Infection: 4/5
Each time a system encounters scripting (which is the basic unit of
LoL logic), this system is subject to the operations and requirements of the Lua backend. On average, we create a new building block every 3-4 days. Each of them directly manipulates Lua objects. The longer we do not replace Lua, the more difficult it becomes to replace it.
Typically, fundamental debt has high rates across all three axes. High costs make it necessary to adhere to an outdated system, which is often the right decision, but high impact and high infection mean that the rectification of crying fundamental debt will be rewarded many times.
Most often, in Riot, the strategy to eliminate the fundamental debt is to build a new system next to the old one. If this is possible, then I recommend converting the fundamental debt into MacGyver debt, gradually porting the system to use the new system with the possibility of conversion operations between the new and the old systems. This makes it easy to start taking advantage of benefits in targeted areas, while reducing exposure. However, sometimes such a transformation is impossible. In this case, creating a transition at compile time (or, if possible, at boot time) allows you to gain confidence in the new system, without putting everything on the map. The compiling scheme is used in the
GDS transform, and the loading scheme worked for
determinism .
Debt data
Data debt begins with a piece of technical debt of one of the other categories. This may be a bug in the scripting system, a not very desirable file format for items or two systems that do not interact well with each other. But then a
bunch of content (graphics, scripts, sounds, etc.) is created on top of this flaw in the code. Soon, the rectification of the original technical debt becomes incredibly risky, and it turns out to be very difficult to say what could break when trying to fix everything.
My favorite real-world example for understanding data debt is DNA. The genome is an organism that has grown slowly over millions of years through copying with losses (mutations), transcription errors and the pressure of evolution. Some copying errors are useless, but harmless, others are harmful, and others provide tremendous benefits. Finding out what each DNA fragment actually
does is incredibly difficult. We fully understand what base pairs mean, and how sets of base pairs are converted into amino acids to create proteins. We even begin to understand some of the roles that DNA can play, other than coding. But in the three billion base pairs of the human genome, there is still too much of what we do not even remotely understand.
Episode Radiolab about CRISPR talks about how to solve one of these puzzles.
The debt of data in
League of Legends has the strongest impact when it turns a trivial correction into an exhausting test. I will tell only about one small example, but you can believe: the data debt is one of the most important reasons for making changes to the
LoL engine. Our game developers have in-depth knowledge of the implementation of game systems and have enough skills to predict what data may break if any code fragments change.
An unforgettable example of data debt, corrected several years ago, was associated with block parameters in our BlockBuilder scripting language. The image above shows an example of how I increase the Owner armor value by a variable plus a constant. I expect Owner to receive an armor bonus of 25 units: 20 from the Delta variable, which is passed to the block, and 5 from the constant. However, due to the fact that the variable name corresponds to the parameter name, this action added 40 units. (Don't even ask why not 45; I have no idea what thought process led to this.)
When the developer of the champion team NoopMoney began to correct this ridiculous behavior, it was enough for him to remove four lines of code. But in the case of such a highly contagious debt, even small changes require careful planning. With this bug, any numerical parameters of 400 thousand lines of
LoL scripts could double. Worse, these scripts "behaved well" in the sense that the game is balanced and tuned to these possibly double values. NoopMoney had to make sure that the fix could be turned off in real time (in case of unexpected bugs), as well as perform a detailed regex search and load the quality control department to determine which scripts are working correctly thanks to this bug. In the end, the problems of fixing this bug turned out to be quite insignificant; it took a change in the scripts of a whole small group of champions. But due to the debt data they proved difficult to predict.
Parameter name bug metrics
1. Influence: 2/5
The appearance of this bug had little effect on the game. He doubled the transmitted value and had the probability of dropping the constant. But he became another grain of useless collective knowledge that designers and developers had to take into account (after they learned about it). Developer attention is too valuable a resource to be scattered about in this way.
2. Costs of elimination: 2/5
As I said, the remediation process was easy. By creating a rollback fix function in real time, we were able to increase confidence in its security. The most expensive part was the initial analysis with an assessment of the extent of the problem for targeted testing.
3. Infection: 4/5
The unfortunate thing about this bug was that it was based on very logical behavior. For example, if you want to inflict damage to a unit, then it is perfectly logical to store the value in the Damage variable. Alas, in the ApplyDamage block that received this value, there was a parameter with the same name, which led to a bug. Then, when someone else wanted to create a similar spell, he simply copied these blocks, spreading the bug further.
Typically, the costs of eliminating debt data are high because it is difficult to assess changes. More dangerous is that it is almost always extremely contagious due to certain properties of the data (as opposed to code). First, it is usually considered acceptable to create a new data element by copying and pasting an existing one. If you make a new spell skill spell, you can save a lot of time using the Estreal Mystical Shot. All problems with an existing data item apply to its descendants. Secondly, unlike the code, the data is rarely subjected to technical analysis. Therefore, it is difficult to notice and stop the spread of erroneous practices, even if they are well known. Finally, a person with eyes and a brain is usually needed to correct errors in the data — a compiler and formal logic cannot cope with them.
To eliminate the debt data, I saw two main approaches. The first one I call the “do it right” flag. For data creators, this means moving from the old “broken” behavior to the new “revised” behavior. Ideally, after it turns out that old content uses a broken version, the revised version should become the default version. Then, as in the case of MacGyver’s debt, you can make a slow and gradual replacement for the new version. At the same time, there are constant costs of adding more and more nonsense to the UI editor.
The second approach I call "just correct the error." He used NoopMoney when eliminating the bug with the names of the parameters. It involves correcting the error and repairing all the data it has affected. To this task was not so frightening, you can use some technology. First you need to perform a lot of grep and regex search operations to try to evaluate the theoretical influence of the bug. Second, conduct targeted testing. Finally, you can prepare the switch function to return to the old behavior after the introduction of the patch in case you missed something worse than the bug to be fixed. It is also worth noting that
determinism helped us a lot in testing these types of changes.
She allowed us to make sure that the server provides the same results before and after the changes.Summarize
When evaluating an example of technical debt, you can use influence metrics (on users and developers), elimination costs (temporal and degrees of risk), and infection. I suppose most developers regularly evaluate the impact and costs of fixing, but I rarely met the discussion of infection. When the problem gets deeper and harder and harder to fix, the infection can become the most serious enemy of the developer. However, it is sometimes possible to turn an infection into one’s own weapon, making the correction more infectious than the problem.When working on the Leaguemost of the technical debt I observe falls into one of these four categories. Local debt is like a black box with disgusting content. In a MacGyver debt, two or more systems are tied together with tape and supplemented with conversion functions. With a fundamental debt, the entire structure is built on certain unfortunate assumptions. In data debt, huge amounts of data are layered on some other type of debt, which makes its correction risky and lengthy.I hope this post will provide you with useful food for thought and discussion of technical duty.