Pragmatic approach to performance

Is premature optimization expensive to hell? Or does the “later correct” approach turn programmers from “specialists” into the despised by everyone “school”?

These questions do not have clear answers, however, in this article I will try to describe my own approach to performance. What do I do to ensure that my systems work at a decent speed, but do not violate other requirements, such as modularity, maintainability and flexibility.

1. Time programmer - the end resource

If you are writing a large program, then some parts of the code will not work as fast as possible theoretical speed. Sorry, paraphrase. If you are writing a large program, then no part of the code will work as fast as possible theoretical speed. Yes, I think it is worth realizing that any line of your code can be changed to work a little faster.

Writing a fast program is not always necessary to achieve maximum performance. You need acceptable performance where it matters. If you spend three weeks optimizing a piece of code that is called once a year, then you could spend these three working weeks on something more meaningful. If you spent them on something really important, then you could seriously improve the speed of drawing the game.
')
There is always not enough time to add all the new features, fix all the bugs and optimize the whole code. So the goal should be maximum performance for minimum effort.

2. Do not underestimate the power of simplicity.

Simple solutions are easier to implement than complex ones. But this is only the tip of the iceberg. The real charms of simple solutions appear over time. A simple solution is easier to understand, easier to debug, easier to implement, easier to port, easier to profile, easier to optimize, easier to parallelize, and easier to replace. Over time, all these advantages accumulate.

Using a simple solution can save time, even if it is slower than a complex solution, because In general, your program will work faster because you will spend the time saved to optimize other parts of it. Those parts that are really important.

I use complex solutions only then, of the year it is in fact justified. For example, when a complex solution is much faster than a simple one (or a factor of 2, or something else), and when it is in the system, where it really matters (which consumes a significant percentage of processor time).

Of course, simplicity is visible only for a specific person. I find that arrays are simple. I find the POD data types are simple. I think the blobs are simple. I do not think that classes with 12 levels of inheritance are simple. I do not think that classes based on 8 different policies are simple. I do not think that geometric algebra is simple.

3. Take everything from a chance to design a system.

Some people think that not engaging in “premature optimization” means developing a system without paying any attention to performance. You need to blind something first, and then fix it when you are engaged in “optimization”.

With this approach, I completely disagree. And not because I love productivity for the sake of productivity, but from purely pragmatic interests.

When you design a system, you have in your head a complete picture of how its various parts will dock, what requirements are imposed on them and how often certain functions will be called. At this stage, not so much effort is needed to think a little more about how fast the system will work and how you organize the data structures so that it works as fast as possible.

Conversely, if you build a system without regard to performance, and will be busy fixing it later, things may turn out to be much more complicated. If you need to reorganize the basic data structures or enter support for multi-trading, it may turn out that you rewrite the entire system from scratch. Only the system will already be in operation, and you will be limited to the published API and links to other systems. In addition, you can not prevent breakage of projects that use the system. And since by the time it takes several months as you (or someone else) wrote the code, you will have to start to remember and understand all the thoughts that went into it. And all minor corrections and corrections that were made at the time of writing are likely to be lost. And you will start debugging with an updated bag of bugs.

So, following our general line “greater performance for less effort,” you can immediately notice that you should consider performance problems from the outset. Just because it requires less effort than subsequent correction.

Needless to be prudent. Performance improvements are easier to introduce initially, but we cannot know exactly how much they will affect the system as a whole. Later, after profiling, more effort will be needed, but we will know better what to focus on. In general, as elsewhere in life - it is important to maintain a balance.

When I design a system, I make a rough sketch of how often code sections are called per unit of time, and formulate design requirements:

1-10. Performance is not important. Do anything
100. Make sure it is O (n) , data oriented and can be cached.
1000. Use multitrading
10,000. Think about what you are doing.

There are also a few recommendations that I try to follow when writing new systems:

Static data add to permanent memory blocks
Add dynamic data in adjacent memory locations
Save memory
Arrays are better than complex data structures
Access memory linearly (simplifies caching)
Make sure that the functions are executed in O (n) time.
Avoid updates like "do nothing" - better watch for active objects
If the system works with many objects - provide parallel access to data.

To date, I have written a lot of systems in such a “style”, and I do not need to make an effort to follow these recommendations. And I made sure that by following them, I get decent basic performance. These recommendations are among the most important and easily achievable points for increasing productivity: algorithm complexity, memory access and concurrency; and most importantly, they provide significant performance gains with relatively little effort.

Of course, it is not always possible to follow all the recommendations. For example, some algorithms really take more than O (n) time. But I know for sure that if I deviated from these recommendations, I need to slow down and think about whether I’ll hurt performance.

4. Use top-down profiling to look for bottlenecks.

No matter how good your original architecture is, your code will slow down in unexpected places. People will use your system in the craziest ways and find bottlenecks where you could not have imagined. So there you have in the error code. Some errors do not lead to the collapse of the system, but simply adversely affect performance. And these will be mistakes that you couldn’t know in principle.

To understand where your program is actually wasting time, the top-down profiler will be an invaluable tool. We explicitly set the profiled parts of our code and transfer live data to an external tool over the network, which can visualize them in various ways:

Screenshot of (old) BitSquid Profiler

Top-down profiling will tell you where to focus your optimization efforts. Do you use 60% of your animation time and 0.5% of your interface time? Get into animation, it will work, and the interface and penny is not worth it.

Profiling from top to bottom, you can narrow and narrow segments of the code being profiled until you get to the performance problem - exactly the place where time is really spent.

I use basic recommendations to achieve good baseline performance on all systems, and then I dig using top-down profiling to find a system that requires additional optimization efforts.

5. Use bottom-up profiling to search for low-level optimization goals.

In general, I find that top-down profiling with well-defined code segments is more useful than bottom-up profiling.

But bottom-up profiling still has usage scenarios. It is good for finding hot spots — functions that are called from various parts of the program and which you can skip when profiling from top to bottom. These hot spots can be good targets for low-level, instruction-by-instruction optimization. Or their presence suggests that something is done incorrectly.

For example, if the strcmp () function is displayed as a hot spot, then your program behaves very badly, and you should urgently put it in a corner and deprive the sweet tooth.

A frequently occurring hot spot of our code is lua_Vexecute (). What is not surprising. This is the main function of the Lua VM, a large switch that runs most of the Lua code. But this tells us that a low-level platform-specific optimization of this function can give a noticeable performance gain.

6. Avoid synthetic tests.

I do not do a lot of synthetic tests such as running the code in a loop 10,000 times and measuring the execution time.

If I do a part in which I cannot immediately understand whether the code will be faster after the changes, I would rather use real game data. Otherwise, I cannot be sure that I am not engaged in optimization for data that does not occur in real life.

Testing 500 instances of the same entity that lose the same animation will differ from testing the same location, but with 50 different units, all of which have different animations. Data access models will be completely different. Optimization affecting only one case will not matter in the other.

7. Optimization is gardening

Programmers optimize the engine. Script writers stuff things into it. So it was, is and will be. And this is good.

Optimization is not an isolated process that happens in a strictly limited period of time. This is part of the whole cycle: development, implementation and development. Optimization is a non-stop dialogue between programmers and screenwriters about what should be in the engine.

Improving the performance is the same as taking care of a garden: see that everything is good, weed weeds, figure out how to make plant life better.

The task of the writers to drop the engine. And the task of programmers is to raise it back, much more powerful. And in the process of this confrontation, there is the point at which the game shines most clearly.

Translated unconnected .
Original A Pragmatic Approach to Performance

Happy New Year! Write a lot of good and fast programs!

Source: https://habr.com/ru/post/135484/

All Articles