
It's about the distant 2005, when the Civilization4 from Sid Meier was just released. By that time, I was hanging tight in Civilization3, I passed it once on a variety of maps, and then the long-awaited four came out. These were the years P3-512Mb for mid-end and P4-1Gb in hi-end. Only the top configs in those years had two gigabytes of memory on board.
Civilization 4 came out with the graphics of the 2002-2003 level of the year, which in principle is normal for the mainstream of those times, especially considering that this is a step-by-step strategy, and not a shooter. But with the passage of the game up to 900Mb of RAM, which led to a terrible swap, especially on large cards, especially by the end of the game, especially on laptops. People wondered me too. Considering that Far Cry came out in the same years with much more beautiful graphics, and which was fully played at the maximum even with 512Mb on board, this behavior of the Civilization 4 looked extremely strange. I wanted to understand and punish ...
So, I started poking around. The first suspicion fell on Python, because Firaxis mentioned its use as an important feature at every step, and there and the garbage collectors, and the non-release of memory after peak loads, and a lot of different fun happens. It was the number one candidate for the charge of gluttony. The source code of the Python dll was downloaded, it added the logging of all allocations of memory, dll was slipped by civilization instead of the native one and ... nothing interesting appeared. On the python, God forbid 25Mb of memory. Something else was dragon-eating resources.
')

The idea of ​​a memory leak in ordinary C-shnyh memory allocations has crept in. Since the CRT (C runtime library) was used in the DLL,
aladetours hook was hung for all calls to malloc, realloc, free and ... also with zero result. Following it, it was thought that the cause of all was the fragmentation of memory on frequent re-excretions, maybe even the fault of python. He wrote his manager, always allocating volume, a multiple of degree 2 - to no avail. Civilization as ate at 800Mb, and continued to eat them. I fell a little into a stupor - where does it go so much memory.
Hooked APIs on all VirtualAlloc, with the calculations of CS: EIP caller, to learn civ4.exe or python24.dll allocates most of the address space. And then it turned out that eats it d3d9.dll. This has become more interesting, why would he eat resources, if all (or almost all) should be in the video memory. After that, I started to bang DirectX calls — from creating a device to creating textures, vertex and index buffers.
When studying what Civilization4 does and how it turned out, it holds some of the resources in D3DPOOL_DEFAULT (it later turned out that these were resources of a graphical interface written by a third-party company
Scaleform , these guys have developed quite well, because even
CryTek guys use their products ). Everything else Civilization4 stored in the MANAGED pool, and it ate 500mb of memory.
A slight digression about the MANAGED and DEFAULT pool for resources in DirectX. Video memory can be used by several applications at once, and the presence or absence in it of the desired texture or vertex / index buffer is critical for the ability to draw something. In the case of the DEFAULT pool, if the video resources of one process are preempted by another, the video memory is stupidly overwritten, and subsequent drawing operations using the lost area return an error, saying “the object is lost”. The response to such a loss should be the restoration of the object by re-reading the texture from the disk and re-creating the texture in the video memory.
In the case of a MANAGED pool, the mechanism is about the same, but all textures and other resources are cached by DirectX in RAM, making the error response transparent - it will restore the copy in video memory from RAM if it is erased by other programs, by analogy as for ordinary memory “Backup” is a swap file. This simplifies the life of the programmer, since he no longer needs to worry about re-reading video objects during the rendering cycle, but does not increase the memory consumption, especially when the game is in the foreground most of the time, and does not share it with anyone in abundance.
The analogy with the usual memory and hard disk, unfortunately, does not work to the end. The size of a hard disk is usually 100 times larger than the capacity of onboard memory, so there is nothing to worry about the size of the swap file being equal to the amount of RAM. The ratio of RAM and VIDEOMEM is usually about 4/1, not 100/1. Therefore, MANAGED is not a production solution, but rather a mode for the lazy, appropriate for small things that do not weigh much. If you shove everything in MANAGED, it will be like in the picture:

All serious engines work with the DEFAULT pool, using object rereading from disk or, in extreme cases, using their in-memory caching, but not the MANAGED pool. Actually, the ability to correctly handle the situation of loss of resources when working with the DEFAULT pool is the reason for the (in) friendliness of games for Alt-Tab.
First, through the hooks on creating objects, I tried to transfer everything to the DEFAULT pool. The picture fell down. At first, I didn’t understand what was the matter, then I immediately got it ... naturally, 500Mb will not fit into any video memory (in those years, top-end vidashki had 256Mb on board). It became clear why they started using MANAGED pool - even their own graphic objects couldn’t get into the video memory, let alone compete for video memory with other processes during the alt-tab!
It also became clear why the swap has never subsided. In the event of an excess of video memory, the managed part of the RAM would gradually move to the swap, being crowded out by the actual pages used, and the brakes would disappear. But this did not happen, because this managed cache was constantly being addressed. In general, everything was clear. Coma one. Far Cry had enough 512Mb of RAM and 128Mb video without some kind of creepy swap. And this is with
that graphics! Civ4, if moved to our time, had a schedule of the level of the 2005th year with the requirement of 5Gb RAM.
An attempt to find a memory leak at the level of DirectX also did not lead to any results. I ship early saves - memory consumption is small. I ship late saves - mad memory consumption. If something flows, it flows into the saves too - I won't track such a leak without a source. I start from the hypothesis that somewhere something is being created in vain ... I start to check the textures as the most weighty.
Checking textures led to another surprise. There was something about 50Mb at low and something about 120Mb at high. Where did the other 400mb go? I start logging everything that is called via Direct3D and find ... 400 megabytes of vertex buffer! Vertex buffer is geometry data, three-dimensional models of units, cities, and landscapes. Bad thoughts are beginning to go into my head, that they traced the animation of all units frame by frame, and nothing can be done about it ... to calm the conscience, sort the memory consumption of vertex buffer by FVF (vertex format - what is in it and what is not, for example , lightness, binding to textures, etc.). There are several varieties of vertex buffers, they all do not eat much, except for one, which eats as much as 280Mb.
How to find out what the buffer is? When filling data with textures, with vertex buffers, Lock is done first, then data is filled, then unlock is done. There and solder - in unlock. Before unlock, I add random variables to the coordinates of the vertices, and watch what has changed. I expect that the soldiers will begin to twitch arms and legs. But hell ... the land swam! And here the insight comes down to me why it slows down by the end of the game and the bigger the card, the stronger the brakes. The point is not the number of types of units on the map. The point is the number of visible tiles of the card! The landscape for each cell of the game world takes into account all the nearby mountains, rivers, seas, surface types (sand, grass, snow), and thus it turns out that several adjacent influences affect each cell of the landscape, which makes it difficult to prepare in advance all possible geometric configurations of tiles .

But I didn’t believe that there really were as many of these different configurations as there are cells on the map, the common must necessarily be found, at least in the water and on the plains. Naturally, without having the source code, I could not competently make this classification, so I decided to hash all these buffers. The situation was further complicated by the fact that the generation of new tiles periodically took place, i.e. Hashing was supposed to be quick enough.
As a hash, the presentation of the vertex buffer data of base 5 on the original bytes as digits was chosen. It mixed well the input data without collisions, because 5 is a prime number and not a power of two, and because multiplication by 5 in the processor is implemented as LEA EAX, [EAX * 4 + EAX], which is considered smartly. The hash itself was stuck on the same Unlock () vertex buffer. During unlock, an identical vertex buffer in the hashed cache was searched for, and if it was, the unlocked buffer data was deleted, and when drawing via DrawIndexedPrimitive, instead of my IDirect3DVertexBuffer, which I slipped civ4.exe, I used a real buffer with cached data, one for all similar buffers .
For half a day I wrote this snag from C ++ templates mixed with assembler inserts and interception of COM calls through detours hooks (the vtbl patch didn’t work for some reason), then I started ... Wah! - memory consumption decreased from 800Mb to 300-400Mb! By the standards of 2010, this is tantamount to reducing memory consumption from 4 gigs to one and a half.
I was finally able to complete the batch started, without finishing the cup of tea during each turn due to a wild swap. I posted a patch on civfanatics.com, people were delighted (
link ). 150k downloads in the first days only with civfanatics, and the patch was re-posted on a variety of fan sites. What I then experienced was hard to describe with words ... it was a thrill! finally managed to fix this serious problem, without a source, and, moreover, which is not a banal memory leak, but a serious architectural flaw. Firaxis for several months could not repeat this feat, even with its own code on his hands. Public spanking for lame coding was a success.

PS: How are things in Civilization 5, did not look.
PPS: While I was going through StarCraft 2, I couldn't help but compare graphics / memory consumption with Crysis. Fortunately for my dream and, possibly, for Blizzard, the laptop turned out to have 4Gb RAM. Therefore, the unjustified eating of SC2 memory caused bewilderment, not a swap. If he got me on a 2Gb piece of iron, I would have to tackle the old one :)