Alternative memory allocators

Posted by Stephen Tovey at 2:29 am Programming (Google Translate humor joke)
Introducing myself: this note, advertised by Alena C ++ , is intended primarily for game developers for consoles, but it will probably be useful for anyone who has to deal with extreme allocation of dynamic memory. Perhaps fans of equalizing memory management in C ++ and Java will also find something to ponder.

Original discussion with interesting discussion in the comments: altdevblogaday.org/2011/02/12/alternatives-to-malloc-and-new

Mandatory opening fable

I really like sushi. It is tasty and convenient. I like the fact that it is possible to go to the sushi restaurant with a conveyor belt, take a seat and take something fresh and tasty from the tape without spending a whole lunch hour from the bay. But with all this, what I really would not want is to be a waiter at a sushi restaurant, especially if it would be my duty to seat visitors in places.
')

The company of three people comes in and asks to show them the places. Satisfied with the arrival of visitors, you kindly seated them. They do not have time to sit down and put themselves some tasty Tekka Maki, as the door opens again, and another four go in! You're lucky today like a drowned man! The restaurant now looks like this ...

Since lunch time, the restaurant quickly filled to capacity. Finally, having absorbed everything that I could, and having paid the bill, the first company (the three of them) leaves, and in return a couple comes in and you offer new visitors to the new visitors. It happens several times and finally the restaurant looks like this ...

And here come four people and ask to seat them. Pragmatic by nature, you carefully monitored how many free places were left, and look, today is your day, there are four places! There is one "but": these four are terribly social and go sit beside. You desperately look back, but even though you have four empty seats, you cannot sit down this company nearby! To ask existing visitors to move in the middle of the dinner would be rude, so, unfortunately, you have no choice but to give the new gate a turn, and they may never return. All this is terribly sad. If you are wondering how this fable relates to the development of games, read on. In our programs, we need to manage memory. Memory is a precious resource that needs to be carefully managed, as is the place in our metaphorical sushi restaurant. Every time we dynamically allocate memory, we reserve a place in a piece called a heap. In C and C ++, this is usually done, respectively, using the malloc function and the new operator. Continuing our crude analogy (I will not do it again, I promise!), This is similar to how our fearless compass of eaters asks to show them their places. But what really matters is that our hypothetical scenario happens when working with memory, and its consequences are much worse than a couple of empty tummies. This is called fragmentation , and it's a nightmare!

What is wrong with malloc and new?

Unfortunately, this is where the fishing stories end and the following text will be about malloc and new, and why they have very limited use in the context of embedded systems (such as game consoles). Although fragmentation is only one aspect of the problems associated with dynamic memory allocation, it is perhaps the most serious. But before we come up with alternatives, we need to look at all the problems:

malloc and new try to be everything in one bottle for all programmers ...
They will allocate you several bytes in exactly the same way as several megabytes. They do not have a clue as to what the data is, the place for which they give you and what the data cycle will be like. In other words, they do not have the broader picture that programmers have.
Relatively poor performance ...
Allocating memory using standard library functions or operators usually requires calls to the kernel. This can have all sorts of unpleasant side effects on the performance of your application, including clearing associative translation buffers, copying memory blocks, etc. Already this reason is sufficient for the use of dynamic memory allocation to become very expensive in terms of performance. The cost of free or delete operations in some memory allocation schemes can also be high, since in many cases a lot of additional work is being done to try to improve the state of the heap before subsequent allocations. “Extra work” is a rather vague term, but it can mean combining memory blocks, and in some cases can mean going through the entire list of memory areas allocated to your application! This is not exactly what you would like to spend precious processor cycles on, if you can avoid it!
They cause heap fragmentation ...
If you have never worked on a project that suffers from problems related to fragmentation, then consider yourself very lucky, but the rest of the guys know that heap fragmentation can be a complete and absolute nightmare.
Tracking dynamically allocated memory can be a daunting task ...
Together with the dynamic allocation of the gift comes the inevitable risk of memory leaks. I’m sure we all know what memory leaks are, but if not, read here . Most studios build infrastructure on top of their dynamic locations to keep track of which memory is used and where.
Bad link locality ...
In essence, there is no way to find out where the memory that malloc or new returns to you will be relative to other memory areas in your application. This can lead to the fact that we will have more costly misses in the cache than we need, and we will eventually dance in memory as on coals. So what is the alternative? The purpose of this note is to give you information (and a bit of code!) About several different alternatives that you can use instead of malloc and new in order to fight off the problems that have just been announced.

Modest linear allocator

The name of this section seems to hint that this allocator is by far the simplest of those I’m going to present, although in truth, they are all simple (and this makes them beautiful). A linear allocator essentially implies the absence of a minutely release of allocated memory resources and linearly allocates pieces of memory one after the other from a fixed pool. Look at the picture.

A good example where a linear allocator is extremely useful can be found almost everywhere when programming a synergistic processor. The persistence of local data outside the scope of the current task being performed is not important, and in many cases the amount of data arriving at the local storage (at least data that varies in size) is quite limited. However, do not be deceived, this is far the only application of it. Here is an example of how a simple linear allocator could be implemented in C.

/** Header structure for linear buffer. */ typedef struct _LinearBuffer { uint8_t *mem; /*!< Pointer to buffer memory. */ uint32_t totalSize; /*!< Total size in bytes. */ uint32_t offset; /*!< Offset. */ } LinearBuffer; /* non-aligned allocation from linear buffer. */ void* linearBufferAlloc(LinearBuffer* buf, uint32_t size) { if(!buf || !size) return NULL; uint32_t newOffset = buf->offset + size; if(newOffset <= buf->totalSize) { void* ptr = buf->mem + buf->offset; buf->offset = newOffset; return ptr; } return NULL; /* out of memory */ }

By applying bit operations to the offset value during memory allocation, equalized allocation can be maintained. It can be very useful, but be aware that depending on the size of the data that you place in your buffer (and in some cases the order in which the allocations are made), you can get unused space between the allocated memory in the buffer. For reasonable size alignments, this is usually acceptable, but it can become overly wasteful if you allocate memory, with much greater alignment, for example, 1 MB each. In such situations, the order in which objects are placed in a linear allocator can have a drastic effect on the amount of unused memory. All you need to do to reset the allocator (perhaps at the end of the level) is to set the offset value to zero. As always when working with dynamic memory, the allocator clients must be sure that they do not have pointers to the memory that you allocate in this way, otherwise you risk breaking the allocator. All C ++ objects that you have placed in the buffer will need to manually call the destructor.

Stack

A few reservations before starting to talk about this allocator. When I talk about a “stack allocator” in this particular case, I’m not talking about the call stack , however, as we will see later, that stack plays an important role in getting rid of dynamic memory allocation from the heap. So what am I talking about then? The linear allocator described above is an excellent solution for many memory allocation tasks, but what if you want a little more control over how you free up your resources? It will give you a stack allocator. Toward the end of the description of the linear allocator, I noted that to reset it, you can simply set the offset to zero, which frees up all the resources of the allocator. The principle of setting a specific offset value is the base used in the implementation of the stack allocator. If you are not familiar with the concept of a stack as a data structure, then probably now is the time to do it, for example, here . Made? Perfectly. Each piece of memory allocated by our stack allocator will optionally associate a pointer to the state of the stack allocator just before allocating. This means that if we restore this state of the allocator (by changing its offset), we will actually 'release' the memory, which can be reused. This is shown in the diagram below.

This can be very useful if you want to temporarily allocate memory for data with a limited lifetime. For example, the limited lifetime of a particular function or subsystem. This strategy may also be useful for such things as level resources that have a clearly defined release order (the reverse of the placement order). Here is a small C example that illustrates what has just been said.

 typedef uint32_t StackHandle; void* stackBufferAlloc(StackBuffer* buf, uint32_t size, StackHandle* handle) { if(!buf || !size) return NULL; const uint32_t currOffset = buf->offset; if(currOffset + size <= buf->totalSize) { uint8_t* ptr = buf->mem + currOffset; buf->offset += size; if(handle) *handle = currOffset; /* set the handle to old offset */ return (void*)ptr; } return NULL; } void stackBufferSet(StackBuffer* buf, StackHandle handle) { buf->offset = handle; return; }

Double sided stack

If you have mastered the stack concept described above, then we can move to a two-sided stack. It looks like a stack allocator, except that it has two stacks, one of which grows from the bottom of the buffer up and the other from the top of the buffer down. See the picture:

Where can this be used? A good example would be any situation where you have data of a similar type, but with completely different lifetime, or if you had data that would be closely related and should be placed in the same memory area, but have different the size. It should be noted that the place where the displacements of the two stacks meet is not fixed. In particular, stacks do not have to take up half the buffer each. The bottom stack can grow to very large sizes, and the top stack can be smaller, and vice versa. In order to make the best use of a memory buffer, this additional flexibility may sometimes be necessary. I think you do not need the advice of Captain O., and I will not give here examples of code for an allocator with a two-sided stack, since it naturally looks like the one-sided one-sided stack allocator that has already been discussed.

Pool

We will now slightly shift the focus from the family of the above allocators, based on linear movements of pointers or offsets, and move on to several other things. The pool allocator, which I will talk about right now, is designed to work with data of the same type or size. He divides the memory buffer controlled by him into segments of the same size, and then allows the client to select and release these pieces according to his desire (see diagram below). To do this, he must constantly monitor the free segments, and I know several ways to implement this. I personally avoid implementations that use a stack of indexes (or pointers) for free segments, because they need extra space, which can often be impermissible, but they generally exist. The implementation, which I will discuss here, does not require additional space to manage free segments in the pool. In fact, there are only two fields in the service data structure of this allocator, which makes it the smallest of all the allocators described in this note.

So how does it work? To manage free segments, we will use a data structure known as a linked list . Again, if you are not familiar with this data structure, then try to read it . Having experience with PlayStation3 and Xbox360, where access to memory is expensive, I generally find data structures based on nodes (for example, a linked list) rather grim, but I think this is probably one of those applications that I approve of. In fact, in the service structure of the allocator there will be a pointer to the linked list. The linked list itself is spread throughout the pool, occupying the same space as the free segments in the memory buffer. When we initialize the allocator, we pass through the buffer segments and write in the first four (or eight) bytes of each segment a pointer with the address of the next free segment. The heading points to the first item in this list. Some limitation of the storage of pointers in the free segments of the pool is that the size of the segment must be not less than the size of the pointer on the target gland. See chart below.

When allocating memory from a pool, you just need to move the pointer to the head of the linked list in the header to the second item in the list, and then return the pointer to the first item. It's very simple; when allocating memory, we always return the first item in the linked list. Similarly, when we free a segment and return it to the pool, we simply insert it into the head of the linked list. Inserting a freed segment into the head, not the tail, has several advantages: first of all, we should not go through a linked list (or store an additional pointer to the tail in the header) and secondly (and this is more important), at the node, just freed, more likely to stay in the cache on subsequent placements. After several allocations and freeing up your memory, your pool may look like this:

Here is a bit of C code for initializing the allocator and allocating and freeing operations.

 /* allocate a chunk from the pool. */ void* poolAlloc(Pool* buf) { if(!buf) return NULL; if(!buf->head) return NULL; /* out of memory */ uint8_t* currPtr = buf->head; buf->head = (*((uint8_t**)(buf->head))); return currPtr; } /* return a chunk to the pool. */ void poolFree(Pool* buf, void* ptr) { if(!buf || !ptr) return; *((uint8_t**)ptr) = buf->head; buf->head = (uint8_t*)ptr; return; } /* initialise the pool header structure, and set all chunks in the pool as empty. */ void poolInit(Pool* buf, uint8_t* mem, uint32_t size, uint32_t chunkSize) { if(!buf || !mem || !size || !chunkSize) return; const uint32_t chunkCount = (size / chunkSize) - 1; for(uint32_t chunkIndex=0; chunkIndex<chunkCount; ++chunkIndex) { uint8_t* currChunk = mem + (chunkIndex * chunkSize); *((uint8_t**)currChunk) = currChunk + chunkSize; } *((uint8_t**)&mem[chunkCount * chunkSize]) = NULL; /* terminating NULL */ buf->mem = buf->head = mem; return; }

A couple of words about stack allocations (alloca to help you)

If you remember, I said earlier that I would mention stack arrangements in the context of a call stack. I’m sure you’ve seen code that conceptually looks like this:

 myFunction() { myTemporaryMemoryBuffer = malloc(myMemorySize); /* ,       */ free (myTemporaryMemoryBuffer); }

Most C compilers support a function that should mean (depending on the size of the allocated memory) that you do not have to resort to heap allocations for temporary buffers of this kind. This function is alloca . The internal alloca depends on the architecture, but in essence it performs memory allocation by setting up the stack frame of your function, which allows you to write data to the call stack area. This may just be a move of the stack pointer (just like in the linear allocator mentioned at the beginning). The memory allocated to you by the alloca function is released upon exiting the function. There are, however, several pitfalls to be aware of when using alloca. Do not forget to check the size of the requested memory, to be sure that you are not asking for a stack of crazy size. If your stack overflows, the consequences will be unpleasant. For the same reason, if you think about allocating a large chunk of memory with alloca, it is better to know all the places from which your function will be called in the context of the overall flow of the program. Using alloca may affect portability in some limited cases (obviously), but I have not yet come across a compiler that would not support it.

Further details can be found in your favorite search engine.

And finally ...

Often, the memory allocated during the execution of a task is temporary and is stored only for the lifetime of one frame. To combat the unpleasant effects of fragmentation in systems with limited memory, it is important to use this information and place this kind of memory in separate buffers. Any of the above allocation schemes will work, but I would probably suggest trying one of the linear ones, since they are much easier to drop than the pool. Noel Llopis has more details on this topic in this excellent post . Which allocator is best for you depends on many factors, and on what the task you are trying to solve requires. I would advise you to think carefully about the memory allocation and freeing patterns that you plan to do on your system, think about the size and lifetime of the allocated memory and try to manage it with allocators that make sense in the given context. Sometimes it helps to draw the memory distribution on paper (yes, with a pencil and all that) or to jot down a diagram of what dependence of memory usage you have on time. Believe it or not, it can really help you understand how the system produces and receives data. If you do this, it will be easier to understand how to divide memory allocations so as to make memory management as easy, fast, and unfragmented as possible.

Keep in mind that I told here about just a small selection of the simplest strategies that I usually prefer when programming games for console consoles. I hope that if you’ve read it up to here and haven’t done such things before, your brain now lives with ideas about code written in the past that could use these methods to alleviate significant shortcomings associated with dynamic memory allocation or other strategies that you could use to solve limited memory management tasks without dynamic allocation.

In conclusion, I believe that programmers should take into account the consequences of dynamic memory allocation, especially in console games, and think twice about using the malloc function or the new operator. It is easy to convince yourself that you do not do too many allocations, which means it does not matter much, but this type of thinking spreads like an avalanche throughout the team, and leads to a slow and painful death. Fragmentation and performance losses associated with the use of dynamic memory, not being nipped in the bud, can have disastrous intractable consequences in your future development cycle. Projects where management and memory allocation are not properly thought out often suffer from accidental failures after a long gaming session due to a lack of memory (which, by the way, is almost impossible to reproduce) and cost hundreds of hours of work of programmers trying to free memory and reorganize its allocation.

Remember: it is never too early to start thinking about memory and whenever you start doing it, you will regret not having done it even earlier!

Additional Information...

Here are some links to similar topics, or to topics that I could not cover:

gamesfromwithin.com/start-pre-allocating-and-stop-worrying
en.wikipedia.org/wiki/Circular_buffer
en.wikipedia.org/wiki/Buddy_memory_allocation
www.memorymanagement.org/articles/alloc.html

Thanks to Sarah, Jaymine, Richard and Dat for reading this bredyatiny.

Another translator: I'm not a game developer, so translations of specific terms may not be very correct. For example, I have no idea what a synergistic processor is :) Translating the game of English words was also not easy, but I hope that it turned out to be not the worst option.

Source: https://habr.com/ru/post/274827/

All Articles