Working with dynamic memory is often a bottleneck in many algorithms, if you do not use special tricks.
In the article I will consider a couple of such techniques. The examples in the article differ (for example, from
this ) in that the overloading of the operators new and delete is used and due to this, the syntactic constructions will be minimalistic, and the reworking of the program will be simple. Also described are the pitfalls found in the process (of course, the gurus who read the standard from cover to cover will not be surprised).
')
0. Do we need manual work with memory?
First of all, we will check how smart the allocator can speed up the work with memory.
Let's write simple tests for C ++ and C # (C # is known for an excellent memory manager, which divides objects by generations, uses different pools for objects of different sizes, etc.).
class Node { public: Node* next; };
class Node { public Node next; }
Despite all the “spherical vacuum” of the example, the time difference turned out to be 10 times (62 ms against 650 ms). In addition, the c # example is completed, and according to the rules of good tone in c ++, selected objects must be deleted, which will further increase the gap (up to 2580 ms).
1. Pool of objects
The obvious solution is to take a large memory block from the OS and divide it into equal blocks of size size (Node), take a block from the pool when allocating memory, and return to the pool when freeing. A pool is easiest to organize using a single-linked list (stack).
Since there is a task of minimal intervention in the program, all that can be done is to add an admixture of BlockAlloc to the Node class:
class Node : public BlockAlloc<Node>
First of all, we need a pool of large blocks (pages) that we take from the OS or C-runtime. It can be organized on top of the malloc and free functions, but for greater efficiency (to skip the extra level of abstraction), we use VirtualAlloc / VirtualFree. These functions allocate memory in units of multiples of 4K, and also reserve process address space in units of multiples of 64K. At the same time specifying the commit and reserve options, we skip another level of abstraction, reserving the address space and allocating memory pages in one call.
PagePool class inline size_t align(size_t x, size_t a) { return ((x-1) | (a-1)) + 1; } //#define align(x, a) ((((x)-1) | ((a)-1)) + 1) template<size_t PageSize = 65536> class PagePool { public: void* GetPage() { void* page = VirtualAlloc(NULL, PageSize, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE); pages.push_back(page); return page; } ~PagePool() { for (vector<void*>::iterator i = pages.begin(); i != pages.end(); ++i) { VirtualFree(*i, 0, MEM_RELEASE); } } private: vector<void*> pages; };
Then arrange a pool of blocks of a given size.
Class BlockPool template<class T, size_t PageSize = 65536, size_t Alignment = 8 /* sizeof(void*) */> class BlockPool : PagePool<PageSize> { public: BlockPool() : head(NULL) { BlockSize = align(sizeof(T), Alignment); count = PageSize / BlockSize; } void* AllocBlock() {
Comment
// todo: lock (this) marked places that require inter-thread synchronization (for example, use EnterCriticalSection or boost :: mutex).
Let me explain why when “formatting” a page, I do not use the FreeBlock abstraction to add a block to the pool. If it were written something like
for (size_t i = 0; i < PageSize; i += BlockSize) FreeBlock((char*)tmp+i);
That page on the FIFO principle would be marked up “the other way around”:

A few blocks requested from the pool in a row would have decreasing addresses. And the processor does not like to go back, from this it breaks Prefetch (
UPD : Not relevant for modern processors). If you do the markup in the loop
for (size_t i = PageSize-(BlockSize-(PageSize%BlockSize))
then the markup loop would go back to the addresses.
Now that the preparations have been made, we can describe the admixture class.
template<class T> class BlockAlloc { public: static void* operator new(size_t s) { if (s != sizeof(T)) { return ::operator new(s); } return pool.AllocBlock(); } static void operator delete(void* m, size_t s) { if (s != sizeof(T)) { ::operator delete(m); } else if (m != NULL) { pool.FreeBlock(m); } }
Let me explain why you need checks
if (s! = Sizeof (T))When do they work? Then when a class inherited from the base T is created / deleted.
Heirs will use the usual new / delete, but BlockAlloc can also be added to them. Thus, we can easily and safely determine which classes should use pools without fear of breaking something in the program. Multiple inheritance also works great with this impurity.
Is done. Inherit Node from BlockAlloc and re-run the test.
Test time is now 120 ms. 5 times faster. But in c # the allocator is still better. Probably there is not just a coherent list. (If immediately after new immediately call delete, and thus not wasting a lot of memory, storing data in the cache, we get 62 ms. Strange. Exactly, like the .NET CLR, as if it returns the released local variables immediately to the corresponding pool, without waiting for GC)
2. Container and its colorful contents
Do you often come across classes that store a lot of different child objects, such that the last lifetime of the latter is not longer than the lifetime of the parent?
For example, it can be an XmlDocument class, filled with the Node and Attribute classes, as well as c-strings (char *) taken from the text inside the nodes. Or a list of files and directories in the file manager, loaded once when the directory is re-read and no longer changing.
As shown in the introduction, delete is more expensive than new. The idea of ​​the second part of the article is to allocate memory for the child objects in a large block associated with the Parent object. When deleting a parent object, the children will, as usual, be called destructors, but there is no need to return the memory — it will be freed by one large block.
Create a class PointerBumpAllocator, which can bite off a large block of pieces of different sizes and select a new large block when the old one is exhausted.
Class PointerBumpAllocator template<size_t PageSize = 65536, size_t Alignment = 8 > class PointerBumpAllocator { public: PointerBumpAllocator() : free(0) { } void* AllocBlock(size_t block) {
Finally, we describe the ChildObject admixture with the overloaded new and delete accessing the given allocator:
template<class T, class A = DefaultAllocator> struct ChildObject { static void* operator new(size_t s, A& allocator) { return allocator.AllocBlock(s); } static void* operator new(size_t s, A* allocator) { return allocator->AllocBlock(s); } static void operator delete(void*, size_t) { } // *1 static void operator delete(void*, A*) { } static void operator delete(void*, A&) { } private: static void* operator new(size_t s); };
In this case, in addition to adding an impurity to the child-class, it will also be necessary to correct all calls to new (or use the “factory” pattern). The syntax for the new operator is as follows:
new (... parameters for the operator ...) ChildObject (... constructor parameters ...)
For convenience, I specified two new operators that accept A & or A *.
If the allocator is added to the parent class as a member, the first option is more convenient:
node = new(allocator) XmlNode(nodename)
If the allocator is added as an ancestor (admixture), the second is more convenient:
node = new(this) XmlNode(nodename)
It is clear that the pointer and the link are mutually converted, the separation of these cases - getting rid of unnecessary icons.
There is no special syntax for calling delete; the compiler will call standard delete (marked with * 1), regardless of which of the new operators was used to create the object. That is, the delete syntax is normal:
delete node;
If an exception occurs in the ChildObject constructor (or its successor), delete is called with the signature corresponding to the signature of the new operator used to create this object (the first parameter size_t will be replaced with void *).
Placing the new operator in the private section protects against calling new without specifying the allocator.
Let me give you a complete example of using the Allocator-ChildObject pair:
Example class XmlDocument : public DefaultAllocator { public: ~XmlDocument() { for (vector<XmlNode*>::iterator i = nodes.begin(); i != nodes.end(); ++i) { delete (*i); } } void AddNode(char* content, char* name) { char* c = (char*)AllocBlock(strlen(content)+1); strcpy(c, content); char* n = (char*)AllocBlock(strlen(name)+1); strcpy(n, content); nodes.push_back(new(this) XmlNode(c, n)); } class XmlNode : public ChildObject<XmlNode, XmlDocument> { public: XmlNode(char* _content, char* _name) : content(_content), name(_name) { } private: char* content; char* name; }; private: vector<XmlNode*> nodes; };
Conclusion The article was written 1.5 years ago for the sandbox, but alas, the moderator didn’t like it.