Garbage Collector & C ++

Manual memory management with C ++ is simultaneously one of the biggest advantages and disadvantages in the language. Indeed, this paradigm allows you to create very productive programs, but it also gives rise to a lot of problems. There are several ways to get rid of them. I tried several and eventually came to the garbage collector. In this article, I want to present not so much the implementation of the garbage collector in C ++, but the history of the idea and its use, what it is to use, and why it was finally abandoned.

So, like most programmers, I have my own project, namely a two-dimensional game engine. All "experiments" were made on it.
')

Stage One: Memory Debugging

The most common problem with manual memory management is leaks. To learn about them you need to monitor the memory. That was my initial approach. We follow the creation and deletion of objects, check after the end of the program that remains not deleted. It is done very simply: we reload the new and delete operators so that they can take in the parameters the name of the source code file and the line where the allocation comes from, and store all the locations in one place. For convenience, we declare a macro, which transmits the file name and string to the operator. Accordingly, when deleting an object, we delete the record about the corresponding allocation.

Code

#include <vector> class MemoryManager { struct AllocInfo { const char* source; int line; int size; void* data; }; static MemoryManager instance; std::vector<AllocInfo> allocations; public: static void RegAllocation(void* ptr, int size, const char* source, int line) { AllocInfo info; info.data = ptr; info.size = size; info.source = source; info.line = line; instance.allocations.push_back(info); } static void UnregAllocation(void* ptr) { for (std::vector<AllocInfo>::iterator it = instance.allocations.begin(); it != instance.allocations.end(); ++it) { if (it->data == ptr) { instance.allocations.erase(it); return; } } } static void PrintInfo() { for (std::vector<AllocInfo>::iterator it = instance.allocations.begin(); it != instance.allocations.end(); ++it) printf("0x%x: %i bytes at %s: %i", it->data, it->size, it->source, it->line); } }; MemoryManager MemoryManager::instance; void* operator new(size_t size, const char* location, int line) { void* res = ::operator new(size); MemoryManager::RegAllocation(res, size, location, line); return res; } void operator delete(void* allocMemory, const char* location, int line) { MemoryManager::UnregAllocation(allocMemory); } void operator delete(void* allocMemory) { MemoryManager::UnregAllocation(allocMemory); } #define mnew new(__FILE__, __LINE__) int main() { int* data = mnew int; MemoryManager::PrintInfo(); return 0; }

pros

simple and clear
not expensive in terms of resources
during the execution of the program, we know the amount of memory involved

Minuses

to learn about leaks, you need to complete the program
not very informative exactly where allocations happen
using overloaded operators slightly changes the syntax

Stage Two: Smart Pointers

And the truth is, the most common solution to the problem of memory management is smart pointers. You are sure to be asked about them at the interview. The idea is simple: instead of the usual pointers, we use template classes that work as pointers, but have additional functionality for automatically releasing objects. Together with the object, we store the reference counter, when assigning a value to a pointer, we increase the counter, when we destroy the pointer, we decrease it. If the counter is zero, the object is not needed by anyone and it can be freed. However, there is a small problem - if two objects refer to each other, they will never be released, since both have a reference count equal to one.

Weak pointers solve this obsession problem. One of the pointers is made weak, and now the reference counters are zero and both objects can be freed.

Well, let's think of a more complicated situation.

This situation can also be resolved with weak / strong pointers, but will it be easier to manually control? If the programmer does something wrong, the result will be the same: a leaked, un-freed object. In fact, such situations are rare and, in general, such an approach greatly simplifies the work with memory.

pros

does not require a lot of resources
guarantees the correct release of objects with the right architecture
easy to learn

Minuses

complicates the syntax
in some situations it is difficult to properly build weak / strong pointers
looped links lead to leaks

Stage Three: Cycling

After testing smart pointers, I remained dissatisfied. I just want to use the same type of smart pointer everywhere, and not to think about looped links. Let him think about them. But how to do that? Imagine the situation:

It is necessary to somehow find out that the two lower objects are looped and can be deleted, because no one refers to them. According to the picture, it is already easy to guess: if the links from the object do not lead to higher-level objects, then it can be released. Top-level objects are, roughly speaking, those objects that begin application initialization. For C ++, these are objects on the stack and static.

Stage Four: Garbage Collector

I think the attentive reader already understood that this is the scheme of the garbage collector, only the opposite. The simplest collector works like this: going down the links from top-level objects, we mark everyone as relevant, after that we have the right to remove those that are not marked.

For implementation, we need the same template pointer class as in the case of smart pointers. Plus, you need the collector himself, who will monitor everything and do the actual garbage collection.

Slightly more code

 #include <vector> #include <map> #include <algorithm> class IPtr; template<typename T> struct Destroyer { static void Destroy(void* obj) { (*(T*)(obj)).~T(); } }; class MemoryManager { public: struct ObjectInfo { void* object; size_t size; bool mark; const char* source; int line; std::vector<IPtr*> pointers; void(*destroy)(void*) = nullptr; }; private: static MemoryManager instance; std::map<void*, ObjectInfo*> objects; std::vector<IPtr*> pointers; size_t allocatedBytes = 0; bool currentMark = true; public: static void CollectGarbage(); protected: void MarkObject(ObjectInfo* info); friend void* operator new(size_t size, const char* source, int line); friend void operator delete(void* object, const char* source, int line); friend void operator delete(void* object); template<typename T> friend class Ptr; }; MemoryManager MemoryManager::instance; class IPtr { protected: void* object; MemoryManager::ObjectInfo* info; public: virtual ~IPtr() {} virtual bool IsRoot() const = 0; protected: void MarkInvalid() { object = nullptr; info = nullptr; } friend void operator delete(void* object); friend class MemoryManager; }; template<typename T> class Ptr: public IPtr { public: Ptr() { object = nullptr; info = nullptr; MemoryManager::instance.pointers.push_back(this); } Ptr(T* object) { this->object = object; auto fnd = MemoryManager::instance.objects.find(object); if (fnd != MemoryManager::instance.objects.end()) { info = fnd->second; info->pointers.push_back(this); if (!info->destroy) info->destroy = &Destroyer<T>::Destroy; } MemoryManager::instance.pointers.push_back(this); } Ptr(const Ptr<T>& other) { object = other.object; info = other.info; if (info) info->pointers.push_back(this); MemoryManager::instance.pointers.push_back(this); } ~Ptr() { if (info) { auto fnd = std::find(info->pointers.begin(), info->pointers.end(), this); if (fnd != info->pointers.end()) info->pointers.erase(fnd); } auto fnd = std::find(MemoryManager::instance.pointers.begin(), MemoryManager::instance.pointers.end(), this); if (fnd != MemoryManager::instance.pointers.end()) MemoryManager::instance.pointers.erase(fnd); } T* Get() const { return (T*)object; } bool IsValid() const { return object != nullptr; } bool IsRoot() const { return false; } operator bool() { return object != nullptr; } operator T*() { return (T*)object; } T* operator->() { return (T*)object; } T& operator*() { return *(T*)object; } const T& operator*() const { return *(T*)object; } Ptr<T>& operator=(const Ptr<T>& other) { if (info) { auto fnd = std::find(info->pointers.begin(), info->pointers.end(), this); if (fnd != info->pointers.end()) info->pointers.erase(fnd); } object = other.object; info = other.info; if (info) info->pointers.push_back(this); return *this; } Ptr<T>& operator=(T* other) { if (info) { auto fnd = std::find(info->pointers.begin(), info->pointers.end(), this); if (fnd != info->pointers.end()) info->pointers.erase(fnd); } object = other; auto fnd = MemoryManager::instance.objects.find(object); if (fnd != MemoryManager::instance.objects.end()) { info = fnd->second; info->pointers.push_back(this); if (!info->destroy) info->destroy = &Destroyer<T>::Destroy; } else info = nullptr; return *this; } }; template<typename T> class RootPtr: public Ptr<T> { public: RootPtr():Ptr() {} RootPtr(T* object):Ptr(object) {} RootPtr(const Ptr<T>& other):Ptr(other) {} bool IsRoot() const { return true; } operator bool() { return object != nullptr; } operator T*() { return (T*)object; } T* operator->() { return (T*)object; } T& operator*() { return *(T*)object; } const T& operator*() const { return *(T*)object; } RootPtr<T>& operator=(const Ptr<T>& other) { Ptr<T>::operator=(other); return *this; } RootPtr<T>& operator=(T* other) { Ptr<T>::operator=(other); return *this; } }; void* operator new(size_t size, const char* source, int line) { void* res = malloc(size); MemoryManager::ObjectInfo* objInfo = new MemoryManager::ObjectInfo(); objInfo->object = res; objInfo->size = size; objInfo->mark = MemoryManager::instance.currentMark; objInfo->source = source; objInfo->line = line; MemoryManager::instance.objects[res] = objInfo; MemoryManager::instance.allocatedBytes += size; return res; } void operator delete(void* data, const char* source, int line) { delete data; } void operator delete(void* data) { auto fnd = MemoryManager::instance.objects.find(data); if (fnd != MemoryManager::instance.objects.end()) { MemoryManager::instance.allocatedBytes -= fnd->second->size; for (auto ptr : fnd->second->pointers) ptr->MarkInvalid(); delete fnd->second; MemoryManager::instance.objects.erase(fnd); } free(data); } void MemoryManager::CollectGarbage() { instance.currentMark = !instance.currentMark; for (auto ptr : instance.pointers) { if (ptr->IsRoot()) { if (ptr->info) instance.MarkObject(ptr->info); } } std::vector< std::map<void*, ObjectInfo*>::iterator > freeObjects; for (auto obj = instance.objects.begin(); obj != instance.objects.end(); ++obj) { if (obj->second->mark != instance.currentMark) freeObjects.push_back(obj); } for (auto obj : freeObjects) { instance.allocatedBytes -= obj->second->size; obj->second->destroy(obj->first); free(obj->first); for (auto ptr : obj->second->pointers) ptr->MarkInvalid(); delete obj->second; instance.objects.erase(obj); } } void MemoryManager::MarkObject(ObjectInfo* info) { info->mark = MemoryManager::instance.currentMark; char* left = (char*)info->object; char* right = left + info->size; for (auto ptr : instance.pointers) { char* cptr = (char*)ptr; if (cptr >= left && cptr < right) { if (ptr->info && ptr->info->mark != MemoryManager::instance.currentMark) MarkObject(ptr->info); } } } #define mnew new(__FILE__, __LINE__) struct B; struct C; struct D; struct A { Ptr<B> pb; Ptr<C> pc; A() { printf("A()\n"); } ~A() { printf("~A()\n"); } }; struct B { Ptr<C> pc; B() { printf("B()\n"); } ~B() { printf("~B()\n"); } }; struct C { Ptr<D> pd; C() { printf("C()\n"); } ~C() { printf("~C()\n"); } }; struct D { Ptr<C> pc; D() { printf("D()\n"); } ~D() { printf("~D()\n"); } }; int main() { RootPtr<A> pa = mnew A; pa->pb = mnew B; pa->pc = mnew C; pa->pc->pd = mnew D; pa->pc->pd->pc = pa->pc; pa->pc = nullptr; MemoryManager::CollectGarbage(); return 0; }

How it works

For each created object, ObjectInfo meta-information is created and stored in the MemoryManager. Each such object is created by an overloaded new operator. ObjectInfo stores information about the size of an object, the place from which it was created, a list of pointers to it and a pointer to a function to call the destructor of this object.

To work with objects, instead of the usual pointers, the template classes Ptr and RootPtr are used . RootPtr is required to indicate top-level objects, since during the execution of the program it is impossible to know that the object was created on the stack or statically (correct me if I'm wrong). When initializing or copying pointers, pointers are added and removed from the corresponding ObjectInfo .

When you call garbage collection, the Boolean variable of the marker is reversed and the object marking cycle begins. The cycle goes through high-level pointers, then recursively through their child pointers. The child pointer is the one that is in the body of the object. After that, all objects for which the marker does not match the current one are deleted.

An attentive reader will surely notice one more thing: we pay for productivity with productivity. We get all the typical cons of garbage collectors: excessive memory consumption and a lot of garbage collection. It is on my project of the game engine that is a fatal minus, because every frame should take a few milliseconds and there is simply no time to stop everything and collect garbage. However, there is a good point, this garbage collector is completely ours and we can do whatever we want!

Stage Five: Improvisation

The first thing that comes to mind is that we completely manage the moment of garbage collection. It is not necessary to do this in the middle of the gameplay. You can do this just when the greatest number of initializations and destruction of objects occur during loading levels. When developing games, it is not customary to do a lot of allocations / destruction during an active game. If at each shot to load and create an object bullet, no memory optimization will not save you. Therefore, to do a long garbage collection between levels seems like a rather tenacious idea.

The next idea is not to call garbage collection at all or to do it extremely rarely. Objects are still deleted manually or by smart pointers, and the garbage collector can be used as a debugger and safety net. In this embodiment, we get the performance equivalent to manual memory management and safety from leaks during garbage collection.

Thus, we can use the collector in different ways. With rapid prototyping, we are not concerned with performance, but we need stability, in which case we use automatic assembly. In a normal situation, we try to manually manage the memory, and the collector tells us and insures. And of course, you can simply turn it off and go to fully manual control.

In addition, you can realize a little more "goodies". Since we use template pointer classes, we can check them for correctness, that is, when deleting an object manually, make all references to it invalid. And then in the right places to check them. Another possible improvement is memory defragmentation. At the stage of garbage collection, you can rearrange the actual objects in the memory to reduce the processor cache misses, which will undoubtedly give a performance boost.

pros

protection against leaks and the use of incorrect pointers
minimum costs when working in semi-automatic mode
maximum convenience with fully automatic mode

Minuses

syntax complication
understanding of the pattern of operation and use is necessary
garbage collection process still takes considerable time

Stage Six: Return to Top

In the end, the decision to refuse the collector was influenced by the specifics of my project. The project is planned to be open source and it focuses primarily on usability. The increasing complexity of the already complicated C ++ syntax with specific pointers and the addition of a garbage collector will undoubtedly have a bad impact on the project. Just imagine the developer’s acquaintance with the new technology: he needs to learn a new API, and also with a tricky memory management model, especially since most C ++ programmers manage the memory well enough so that they can be managed manually.
Finally, I made sure to return to the manual model when I decided to use scripts. That they are needed for simplicity and convenience. You make a prototype or a simple game - use scripts, save time and resources. Flexibility and performance are needed - welcome to C ++. That who will use With ++ actually the garbage collector is hardly required.

That's how I returned to the beginning. I hope my experience will be useful or at least interesting to other cyclists.

PS The code from the article is not optimized and is provided only for understanding the work.

Source: https://habr.com/ru/post/282544/

All Articles

Garbage Collector & C ++

Stage One: Memory Debugging

Stage Two: Smart Pointers

Stage Three: Cycling

Stage Four: Garbage Collector

Stage Five: Improvisation

Stage Six: Return to Top

More articles: