📜 ⬆️ ⬇️

C ++ garbage collector

Hi, Habr! I conceived this article for a long time. It will be about the simplest copying garbage collector in C ++. It has quite a lot of restrictions (the part does not interfere, some can be circumvented if you aim to write some serious library, and for something it would be nice to have rudimentary support from the language), but the code in it is just over 100 lines. Interested please under the cat. There is a minimum of PLO, simple patterns and terrible magic rituals with pointers.

Start over. What is a garbage collector and what is it for?

A garbage collector (Gargabe Collector, GC) is such a way to manage resources, usually RAM, allocated on the heap. The essence is simple - the programmer asks the garbage collector to allocate a piece of memory to him, and he himself already determines when he becomes unnecessary and can be released. This solves most problems with memory leaks, although it makes it impossible to heroically exalt segmentation errors. What a pity.

Garbage collectors are of two kinds - conservative and copying.
')
Something like the first type is implemented in the last standard. The mechanism shared_ptr allows you to abandon the explicit use of the operators new and delete. It considers references that point to an object and gets rid of it when their number becomes zero. Problems with this approach arise when many small objects are created with a short but not the same lifespan. As a result, the heap turns into a bloody mess of valid objects and rags of free memory of several dozen bytes. Because of this, the creation of new objects begins to take forever and the native code begins to envy Python.

To solve this problem - heap fragmentation - a second type of collector was invented - copying. For the time being, his strategy of dealing with garbage is contemplation. When it becomes too much, he does the following:
1. Copies all the necessary data to another area of ​​memory.
2. Changes all pointers to actual ones.
3. Frees all memory that is no longer in use.

At once I’ll clarify that I didn’t look closely at a single garbage collection algorithm, nor how the “adult” GC libraries for C ++ work. Probably, the algorithm, which I will now describe, has a name, perhaps a nominal one, but here I will manage without references to sources.

To identify the objects that the program still needs, I suggest treating memory as a regular graph. Then the “living” objects after garbage collection will be those that are accessible through a chain of pointers. Here questions arise. First, as for any possible object that a programmer can ask to create, determine where he has pointers. The first way is using template magic for each class of objects to create its own allocator. A terrible idea for many reasons. The second is to write in the header of each object all the information that is needed for the operation of the GC (oh, yes, for classes with virtual functions this implementation is not appropriate. I have a couple of ideas, if necessary).

The title can also be used in many ways. Mine is the simplest one that can exist at all (for implementation, but not use). First, each object that plans to be created through the garbage collector must have this structure at the beginning of its definition:

enum { STRUCT_SZ = 0, REF_COUNT = 1 }; struct gcHeader { union { int gcData[2]; gcHeader* post_gcAddress; }; }; 

Secondly, right after the heading, and nowhere else, all the pointers, which also belong to the garbage collector, should go. Accordingly, their number will be stored in gcData [REF_COUNT]. This is one of the limitations that my implementation imposes. GcData [STRUCT_SZ] will store the size of the structure in bytes. I will reveal the purpose of the pointer later. Conveniently, the size of the structure was equal to the size of the pointer (now 2014, folks!).

Great, now we can bypass all our memory. The question is where to start the traversal. The only memory area that is 100% available to us at any time is the stack. The problem is the same as with pointers in structures - we have no idea which bunch of bytes points to a GC object. Therefore, you need to write down the location of each such pointer explicitly.

 vector<gcHeader**> referencesStack; template <class T> class stackRef { public: T* ref; stackRef(){ ref = nullptr; referencesStack.push_back(reinterpret_cast<gcHeader**>(&ref)); } ~stackRef(){ referencesStack.pop_back(); } }; 

The stackRef class is simple. When initialized, it only adds its address to the stack of links. The destructor, accordingly, deletes the irrelevant address from the same stack. The work of the call stack and constructors with destructors is synchronized, so there will be no anomalies.

In the class, you need to redefine a bunch of operators - dereferencing, assignment, etc, but it will make sense to do this no sooner than the guys from the Boost Foundation will contact me.

Auxiliary structures are ready. You can go to the memory allocation.

A cool feature of this way of managing resources is exactly the way they are allocated. The standard C ++ C ++ allocator has to update the lists of free blocks after each delete, and after new find the appropriate block among the blocks, then split it into two small blocks, and then something else that modern allocators do there. In the case of a garbage collector, the delete operation is simply not needed, so that all the occupied memory will go in one solid block. To create a new object, you only need to increase the size of this block, i.e., move one pointer. A simple operation that is performed in O (1). It really ceased to seem such a great idea, because it provokes a bunch of assemblies when it would be possible to simply reuse the unnecessary memory, but for now you can stop there.

We divide the memory controlled by the garbage collector into pieces of 4 kilobytes and link them to the list. In fact, a little more than 4 kilobytes, but this is a question of my laziness.

 const int CHUNK_SIZE = 4096; const int OVERDRAFT = 128; const int ACTUAL_SIZE = CHUNK_SIZE + OVERDRAFT; //      15  ,  -   . struct gcChunk { gcChunk* next; char data[ACTUAL_SIZE];//     . }; gcChunk* firstChunk; gcChunk* currentChunk; int chunkCount; int currentOffset; 

firstChunk is the beginning of the list, currentChunk is the last created block of memory. urrentOffset - the beginning of a free memory segment relative to currentChunk.

 gcHeader* gcRawAlloc(int size, int refCount){ if (size > CHUNK_SIZE)//  proof of concept,      return nullptr; if (currentOffset + size > CHUNK_SIZE){ //     list<gcChunk>  STL,   ,       . ++chunkCount; currentChunk->next = new gcChunk(); currentChunk = currentChunk->next; currentChunk->next = nullptr; currentOffset = 0; } gcHeader* new_obj = reinterpret_cast<gcHeader*>(&(currentChunk->data[currentOffset])); new_obj->gcData[STRUCT_SZ] = size; new_obj->gcData[REF_COUNT] = (refCount << 1)| 1; currentOffset += size; if (currentOffset % 4)//  4 .  ,    , ..         . currentOffset += 4 - currentOffset % 4; return new_obj;//        ,       ,    —   . } 


There are more unobvious moments here. We analyze the 12th line.
At this stage, it is more convenient for us not to think about exactly which type of the new object. We know for sure that it has our gcHeader, and that’s enough for now.
After we allocate memory for a new object, we need to initialize its header. What can mean mysterious assignment

 temp->gcData[REF_COUNT] = (refCount << 1)| 1; 

?

To do this, look again at the title definition.
 struct gcHeader { union { int gcData[2]; gcHeader* post_gcAddress; }; }; 


The union keyword means that the gcData array and the post_gcAddress pointer are located at the same address. This is useful for saving memory, but the problem is that C ++ does not remember how the data in the union was used last time — as an array, or as a reference. This feature of processor architectures, such as the need for data alignment, helps.

In short, any variables longer than one byte must be located at addresses that are multiples of the machine word in bytes. Disruption of alignment on modern processors slows down the work of the program, and the old ARMs generally refuse to work in such conditions. As a result, 2 or 3 low bits of the pointer can be used at any time at the discretion of the programmer. For example, when implementing red-black trees, the last bit is often used instead of a boolean variable.

This is the same thing. If the low-order bit is one, then these eight bytes are guaranteed to be an array of two int. You can, for example, use another bit to indicate to the garbage collector, something like "this is a reference to a polymorphic object, it has a vtable pointer, do not rub it."

Well, a small wrapper over the function so that the use of the allocator does not cause much pain.
 template <class T> T* gcAlloc(){ return reinterpret_cast<T*>(gcRawAlloc(sizeof(T), T::refCount)); } 
Here it is necessary to put emplace new so that objects with constructors are correctly initialized. As you can see, the class of the object we want to create must have a static constant refCount. It can be calculated automatically using an external preprocessor. Otherwise, I see at least three ways to shoot myself a leg.

Before using this function, you need to initialize the heap.
 void gcInit(){ firstChunk = currentChunk = new gcChunk; firstChunk->next = nullptr; currentOffset = 0; chunkCount = 1; } 

It's time to look at the implementation of the garbage collection itself.

The first function, gcCollect, should start the heap from scratch, not forgetting to save pointers to the old list. These lines almost repeat initialization.

 void gcCollect(){ //execvp("rm", "cppgc.cpp");//       . gcChunk* newFirstChunk = currentChunk = new gcChunk; currentChunk->next = nullptr; currentOffset = 0; chunkCount = 1; 


Next, we begin the build process with each pointer that is stored on the stack.

  for (auto i = referencesStack.begin();i != referencesStack.end(); ++i ) gcMove(*i); 


And now just freeing unnecessary memory.

  // ,       ,     gcChunk iter = firstChunk; firstChunk = newFirstChunk; while (iter != nullptr){ gcChunk* t = iter->next; delete[] iter; iter = t; } } 


Note that delete is called only for large blocks of memory. Thus, destructors of objects in the garbage collector will never be called. This is not a problem for classes where destructors only free memory, but there is no possibility, for example, to automatically close connections and file descriptors. Mark & ​​Sweep algorithms can help with this, but writing them is much more difficult.

The final touch is the gcMove function.

 bool isPointer(gcHeader a){ return (a.gcData[REF_COUNT] & 1) == 0; } void gcMove(gcHeader** current){ if (*current == nullptr) return; if (isPointer(**current)){//    .   ,    . (*current) = (*current)->post_gcAddress; return; } gcHeader* new_obj = gcRawAlloc((*current)->gcData[STRUCT_SZ], (*current)->gcData[REF_COUNT]); memcpy(new_obj, (*current), sizeof(char) * (*current)->gcData[STRUCT_SZ]); gcHeader** iterator = reinterpret_cast<gcHeader**>(new_obj) + 1; (*current)->post_gcAddress = new_obj; (*current) = new_obj; int refCount = new_obj->gcData[REF_COUNT] >> 1; for (int i = 0; i < refCount; ++i, ++iterator) gcMove(iterator); } 


Let's sort the function from the middle. We need the ability to redirect links, so a pointer to a pointer to the data is passed to the function.

GC allocates the object the necessary amount of memory in the new heap (it knows how much from the header) and copies all the data from the old incarnation to the new one. Then he writes a new address of the object in the old title. Now, if the object is pointed to by several references, the algorithm will be able to determine that the object has already been moved once (the low-order bit, guaranteed, 0) and will not copy it once again afterwards. It remains to redirect the old pointer to a new copy of the object.
Now, you need to deal with the pointers of the object itself. You need to do the same with them. Line

 gcHeader** iterator = reinterpret_cast<gcHeader**>(temp) + 1; 


gets a pointer to the first link in the structure (if there is one, of course). Remember that sizeof (gcHeader) == sizeof (void *). Otherwise, it will take a couple more lines.
What to do next, the question is already controversial. I simply recursively call the gcMove function for each pointer. Such an algorithm corresponds to bypassing the graph in depth . However, this is not the best choice.
The killer feature of copying garbage collectors for me is the ability to maintain link locality. In short, objects that refer to each other, and in the memory should also be as close as possible. So the processor will be able to more efficiently use its cache memory.

Hidden text
You can conduct an experiment. Create a linked list and start inserting elements into arbitrary places. And then - just go around and print the entire list. Most likely, the C ++ program will perform the last stage longer than Java or C # after the forced garbage collection. This is due to the fact that in the case of C ++, the processor will constantly stop at the cache misses and wait for the data from the slow RAM to arrive. In the case of Java, this will be practically a traversal of the whole array.


My GC does not know how. I chose to crawl in depth because of the simplicity. It is advisable to move objects in the order of walking wide . It will be very cool to be confused and to line up objects in memory in accordance with the statistics of calls to them, as in this algorithm , in order to achieve the optimal number of misses.

That's all. It remains to demonstrate the work with the collector.

As an example, take the simplest search tree .
 struct searchTree { gcHeader gc; searchTree* left; searchTree* right; int key; static const int refCount = 2; }; 


As already mentioned, there should be a header at the beginning, and after the title all pointers to objects of our heap.

 void stAdd(searchTree* &target, int key){ if (target == nullptr){ target = gcAlloc<searchTree>(); target->left = target->right = nullptr; target->key = key; return; } if (target->key == key) return; if (target->key < key) stAdd(target->left, key); else stAdd(target->right, key); } 


The usual addition to the tree. Notice how gcAlloc is used.

 searchTree* stFind(searchTree* target, int key){ if (target == nullptr || target->key == key) return target; if (target->key < key) return stFind(target->left, key); else return stFind(target->right, key); } void stPrint(searchTree* t, int indent = 0){ if (t == nullptr) return; for (int i = 0; i < indent; ++i) cout << " "; cout << t << ' ' << t->key << endl; stPrint(t->left, indent + 1); stPrint(t->right, indent + 1); } void stCut(searchTree* &target, int key){ if (target == nullptr || target->key == key){ target = nullptr; return; } if (target->key < key) stCut(target->left, key); else stCut(target->right, key); } 


stFind returns a reference to the subtree with the necessary key, stPrint prints out the keys and addresses of the subtree, stCut clips the subtree in which the desired key is stored.

Finally, the main.

 int main(){ gcInit(); stackRef<searchTree> root; stAdd(root.ref, 2); stAdd(root.ref, 1); stAdd(root.ref, 3); stAdd(root.ref, 6); stAdd(root.ref, 5); stAdd(root.ref, 4); stAdd(root.ref, 8); stackRef<searchTree> additionalRef; additionalRef.ref = stFind(root.ref, 3); cout << "Before GC" << endl; cout << additionalRef.ref << ' ' << currentOffset << endl <<endl; stPrint(root.ref); cout << endl; gcCollect(); cout << "After GC" << endl; cout << additionalRef.ref << ' ' << currentOffset << endl << endl; stPrint(root.ref); cout << endl; stCut(root.ref, 5); gcCollect(); cout << "Deleted some elements and GC'd." << endl; cout << additionalRef.ref << ' ' << currentOffset << endl << endl; stPrint(root.ref); return 0; } 


 Before GC 0xd92058 224 0xd92018 2 0xd92058 3 0xd92078 6 0xd920d8 8 0xd92098 5 0xd920b8 4 0xd92038 1 After GC 0xd93108 224 0xd930e8 2 0xd93108 3 0xd93128 6 0xd93148 8 0xd93168 5 0xd93188 4 0xd931a8 1 Deleted some elements and GC'd. 0xd92038 160 0xd92018 2 0xd92038 3 0xd92058 6 0xd92078 8 0xd92098 1 


As you can see, working with structures for the programmer does not change much. What's going on here:
1. We randomly fill the search tree.
2. Create another stack reference to one of the elements to check how the GC responds to multiple references to a single object.
3. Print the tree, additionalReference and currentOffset.
4. Empty the cause garbage collection.
5. We print the tree again. All pointers that are needed - have changed.
6. We prune one subtree and call garbage collection again. Everything works fine again. Notice the currentOffset has decreased, and the root of the tree has returned to the same address where it was located for the first time.

findings

So, in C ++, you can use the Garbage Collector. Moreover, it is quite a pretty one, in my zamylenny eyes, and even with a minimum overhead. I will try to list everything that needs to be done so that it is really convenient and useful.
First - the organizational moments.

1. Global variables are, of course, not cool at all. It is necessary to arrange everything as a human class and / or C ++ allocator.
2. To force a cheder into each class is almost sadism. You just need to give inheritance from the abstract class, which should have two methods - getSize and getLayout. The latter should return a reference to the array, in which the relative coordinates of all pointers are stored (the idea with “all references are at the beginning” is not at all suitable for something serious). This array should definitely be filled in automatically, but I still have no idea how to do this.

The next question is automatic build. When the idea of ​​GC itself was put forward, no one meant that someone would actually invoke something like the gcCollect function all the time, everything should happen by itself. But C ++ is a special case. It is famous for the fact that the entire flow of execution under the nose, or at least predictable. The capricious Garbage Collector of any other language here is almost an ideological crime. So, it should have at least two modes:

1. Transparent.
2. Throw an exception after exhausting a certain quota of memory. Here it is up to the programmer to decide whether to get out or allocate memory forcefully.

And one more question. Multithreading It's all bad. To start garbage collection, you need to suspend all threads, so as not to break anything. As a result, you have to write half the JVM. The best solution seems to me his absence. For each thread, you can simply create your own dedicated GC, and if you need to transfer something to another subprocess, nobody has canceled the usual shared_ptr. Without shared memory, life is usually much more fun.

Finish on a sad note. Such a garbage collector is totally incompatible with any ready-made library. Their objects will not be able to provide the necessary data. Despite the fact that std :: list, std :: map and std :: set will only benefit if you rewrite them specifically for GC, redoing N gigabytes of Boost sources, for example, is completely hopeless. However, for dealing with fragmentation in the case of small objects, such a thing seems to me very useful.

You can download and play from here .

Source: https://habr.com/ru/post/213225/


All Articles