Which of us doesn't like refactoring? I think that many times each of us, when refactoring old code, discovered something new or remembered something important, but well forgotten. Most recently, having somewhat refreshed my knowledge of how std :: shared_ptr works when using a custom allocator, I decided that you shouldn’t forget them anymore. All that was refreshed managed collected in this article.
In one of the projects, it was necessary to optimize performance. Profiling pointed to a large number of calls to the new / delete operators and the corresponding calls to malloc / free, which not only lead to expensive locks in a multithreaded environment by themselves, but can also cause such heavy functions as malloc_consolidate at the most unexpected moment. A large number of operations with dynamic memory was caused by intensive work with smart pointers std :: shared_ptr.
There were not many classes whose objects were created in this way. In addition, I did not want to rewrite the application. Therefore, it was decided to investigate the possibility of using the pattern - object pool. Those. leave the use of shared_ptr, but redo the memory allocation mechanism in such a way as to get rid of the intensive acquisition / release of dynamic memory.
Replacing the standard implementation of malloc with other variants (tcmalloc, jemalloc) was not considered, since by experience, the replacement of the standard implementation did not affect the performance fundamentally, but the changes would have affected the entire program with possible consequences.
Later, the idea was transformed into the use of its own memory pool and the implementation of a special allocator. The advantage of using the memory pool in my case over the object pool is transparency for the calling code. When using the allocator, objects will be placed in the already allocated memory (the placing operator new will be used) with the corresponding constructor call, and also cleared by explicit calls to the destructor. Those. additional actions that are characteristic of the object pool for initializing an object (when retrieving from a pool) and for bringing it to its initial state (before returning to the pool) are not required.
Next, I will consider what interesting features of working with memory when using shared_ptr for me personally, I understood and put it on the shelves. In order not to overload the text with details, the code will be simplified and will relate to the real project only in the most general terms. First of all, I will focus not on the implementation of the allocator, but on the principle of working with std :: shared_ptr when using a custom allocator.
The current pointer creation mechanism was using std :: make_shared:
auto ptr = std::make_shared<foo_struct>();
As you know, this method of creating a pointer eliminates some of the potential problems associated with memory leaks that occur if you create a pointer for workers and peasants (although in some cases this option is also justified. For example, if you want to transfer a deleter):
auto ptr = std::shared_ptr<foo_struct>(new foo_struct);
The key idea in working with std :: shared_ptr memory in order to create a control block. And we know that this is a special structure that makes the pointer smart. And for her, you need to allocate memory accordingly.
The ability to completely control the memory usage when working with std :: shared_ptr is provided to us via std :: allocate_shared. When calling std :: allocate_shared, you can pass your own allocator:
auto ptr = std::allocate_shared<foo_struct>(allocator);
If you override the new and delete operators, you can see how the necessary amount of memory is allocated for the structure from the example:
struct foo_struct { foo_struct() { std::cout << "foo_struct()" << std::endl; } ~foo_struct() { std::cout << "~foo_struct()" << std::endl; } uint64_t value1 = 1; uint64_t value2 = 2; uint64_t value3 = 3; uint64_t value4 = 4; };
Take for example the simplest allocator:
template <class T> struct custom_allocator { typedef T value_type; custom_allocator() noexcept {} template <class U> custom_allocator (const custom_allocator<U>&) noexcept {} T* allocate (std::size_t n) { return reinterpret_cast<T*>( ::operator new(n*sizeof(T))); } void deallocate (T* p, std::size_t n) { ::operator delete(p); } };
---- Construct shared ---- operator new: size = 32 p = 0x1742030 foo_struct() operator new: size = 24 p = 0x1742060 ~foo_struct() operator delete: p = 0x1742030 operator delete: p = 0x1742060 ---- Construct shared ----
---- Make shared ---- operator new: size = 48 p = 0x1742080 foo_struct() ~foo_struct() operator delete: p = 0x1742080 ---- Make shared ----
---- Allocate shared ---- operator new: size = 48 p = 0x1742080 foo_struct() ~foo_struct() operator delete: p = 0x1742080 ---- Allocate shared ----
An important feature of using both std :: make_shared and a custom allocator when working with shared_ptr is, at first glance, an insignificant thing, the ability to allocate memory for both the object itself and for the control block in one call to the allocator. This is often written in books, but it is poorly stored in memory until you come across this in practice.
If you lose sight of this aspect, then the behavior of the system when creating a pointer seems rather strange. We plan to use the allocator to allocate memory for a specific object to which the pointer should point, but in reality a request for memory allocation requires more space than the object should occupy. And the type of the used allocator does not match our source.
---- Allocate shared ---- Allocating: std::_Sp_counted_ptr_inplace<foo_struct, custom_allocator<foo_struct>, (__gnu_cxx::_Lock_policy)2> operator new: size = 48 p = 0x1742080 foo_struct() ~foo_struct() Deallocating: std::_Sp_counted_ptr_inplace<foo_struct, custom_allocator<foo_struct>, (__gnu_cxx::_Lock_policy)2> operator delete: p = 0x1742080 ---- Allocate shared ----
Memory is not allocated to an object of class foo_struct. More precisely, not only for foo_struct.
Everything falls into place when we recall the std :: shared_ptr control block. Now, if you add some more debugging output to the copy allocator of the allocator, you can see the type of object being created.
---- Allocate shared ---- sizeof control_block_type: 48 sizeof foo_struct: 32 custom_allocator<T>::custom_allocator(const custom_allocator<U>&): T: std::_Sp_counted_ptr_inplace<foo_struct, custom_allocator<foo_struct>, (__gnu_cxx::_Lock_policy)2> U: foo_struct Allocating: std::_Sp_counted_ptr_inplace<foo_struct, custom_allocator<foo_struct>, (__gnu_cxx::_Lock_policy)2> operator new: size = 48 p = 0x1742080 foo_struct() ~foo_struct() custom_allocator<T>::custom_allocator(const custom_allocator<U>&): T: std::_Sp_counted_ptr_inplace<foo_struct, custom_allocator<foo_struct>, (__gnu_cxx::_Lock_policy)2> U: foo_struct Deallocating: std::_Sp_counted_ptr_inplace<foo_struct, custom_allocator<foo_struct>, (__gnu_cxx::_Lock_policy)2> operator delete: p = 0x1742080 ---- Allocate shared ----
In this case, the allocator rebind works . Those. getting one type of allocator from another type of allocator. This "trick" is used not only in std :: shared_ptr, but also in other classes of the standard library such as std :: list or std :: map - where the actual stored object is different from the user. At the same time, the necessary variant is created from the initial allocator to allocate the required amount of memory.
So, when using a custom allocator, memory is allocated both for the control unit and for the object itself. And all this for one call. This should be considered when creating an allocator. Especially if the memory used is pre-allocated in blocks of fixed length. The problem here is to correctly determine the size of the memory block that will be really needed when running the allocator.
I haven’t yet found anything better than to use either a deliberately great value or a completely non-portable method:
using control_block_type = std::_Sp_counted_ptr_inplace<foo_struct, custom_allocator<foo_struct>, (__gnu_cxx::_Lock_policy)2>; constexpr static size_t block_size = sizeof(control_block_type);
By the way, depending on the version of the compiler, the size of the control block is different.
I would be grateful for the hint how to solve this puzzle in a more elegant way.
As a conclusion, I would like to repeat that an important result of using an alternative allocator was the ability to perform optimization without major modification of the existing code and interface of working with objects. And of course, do not forget to periodically refresh the memory of various subtle aspects of the work of your programming language!
The source code of the github example.
Thanks for attention!
Source: https://habr.com/ru/post/304308/
All Articles