Place and conquer! Use host new to optimize C ++ code.

Creating an object for an object, we often do not pay attention to such a “trifle” as dynamic memory allocation. Along with copying and serialization, memory allocation from the heap through new gradually negates the advantages of C ++ in speed. The more intensively we use the cherished new, the more difficult it becomes for the application, since the memory ends, is fragmented and in every possible way seeks to leak. This fate has already befell convenient, but implicitly dangerous for the performance containers STL: vector, string, deque, map. It is especially insulting to lose speed on the allocation of small objects in large quantities. But there is a way to handle the memory allocation of such objects on the stack, while hiding implementation details in a special data class. The new-placer mechanism will help us with this - an unsurpassed way to optimize an application full of frequent and small heap memory allocations.

In the last lesson, we did amazing things: working with C ++ objects as containers, containing values of the type computed at runtime and filled in dynamically. We actively used the Copy-on-Write add-on over std :: shared_ptr, which referred to the actual data type when filling the object. In this case, it was assumed that we will allocate memory dynamically for any initialization of data, calling new every time we need new data of an arbitrary type.

This approach has its advantages. Data can be shared between multiple objects, deferring copying. You can, in principle, do not know in advance about the type of data. However, this method has a number of drawbacks, due to which Copy-on-Write is used, as a rule, for potentially quite large objects.
')
The first drawback is found out immediately. Mass dynamic memory allocation seriously slows down program execution, especially mass implicit memory allocation through new. Yes, I am aware of both std :: string and std :: vector, which often, without asking the programmer, begin to reallocate the memory, causing one new after another (and we'll still talk about repositioning data in std :: vector). A good specialist in C ++ development always knows about these funny features of standard containers and understands how to avoid the extra costs of allocating new memory segments. What a clean si has always been good about is precisely the fact that any work with memory was performed transparently, in C ++ you always need to keep in mind a number of cases of implicit work with memory.

The second disadvantage is a consequence of the first. Frequent allocation of small segments of memory in large quantities will lead to terrible memory fragmentation and the inability to allocate even a rather small memory block in a single piece, for example, to initialize the same std :: vector or std :: string. As a result, we get bad_alloc for no apparent reason. The memory is much larger than needed, and it will not be possible to allocate a continuous block of even small size in conditions of highly fragmented memory.

Thus, for small objects comparable to int64_t that can be safely placed on the stack, you can and should use a different data processing technique. Such objects can be transferred by value, you can copy as many times as you like, without delaying until the first change, since one or two registers are trite.

However, we should not deviate from the practice of declaring the details of the data in the implementation. But we have to sacrifice something: we will need to know in advance the exact size of the data in bytes. It is required in order to keep a buffer in the classroom for storing the object data along with the usual data pointer. Now more.

First grade

Outwardly, almost nothing changes. All the same class that provides the API objects. The class contains a reference to the data whose class is declared through the forward declaration and will be moved to the implementation details. Because of this, the class field cannot be declared an object of this type, however, the data type can be referenced with a simple pointer and get a buffer in advance to store the object data in the object itself. If an object is created, for example, on a stack, then all data will be stored on the stack as part of an object. Now consider an example so that everything falls into place:

class object { public: ... protected: //    class data; //       static const size_t max_data_size = N; private: //    data* m_data; //  ,     char m_buffer[max_data_size]; };

In this code snippet, we continue the ideology of hiding data in the implementation, all we know about class data is the name of the class and the presence of a pointer to the data. However, now we have the opportunity not to crawl over memory in heap. A class in C ++ terminology still stores data in the form of its fields. In fact, the data will be placed in the m_buffer buffer, the memory for which is already allocated when creating the class. It remains only to explain the details of how to place the data in the buffer bytes.

Hosting new

As a rule, few remember about such a useful property of the operator new, as the ability to specify a ready-made memory area to accommodate the object being created. All we need is to write new (m_buffer) to create any type of object in the allocated buffer. It sounds simple, but you need to remember that we are paying a high price: beforehand specifying the maximum buffer size. Moreover, the buffer size falls into the header file and is clearly involved in the declaration of the API.

But we win in speed. If, by allocating data on the heap for each initialization, we risk lagging behind Java, then by placing all the data on the stack, we have the speed of pure C, the unattainable speed for almost any high-level language except C ++. At the same time, the level of abstraction is extremely high; we are building the API on ordinary C ++ objects, hiding implementation details. The only limitation is the size that we specify; we can no longer easily change in the implementation of the set of fields in the data class, always need to remember about the size. Moreover, we need to check the size of the data described in the implementation for compliance with the one specified in the header file. Just because the library build may differ from the version of the header files, for example, when it comes from various sources. Let's look at an example of how such a test should look like, as well as creating an object in prepared memory that places new.

 object::object() : m_data(new(m_buffer) object::data) { static_assert(sizeof(object::data) <= max_data_size, "..."); }

Here, static_assert is actually executed at the compilation stage, so the initialization of m_data will be performed only if object :: data has enough memory in the m_buffer buffer. Similarly, for a descendant class, such as flower, object class, the data should also not exceed the specified bar, since we store the data in the implementation of the base class.

 flower::flower(std::string const& name) : object(new(get_buffer()) flower::data(name)) { static_assert(sizeof(flower::data) < max_data_size, "..." ); }

Obviously, for this you need a protected-method get_buffer () to get the address m_buffer in the base class, as well as a protected constructor object from object :: data *. Just as in the previous release, we inherit the data of the heirs from the base class data, therefore flower :: data * is compatible with object :: data *. For security, it is necessary to add a check to the base constructor from object :: data * that the address of the previously allocated buffer is transmitted:

 object::object(object::data* data_ptr) { if (static_cast<void*>(data_ptr) != static_cast<void*>(m_buffer)) throw some_exception(...); m_data = data_ptr; }

As a result, as before, we have the ability to emulate dynamic typing, working with ordinary class objects. For example:

 object rose = flower("rose");

Objects with large data

It remains to find out what to do with objects whose data size goes beyond the indicated maximum. In fact, everything is pretty simple here. It suffices that the copy_on_write <data :: impl> size, which is essentially a superstructure over std :: shared_ptr <data :: impl>, fits into the limit, where impl is an implementation of a data class of arbitrary size. Since the size of std :: shared_ptr <data :: impl> does not depend on the size of the data :: impl class objects themselves, we get a universal way of storing data with a transition from storage by value to storage by reference.

 class huge { public: ... protected: class data; }; class huge::data { public: ... protected: class impl; private: copy_on_write<impl> m_impl; };

However, let's digress from solving the problem of a single API for objects with dynamic typing and consider another example of optimization through hosting new.

copy_on_write :: flashback

If someone missed the previous release, the copy_on_write class is a template class for storing data with copy optimization. Emulating a pointer, this class has a tricky operator-> overload for const and non-const cases. When copying objects, we refer to the same data without causing expensive copying. However, as soon as we call a non-constant data class method that potentially changes data, we unlink our copy of the data for the current object. Simplified implementation looks like this:
 template <typename impl_type> class copy_on_write { public: copy_on_write(impl_type* pimpl) : m_pimpl(pimpl) { } impl_type* operator -> () { if (!m_pimpl.unique()) m_pimpl.reset(new impl_type(*m_pimpl)); return m_pimpl.get(); } impl_type const* operator -> () const { return m_pimpl.get(); } private: std::shared_ptr<impl_type> m_pimpl; }; 
Thus, when choosing the maximum data size for the built-in buffer, it is worth considering the size of the class containing copy_on_write as a field.

Data Fields

The most powerful way to optimize via posting new is the fields of the selection records as a result of the SQL query. A sample requests a set of data of a wide variety of types, from integer and real to strings and arrays. Although the data itself is obtained dynamically and field types obtained from the database have to be initialized with dynamic typing emulation, but all the records contain the same set of field types by which you can determine the total data size for each record. This allows us to allocate memory for record fields only once, by calculating the size of the field types included in each sample record. You can also allocate memory once for all records as a single block, however, as a rule, after selecting records, various operations are performed on the records, including filtering and sorting them, so it makes sense to describe the records themselves as Copy-On-Write objects for the convenience of subsequent operations . Allocating memory for each field from a heap is unreasonably expensive.

This is how our write class will look like if we simplify the declaration and use copy_on_write directly from the data class:

 class record { public: record(std::vector<field::type> const& types); ... protected: class data; private: copy_on_write<data> m_data; }; class record::data { public: data(std::vector<field::type> const& types); ... private: std::vector<char> m_buffer; std::vector<field*> m_fields; };

Here, to simplify the explanation, the vector of field types std :: vector <field :: type> is entered, an array of enum-values. In fact, this array should be typed from the arguments through boost :: fusion or, using Boost.Preprocessor, you can type an array of generalized objects of type object from any type of arguments. We are now important is the mechanism of a single memory allocation from the heap for each record.

 record::data::data(std::vector<field::type> const& types) : m_buffer(field::calc_size(types)), m_fields(types.size()) { size_t offset = 0; std::transform(types.begin(), types.end(), m_fields.begin(), [&offset](field::type type, field*& field_ptr) { field_ptr = new(m_buffer + offset) field(type); offset += field::size(type); } ); }

where field :: size calculates the size of the data given by field :: type, and field :: calc_size calculates the total size already needed for the entire set of record types, passed as std :: vector <field :: type>.

The field is implemented similarly to the object type, which is essentially a dynamic content container. Most types: int64_t, bool, double are scalars and are stored by value. The std :: string type can also be stored by value, but it’s worth considering that the strings will almost certainly be stored on the heap and be dynamically allocated. If you want to support a certain varchar of a certain length, then here, most likely, you will need your copy_on_write type with an array of characters of a fixed length.

Different types of fields are similar to different types of objects inherited from the object class. You can even not use enum, but tied directly to types, but, as a rule, parsing the result of an SQL query entails deserializing a packet of bytes with data, where all field types are known in advance, so enum does not entail any limitations for convenience. Moreover, metaprogramming is not for the faint of heart, and we will not consider MPL and Boost.Fusion here.

It remains to touch on the last important aspect of using allocating new - a pool of similar objects in C ++.

Pool of similar objects

As before, we optimize dynamic memory allocation. What is a pool of objects? This is an array of presets allocated in advance by a large group to initialize a specific type. In a sense, the record above was a pool for field objects. Also, you probably met a pool of objects, if you worked with high-level languages (C #, Python, Java), because to select new objects, they use prepared memory segments in which they place objects, in essence, the object type. After one of the pool objects is no longer needed, in other words, it was no longer referenced, it either immediately deinitialized, or waits for its sad fate in the form of another bypass of the Garbage Collector - the garbage collector - a special mechanism for removing orphaned good. Generally speaking, deinitializing objects in a pool is its weak point. But we get a speedy selection of objects, as a rule, either already initialized or prepared for initialization. If we make, on the basis of our type object, a full-fledged pool of objects with deinitialization by the reference counter and with the Garbage Collector, we will get Java or Python. If you needed something like this, maybe you shouldn’t make a fuss and take a ready-made language with garbage collection? However, if in order to optimize objects of the same type, it was necessary to allocate in advance a large memory segment and the task really requires mass initialization of a large number of objects with a certain base class, then a pool of objects will avoid the mass of dynamic memory allocations.

To understand, we need a clear application explanation. How about the actual sample as a result of a SQL query with a pool for records? This will optimize the mass allocation of memory for constructing objects in the sample records.

 class selection { public: selection(std::vector<field::type> const& types, size_t row_count); ... protected: class data; private: copy_on_write<data> m_data; }; class selection::data { public: data(std::vector<field::type> const& types, size_t row_count); ... private: std::vector<field::type> m_types; std::vector<char> m_buffer; std::vector<record> m_rows; }; selection::data::data(std::vector<field::type> const& types, size_t row_count) : m_types(types) { if (!row_count) return; m_rows.reserve(row_count); size_t row_data_size = field::calc_size(types); m_buffer.resize(row_count * row_data_size); char* offset = m_buffer for (size_t i = 0; i < row_count; ++i) { m_rows.push_back(record::inplace(offset, types)); offset += row_data_size; } }

Where record :: inplace essentially creates these records not on the heap, but at a given address.

 record record::inplace(void* address, std::vector<field::type> const& types) { return record(new(address) record::data(types)); }

We need a record constructor with initialization and a special destructor, more on that later. This version of record initialization makes it impossible to use it in the previous version, that is, in the form of a class containing only the copy_on_write field. We will not be able, calmly hoping for a dynamic selection of data in the heap, to roll records as we want. On the other hand, we get a crazy performance boost with a large dataset. However, there is a dirty trick in placing new, which you should be aware of.

Explicit destructor call

WARNING

If someone has a habit of not reading it up to the end or reading diagonally, it is very vain. By skipping this important section, you can add memory to leak leakage - memory leaks, and in large quantities.

There is one more “but” when using the host new - you have to call the destructor yourself, manually, since delete will not do anything at all. Therefore, a class containing data that is allocated to a previously prepared memory must explicitly call the destructor of the class created in memory in the destructor. So, the object :: ~ object class destructor must explicitly call the object :: data :: ~ data destructor, and the record :: data :: ~ data destructor will have to call a whole number of field :: ~ field destructors - one for each field. In order to visually show how this should happen, I will write out the object class in more detail.

 class object { public: object(); virtual ~object(); ... protected: class data; char* get_buffer(); object(data* derived_data); static const size_t max_data_size = N; private: data* m_data; char m_buffer[max_data_size]; }; object::object() : m_data(new(m_buffer) data) { static_assert(sizeof(data) <= max_data_size, "..."); } object::~object() { m_data->~data(); }

Since the destructor for the data class must be described as virtual, the data will also be successfully deinitialized, whatever the heir object :: data is used for.

You also need to redefine the constructor and copy operator, as well as the displacements, because unlike the case with copy_on_write, where we were satisfied with the auto-generated constructor, here each object looks at its data area with a simple pointer. Therefore, correct the default behavior:

 object::object(object const& another) : m_buffer(max_data_size), m_data(another.clone_data_at(m_buffer)) { } object& object::operator = (object const& another) { destruct_data(); //     m_data = another.clone_data_at(m_buffer); return *this; } object::data* object::clone_data_at(void* address) { return m_data->clone_at(address); } //      //      object::data* object::data::clone_at(void* address) { return new(address) data(*this); } void object::destruct_data() { m_data->~data(); }

Here, our new desctuct_data () method is requested in the object :: ~ object destructor. Once asked, it means that there is a place for him. For the constructor and the move operator, the behavior is similar:

 object::object(object&& another) : m_data(another.move_data_to(m_buffer)) { } object& object::operator = (object const& another) { destruct_data(); //     m_data = another.move_data_to(m_buffer); return *this; } object::data* object::move_data_to(void* address) { return m_data->move_to(address); } //      //      object::data* object::data::move_to(void* address) { return new(address) data(std::move(*this)); } object::~object() { destruct_data(); }

So, the danger of memory leak'ov eliminated. Users of your API can develop easy.

Placing new versus new on the heap

As you have already noticed, classes that use host new are much more difficult to implement. Every aspect of using a class implemented on the technique of placing an object into a prepared memory should be thoroughly tested. The complexity of the usual new of any class, as a rule, is reduced to wrapping a smart pointer. What is the benefit then, even if the emulation of dynamic typing is complicated by explicitly specifying the maximum size of the data type?

Gain in speed. The power of C ++ compared to more convenient C #, Java and Python is in execution speed. Here we reach the highest speeds, because we don’t go to the heap for new objects. And do not slow down the application in the longer term, avoiding memory fragmentation. Fragmented memory like cheese: full of holes, and in the total the size of these holes allows you to push an orange there, but in fact the orange does not fit into any of the holes, each of them is too small. Similarly, std :: vector, like std :: string, which require a segment of continuous memory, can at one point get std :: bad_alloc when redistributing elements.

Placing new in the standard library

Remember, I promised to tell you about placing new in std :: vector at the beginning of the article? So, all the constructors of elements in std :: vector are called in prepared memory. And destructors are also actively called for elements. It doesn't matter for vectors from simple POD types like int or char, but if we want to select std :: vector, and custom has a non-trivial and heavy default constructor and a no less heavy copy constructor, then we get a lot of trouble if we don’t Know how std :: vector works with its data.

So, what happens when we ask a vector to resize? To begin with, the vector looks that it has not yet reserved the necessary number of bytes (the buffer the vector always allocates with a margin), after which it allocates a new buffer. All existing elements are transferred to the new buffer by the relocation constructor via placing new at the appropriate address. As a result, all the elements are in place. After that, the vector gets the required number of elements to the end of the array, creating each one with a new allocator and a default constructor. Similarly, in the opposite direction - reducing the number of elements will cause destructors "manually" when deleting elements.

Unlike std :: vector, the std :: string container does not do placement new simply because it always stores a char that does not need constructors or destructors. But a number of standard library containers: deque, list, map, and other class templates for storing arbitrary data — are actively using hosting new in their implementation.

No need to think about placing new as something akin to hak, it is a full-fledged language function that allows you to initialize an object with a designer for the specified memory. This operation is similar to the old trick of the C language, when the allocated block of bytes was declared by a pointer to a certain type (usually a structure) and then work with this block of memory was carried out through this type of API.

What is the result?

Of course, the ability to use placing new where it is needed, and only when it is really needed, effectively and justifiably, does not come immediately. Some of them fight back with the harm of preliminary optimization, others, on the contrary, only after reading the article will rush to embed new (m_buffer) where necessary and where not. Over time, both come to a golden middle.

The essence of the method is simple - if it is possible and necessary to place a class object in a memory prepared in advance, it is relatively simple to do this if you remember a couple of simple rules:

the memory must live all the time while the object lives in it, if the memory is rubbed, then the object will begin to refer to the broken memory segment;
the class destructor for the object allocated to host new must be called manually, which is sad, but delete does not do anything with the memory of the pointer.

Everything else is limited only by the accuracy and boundless imagination of the developer. That is you.

First published in Hacker Magazine # 190.
Author: Vladimir Qualab Kerimov, Lead C ++ Developer, Parallels

Subscribe to "Hacker"

Source: https://habr.com/ru/post/257893/

All Articles