The STL library vector template wins in almost all parameters from a regular C ++ array. It allows you to add and remove items, frees the allocated memory during destruction, allows you to control the output beyond the array, etc. Nevertheless, he has one drawback - for his work requires additional memory, small, but in some cases substantial. Below is the implementation of the container, which allows to slightly reduce memory costs and improve performance.
First you need to figure out what this memory is used for and how much is required. In addition to the pointer to the beginning of the array, the template stores a pointer to the end of the filled array, a pointer to the end of the selected array, and a memory allocator. For illustration, here is a piece of code:
//
template < class TEntry, class TAllocator > class TArrayBase
{
//
TEntry* GetStart() const { return m_Start; }
// ( , )
TEntry* GetFinish() const { return m_Finish; }
// ( , )
TEntry* GetEndAllocated() const { return m_Storage._M_data; }
//
TEntry* m_Start;
TEntry* m_Finish;
// ,
//
_STLP_alloc_proxy<TEntry*, TEntry, allocator_type> m_Storage;
};
For a 32-bit x86 code, where the pointer requires 4 bytes, such an object will take 12 bytes. If a memory allocator without data fields and virtual functions is used (the standard allocator is just that), then it will not take a place in a successful implementation, at least in the implementation of STLPort this is so. In the library for VS2005, the implementation is less successful and the allocator will add another 4 bytes. Thus, at least 8 extra bytes are required, compared to a pointer to a regular array.
It would seem that these are 8 bytes, a trifle, lost next to the size of memory that will be needed to store the elements of the array itself. However, it often happens that a significant number of arrays will be empty. For example, there is a game model, it may have light sources, sound sources, damage, and other attributes. They need to be stored somewhere, and the arrays are quite suitable for this, but most models have only geometry, and these arrays will remain empty.
')
What is bad these extra bytes? Memory is cheap nowadays, and a dozen or two extra bytes per model will result in a total of several hundred kilobytes, which is not such a big loss against the available gigabytes. The fact is that the extra data in the object also reduces performance. The model, instead of fitting in, say, two cache lines, begins to occupy three, which increases the time it takes to sample its data during processing. Therefore, there is a need to somehow get rid of unnecessary data, while maintaining the functionality of the vector.
The first solution was to store the auto_ptr <vector> pointer. The pointer occupies the same 4 bytes, a null pointer means that the array is empty, non-empty refers to the vector object in the heap. But this solution has several disadvantages:
- for a non-empty array, additional memory is required per pointer (4 bytes) and per block on the heap (the heap implementation used adds 4 service bytes per block);
- additional indirection appears, as the vector lies in a separate heap block, which is also bad for the cache;
- when using, you need to constantly check the pointer, which leads to errors.
The last flaw could be easily eliminated by making a proxy container storing a pointer and implementing a check inside its methods.
A better solution is to split the data into two parts. In the object, leave only one pointer to the beginning of the array, and store the remaining two pointers and the memory allocator in the same memory block as the array elements, as a prefix immediately before the first element. In this case, for a non-empty memory array, you need exactly the same amount as for a regular vector. For an empty array, it suffices to store one pointer to a static empty array. This is better than a null pointer, since it avoids unnecessary checks in many methods.
In the following section of the code, the same functionality is implemented as in the first, but with the storage of part of the data in the prefix:
// ( )
template < class TAllocator >
class TTinyArrayStorage
{
//
char *Finish;
//
_STLP_alloc_proxy< char *, char , allocator_type> Storage;
//
static TTinyArrayStorage<TAllocator> EmptyStorage;
};
//
template < class TEntry, class TAllocator >
class TTinyArrayBase
{
//
TEntry* GetStart() const { return m_Data; }
// ( , )
TEntry* GetFinish() const { return (TEntry*)GetStorage()->Finish; }
// ( , )
TEntry* GetEndAllocated() const
{ return (TEntry*)GetStorage()->Storage._M_data; }
//
TTinyArrayStorage<TAllocator> *GetStorage() const
{ return (TTinyArrayStorage<TAllocator>*)((( char *)m_Data) -
sizeof (TTinyArrayStorage<TAllocator>)); }
//
TEntry *m_Data;
};
Additional indirection remains, although not in all methods. To take an element by index is enough a pointer to the beginning of the array, but to add or remove an element, you need information from the prefix block. To pass through the array elements, you also need a pointer from the prefix block, but in this case you will still need to select data from the first array element, which will often be in the same cache line as the prefix block. If we talk about the size of the generated code, it almost does not increase. For example, consider the usual loop through all the elements of an array:
for (vector<TFoo>::iterator i = in_data.begin(), e = in_data.end(); i != e; ++i)
i->Process();
The VS2005 compiler creates the following code for the standard vector:
00401340 8B 44 24 04 mov eax,dword ptr [esp+4] 00401344 56 push esi 00401345 8B 30 mov esi,dword ptr [eax] 00401347 57 push edi 00401348 8B 78 04 mov edi,dword ptr [eax+4] 0040134B 3B F7 cmp esi,edi 0040134D 74 0F je Test+1Eh (40135Eh) 0040134F 90 nop 00401350 8B CE mov ecx,esi 00401352 E8 D9 FF FF FF call TFoo::Process (401330h) 00401357 83 C6 04 add esi,4 0040135A 3B F7 cmp esi,edi 0040135C 75 F2 jne Test+10h (401350h) 0040135E 5F pop edi 0040135F 5E pop esi 00401360 C3 ret
And for the “lightweight” vector:
00401370 8B 44 24 04 mov eax,dword ptr [esp+4] 00401374 56 push esi 00401375 8B 30 mov esi,dword ptr [eax] 00401377 57 push edi 00401378 8B 7E F8 mov edi,dword ptr [esi-8] 0040137B 3B F7 cmp esi,edi 0040137D 74 0F je Test+1Eh (40138Eh) 0040137F 90 nop 00401380 8B CE mov ecx,esi 00401382 E8 A9 FF FF FF call TFoo::Process (401330h) 00401387 83 C6 04 add esi,4 0040138A 3B F7 cmp esi,edi 0040138C 75 F2 jne Test+10h (401380h) 0040138E 5F pop edi 0040138F 5E pop esi 00401390 C3 ret
As you can see, the code was created almost the same, except for the fifth line, in the first case the pointer to the end of the array is taken relative to the pointer to the vector object ([eax + 4]), and in the second - relative to the pointer to the first element of the array ([esi-8] ).
It should be noted some difficulties with memory allocators that arise with this implementation. If the distributor is nonstandard, and, for example, a multiple of a power of two that is equalized to the border, it will align the beginning of the prefix block, and the elements will be shifted by the size of this block. In such cases, you will have to either make a special allocator, taking into account the size of the prefix block, or introduce such a possibility into the array template itself.
As a conclusion, I will cite figures from a real project in which such a container is actively used. With a total number of vector objects of about 270 thousand, about 170 thousand remain empty, which means saving more than one megabyte of memory. Compared to the code using the standard vector container, the amount of code increased insignificantly - by 48 kb, with a total of about 10 MB. Performance increased slightly, by about 1%.