📜 ⬆️ ⬇️

C ++ library from Google with map and set containers on B-trees

One of the Google employees in 20% of free time developed and laid out under the free license the cpp-btree library (C ++ B-Tree), which contains containers working as map, set, multimap and multiset from the standard template library (STL).

The difference is that STL containers are implemented on red-black trees, and similar cpp-btree containers are implemented on B-trees. At the same time, in certain situations a significant gain is achieved in the use of memory (on small elements) and in performance (on large container sizes).

B-trees are known as a tool for working with disk storage: databases and file systems. But the same properties that give a win there, allow more efficient use of RAM. Each node of the red-black tree contains one element, requires three pointers plus a bit of information on the element for balance. For comparison, containers on B-trees store several items per node, therefore decreasing the overhead of pointers and saving a significant amount of memory.

According to tests, data structures on B-trees take up 50-80% less space compared to red-black trees. The table shows the average number of bytes per item in different types of set / map containers. For example, the standard container set <int32_t> creates an overhead of 16 bytes on each of the many small elements, while the alternative container btree_set <int32_t> creates an overhead of only 1 byte per element.
')
Type ofInsertB-Tree (32-bit)Red-Black (32-bit)B-Tree (64-bit)Red-Black (64-bit)
set <int32_t>sorted4.1920.004.4040.00
random4.9020.005.1540.00
set <int64_t>sorted8.3924.008.8040.00
random9.9624.0010.4740.00
set <string>sorted24.5740.0033.6064.00
random29.4940.0040.7464.00
map <int32_t, void *>sorted8.3924.008.8048.00
random9.9624.0010.4748.00
map <int64_t, void *>sorted12.6028.0013.2048.00
random15.1628.0015.9248.00
map <string, void *>sorted28.6744.0038.1672.00
random34.4944.0046.5372.00

On some large containers there is also a gain in access speed due to more efficient use of the cache. The graph shows the average execution time of the insert, depending on the size of the container.


Especially for the “Russian friends”, the author of the library explained that he did not sign the Y axis, because the specific time of the operation depends on the equipment used. In this case, for 10 million elements, the operation takes 1.6 microseconds on red-black trees, versus 400 nanoseconds on B-trees. Intel Xeon CPU E5-1650 @ 3.20GHz with L1 = 32K, L2 = 256K, L3 = 12M.

The developer warns about the difference between cpp-btree and standard containers: since any operations of inserting and deleting nodes into the B-tree will invalidate existing iterators, so if necessary, you need to use secure versions .

via Google Open Source Blog

Source: https://habr.com/ru/post/169199/


All Articles