Hello.
From time to time, like most programmers, I study some new structures and algorithms.
This time, I came across articles on cache oblivious algorithms. These are algorithms that are initially more optimized for working with the caching subsystem of modern processors.
')
One of the representatives of this group is the Unrolled linked list.
What is it?
To understand what it is and what it is eaten with, I suggest recalling the most popular implementations of lists: ArrayList (with some simplification, we will talk about arrays instead of the list itself) and LinkedList.
Consider their main advantages and disadvantages:
ArrayList :
Virtues- Compactness — an array is the most (at least one of) compact data structures for representing lists;
- The insertion speed at the end of an empty array is a very fast operation, fact !;
- The speed of taking an element by index is also very high, the number of cache miss can be estimated at N / B, where B is the size of the cache line, and N is the number of array elements, which can be considered a reference for this kind of structure.
disadvantages- Inserting in the middle is not a very convenient operation for an array, you have to shift the right-hand side, which, even with arraycopy, is not free;
- The length of the memory area - to store large arrays requires a large continuous chunk of memory. And while many garbage collectors effectively deal with memory fragmentation, the likelihood of OutOfMemmory especially for large lists is essential.
LinkedList :
Virtues- Relatively simpler and faster insertion into the middle;
- No problem when working with highly fragmented memory.
disadvantages- Significantly larger amount of necessary memory for storing data of the same size (up to 4 times);
- When iterating through the list, the number of cache miss in the worst case may be equal to the number of nodes in the list. Even in the best case, when all the nodes are in memory one after another, due to the larger size of the node, the amount of cache miss will be several times larger than that for ArrayList.
So, UnrolledLinkedList tries to combine the merits of the two lists without inheriting their disadvantages.
Consider the device of this list in more detail.
In simple terms, UnrolledLinkedList is a connected (doubly-connected) list of arrays of length n. That is, each node contains not one value, but an array of values. At the same time, the array is small enough for insertion and deletion operations to be very fast and memory fragmentation had less influence on the ability to create a node, but large enough to fit completely into the cache line to reduce the number of cache miss. Additionally, each node stores the number of elements added to it. Graphically, this list can be represented as follows:

Consider the operations for this list:
Insert in the node:In the case of free cells, a new element is simply added to the end of the array. If the array is already full, then we create a new node for the current one and transfer the right half of the array to it, thereby creating a place to insert new values. A new item in this case will be inserted at the end of the new node. The same can be applied to the insert in the middle of the array.
Graphically, this operation can be represented as follows:
Remove from node:The operation is quite simple, we simply remove the desired element from the array and make the shift of the remaining elements, if necessary. But there is one thing, but if the number of elements in the array decreases to n / 2, then n / 2 elements are transferred from the neighboring cells. If after this one of the cells is empty, it is deleted. This operation is needed to reduce memory consumption with the gradual clearing of large lists.
Search node by index:As in the coherent list, the search begins with a head or tail, from which it is closer, the difference is that there is no need to iterate over arrays - you just need to summarize the values of the fields containing the number of elements in the node. Thus, the search for the desired node is much faster than in LinkedList, although the asymptotic behavior is still O (N).
Initially it seemed to me that half of the empty cells in each node is too wasteful. However, as shown by small experiments, the average cell fill level is 70%, and even with a 50% fill, the memory consumption should be significantly less than that of the linked list. Moreover, the list allows you to customize yourself for various needs. I tried to play with the fill factor of the array, but did not extract significant benefits, so I left everything as it is.
Speaking about the number of cache miss, when iterating over such a list, it can be estimated as (N / n + 1) (n / B), where N is the list length, n is the length of the node array, B is the length of the cache line. This estimate is quite close to the estimate for ArrayList.
At the time of writing, I also had the idea that such a list could be a fairly effective implementation of a concurrent collection with the ability to block only one node. However, this is a topic for a new study and a new article.
Sources are posted on
github .
They are a rather crude code, so please do not kick your feet for quality. However, in any case, I would appreciate any criticism.
There are some attempts in the source code to measure and compare different lists, but I’m embarrassed to present them in the article, because I’m not so sure that all measurements were made correctly and I didn’t shoot myself in the leg. I would be very happy with the help of experts in the field of benchmarking for the final placement of points above the lists!
Thanks for attention.
Sources:
blogs.msdn.com/b/devdev/archive/2005/08/22/454887.aspxen.wikipedia.org/wiki/Unrolled_linked_listUPD: Still got to measure performance and size. For microbenchmarking, I used the Google caller caliper.
Here are the results:
Linear iteration
length benchmark us linear runtime
1000 UnrolledListIteration 6.34 =
1000 ArrayListIteration 5.56 =
1000 LinkedListIteration 5.93 =
100,000 UnrolledListIteration 827.68 ==
100,000 ArrayListIteration 637.05 =
100,000 LinkedListIteration 1124.38 ==
1,000,000 UnrolledListIteration 9056,77 =======================
1,000,000 ArrayListIteration 6731,73 =================
1,000,000 LinkedListIteration 11800,96 ==============================
Random Access:
length benchmark us linear runtime
1000 UnrolledListIteration 6.34 =
1000 ArrayListIteration 5.53 =
1000 LinkedListIteration 5.97 =
100,000 UnrolledListIteration 844.30 ==
100,000 ArrayListIteration 652.21 =
100,000 LinkedListIteration 1106.90 ===
1,000,000 UnrolledListIteration 9066,88 ==========================
1,000,000 ArrayListIteration 6723.21 ===================
1000000 LinkedListIteration 10339,15 =============================
Size of:
ArrayList is 45172432,000 bytes // This is not fair results, but the memory consumption is indicative of a gradual increase in volume.
ArrayList with lenght 1KK is 24000064,000 bytes
UnrolledList is 26250048,000 bytes
LinkedList is 56000048,000 bytes
These are the results that are consistent with the theory)
The repository is updated, everyone who wants to get acquainted with the updates or run their tests - you are welcome.