📜 ⬆️ ⬇️

When "O" brings a big


The "O" is a great tool. It allows you to quickly select the appropriate data structure or algorithm. But sometimes a simple analysis of “O” big can fool us, if you don’t think carefully about the influence of constant factors. An example that is often encountered when programming on modern processors is associated with the choice of data structure: an array, a list, or a tree.


Memory, slow-slow memory


In the early 1980s, the time required to obtain data from RAM and the time required to perform calculations with this data were approximately the same. It was possible to use an algorithm that randomly moved through the dynamic memory, collecting and processing data. Since then, processors began to perform calculations many times faster, from 100 to 1000 times, than to receive data from RAM. This means that while the processor is waiting for data from memory, it is idle for hundreds of cycles, doing nothing. Of course, it would be completely stupid, so modern processors contain several levels of embedded cache. Each time you request a single piece of data from memory, additional adjacent pieces of memory will be written to the processor cache. As a result, with a sequential pass through memory, you can access it almost as quickly as the processor can process information, because chunks of memory will be constantly recorded in the L1 cache. If you move to random memory addresses, then often the cache will not work, and performance can suffer greatly. If you want to know more, Mike Acton’s CppCon report is a great starting point (and a great time).


As a result, arrays have become a better data structure when performance is important, even if, according to analysis “O”, a large array should work slower. Where you wanted to use a tree, a sorted binary search array might be better. Where you would like to use a queue, a growing array may come up better, and so on.


Linked list and dynamic array


When you understand the importance of accessing sequential memory, it will not be a surprise to you that if you need to quickly go through the collection, the array will be faster than a coherent list. Environments with smart memory management and garbage collectors will probably keep the linked list nodes more consistent, but they cannot guarantee this. Using a raw array usually requires more complex code, especially when you need to insert or add new elements, as you have to handle the growth of the array, the movement of elements, and so on. Most languages ​​have native libraries containing one or another implementation of dynamic arrays. In C ++, there is a vector , in C # there is a List (in F # it is used under the name ResizeArray) , and in Java there is an ArrayList . Usually these structures provide the same or similar interface as the linked list. In this article I will call such structures dynamic arrays (Array List), but keep in mind that the examples in C # use the List class, and not the older ArrayList.


What if you need a data structure in which you can quickly insert elements and quickly pass through them? Let's imagine, for this example, that we have such a case: we will insert at the beginning of the collection about 5 times more often than go through it. Let's also imagine that both the coherent list and the dynamic array in our environment have equally pleasant interfaces. It remains only to decide what will be a more effective solution. We can refer to the analysis of "O" big to optimize our valuable time. Let us refer to the useful hint about the “O” large , the corresponding difficulties for these data structures are:


PassInsert
Dynamic arrayO (n)O (n)
Linked listO (n)O (1)

The problem of dynamic arrays is to insert, at a minimum, each element must be copied and moved one point past the insertion point to make room for the new element. Hence, O (n). Sometimes you need to create a new array, larger in size so that there is a place to insert. In our case, insertion occurs 5 times more often than a pass, so it seems that the conclusion is obvious. As long as n is large enough, a coherent list should generally be more efficient.


Experiment


But to make sure, it is better to count. Let's experiment with C # using BenchMarkDotNet . In C #, there is a LinkedList collection, which is a classic linked list, and a List, which is a dynamic array. The interfaces of both are similar, and both can be easily applied in our case. Consider the worst case for a dynamic array — the insert always occurs at the beginning, so you have to copy the entire array with each insert. The configuration of the test environment is as follows:


Host Process Environment Information: BenchmarkDotNet.Core=v0.9.9.0 OS=Microsoft Windows NT 6.2.9200.0 Processor=Intel(R) Core(TM) i7-4712HQ CPU 2.30GHz, ProcessorCount=8 Frequency=2240910 ticks, Resolution=446.2473 ns, Timer=TSC CLR=MS.NET 4.0.30319.42000, Arch=64-bit RELEASE [RyuJIT] GC=Concurrent Workstation JitModules=clrjit-v4.6.1590.0 Type=Bench Mode=Throughput 

Test Cases:


  [Benchmark(Baseline=true)] public int ArrayTest() { //In C#, List<T> is an array backed list. List<int> local = arrayList; int localInserts = inserts; int sum = 0; for (int i = 0; i < localInserts; i++) { local.Insert(0, 1); //Insert the number 1 at the front } // For loops iterate over List<T> much faster than foreach for (int i = 0; i < local.Count; i++) { sum += local[i]; //do some work here so the JIT doesn't elide the loop entirely } return sum; } [Benchmark] public int ListTest() { LinkedList<int> local = linkedList; int localInserts = inserts; int sum = 0; for (int i = 0; i < localInserts; i++) { local.AddFirst(1); //Insert the number 1 at the front } // Again, iterating the fastest possible way over this collection var node = local.First; for (int i = 0; i < local.Count; i++) { sum += node.Value; node = node.Next; } return sum; } 

Results:


MethodThe sizeInsertsMedian
ArrayTest100five38.9983 us
Listtest100five51.7538 us

The dynamic array wins with a good margin. But this is a small list, “O” large describes the performance of only growing to large sizes n , so the test results should eventually turn upside down with increasing n . Let's try:


MethodThe sizeInsertsMedian
ArrayTest100five38.9983 us
Listtest100five51.7538 us
ArrayTest1000five42.1585 us
Listtest1000five49.5561 us
ArrayTest100,000five208.9662 us
Listtest100,000five312.2153 us
ArrayTest1,000,000five2,179.2469 us
Listtest1,000,000five4,913.3430 us
ArrayTest10,000,000five36,103.8456 us
Listtest10,000,000five49,395.0839 us


Unexpected results for many. No matter how large n , the dynamic array is still better. To make the performance worse, the ratio of inserts to aisles must change, not just the size of the collection. Note that this is not an error of the “O” analysis of a big one, it is just a human error - we incorrectly apply the method. If you go deep into mathematics, then “O” will show a lot that both data structures will grow at the same speed as long as the ratio of inserts to the passes does not change.


Where the tipping point lies depends on many factors, but a good approximate rule is suggested by Chandler Carruth from Google: a dynamic array will be more efficient than a coherent list until the inserts are an order of magnitude larger than the passes. In our case, the rule works well, because with a 10: 1 ratio, you can see a shift towards the linked list:


MethodThe sizeInsertsMedian
ArrayTest100,000ten328,147.7954 ns
Listtest100,000ten324,349.0560 ns

The devil is in the details


The dynamic array wins because the numbers that pass through are in memory sequentially. Each time a number is requested from memory, a whole set of numbers is added to the L1 cache, so the next 64 bytes of data are ready for processing. When working with a linked list, each node.Next call redirects a pointer to the next node, and there is no guarantee that this node will be in memory immediately after the previous one. Therefore, sometimes we will get past the cache. But we do not always have to work with types that store values ​​directly, especially in object-oriented languages, where the passage often takes place on reference types. In this case, despite the fact that in the dynamic array the pointers themselves are in memory sequentially, the objects to which they point are not. The situation is still better than with a coherent list, where you have to dereference the pointers for each item twice, but how does this affect the overall performance?


Performance deteriorates significantly, depending on the size of objects and the features of hardware and software. If in the example above, the numbers are replaced with small objects (12 bytes), then the “break point” drops to 4 inserts in one pass:


MethodThe sizeInsertsMedian
ArrayTestObject100,0000674.1864 us
ListTestObject100,00001,140.9044 us
ArrayTestObject100,0002959.0482 us
ListTestObject100,00021,121.5423 us
ArrayTestObject100,000four1,230.6550 us
ListTestObject100,000four1,142.6658 us

Managed C # code suffers greatly in this case, because going through a dynamic array creates unnecessary checks on the array boundaries. A vector in C ++ is likely to work more efficiently. If we are quite aggressive in solving this problem, then we can write a faster class for the dynamic array using the unsafe C # code to avoid checking the bounds of the array. The relative difference will also depend heavily on how the memory allocator and the garbage collector manage dynamic memory, how large objects are and other factors. Larger objects usually improve the performance of dynamic arrays in my environment. When it comes to whole applications, the relative performance of dynamic arrays can also improve with increased fragmentation of dynamic memory, but to be sure, testing is needed.


One more thing. If objects are quite small (from 16 to 32 bytes or less in different situations), then it is worth considering the option of storing them by value ( struct in .NET), and not in the object. This will not only greatly improve performance due to consistent memory allocation, but also theoretically reduce the additional workload due to garbage collection, depending on the usage scenario for this data:


MethodThe sizeInsertsMedian
ArrayTestObject100,000ten2,094.8273 us
ListTestObject100,000ten1,154.3014 us
ArrayTestStruct100,000ten792.0004 us
ListTestStruct100,000ten1,206.0713 us

Java can show better results here because it automatically makes smart changes to small objects, or you can simply use separate arrays of primitive types. And although this is very tedious to write, it can be faster than an array of structures. It all depends on the specifics of accessing the data in your code. Keep this in mind when performance is especially important.


Make sure the abstraction justifies itself.


Often people do not agree with such conclusions, and their arguments are the purity of the code, correctness and maintainability. Of course, each sphere has its own priorities, but I believe that when abstraction only slightly improves cleanliness when, and performance suffers greatly, you need to make it a rule to choose performance. If you do not regret the time to study the environment, then you can learn about cases where there is a faster and no less clear solution, as is often the case with dynamic arrays instead of lists.


Things to think about: here are seven different ways to find the sum of a list of numbers in C #, with execution time and memory usage. Everywhere overflow checking arithmetic is used, so that the comparison with Linq, where the Sum method uses just such arithmetic, is correct. Notice how much better the method is faster than others Notice how costly the most popular way. Note that the foreach abstraction works well with arrays, but not with dynamic arrays or linked lists. Whatever your language and environment, it is important to understand these details in order to make the right decisions by default.


MethodLengthMedianBytes Allocated / Op
LinkedListLinq100,000990.7718 us23,192.49
RawArrayLinq100,000643.8204 us11,856.39
LinkedListForEach100,000489.7294 us11,909.99
LinkedListFor100,000299.9746 us6,033.70
ArrayListForEach100,000270.3873 us6,035.88
ArrayListFor100,00097.0850 us1,574.32
RawArrayForEach100,00053.0535 us1,574.84
RawArrayFor100,00053.1745 us1,577.77

  [Benchmark(Baseline = true)] public int LinkedListLinq() { var local = linkedList; return local.Sum(); } [Benchmark] public int LinkedListForEach() { var local = linkedList; int sum = 0; checked { foreach (var node in local) { sum += node; } } return sum; } [Benchmark] public int LinkedListFor() { var local = linkedList; int sum = 0; var node = local.First; for (int i = 0; i < local.Count; i++) { checked { sum += node.Value; node = node.Next; } } return sum; } [Benchmark] public int ArrayListFor() { //In C#, List<T> is an array backed list List<int> local = arrayList; int sum = 0; for (int i = 0; i < local.Count; i++) { checked { sum += local[i]; } } return sum; } [Benchmark] public int ArrayListForEach() { //In C#, List<T> is an array backed list List<int> local = arrayList; int sum = 0; checked { foreach (var x in local) { sum += x; } } return sum; } [Benchmark] public int RawArrayLinq() { int[] local = rawArray; return local.Sum(); } [Benchmark] public int RawArrayForEach() { int[] local = rawArray; int sum = 0; checked { foreach (var x in local) { sum += x; } } return sum; } [Benchmark] public int RawArrayFor() { int[] local = rawArray; int sum = 0; for (int i = 0; i < local.Length; i++) { checked { sum += local[i]; } } return sum; } 

')

Source: https://habr.com/ru/post/308818/


All Articles