I want to share the experience recently gained in C # on loading and processing large amounts of data in memory. All of the following applies to Visual Studio 2008 and .Net Framework 3.5.1, in case of any differences in other versions of the language or libraries.
So, we have the following tasks:
1. Place in memory up to 100 million records consisting of a string, 16 characters long (unique key) and two integer values, 4 bytes each;
2. Quickly find and edit a record by key.
')
Problem solving â„–1. Location in memory.
Let's start with the placement. The total size of useful information in the record is 24 bytes. The entire data set should take about 2.4 GB (we will not deal with accurate recalculations along the 1024 border). Here we are in for the first trouble: on x32, the Windows OS process can allocate only 2 GB. In practice, in the .Net process, this value is even smaller, I managed to allocate somewhere around 1.8 GB. I assume that the remaining space is occupied by the libraries that are used in the project. So We can solve the problem completely only on the x64 platform. In the example, we will still consider x32 for comparison.
So, without further ado, we take the pen (mouse and keyboard) and begin to sculpt what immediately comes to mind. Make a representative of the record in memory:
public class Entry { public string Key; public int IntValue1; public int IntValue2; }
The int alias is cross-platform and always corresponds to Int32, so it will be the same for x32 and x64.
We make a simple array and load 10 million records into it:
public Entry[] Entries = new Entry[10000000]; private void btnCreate_Click(object sender, EventArgs e) { int i; for (i = 0; i < Entries.Length; i++) { Entries[i] = new Entry() {Key = i.ToString("0000000000000000")}; } GC.Collect(); var msg = string.Format("Created {0}, Memory {1}", i.ToString("0,0"), GC.GetTotalMemory(true).ToString("0,0")); lbLog.Items.Add(msg); }
And we see that we occupied about 760 MB (x32) and 1040 MB (x64), i.e. at 76/104 B per record, of which only 24 are useful. It turns out that 100 million will occupy 7.6 / 10.4 GB!
What took 52/80 B? There are several points: the infrastructure of .Net objects and the alignment of memory locations. Let's start with the first. A good article can be read
here . It turns out that on the class infrastructure we lose 8/16 B and another 4/8 B on the link (our array of class instances is actually an array of pointers to instances). Check by replacing class with struct:
public struct Entry ...
Measurement showed costs 64/80 B to write. Everything fits together - we saved 12/24 B. The structure is ValueType, so it is logical to assume that in the array we now have not links, but the values ​​themselves. So it is, so we actually saved 4/8 B on the link. The infrastructure of the structure itself does not take anything. Before we go further and start optimizing Key, on which we, apparently, are losing a lot, let's stop at one important point. Having transferred the record to the structure, we dug a hole for ourselves - if we place the data in one array of structures, then we will need one very large block of memory. Even if there is enough free memory, such a solid block may not appear due to fragmentation. In practice, I began to face problems allocating more than 500 MB. We will solve this question in the 2nd problem. Additionally, with each transfer of the record, we will now lose on cloning the structure, and when organizing the link, we will lose on boxing / unboxing. But nowhere to go, you need to save space.
Continue with the Key field. The String type is a class, so we immediately lose 12/24 B on the record. In fact, the line is very heavy, for example, you can read
here . Practically, having made an array of strings of length 16, we are convinced of the occupation by the field Key 56/72 B on the record. Total:
For class: 8 (two int) + 56/72 (Key) +12/24 (class infrastructure + link) = 76/104 B.
For the structure: 8 (two int) + 56/72 (Key) = 64/80 B.
Let's try to translate the Key field to char []. We get the benefit of 8 B on both platforms. The array is also a class, so we did not save on the infrastructure, but only on the internal organization of the string itself, given that inside it also stores the char array. Each character is Unicode and occupies 2 bytes. Plus, alignment in memory can play a role. It's time to talk about it.
I think many people know that it is advantageous to align data in memory along a border defined by a platform, for example, along an Int32 border. Details about this can be found
here . In our case, we did not lose anything at the location of the Entry class / structure, since all fields fall exactly on the border of 4 bytes (remember that Key is a link). You can experiment by changing the settings of the StructLayout attribute and using the Marshal.SizeOf (new Entry ()) metering. But if you make one field of type byte instead of int, it turns out that we only occupy the same amount of memory per array as with int, even if you set the property Pack = 1 for StructLayout. That is, the structure itself will be packed at the 1 byte boundary, but the array elements will be located at the 4 byte boundary. I’ll say right away that I didn’t solve the problem of array packing, since the final solution of the problem did not require it. Therefore, this topic, as well as the topic of lines / characters, is not fully disclosed here. I also want to add that the CharSet = CharSet.Ansi value of the StructLayout attribute did not affect the amount of memory occupied by either the string or the char [].
So, at the moment we saved 20/32 B on the record. Because the string in this problem was not meant to be unicode, it was decided to use byte [] instead of char []. We save another 16 B on both platforms. One record now occupies 40/56 B. Isfat - 16/32 B per organization of the array. It turns out that .Net has the ability to organize a static fixed buffer using an insecure mode:
private unsafe fixed byte _key[16];
This is where we finally achieve 100% rationality - our recording takes exactly 24 B in memory on both platforms. At x32 we will be able to download about 75 million (1.8 GB), on x64 all 100 million (2.4 GB). The maximum amount on x64 is essentially limited by the amount of available memory. You can use this happiness only in unsafe and fixed blocks. There is a suspicion that here we will lose in performance, but as far as I don’t know, I didn’t make measurements, because The final version fully satisfied the needs.
We smoke and move on to solving the second problem.
Problem solving â„–2. Fast access.
For quick access to 100 million records, we need something like a hash table. Standard Hashtable is not suitable, because uses Object, which means that each record-structure will be put in a special class (boxing) and we will again start losing on the class infrastructure. We need Generic. There is such a - Dictionary <TKey, TValue>. After an unsuccessful attempt to use and disassemble, we find the following (those who are not familiar with the principles of the work of hash tables should be read
here ):
- an excess of 8 B per occurrence:
[StructLayout(LayoutKind.Sequential)] private struct Entry { public int hashCode; public int next; public TKey key; public TValue value; }
Losing hashcode and organizing a linked list (using the chaining method). Here, by the way, we need two structures - one for the key, the other for the values.
- one segment for data:
private Entry<TKey, TValue>[] entries;
There is an acute problem of allocating one large block of memory.
- the hash segment is greater than or equal to the capacity (many questions arise):
private void Initialize(int capacity) { … num = HashHelpers.GetPrime(capacity); this.buckets = new int[num]; …
- when filling, resizing occurs with a minimum of two-fold reserve, fitting to this size of the hash segment and correspondingly rehashing:
private unsafe void Resize() { … num = HashHelpers.GetPrime(this.count * 2); numArray = new int[num]; … entryArray = new Entry<TKey, TValue>[num]; … this.buckets = numArray; this.entries = entryArray; }
I think you should not describe in detail what this all leads to. Think about at least the logic of double growth on millions of volumes.
Having discovered such inadequate behavior, I no longer began to delve into my native collections for something more rational, but wrote my own dictionary. In fact, everything is quite trivial, especially since I do not need 90% of the functionality that the standard implements. I will not give the code, I will only indicate the key points:
- do not separate key and value, all in one Entry structure.
- in the structure, Entry has overlapped (and how is it possible to overlap something in the structure ?!) with the GetHashCode () method. Used the modified FNV method
from here (fig.4)
- in the Entry structure, overlapped the Equals (object obj) method. Comparison is only a key.
- the size of the hash segment made custom in the designer. Changing its size and rehashing is not provided.
- the index in the hash segment is calculated as the remainder of the hashcode divided by the size of the hash segment.
- each cell of the hash segment is an ordinary List, which is filled with records with the same hashcode. This is where the problem of one large block of memory is solved.
Calculate how much memory we spend on the organization:
- hash segment = array of links to lists (size of hash segment * 4/8 B);
- list instances = class infrastructure + list fields (hash segment size * (8/16 +28/44)).
Total, for the organization of a segment of 1 million, we will spend about 40/68 MB.
Everything would be great and it's time to drink champagne. But it was not there. It turns out that List:
- as the standard dictionary grows with doubling:
private void EnsureCapacity(int min) { … num = (((int) this._items.Length) == null) ? 4 : (((int) this._items.Length) * 2); …
- the function of freeing the excess memory does not release 0.1 volume:
public void TrimExcess() { int num; num = (int) (((double) ((int) this._items.Length)) * 0.9); if (this._size >= num) { goto Label_002A; } this.Capacity = this._size; Label_002A: return; }
- well, in the appendage it is redundant (for our purposes) in the fields and in what is a class:
public class List... { private const int _defaultCapacity = 4; private static T[] _emptyArray; private T[] _items; private int _size; [NonSerialized] private object _syncRoot; private int _version; ...
I had to write my list, with custom linear growth, complete memory release and saving 8 B in the _syncRoot and _version fields. You can also translate it into a structure, saving 12 B.
Everything, the problem is solved - we drink champagne!
And some practice
PC Core 2 Quad Q9400 @ 2.66, RAM 4 Gb, Windows Server 2008 R2 Standard x64
Hesh segment 1 million, list growth 25.
Filling 100 million. 5 min. 50 sec. Memory 2.46 GB, peak 3.36 GB.
Statistics: filling 100%, minimum chain 56, maximum 152, standard deviation 10.
Comparison of 100 million in memory with external 100 million 4 min 10 sec.
Hesh segment 2 million, growth lists 20.
Filling 100 million. 4 min. 43 sec. Memory 2.53 GB, peak 3.57 GB.
Statistics: filling 100%, minimum chain 20, maximum 89, standard deviation 7.
Comparison of 100 million in memory with external 100 million 2 min 56 sec.
A small remark: .Net does not immediately free up memory, even if we freed a large number of objects. This may not happen after garbage collection. The memory remains reserved for future use and will be released if there is an acute shortage of memory in the system (perhaps even by some criteria).
I hope this experience will be useful to someone else when solving optimization problems.