Memory capture / release speed in C #

I had such a task: you need to process the data file. The file is divided into sections with a length of about 1 MB, each of them contains approximately 100000 records in packed form. The number of records can vary from section to section and is written in the header of each of them. During processing the section is unpacked, and each record is converted into 20 integers. For processing, you need to store the current unpacked section and several previous ones (about 5-10, but maybe more - it is not known in advance how many). The question is how to allocate memory for unpacking sections.

The project, within which it is necessary to solve the problem, is written in C # under VS 2008 (the use of inserts from other languages is absolutely not welcome), the main system under which the finished program will work is Windows 7, 64 bit (at least for now). And, as usual, you need to process faster.
The first question that arises is whether it is necessary to organize a pool of arrays for unpacking, or it is possible to capture an array for each new section again. The second question is what should be the structure of this array, what is better - work with linear arrays 8 MB in length, or break an array into smaller pieces and organize, for example, an array of arrays. In the second case - what should be the length of these pieces.

I took a few objects:

Array int [] [] of size M * N
Array int [] of length N

Homemade list of N items in length:

class list{ public list next; public int val; }

List List <int> of N elements

The numbers M and N for the two-dimensional array were chosen so that M * N = 40000000 (which corresponds to the memory for 20 sections).
For each object, the average time for creation + filling + reading was measured (after which the object was forgotten), and for control - time for filling + reading (the object was created only once). Time was measured in nanoseconds per processed element of the object. The measurement went on two times: when working on one processor core and when operating in parallel on 4 cores (in the second case, the time spent by 4 did not multiply, ie, the result, as a rule, should be less than in the case of a single core).

The results look like this:

MxN	8000x5000	2000x20000	1000x40000	100x400000	10x4000000	1x40000000
int [] []	8.34 / 7.30	8.34 / 7.02	4.08 / 2.69	3.76 / 2.55	3.62 / 2.58	3.63 / 2.78
int [] [], R + W	2.57 / 1.60	2.64 / 1.60	2.22 / 1.04	2.20 / 1.00	2.18 / 1.00	2.09 / 1.03
int [], full	1.94 / 1.04	1.85 / 0.96	3.4 / 1.58	3.44 / 2.69	3.60 / 3.63	3.60 / 2.78
int [], R + W	1.58 / 0.46	1.56 / 0.47	1.56 / 0.47	1.57 / 0.63	1.83 / 0.93	2.00 / 1.05
list	16.30 / 9.14	19.16 / 19.00	21.69 / 35.17	53.8 / 85.65	145/130
list, read	2.32 / 0.60	2.29 / 0.61	2.31 / 1.12	6.4 / 2.58	7.2 / 3.67
List <int>	8.95 / 4.21	11.06 / 4.74	11.98 / 5.03	11.85 / 6.38	11.85 / 6.98	13.71 / 8.10
List <int>, read	2.95 / 0.88	2.96 / 0.92	2.96 / 0.92	2.96 / 0.92	3.13 / 1.05	4.13 / 1.65

Each cell has two times for one and four cores.
What can be learned from this plate? First, it turns out that the time it takes to capture memory linearly depends on the length of the array: one linear array of 160 MB is captured 100 times longer than an array of 1.6 MB. Secondly, if we want to capture one array for a short time, then short arrays have an advantage: their capture occupies 0.3ns / word, while the capture of long arrays is 1.8 ns / word (the difference between the 3rd and 4th lines). This confirms the frequently cited assertion that objects with a length of less than 88 KB are taken from a separate, faster pool. But if there are a lot of arrays, the picture becomes opposite: about 1.5 ns / word for long arrays, and 5.8 ns / word for short arrays - almost 4 times more! So if you need a multidimensional array for a short time, then you should not make it stepped with short internal arrays, it is better to look for another option. For example, capture a one-dimensional array and count indices.
')
In addition, it is clear that my implementation of the list didn’t like the system at all: when its length approached a million, the time to create one element increased by about 6 times compared to short lists.

The optimal for my task, apparently, would be the capture of long arrays (one per unpacked section) - if I want to capture arrays every time. For a file with a length of 1600 sections (this is a typical size), the time loss would be 1.5 * 2 * 1.6 = 5 seconds. True, now one of the processing options (without extra memory captures) takes only 11 seconds, but there is something to think about: other processing will be longer and more difficult. It is possible that it will be necessary to continue to reuse memory wherever possible, and not to abuse dynamic memory. But maybe not.

Source: https://habr.com/ru/post/137066/

All Articles

Memory capture / release speed in C #

More articles: