iter_content
method has iter_content
used for a long time, and if it slowed down significantly in the common user mode, such information would not pass by us. import requests https = requests.get("https://az792536.vo.msecnd.net/vms/VMBuild_20161102/VirtualBox/MSEdge/MSEdge.Win10_preview.VirtualBox.zip", stream=True) for content in https.iter_content(100 * 2 ** 20): # 100MB pass
At 10 MB, there is no increase in the load on the processor and no effect on throughput. At 1 GB, the processor is loaded at 100%, as at 100 MB, but the bandwidth falls below 100 KB / s, in contrast to 1 MB / s at 100 MB.
File "/home/user/.local/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1299, in recv buf = _ffi.new("char[]", bufsiz)
FFI.new
is to return zero memory. This meant a linear increase in redundancy, depending on the size of the allocated memory: larger volumes had to be reset longer. Consequently, bad behavior is associated with the release of large volumes. We took advantage of CFFI's ability to disable resetting these buffers, and the problem went away 1 . So she solved, right?malloc
function from the standard C library (you can read the documentation about it for your OS by typing man 3 malloc
in the manual). This function takes one argument — the number of bytes of memory to allocate. The standard C library allocates memory using one of several different techniques, but somehow it returns a pointer to a section of memory that is at least as large as the amount you requested.malloc
returns uninitialized memory . That is, the standard C library allocates a certain amount and immediately transfers it to your program, without changing the data that is already there . That is, when using malloc
your program will return a buffer to which it has already written data. This is a common cause of bugs in non-memory-safe (memory-unsafe) languages, for example C. In general, reading from uninitialized memory is very risky.malloc
has a friend documented on the same manual page: calloc
. Its main difference is that it takes two arguments — a counter and a size. Using malloc
, you ask for the standard C library: "Please allocate at least n
bytes to me." And when you call calloc
you ask for it: “Please allocate enough memory for n
objects of size m
bytes.” Obviously, the primary idea of calling calloc
was to safely allocate heaps for arrays of objects 2 .calloc
has a side effect associated with its original purpose for placing arrays in memory. He is very modestly mentioned in the manual.The allocated memory is filled with zero bytes.
calloc
purpose. For example, if you place an array of values ​​in memory, it will often be very useful for it to initially have a default state. In some modern memory-safe languages, this has already become the standard behavior when creating arrays and structures. Say, when you initialize a structure in Go, then all its members are defaulted to their so-called "zero" values, equivalent to "those values ​​that would be if everything were reset to zero." This can be considered a promise that all Go structures are located in memory using calloc
3 .malloc
returns uninitialized memory, and calloc
returns initialized memory. And if so, and even in the light of the aforementioned strict promises, the operating system can optimize the allocated memory. Indeed, many modern operating systems do this.calloc
is to write something like: void *calloc(size_t count, size_t size) { assert(!multiplication_would_overflow(count, size)); size_t allocation_size = count * size; void *allocation = malloc(allocation_size); memset(allocation, 0, allocation_size); return allocation; }
memset
(usually, specialized processor vector instructions are used that allow a single instruction to reset a large number of bytes at once). However, the cost of this procedure varies linearly.malloc
costs are scanty.malloc(1024 * 1024 * 1024)
to allocate 1 GB of memory, it will happen almost instantly, because in fact the memory is not allocated to the process. But programs can instantly “allocate” for themselves many gigabytes, although in reality this would not have happened very quickly.calloc
. The OS can display a completely new page on the so-called “zero page”: this is a read-only memory page, and only zeros are read from it. Initially, this mapping is copy-on-write: when your process tries to write data to this new memory — the kernel intervenes, copies all zeros to a new page, and then allows you to write.calloc
can do the same thing as malloc
when allocating large volumes, requesting new virtual memory pages. This will be free until the memory is used. This optimization means that the cost of calloc(1024 * 1024 * 1024, 1)
will be equal to the call to malloc
for the same amount of memory, despite the fact that calloc
also promises to fill the memory with zeros. Clever!calloc
, then why was the memory reset?calloc
was not always used. But I suspected that in this case I could reproduce the deceleration directly using calloc
, so I threw the program again: #include <stdlib.h> #define ALLOCATION_SIZE (100 * 1024 * 1024) int main (int argc, char *argv[]) { for (int i = 0; i < 10000; i++) { void *temp = calloc(ALLOCATION_SIZE, 1); free(temp); } return 0; }
calloc
ten thousand times. Then exits. Next - two options 5 :calloc
can use the above virtual memory trick. In this case, the program should work quickly: the allocated memory is not actually used, is not paginated, and the pages do not become “dirty” (dirty). The OS is lying to us about the selection, and we do not catch her hand, so everything works fine.calloc
can draw malloc
and manually reset memory using memset
. This should be done very, very slowly: in total, we need to reset a terabyte of memory (ten thousand cycles of 100 MB each), which is very difficult.ALLOCATION_SIZE
(for example, 1000 * 1024 * 1024
), then on MacOS this program will work almost instantly! What the hell?sample
utility (see man 1 sample
), which can tell a lot about the process being executed, registering its state. For our code, sample
gives the following: Sampling process 57844 for 10 seconds with 1 millisecond of run time between samples Sampling completed, processing symbols... Sample analysis of process 57844 written to file /tmp/a.out_2016-12-05_153352_8Lp9.sample.txt Analysis of sampling a.out (pid 57844) every 1 millisecond Process: a.out [57844] Path: /Users/cory/tmp/a.out Load Address: 0x10a279000 Identifier: a.out Version: 0 Code Type: X86-64 Parent Process: zsh [1021] Date/Time: 2016-12-05 15:33:52.123 +0000 Launch Time: 2016-12-05 15:33:42.352 +0000 OS Version: Mac OS X 10.12.2 (16C53a) Report Version: 7 Analysis Tool: /usr/bin/sample ---- Call graph: 3668 Thread_7796221 DispatchQueue_1: com.apple.main-thread (serial) 3668 start (in libdyld.dylib) + 1 [0x7fffca829255] 3444 main (in a.out) + 61 [0x10a279f5d] + 3444 calloc (in libsystem_malloc.dylib) + 30 [0x7fffca9addd7] + 3444 malloc_zone_calloc (in libsystem_malloc.dylib) + 87 [0x7fffca9ad496] + 3444 szone_malloc_should_clear (in libsystem_malloc.dylib) + 365 [0x7fffca9ab4a7] + 3227 large_malloc (in libsystem_malloc.dylib) + 989 [0x7fffca9afe47] + ! 3227 _platform_bzero$VARIANT$Haswel (in libsystem_platform.dylib) + 41 [0x7fffcaa3abc9] + 217 large_malloc (in libsystem_malloc.dylib) + 961 [0x7fffca9afe2b] + 217 madvise (in libsystem_kernel.dylib) + 10 [0x7fffca958f32] 221 main (in a.out) + 74 [0x10a279f6a] + 217 free_large (in libsystem_malloc.dylib) + 538 [0x7fffca9b0481] + ! 217 madvise (in libsystem_kernel.dylib) + 10 [0x7fffca958f32] + 4 free_large (in libsystem_malloc.dylib) + 119 [0x7fffca9b02de] + 4 madvise (in libsystem_kernel.dylib) + 10 [0x7fffca958f32] 3 main (in a.out) + 61 [0x10a279f5d] Total number in stack (recursive counted multiple, when >=5): Sort by top of stack, same collapsed (when >= 5): _platform_bzero$VARIANT$Haswell (in libsystem_platform.dylib) 3227 madvise (in libsystem_kernel.dylib) 438
_platform_bzero$VARIANT$Haswell
method. It is used to clear buffers. That is, MacOS resets them. Why?libsystem_malloc
. I went to opensource.apple.com , downloaded the libmalloc-116 archive with the source code I needed and began to investigate.large_malloc
behind the constant #define
a bunch of code is hidden, CONFIG_LARGE_CACHE
. Basically, all this code comes down to the “free-list” pages of large amounts of memory allocated for the program. If MacOS allocates a contiguous buffer of 127 KB to LARGE_CACHE_SIZE_ENTRY_LIMIT
(approximately 125 MB), then libsystem_malloc
will try to libsystem_malloc
these pages if they can be used by another memory allocation process. Because of this, he does not have to ask the Darwin kernel page, which saves context switching and a system call: in principle, non-trivial savings.calloc
when you need to reset the bytes. And if MacOS finds a page that can be reused and that was called from calloc
, then the memory will be reset . All. And so every time.calloc
to provide zero memory pages. It would not be so bad if it were done only for dirty pages. If the application writes to a nullable page, then it will probably not be reset. But MacOS does this unconditionally . This means that even if you call alloc
, free
and calloc
without touching the memory at all, then the second call to calloc
will take the pages allocated during the first call and never supported by physical memory. Therefore, the OS has to load (page-in) all this memory in order to reset it, although it has already been reset . This is what we want to avoid using a virtual memory-based distribution engine when it comes to allocating large amounts: never used memory becomes used by the “list of free” pages.calloc
linearly increases depending on the size of the allocated memory up to 125 MB, despite the fact that other operating systems demonstrate the behavior of O (1) starting from 127 KB. After 125 MB, MacOS stops caching pages, so the speed magically takes off.char
OpenSSL, , OpenSSL . , OpenSSL . , OpenSSL , . . ) , OpenSSL, , ) . ( ) : OpenSSL , , . .type *array = malloc(number_of_elements * size_of_element)
. , : number_of_elements size_of_element
, . calloc
, . , .Source: https://habr.com/ru/post/317476/
All Articles