📜 ⬆️ ⬇️

Page-cache, or how RAM and files are interconnected



Earlier we learned how the kernel manages the virtual memory of the process, but we dropped the work with files and I / O. In this article, we will look at the important and often misleading question of what is the relationship between RAM and file operations, and how it affects system performance.

As for working with files, the operating system should solve two important problems. The first problem is the surprisingly low speed of hard drives ( especially search operations ) compared to the speed of RAM. The second problem is the possibility of sharing a file once loaded into RAM by different programs. Looking at the processes using Process Explorer , we will see that about 15 MB of RAM in each process is spent on common DLL-libraries. My computer is currently running 100 processes, and if it were not possible to share files in memory, then about 1.5 GB of memory would be spent only on shared DLLs . This, of course, is unacceptable. In Linux, programs also use shared libraries like ld.so, libc, and others.

Fortunately, both problems can be solved, as they say, in one fell swoop - with the help of a page cache . The page cache is used by the kernel to store fragments of files, each fragment having a size of one page. In order to better illustrate the idea of ​​a page cache, I came up with a program called render , which opens the scene.dat file, reads it in 512 byte portions and copies them to the allocated space on the heap. The first read operation will be performed as shown in the figure above.
')
After 12 KB has been read, a bunch of render processes and related physical pages will look like this:


It seems that everything is simple, but in reality everything happens a lot. First, even despite the fact that our program uses the usual read () calls, as a result of their execution in the page cache there will be three 4-kilobyte pages with the contents of the scene.dat file. Many people are surprised, but all standard file I / O operations work through a page cache . On Linux on the x86 platform, the kernel represents the file as a sequence of 4-kilobyte fragments. If you request the reading of just one byte from a file, this will result in the 4-kilobyte fragment containing the given byte being completely read from the disk and placed in the page cache. Generally speaking, this makes sense, because, firstly, the performance with continuous reading from the disk (sustained disk throughput) is quite high, and, secondly, programs usually read more than one byte from a certain area of ​​the file. The page cache knows what place in the file each cached fragment has; This is pictured as # 0, # 1, etc. Windows use 256-kilobyte fragments (called “ view ”), which are similar in their purpose to pages in the Linux page cache.

When using normal read operations, the data first gets into the page cache. The programmer has access to the data in portions, through a buffer, and from it he copies them (in our example) to the area on the heap. This approach is extremely inefficient - not only is the computational resources of the processor spent and it has a negative effect on the processor caches , but there is also a waste of RAM to store copies of the same data . If you look at the previous picture, you will see that the contents of the scene.dat file are stored in duplicate at once; any new process working with this file will copy this data again. Thus, this is what we have achieved - we somewhat reduced the problem of latency when reading from a disk, but otherwise failed completely. However, a solution to the problem exists - this is “memory-mapped files” :


When a programmer uses the mapping of files to memory, the kernel maps virtual pages directly to physical pages in the page cache. This allows you to achieve significant performance gains - in Windows System Programming, they write about accelerating program execution time by 30% or more compared to standard file I / O operations. Similar figures, only now for Linux and Solaris, are given in the book Advanced Programming in the Unix Environment . Using this mechanism, you can write programs that will use much less RAM (although much here also depends on the features of the program itself).

As always, the main thing in terms of performance is measurement and visual results . But even without it, the display of files in memory quite pays for itself. The programming interface is quite pleasant and allows you to read files as normal bytes in memory. For the sake of all the advantages of this mechanism, you will not have to sacrifice anything special, for example, the readability of the code will not suffer in any way. As the saying goes, the flag is in your hands - do not be afraid to experiment with your address space and calling mmap in Unix-like systems, calling CreateFileMapping in Windows, as well as various wrapping functions available in high-level programming languages.

When a file is mapped into memory, its contents get there not immediately, but gradually - as the processor catches page faults caused by accessing unloaded sections of the file. The handler for such a page fault will find the necessary page frame in the page cache and perform the mapping of the virtual page into the given page frame. If the required data has not been previously cached in the page cache, a disk read operation will be initiated.

And now, the question. Imagine that the render program has completed its execution, and there are no any child processes left either. Will the pages in the page cache that store fragments of the scene.dat file be immediately released ? Many people think that yes - but that would be ineffective. In general, if you try to analyze the situation, here’s what comes to mind: quite often we create a file in one program, it completes its execution, then the file is used in another program. The page cache should provide for such situations. And in general , why should the kernel, in principle, get rid of the contents of the page cache? We remember that the speed of the hard disk is five orders of magnitude slower than the RAM. And if it happens that the data have previously been cached, then we are very lucky. That is why nothing is removed from the page cache, at least as long as there is free RAM. The page cache does not depend on any particular process; on the contrary, it is such a resource that the entire system shares. A week later, run the render again, and if the scene.dat file is still in the page cache, well, we are lucky! That is why the page cache size gradually grows, and then its growth suddenly stops. No, not because the operating system is a complete garbage that eats up the whole operative. Because it should be so. Unused RAM is also a kind of wasted resource. It is better to use as much RAM as possible for the paging cache than not to use at all.

When the program makes a write () call, the data is simply copied to the corresponding page in the page cache, and it is marked with the “dirty” flag. Writing directly to the hard disk itself does not happen immediately, and there is no point in blocking the program while waiting for the disk subsystem to become available. This behavior has its disadvantage - if the computer falls into the blue screen, the data may never get to the disk. That is why critical files, such as database transaction log files, need to be synchronized with a special fsync () call (but in general, there is also a hard disk controller cache, so you cannot be absolutely sure that the write operation is successful). The read () call, on the other hand, blocks the program until the disk becomes available and the data is not read. In order to somewhat mitigate this problem, the operating systems use the so-called. “Eager loading method” (eager loading), and an example of this method is “read ahead” . When read-ahead is enabled, the kernel proactively loads a certain number of file fragments into the page cache, thus anticipating subsequent read requests. You can help the kernel with the choice of optimal parameters for read ahead, choosing a parameter depending on how you are going to read the file - sequentially or in a random order (calls madvise () , readahead () , on Windows - cache hints ). Linux uses read-ahead for memory-mirrored files; About Windows, I'm not sure. Finally, you can not use the page cache at all - the O_DIRECT flags in Linux and NO_BUFFERING in Windows are responsible for this; databases do this quite often.

File mapping into memory can be of two types - either private or shared . These terms refer only to how the system will react to changes in the in-memory data: in the case of shared mappings, any changes to the data will be flushed to disk or will be visible in other processes; in the case of private mappings, this will not happen. To implement private-mapping, the kernel relies on the copy-on-write mechanism, which is based on the specific use of entries in page-tables. In the following example, our render program, as well as the render3d program (and I have a talent for inventing program names!), Create a private-mapping for the scene.dat file. Then, render writes to the virtual memory area that is assigned to the file:



The fact that the entries in the page-tables are read-only (see figure) should not confuse us; and this does not mean that the mapping will be available only for reading. This is simply a technique by which the kernel shares the page with different processes and delays the need to create a copy of the page until the very last moment. Looking at the picture you understand that the term “private” is probably not the most successful, but if you remember that it describes only the behavior when data changes , then everything is fine. The mapping mechanism has one more feature. Suppose there are two programs that are not related to the "parent process - child process" relationship. Programs work with the same file, but it is mapped differently - in one program it is private-mapping, in the other - shared-mapping. So, a program with private mapping (let's call it “the first program”) will see all the changes made by the second program to a certain page, until the first program tries to write something to this page (which will lead to a separate copy of the page for the first program). Those. as soon as the copy-on-write mechanism is completed, changes made by other programs will not be visible. The kernel does not guarantee this behavior, but in the case of x86 processors, this is what happens; and this has a definite meaning, even from the point of view of the same API.

As for shared-mapping, here the situation is as follows. Pages are set to read / write permissions, and they are simply mapped to the page cache. Thus, whoever makes changes to the page, all processes will see it. In addition, the data is reset to the hard disk. Finally, if the pages in the previous figure would be really read-only, then the page fault caught when accessing them would cause a segmentation fault, rather than working out the copy-on-write logic.

Shared libraries are also mapped to memory like any other files. There is nothing special about this - all the same private mapping available to the programmer through an API call. The following is an example showing a part of the address space of two copies of the render program using the file-to-memory mechanism. In addition, the corresponding areas of physical memory are shown. Thus, we can link together the concepts we met in this article:



This concludes our series of three articles on the basics of memory. I hope the information was useful for you and allowed you to make a general idea on this topic.

Links to articles in the series:



The material was prepared by the staff of the company Smart-Soft - smart-soft.ru .

Source: https://habr.com/ru/post/228937/


All Articles