📜 ⬆️ ⬇️

Memory mapped files

In this article I would like to talk about such a wonderful thing as memory-mapped files, hereinafter - MMF ).
Sometimes their use can give quite a significant performance boost compared with the usual buffered file handling.

This is a mechanism that allows you to display files on the memory. Thus, when reading data from it, the corresponding bytes are read from the file. The recording is similar.
“Cool, of course, but what does it give?” - you ask. Let me explain by example.
Suppose we face the task of processing a large file (several tens or even hundreds of megabytes). It would seem that the task is trivial - open the file, copy it block by block from it into memory, process it. What happens when this happens? Each block is copied to a temporary cache, then from it to our memory. And so with each block. There is non-optimal use of memory for cache + a bunch of copy operations. What to do?
This is where the MMF mechanism comes to our rescue. When we access the memory in which the file is mapped, the data is loaded from the disk into the cache (if they are not there yet), then the cache is mapped into the address space of our program. If this data is deleted, the mapping is canceled. Thus, we get rid of the copy operation from the cache to the buffer. In addition, we do not need to bathe about optimizing the work with the disk - all the dirty work takes the core of the OS.
At one time I was conducting an experiment. Measured using quantify, the speed of the program, which is buffered, copies a large 500 MB file to another file. And the speed of the program, which does the same, but with the help of MMF. So the second one works faster by almost 30% (in Solaris, in other OSs the result may differ). Agree, not bad.
To take advantage of this opportunity, we must inform the kernel about our desire to map the file to memory. This is done using the mmap () function.
#include<sys/mman.h>
void *mmap( void *addr, size_t len, int prot, int flag, int filedes, off_t off);

It returns the address of the beginning of the portion of the mapped memory or MAP_FAILED in case of failure.
The first argument is the desired address of the beginning of the area of ​​the displayed memory. I do not know when it might be useful. Passing 0 - then the kernel itself will select this address.
len is the number of bytes to display in memory.
prot is a number that determines the security level of the mapped memory area (read only, write only, execution, area is not available). Common values ​​are PROT_READ , PROT_WRITE (can be kombinirovat through OR). I will not dwell on it - read in more detail in mana. I will only note that the memory security will not be established lower than the rights with which the file is opened.
flag - describes the attributes of the area. The normal value is MAP_SHARED . For the rest - smoke mana. But I note that using MAP_FIXED reduces the portability of the application, since its support is optional on POSIX systems.
filedes - as you have already guessed - the file descriptor to be displayed.
off - offset of the displayed area from the beginning of the file.

Important note . If you plan to use MMF to write to a file, before mapping you need to set the final file size not less than the size of the mapped memory! Otherwise, you will fall for SIGBUS.

Below is an example (honestly styled from a wonderful book "Unix. Professional Programming" ) of a program that copies a file using MMF.
#include <fcntl.h>
#include <sys/mman.h>
int main( int argc, char *argv[])
{
int fdin, fdout;
void *src, *dst;
struct stat statbuf;
if (argc != 3)
err_quit( ": %s <fromfile> <tofile>" , argv[0]);
if ( (fdin = open(argv[1], O_RDONLY)) < 0 )
err_sys( " %s " , argv[1]);
if ( (fdout = open(argv[2], O_RDWR | O_CREAT | O_TRUNC, FILE_MODE)) < 0 )
err_sys( " %s " , argv[2]);
if ( fstat(fdin, &statbuf) < 0 ) /* */
err_sys( "fstat error" );
/* */
if ( lseek(fdout, statbuf.st_size - 1, SEEK_SET) == -1 )
err_sys( " lseek" );
if ( write(fdout, "" , 1) != 1 )
err_sys( " write" );
if ( (src = mmap(0, statbuf.st_size, PROT_READ, MAP_SHARED, fdin, 0)) == MAP_FAILED )
err_sys( " mmap " );
if ( (dst = mmap(0, statbuf.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fdout, 0)) == MAP_FAILED )
err_sys( " mmap " );
memcpy(dst, src, statbuf.st_size); /* */
exit(0);
}

* This source code was highlighted with Source Code Highlighter .

Here, in general, that's all. Hope this article was helpful. I am pleased to accept constructive criticism.

')

Source: https://habr.com/ru/post/55716/


All Articles