Linux Virtual File Systems: Why Do They Need It and How Do They Work? Part 1

Hello! We are continuing to launch new streams on courses that you already like, and now we are in a hurry to announce that we are launching a new set on the Linux Administrator course, which will start in late April. This event will be timed to the new publication. The original material can be found here .

Virtual file systems serve as some kind of magical abstraction that allows the Linux philosophy to say that “everything is a file”.

')
What is a file system? Based on the words of one of the first contributors and authors of Linux, Robert Love , "The file system is a hierarchical data repository, assembled according to a specific structure." However, this definition is equally well suited for the VFAT (Virtual File Allocation Table), Git and Cassandra ( NoSQL database ). So what exactly defines such a thing as “file system”?

File System Basics

The Linux kernel has specific requirements for an entity that can be considered a file system. It must implement the open() , read() and write() methods for constant objects that have names. From the point of view of object-oriented programming , the kernel defines a generic filesystem (generic filesystem) as an abstract interface, and these three big functions are considered “virtual” and have no specific definition. Accordingly, the default file system implementation is called a virtual file system (VFS).

If we can open, read and write to the entity, then this entity is considered a file, as we see from the example in the console above.
The VFS phenomenon only emphasizes the observation characteristic of Unix-like systems, which states that “everything is a file”. Think how strange it is that that little example above with / dev / console shows how the console actually works. The picture shows an interactive Bash session. Sending a string to the console (virtual console device) displays it on a virtual screen. VFS has other, even stranger properties. For example, it allows you to search by them .

Familiar systems such as ext4, NFS, and / proc have three important functions in the C data structure called file_operations . In addition, certain file systems extend and redefine VFS functions in the usual object-oriented way. As Robert Love notes, VFS abstraction allows Linux users to blithely copy files to or from third-party operating systems or abstract entities, such as pipes, without worrying about their internal data format. As a user (userspace) using a system call, a process can copy from a file to kernel data structures using the read() method of one file system, and then use the write() method of another file system to output data.

The definitions of functions that belong to basic types of VFS are in the fs / *. C files of the kernel source code, while the fs/ subdirectories contain certain file systems. The kernel also contains entities, such as cgroups , /dev and tmpfs , which are required during the boot process and are therefore defined in the init/ kernel subdirectory. Note that cgroups , /dev and tmpfs do not call the “big three” file_operations functions, but directly read and write to memory.
The diagram below shows how userspace refers to different types of file systems, usually mounted on Linux systems. Structures such as pipes , dmesg and POSIX clocks , which also implement the file_operations structure, access to which passes through the VFS layer, are not shown.

VFS is the “shell layer” between system calls and implementations of certain file_operations , such as ext4 and procfs . The file_operations functions can communicate with either device drivers or memory access devices. tmpfs , devtmpfs and cgroups do not use file_operations , but directly access memory.
The existence of VFS provides the ability to reuse the code, since the basic methods associated with file systems should not be reimplemented by every type of file system. Code Reuse - a widely used practice of software engineers! However, if reusable code contains serious errors , all implementations that inherit common methods suffer from them.

/ tmp: Simple hint

A simple way to find out that VFS are present in the system is to enter mount | grep -v sd | grep -v :/ mount | grep -v sd | grep -v :/ mount | grep -v sd | grep -v :/ , which will show all mounted ( mounted ) file systems that are not resident on disk and not NFS, which is true on most computers. One of the listed VFS mounts is undoubtedly /tmp , right?

Everyone knows that storing /tmp on physical media is insane! Source of

Why is it not desirable to store /tmp on physical media? Because the files in /tmp are temporary, and storage devices are slower than the memory where tmpfs is created. Moreover, physical media is more subject to wear when overwriting than memory. Finally, files in / tmp may contain sensitive information, so their disappearance at every reboot is an essential function.

Unfortunately, some installation scripts for Linux distributions create / tmp on the default storage device. Do not despair if this has happened to your system. Follow a few simple instructions from the Arch Wiki to fix this, and remember that the memory allocated for tmpfs becomes inaccessible for other purposes. In other words, a system with gigantic tmpfs and large files in it can use up all the memory and fall. Another hint: while editing the /etc/fstab , remember that it should end with a new line, otherwise your system will not boot.

/ proc and / sys

In addition to /tmp , VFS (virtual file systems) that are most familiar to Linux users are /proc and /sys . ( /dev resides in shared memory and does not have file_operations ). Why these two components? Let's understand this question.

procfs takes a snapshot of the instantaneous state of the kernel and the processes it controls for userspace . In /proc kernel displays information about the means it has, for example, interrupts, virtual memory, and the scheduler. In addition, /proc/sys is the place where the settings configured with the sysctl command are available to the userspace . The status and statistics of individual processes are displayed in the /proc/ directories.

Here, /proc/meminfo is an empty file, which nevertheless contains valuable information.

The behavior of the /proc files shows how dissimilar VFS disk file systems can be. On the one hand, /proc/meminfo contain information that can be viewed with the free command. On the other hand, there is empty! How is it going? The situation resembles the famous article entitled “Does the moon exist when no one is looking at it? Reality and Quantum Theory ” , written in 1985 by David Mermin, a professor of physics at Cornell University. The point is that the kernel collects memory statistics when a request to /proc occurs, and in fact there is nothing in the /proc files when no one is looking there. As Mermin said, “The fundamental quantum doctrine says that measurement, as a rule, does not reveal the pre-existing value of the property being measured.” (Think about the moon as a homework task!)
The apparent emptiness of procfs makes sense, since the information located there is dynamic. A slightly different situation with sysfs . Let's compare how many files of at least one byte are in /proc and in /sys .

Procfs has one file, namely the exported kernel configuration, which is an exception, since it needs to be generated only once per download. On the other hand, /sys contains many more large files, many of which occupy a whole page of memory. Usually, sysfs files contain exactly one number or string, unlike tables of information obtained when reading files such as /proc/meminfo .

The goal of sysfs is to provide properties that can be read and written to what the kernel calls «kobjects» in userspace. The only purpose of kobjects is reference counting: when the last link to a kobject is deleted, the system will recover the resources associated with it. However, /sys makes up most of the famous “stable ABI for userspace” kernel, which no one can, under any circumstances, “break” . This does not mean that the files in sysfs are static, which would contradict the counting of references to unstable objects.
A stable binary kernel application interface (kernel's stable ABI) limits what may appear in /sys , not what is actually present at this particular moment. Listing permissions on files in sysfs provides an understanding of how the configurable parameters of devices, modules, file systems, etc. can be customized or read. We make a logical conclusion that procfs is also part of the stable ABI kernel, although this is not explicitly stated in the documentation .

Files in sysfs describe one specific property for each entity and can be readable, rewritable, or both. A “0” in the file indicates that the SSD cannot be deleted.

We will begin the second part of the translation with how to monitor VFS using eBPF and bcc tools, and now we are waiting for your comments and traditionally invite you to an open webinar , which our teacher, Vladimir Drozdetsky , will conduct on April 9th.

The second part of.

Source: https://habr.com/ru/post/446614/

All Articles

Linux Virtual File Systems: Why Do They Need It and How Do They Work? Part 1

More articles: