sysfs
files is to look at it in practice, and the easiest way to watch ARM64 is to use eBPF. eBPF (short for Berkeley Packet Filter) consists of a virtual machine running in the kernel , which privileged users can query
from the command line. Kernel sources tell the reader what the kernel can do; running eBPF tools on a booted system shows what the kernel actually does.
bcc
tools are Python scripts with small insertions of C code, which means that anyone familiar with both languages can easily modify them. In bcc/tools
there are 80 Python scripts, which means that most likely the developer or system administrator will be able to choose something suitable for solving the problem.
vfscount
or vfsstat
. This will show, say, that dozens of calls to vfs_open()
and “his friends” occur literally every second.
vfsstat.py
is a Python script, with C code inserts, that simply counts VFS function calls.
With eBPF, you can see what happens in /sys
when a USB flash drive is inserted. Here is a simple and complex example.
bcc
trace.py tool displays a message when the sysfs_create_files()
command is sysfs_create_files()
. We see that sysfs_create_files()
was launched using the kworker
stream in response to the fact that the flash drive was inserted, but what file was created? The second example shows the power of eBPF. Here, trace.py
displays a kernel trace.py
(kernel backtrace) (option -K) and the name of the file that was created by sysfs_create_files()
. Insertion in single statements is C code that includes an easily recognizable format string provided by the Python script that runs the LLVM just-in-time compiler . This line it compiles and executes in a virtual machine inside the kernel. The full signature of the sysfs_create_files ()
function must be reproduced in the second command so that the format string can refer to one of the parameters. Errors in this C code fragment lead to recognizable C compiler errors. For example, if the -l option is omitted, you will see “Failed to compile BPF text.” Developers who are familiar with C and Python will find bcc
tools easy to expand and modify.
kworker
will show that PID 7711 is the kworker
stream that created the «events»
file in sysfs
. Accordingly, a call with sysfs_remove_files()
will show that deleting the drive resulted in deleting the events
file, which corresponds to the general concept of reference counting. At the same time, viewing sysfs_create_link ()
from eBPF during insertion of a USB drive will show that at least 48 symbolic links have been created.
disk_add_events ()
, and either "media_change"
or "eject_request"
can be written to the event file. Here, the kernel block layer informs userspace about the appearance and extraction of the “disk”. Notice how informative this research method is on the example of inserting a USB drive compared to trying to figure out how everything works, exclusively from source.
fsck filesystem-recovery
utility and, in the worst case, lose data.
fsck
, when the engine finally starts working? And the answer is simple. Embedded devices rely on a read-only root filesystem (abbreviated as ro-rootfs
(read-only root file system)).
ro-rootfs
offer many benefits that are less obvious than genuine. One advantage is that malware cannot write to /usr
or /lib
if no Linux process can write there. Another is that the largely immutable file system is crucial for field support for remote devices, since support personnel use local systems that are nominally identical to systems in the field. Perhaps the most important (but also the most insidious) advantage is that ro-rootfs makes developers decide which system objects will be unchanged, even at the system design stage. Working with ro-rootfs can be inconvenient and painful, as is often the case with const variables in programming languages, but their advantages easily pay for the additional overhead.
rootfs
only rootfs
requires some extra effort for developers of embedded systems, and this is where VFS comes into play. Linux requires that the files in /var
be writable, and, in addition, many popular applications that run embedded systems will try to create configuration dot-files
in $HOME
. One solution for configuration files in the home directory is usually their pre-generation and build into rootfs
. For /var
one of the possible approaches is to mount it in a separate section that is writable, while it is mounted /
read-only. Another popular alternative is to use bind or overlay mounts.
man mount
command is the best way to learn about mountable and overlayed mountings, which give developers and system administrators the ability to create a file system along one path and then provide it to applications in another. For embedded systems, this means the ability to store files in /var
on a read-only flash drive, but overlaying or linking the mounting of the path from tmpfs
to /var
when loading will allow applications to write scrawl notes there. The next time you turn on, changes to /var
will be lost. The overlay mount creates a union between tmpfs
and the underlying file system and allows you to make modifications to existing files in ro-tootf
while the associated mount can make new empty tmpfs
folders visible as available for writing to ro-rootfs
paths. While overlayfs
is the proper ( proper
) file system type, the associated mount is implemented in the VFS namespace .
bcc
mountsnoop
tool.
system-nspawn
starts the container while mountsnoop.py
.
mountsnoop
while the container is “loading” indicates that the runtime environment of the container is strongly dependent on the mount being linked (Only the beginning of the long output is displayed).
systemd-nspawn
provides the selected files in the host's procfs
and sysfs
container as paths in its rootfs
. In addition to the MS_BIND
flag, which establishes a binding mount, some other flags in the mounted system define the relationship between changes in the host and container namespaces. For example, a bound mount can either skip changes to /proc
and /sys
to a container, or hide them depending on the call.
glibc
. One way to make progress is to read the source code of a single kernel subsystem, with a focus on understanding system calls and headers facing user space, as well as the main internal interfaces of the kernel, for example, the file_operations
table. File operations provide the principle of “everything is a file”, so their management is especially nice. The C kernel source files in the fs/
top-level directory represent the implementation of virtual file systems, which are a shell layer that provides wide and relatively simple compatibility of popular file systems and storage devices. Linux binding and overlapping mounts are the magic of VFS, which makes it possible to create containers and root-only file systems for reading. Combined with source code learning, the eBPF kernel tool and its bcc
interface
Source: https://habr.com/ru/post/447748/