In August 2016, without any official announcements from Google, the sources of the new operating system were discovered. Fuchsia. This OS is based on a microkernel called Zircon, which in turn is based on LK (Little Kernel) .
Fuchsia is not Linux
I do not real welder I am a developer and / or Zircon expert. The test under the cut is a compilation of partial translations: the official documentation of Zircon vDSO and the article Admiring the Zircon Part 1: Understanding Minimal Process Creation from @depletionmode , where some ad- libbing has been added (which has been removed under spoilers). Therefore, constructive suggestions for improving the article, as always, are welcome.
vDSO in Zircon is the only means of accessing system calls (syscalls) .
But is it not possible to directly call the instructions of the SYSENTER / SYSCALL processor from our code? No, these processor instructions are not part of the system ABI. User code is prohibited from directly following such instructions.
Those wishing to learn more details about such an architectural step are invited to the cat.
The abbreviation vDSO stands for v irtual D ynamic S hared O bject:
Support for vDSO as the only monitored ABI for user-mode applications is implemented in two ways:
Projecting a virtual memory object ( VMO, Virtual Memory Object ).
When zx_vmar_map processes VMO for vDSO (and ZX_VM_PERM_EXECUTE
requested in the arguments), the kernel requires that the offset and size strictly coincide with the vDSO executable segment. This (including) guarantees only one projection of the vDSO into the process memory. After the first successful projection of the vDSO into the process, it can no longer be deleted. An attempt to re-project a vDSO into the process memory, attempts to delete a projected VMO for a vDSO, or projecting with the wrong offset and / or size fail with the error ZX_ERR_ACCESS_DENIED
.
The offset and the size of the vDSO code, even at compile time, are extracted from the ELF file and then used in the kernel code to perform the checks described above. After the first successful projection of the vDSO, the OS kernel remembers the address for the target process to speed up the checks.
Check return addresses for system call functions.
When the user mode code calls the kernel, the low-level system call number is passed in the register. Low-level system calls are the internal (private) interface between the vDSO and the Zircon core. Some (most) directly correspond to the system calls of the public ABI, while others do not.
For each low-level system call, the vDSO code has a fixed set of offsets in the code that make this call. The source code for vDSO defines internal symbols identifying each such location. During compilation, these locations are extracted from the vDSO symbol table and are used to generate kernel code that determines the predicate of the validity of the code address for each low-level system call. These predicates allow you to quickly check the calling code for validity, given the offset from the beginning of the vDSO code segment.
If it is determined by the predicate that the calling code is not allowed to make a system call, a synthetic exception is thrown, just as if the calling code tried to execute a non-existent or preferred instruction.
To start the execution of the first thread (thread) of the newly created process, use the zx_process_start system call. The last parameter of this system call (see arg2 in the documentation) is the argument for the first thread of the process being created. By agreement, the program loader maps the vDSO to the address space of the new process (to a random place selected by the system) and passes the base display address with the argument arg2 to the first thread of the process being created. This address is the address of the ELF file header, which can be used to find the necessary named functions for making system calls.
vDSO is the regular EFL shared library, which can be viewed like any other. But for vDSO, a small subset of the entire ELF format is intentionally chosen. This has several advantages:
All vDSO memory is represented by two consecutive segments, each of which contains aligned whole pages:
The entire vDSO image consists only of the pages of these two segments. Only two values extracted from ELF headers are needed to display vDSO memory: the number of pages in each segment.
Some system calls simply return values that are constant (values must be queried at run time and cannot be compiled into user mode code). These values are either fixed in the kernel at compile time, or determined by the kernel at boot time (boot parameters and hardware parameters). For example: zx_system_get_version () , zx_system_get_num_cpus () and zx_ticks_per_second () . The return value of the last function, for example, is affected by the kernel command line parameter.
Interestingly, the description of the function zx_system_get_num_cpus () also explicitly states that the OS does not support the hot change in the number of processors:
This can not be changed during the boot time of the system.
This, at least, indirectly indicates that the OS is not positioned as a server.
Since these values are constant, there is no point in paying for real system calls to the OS kernel. Instead, their implementation is simple C ++ functions that return data read from the vDSO constant segment. Values captured at compile time (such as the system version string) are simply compiled into vDSO.
For values specified at boot time, the kernel must change the contents of the vDSO. This is done using early-stage code that forms the VMO vDSO before the kernel starts the first user process (and passes the VMO handle to it). During compilation, offsets from the vDSO image ( vdso_constants ) are extracted from the ELF file, and then embedded in the kernel. And at boot time, the kernel temporarily displays the pages covering vdso_constants in its own address space to pre-initialize the structure with the correct values (for the current system startup).
One of the most important reasons is safety. That is, if an attacker manages to execute arbitrary (shell-) code, he will have to use vDSO functions to call system functions. The first obstacle will be the aforementioned randomization of the vDSO boot address for each process created. And since the VM OS (virtual memory object) of the vDSO is responsible for the OS kernel, it can choose to display a completely different vDSO into a specific process, thereby prohibiting dangerous (and not needed by a specific process) system calls. For example: you can prevent drivers from spawning child processes or processing the projection of MMIO areas. This is a great tool for reducing attack surface.
Note: currently, support for multiple vDSOs is being actively developed. There is already a concept implementation (proof-of-concept) and simple tests, but more work is needed to improve the reliability of the implementation and determine which options will be available. The current concept provides options for a vDSO image that export only a subset of the full vDSO system call interface.
It should be noted that such techniques are already successfully used in other operating systems. For example, in Windows there is a ProcessSystemCallDisablePolicy :
Win32k System Call Disable Restricted Ability to use NTUser and GDI
Source: https://habr.com/ru/post/435482/
All Articles