📜 ⬆️ ⬇️

FMA3 instructions in Ryzen tightly hang up the operating system



As it turned out, the execution of some specific FMA3 instructions on an AMD Ryzen processor leads to a critical OS crash.

Instructions like FMA3 (Fused-Multiply-Add) are supported by both Intel (in Haswell) and AMD. These instructions are of type d = round(a × b + c) , where d must be in the same register as a , b, or c . For comparison, the FMA4 instructions only support AMD (in Buldozer and later processors). There a , b , c and d can be in different registers.

A processor bug was found in Flops version2 , a simple and little-known utility for testing CPUs. It should be noted that the developer of this utility, Alexander "Mystical" Yee (Alexander "Mystical" Yee) positions it as a specific testing utility that is sensitive to the microarchitecture of processors. In other benchmarks, the bug never showed up.
')
The Flops version2 utility comes with specific binaries for all major x64 architectures (Core2, Bulldozer, Sandy Bridge, Piledriver, Haswell, Skylake). But at the moment, neither among binary builds for Windows , nor for Linux is there a version for testing Zen. Therefore, now for testing Ryzen used binaries of other architectures, namely the closest Haswell. The above-mentioned error with the FMA3 instructions was discovered two weeks ago by the author of the Flops program when he launched the test with the stock binary for Haswell on a computer of the following configuration:


Suddenly it was found that the system usually freezes when performing the following operation:

Single-Precision - 128-bit FMA3 - Fused Multiply Add:

Sometimes the test passes this operation successfully, but still hangs on some other operation in the future.

The developer explains that his test is open source, and if you don’t trust the results, you can take and compile the binary in Visual Studio and recheck the results.

Alexander understood how much attention he would attract when reporting an error in the advertised processor. Therefore, he repeatedly rechecked the results. The processor hung the system at all clock speeds. And when working in single-threaded mode, the system hung each core.

There remained some probabilities that the cause of the malfunction may still be not in the processor, but in something else. For example, in a specific motherboard, in a specific BIOS, in a specific operating system ... What else could it be?

The developer shared the results with his colleagues so that they could test other versions of Zen on their computers. Crashes were confirmed for other processors, on different motherboards, under different versions of Windows and under Linux.

In the first days after Alex’s call, five Ryzen processors were launched by tests. Here are the results:

Confirmed failures:

Confirmed trouble free operation:

The benchmark developer checked all FMA variants (128 bits, 256 bits, single precision, double precision of numbers). In all cases, the computer tightly hung.

Only one detail haunted him: although the test was written correctly, for some reason the hangup did not occur in other benchmarks, such as prime95 and y-cruncher, although they also use FMA in testing.

So some uncertainty remained.

In the end, on March 16, an official message was received from a representative of AMD that the bug would be fixed in the new AGESA (AMD Generic Encapsulated Software Architecture) code - a protocol that is also used to initialize AMD processor cores. In other words, the company's specialists checked and confirmed the bug. Later, representatives of AMD officially confirmed the bug in the comments for the media .

Fortunately, such a bug can be fixed without replacing the hardware, but simply by updating the microcode. The bug is minor, so it will not cause processor feedback or any other problems for the company. In fact, in actual work conditions, hardly anyone can ever encounter this bug, it doesn’t affect the performance of the computer or the performance of the processor.

The bad news is that attackers can use it for DoS attacks. That is, mainly this error is a problem of information security. After all, an ordinary user program that works in user mode, and not at the kernel level of the OS, should not hang the system tightly. But it happens.

The fact that the test was run on a binary for a different architecture is not so important. Any processor must successfully reproduce tests from any binary if it supports the appropriate set of instructions, the author of the benchmark writes. But even if you run the test using incompatible instructions, the program should not hang the system tightly.

The danger of a security vulnerability is exacerbated by the fact that you can run malicious code even from under a virtual machine, it will still suspend the entire system. A computer with a new Ryzen processor can hang any malware. Perhaps even through the browser.

As already mentioned, AMD is working on updating the AGESA protocol. After that, patches will be released for all versions of BIOS in all motherboards.

Source: https://habr.com/ru/post/402551/


All Articles