How vendors IA-32 put a pig on the creators of virtualization

It is unlikely that anyone will be surprised by the fact that not only Intel but also companies such as AMD and VIA are engaged in the development of the IA-32 architecture. More information can be found, for example, in article A. Fog'a . Today I plan to talk about one, in my opinion, not fully thought out ISA change introduced by AMD.

http://technology.desktopnexus.com/wallpaper/911325

When thinking about the impact of AMD on the IA-32 architecture, the REX prefix and support for 64-bit processor mode are primarily remembered. And this is definitely the “positive” effect that made IA-32 better. However, there were other interesting changes that I personally cannot call positive.

The coding of the command system IA-32 due to a long evolution has become an extremely complex structure (only the prefixes are worth). Talking about some decoding problems and their solutions in the articles "Is your disassembler working correctly?" And "How to cope with the IA-32 code or features of a Simics decoder" , I forgot to mention a few interesting facts. The maximum possible length of an IA-32 instruction is 15 bytes. There may be several prefixes in the encoding and their number is actually limited only by the condition on the length of the instruction. In this case, the same prefix may occur several times, or, for example, prefixes may occur that can in no way affect this instruction. All of them will be simply ignored.
')
In my opinion, a good example illustrating this situation can be given on the basis of the NOP instruction (No OPeration, an instruction that does nothing. 0x90 ).

0x66 0x66 0x66 0x66 0x66 0x66 0x66 0x66 0x66 0x66 0x66 0x66 0x66 0x66 0x90 is also a NOP instruction, all 14 prefixes 0x66 simply ignored.

This is certainly a very strange feature, but one cannot get away from it. And some compilers may even use prefixes for code alignment.

On this little flowers are over, the berries begin.

For many years in the Intel architecture there is an instruction BSR . It first appeared in the Intel 80386 processor . It finds the sequence number of the most significant bit of 1.

For example, for the number 0x11aa00bb this instruction will return 28.

Let's see how it can be encoded:

Nothing interesting: 0x0F 0xBD and Mod R / M bytes for operands.

And now let's add some prefix to the encoding of this instruction ... Let's say 0xF3 . The valid instruction will turn out, the prefix will be simply ignored, as it relates to string operations or input / output instructions. No crime.

What actually made comrades from AMD?

Having done some research, they found that the combination of the prefix 0xF3 with the BSR instruction in software is very rare, and reassigned this combination to a new instruction - LZCNT , which calculates the number of leading zeros.

For the same input number 0x11aa00bb in 32-bit mode, this instruction will return not 28, but 3.

This instruction appeared as part of the ABM (Advanced Bit Manipulation) command extension, consisting of two LZCNT and POPCNT (in this command, I personally do not see anything wrong), each of which has a separate bit in CPUID .

Unfortunately, this instruction cannot be disabled.

The first ABM instruction set was supported by the AMD processor based on the Barcelona microarchitecture. Intel has added a POPCNT instruction to the Nehalem processor instruction set. And one might have thought that Intel would stop there, but no. The LZCNT instruction appeared in Haswell processors.

What is this bad?

First, this change obviously violates backward compatibility. But this, in my opinion, is not its worst feature. As mentioned above, according to AMD research, the BSR instruction with this prefix is extremely rare. Still, theoretically, such a situation is possible.

But the article is not about that, so now let's move away a bit from the typical needs of an ordinary user and look at the needs of developers.

As you know, most of the software stack is written and debugged on the simulator before baking the chip itself. So let's see how this change can affect the speed and accuracy of the simulation.

Of course, everyone wants to model as quickly as possible. The speed of an ordinary interpreter is never enough. Everyone wants to load the BIOS in seconds, and the operating system in minutes. For this reason, the model is much more complicated, there is an optimizing binary translator , which allows to reduce the time of the simulator. But this is still not enough! Add support for direct execution of guest instructions on the host, which further complicates the model, while improving performance several times. More information about the various modes of operation of the simulator can be found in the article “Programming simulation of a microprocessor. Transmission .

It is easy to guess that neither the interpreter nor the translator should have any problems. Problems may arise when using hardware virtualization . Neither LZCNT nor, moreover, BSR causes an output to the VM monitor.

This leads to the fact that if you need to simulate a Haswell + processor, then on an older processor, such as Sandy Bridge, you can execute BSR instead of LZCNT . And vice versa, if you want to model some simpler processor, for example, Quark on a host with Haswell, you risk getting the opposite effect - LZCNT instead of BSR .

They broke virtualization!

http://oneinjesus.info/2010/04/the-sad-story-of-my-broken-computer/

However, the solution to this problem is to preview the page.

The existing virtualization mechanism allows you to limit the set of memory pages that guest software can access. Thus, we can allow direct execution of code located only on pages that do not contain LZCNT encodings instructions. And each new page is pre-scanned for the presence of these commands.

Such a change, of course, leads to a drop in performance and complication without even a simple simulator. It seems to me that this is the negative effect of these changes.

PS Such instruction is not the only one. Together with the BMI1 extension , Intel added a new TZCNT instruction, which is likewise linked to the BSF team.

Source: https://habr.com/ru/post/226065/

All Articles

How vendors IA-32 put a pig on the creators of virtualization

On this little flowers are over, the berries begin.

What actually made comrades from AMD?

What is this bad?

More articles: