How to handle IA-32 code or Simics decoder features

Hi% username%!
Decoding the IA-32 code is an extremely complex task. To verify this, you can refer to the Intel Software Development Manual or to the articles previously written on Habré: Prefixes in the IA-32 command system. Does your disassembler work correctly? . Let's see how the functionally accurate full-platform simulator Wind River Simics is struggling with this task. .

Most libraries for decoding IA-32 instructions generate or use correspondence tables between operation codes and instructions. An example of using this approach is described in the Disassembler with own hands article. However, decoding prefixes and arguments is usually handwritten : libopcodes , metasm , beaengine , distorm . This approach has a significant drawback - adding support for new sets of commands will require a lot of manual work.
There are other ways to create decoders, for example using the GDSL language. This approach is universal and allows you to create decoders for any architecture.
Simics, on the other hand, uses a completely different, ~~no less universal~~ approach to working with IA-32 instructions, called split decoding. Simics also has the ability to use external decoders, but more on that later.

Input and output of the decoder procedure

In a real processor, a separate block of logic elements of the microcircuit is responsible for the decoding task. In the simulator, it corresponds to a certain procedure written in a programming language. Consider what is fed to its input and what results should be expected from it.

Obviously, an array of bytes of known length, received during the fetch command phase, is fed to the decoder input. In addition, it must be aware of the current processor mode (see Prefixes in the IA-32 Command System ).

As a result, the decoder should return the error code and the results of the sequence analysis in the form of a list of result fields. The following values for the error code are possible:
')

Decoding is successful (return code is equal to instruction length> 0). The byte array was recognized as a valid instruction, and the list of fields contains information about the operation code and its arguments.
Decoding failed (code 0). No instruction defined in the architecture corresponds to the input byte array. At the same time, the contents of the result fields do not make sense. What happens in this situation further in the execution phase? It depends on the architecture. Most often, the inability to decode leads to the generation of an exception, and in some cases an incorrect instruction can be interpreted as a NOP - the absence of an operation.
For ISA with variable length instructions, a third situation is possible - the input data is not enough to make a unique decision (code <0). In other words, only a part of the instruction was transmitted to the input of the decoder, and, having no information about which bytes go further in memory, it reports this.

The figure below shows an example of an algorithm that combines the iterations of the Fetch and Decode phases and allows decoding for instructions with variable length.

Split decoding

The main idea of this approach is to divide the decoding phase into two stages:

Prefix decoding. This stage includes both decoding of all prefixes and checking for conflicts between them.
Decoding of operation codes and operands. This stage involves calling the decoder generated using SimGen .

The algorithm of the built-in decoder Simics.

It goes without saying that the second phase depends on the result, since in the command system IA-32 there is such a thing as mandatory prefixes, which are actually part of the operation code (see Prefixes in the command system IA-32 ).

Using external decoders

Simics allows you to connect additional decoders using external interfaces described in the Model Builder User's Guide that comes with the simulator. Thus, you can connect a lot of external decoders and call them one by one until some decoder gives a positive result or the list of decoders is over. In this case, it will be possible to conclude that in this model, this operation code is considered invalid.

For flexibility, external decoders in Simics are divided into two types:

User decoders (user decoders) - decoders, which can override any existing opcode, and, of course, can add the ability to decode new instructions.
Extension decoders (extension decoders) are decoders designed to expand the capabilities of the built-in decoder, that is, to decode instructions that it does not support.

The difference between the proposed types of decoders is that the user decoders are started first - even before the built-in call, which allows you to override the decoding results that are embedded in the original model. The extenders are started only when neither the user nor the built-in decoders could recognize the instruction.

And one more obvious remark

User decoders are defined by the user, while extension decoders are “wired” into the model by him cannot be changed.

That is, the user, engaged in the development of any ISA, can simply slip his decoder and see what changes without changing the original processor model.

Example

You suddenly wanted to swap the NOP and HLT instructions and see if your system will work. To do this, you just write a small decoder that decodes 0x90 as HLT, and 0xF4 as NOP, attach it to Simics and try to start the system.

In addition, this approach allows reuse of existing decoders instead of writing them from scratch, which significantly reduces the development time of the model.

Source: https://habr.com/ru/post/215687/

All Articles

How to handle IA-32 code or Simics decoder features

Input and output of the decoder procedure

Split decoding

Using external decoders

More articles: