📜 ⬆️ ⬇️

How to handle IA-32 code or Simics decoder features

Hi% username%!
Decoding the IA-32 code is an extremely complex task. To verify this, you can refer to the Intel Software Development Manual or to the articles previously written on Habré: Prefixes in the IA-32 command system. Does your disassembler work correctly? . Let's see how the functionally accurate full-platform simulator Wind River Simics is struggling with this task. .
Most libraries for decoding IA-32 instructions generate or use correspondence tables between operation codes and instructions. An example of using this approach is described in the Disassembler with own hands article. However, decoding prefixes and arguments is usually handwritten : libopcodes , metasm , beaengine , distorm . This approach has a significant drawback - adding support for new sets of commands will require a lot of manual work.
There are other ways to create decoders, for example using the GDSL language. This approach is universal and allows you to create decoders for any architecture.
Simics, on the other hand, uses a completely different, no less universal approach to working with IA-32 instructions, called split decoding. Simics also has the ability to use external decoders, but more on that later.

Input and output of the decoder procedure


In a real processor, a separate block of logic elements of the microcircuit is responsible for the decoding task. In the simulator, it corresponds to a certain procedure written in a programming language. Consider what is fed to its input and what results should be expected from it.

Obviously, an array of bytes of known length, received during the fetch command phase, is fed to the decoder input. In addition, it must be aware of the current processor mode (see Prefixes in the IA-32 Command System ).

As a result, the decoder should return the error code and the results of the sequence analysis in the form of a list of result fields. The following values ​​for the error code are possible:
')

The figure below shows an example of an algorithm that combines the iterations of the Fetch and Decode phases and allows decoding for instructions with variable length.



Split decoding


The main idea of ​​this approach is to divide the decoding phase into two stages:
  1. Prefix decoding. This stage includes both decoding of all prefixes and checking for conflicts between them.
  2. Decoding of operation codes and operands. This stage involves calling the decoder generated using SimGen .

The algorithm of the built-in decoder Simics.


It goes without saying that the second phase depends on the result, since in the command system IA-32 there is such a thing as mandatory prefixes, which are actually part of the operation code (see Prefixes in the command system IA-32 ).

Using external decoders


Simics allows you to connect additional decoders using external interfaces described in the Model Builder User's Guide that comes with the simulator. Thus, you can connect a lot of external decoders and call them one by one until some decoder gives a positive result or the list of decoders is over. In this case, it will be possible to conclude that in this model, this operation code is considered invalid.



For flexibility, external decoders in Simics are divided into two types:



The difference between the proposed types of decoders is that the user decoders are started first - even before the built-in call, which allows you to override the decoding results that are embedded in the original model. The extenders are started only when neither the user nor the built-in decoders could recognize the instruction.

And one more obvious remark
User decoders are defined by the user, while extension decoders are “wired” into the model by him cannot be changed.




That is, the user, engaged in the development of any ISA, can simply slip his decoder and see what changes without changing the original processor model.

Example
You suddenly wanted to swap the NOP and HLT instructions and see if your system will work. To do this, you just write a small decoder that decodes 0x90 as HLT, and 0xF4 as NOP, attach it to Simics and try to start the system.


In addition, this approach allows reuse of existing decoders instead of writing them from scratch, which significantly reduces the development time of the model.

Source: https://habr.com/ru/post/215687/


All Articles