Assembler for simulation tasks. Part 1: guest assembler

^{Instructions, registers, and assembler directives are always required.} ^{golang.org/doc/asm}

On Habré and the Internet as a whole, there is quite a lot of information about the use of assembly languages for various architectures. Scrolling through the available materials, I found that the areas of use of the assembler and related technologies that are most often highlighted in them are as follows:

Embedded (embedded) systems.
Decompilation, reverse engineering, computer security.
High performance computing (HPC) and program optimization.

And of course, in each of these areas there are specific requirements, and therefore their own concepts about the tools and "their" assembler. Embaders look into the code through the editor and debugger, reverse engineers see it in decompilers like IDA and radare2 and ICE debuggers, and HPC specialists through profilers such as Intel® VTune ™ Amplifier, xperf or perf .
And I wanted to talk about another programming area in which assemblers are frequent satellites. Namely - about their role in the development of software models of computing systems, in common people called simulators.

The objectives of this and subsequent articles are as follows.

Show another facet of using assembly language and programming in machine codes.
To illustrate all three ways of using assembler in modern programs: through intrinsiki, through assembler inserts and through separate files.
~~Encourage yourself to update your own notes on how to write simulators of central processing units.~~

The task of a software model of a computing device, such as a central processor, is to correctly simulate the operation of each machine instruction that occurs during the operation of a computer.
The programmer working on the simulator is faced with the need to use the assembler at least three times: when parsing machine code instructions, when writing code that simulates their behavior, as well as when debugging your model.

Back and forth: decoding

The first thing that needs to be done with the machine instruction after retrieving it from memory is to find out its function, as well as with what arguments it operates.
Decoding (in simulation) is the translation of a machine word read from the program's memory into an internal representation of the simulator, facilitating subsequent modeling. In the decoding process, the bit fields described in the specification are extracted from the stream of generally faceless zeros and ones, their values are compared with the valid ones, the values of some fields are combined into one whole. In general, the level of abstraction of available information about instructions increases: instead of shifting from the current instruction, the absolute address to go, instead of stubs of a literal argument, is a constant already collected and correctly extended by sign, instead of a mash of prefixes that redefine each other’s meaning and instructions in general — accurate information about the width of the data address and the width of the operands, etc.

About decoder generation by description

The task of (soft) decoding machine instructions on the structure resembles the parsing of strings when parsing high-level languages (which is done in the compiler frontends). Both there and there at the entrance there is a language with a well-known grammar, and at the output - an intermediate representation corresponding to the phrase that was parsed. In both cases, grammars are often difficult to implement parsing for them manually, so code generators from DSL (domain specific language) descriptions are used.
For high-level languages, parser generators have been developed and successfully used: lexers in conjunction with scanners. Here and Lex / YACC , and ANTLR, and many other tools for all sorts of target languages.
For machine languages, there are also decoder generators: SimGen , ISDL , The New Jersey Machine-Code Toolkit .
What surprised me was the lack of projects on the use of classical parser generators to describe the grammar of machine languages. Always use something of their own, self-made bicycle, albeit effective. Machine languages are not too simple, so that activating the Yacc turns out to be “shooting sparrows with a cannon.” They are hardly too complex for the expressiveness of ANTLR to be enough.
I was so interested in the question that I even started a discussion on the ANTLR forum, but did not receive a clear answer there.

Disassembling - translation of information about instructions from a machine representation into a text line, convenient for reading, processing and memorizing by a person - into mnemonics . Unlike decoding results that are processed by a soulless machine and therefore must be unambiguous, the result of disassembling should be understandable to people. In this case, it is even allowed to introduce a light ambiguity. For example, the same mnemonic opus “PUSH” for Intel® IA-32 architecture will be used for a rather scattered group of machine instructions, some of which work with general-purpose registers, some with segment registers, some with memory operands, and part with literal constants. The machine code and semantics of all variants of PUSH are very different, whereas the mnemonic record will be similar.
It is no secret that even the syntax used for the mnemonic representation may be different; talk about this in more detail below.
In the simulator, disassembling is useful when implementing the built-in debugger, which allows even in the absence of the application source code to figure out whether it works correctly and whether the model executes instructions correctly.
(For) encoding - the inverse decoding conversion from an internal representation to a machine code. The ability to code instructions of the target architecture is essential and essential for assembler programs, whereas it is rarely required in the simulator. For a simulator, the ability to code-generate is important if it “writes itself”, i.e. is a binary translator. In this case, it is required to create a code not for the guest (simulated, target), but for the host architecture. More on this in the second part of the article.
Assembly - translation of instructions from the mnemonic record to the intermediate representation (or immediately to the machine code). This is the area of responsibility for various assembler programs: MASM, TASM, GAS, NASM, YASM, WASM, <insert your favorite ASM> ...
Since mnemonics are involved in the assembly process, we can expect ambiguities in the transformation. Indeed, the assembler has the right to choose for any mnemonic any valid and satisfying machine code. Most often he chooses the most compact format. In the following listing, I use the objdump disassembler to illustrate what the vector VADDPS instruction with various arguments converted to the GNU as assembler translates to:

 $ cat vaddps1.s #  ,    - vaddps %ymm0, %ymm1, %ymm1 vaddps %ymm1, %ymm1, %ymm1 vaddps %ymm2, %ymm1, %ymm1 vaddps %ymm3, %ymm1, %ymm1 vaddps %ymm4, %ymm1, %ymm1 vaddps %ymm5, %ymm1, %ymm1 vaddps %ymm6, %ymm1, %ymm1 vaddps %ymm7, %ymm1, %ymm1 vaddps %ymm8, %ymm1, %ymm1 vaddps %ymm9, %ymm1, %ymm1 vaddps %ymm10, %ymm1, %ymm1 vaddps %ymm11, %ymm1, %ymm1 vaddps %ymm12, %ymm1, %ymm1 vaddps %ymm13, %ymm1, %ymm1 vaddps %ymm14, %ymm1, %ymm1 vaddps %ymm15, %ymm1, %ymm1 $ as vaddps1.s #  $ objdump -d a.out #  a.out: file format pe-x86-64 Disassembly of section .text: 0000000000000000 <.text>: 0: c5 f4 58 c8 vaddps %ymm0,%ymm1,%ymm1 #  VEX 4: c5 f4 58 c9 vaddps %ymm1,%ymm1,%ymm1 8: c5 f4 58 ca vaddps %ymm2,%ymm1,%ymm1 c: c5 f4 58 cb vaddps %ymm3,%ymm1,%ymm1 10: c5 f4 58 cc vaddps %ymm4,%ymm1,%ymm1 14: c5 f4 58 cd vaddps %ymm5,%ymm1,%ymm1 18: c5 f4 58 ce vaddps %ymm6,%ymm1,%ymm1 1c: c5 f4 58 cf vaddps %ymm7,%ymm1,%ymm1 20: c4 c1 74 58 c8 vaddps %ymm8,%ymm1,%ymm1 #  VEX 25: c4 c1 74 58 c9 vaddps %ymm9,%ymm1,%ymm1 2a: c4 c1 74 58 ca vaddps %ymm10,%ymm1,%ymm1 2f: c4 c1 74 58 cb vaddps %ymm11,%ymm1,%ymm1 34: c4 c1 74 58 cc vaddps %ymm12,%ymm1,%ymm1 39: c4 c1 74 58 cd vaddps %ymm13,%ymm1,%ymm1 3e: c4 c1 74 58 ce vaddps %ymm14,%ymm1,%ymm1 43: c4 c1 74 58 cf vaddps %ymm15,%ymm1,%ymm1

In this example, I changed one of the source registers, going through all its options, from YMM0 to YMM15. The instructions with the first eight registers YMM0-YMM7 could be encoded using the shorter two-byte prefix VEX, and GAS chose this format. Whereas, for the YMM8-YMM15 range, instructions could only be presented using a three-byte VEX, and therefore turned out to be a byte longer. In principle, nothing prevented the use in all cases of a three-byte VEX, but not:
')

 $ cat vaddps2.s .byte 0xc5, 0xf4, 0x58, 0xc8 #    VEX,      .byte 0xc4, 0xe1, 0x74, 0x58, 0xc8 #    VEX $ as vaddps2.s $ objdump.exe -d a.out a.out: file format pe-x86-64 Disassembly of section .text: 0000000000000000 <.text>: 0: c5 f4 58 c8 vaddps %ymm0,%ymm1,%ymm1 4: c4 e1 74 58 c8 vaddps %ymm0,%ymm1,%ymm1 #   ,

In this example, I show that the same mnemonic VADDPS with the first register YMM0 can be represented by at least two machine code sequences.

And there are a lot of such tricks that assemblers of different architectures do. For example, many RISC architectures do not have a machine representation for the operation “copy register X to register Y”, so the assembler converts the mnemonic mov r1, r2 into code corresponding to add 0, r1, r2 , i.e. "Add r1 with zero and put the result in r2". Another example: an assembler for the IA-64 architecture (Intel® Itanium) has to pack several instructions into one 128-bit machine bundle. However, not all instructions are freely combined with each other: it is impossible to take and put them together because of a conflict over the computing resources that they consume. We have to assembler or signal an error, or try to scatter instructions on different bundles. The second approach requires the assembler to be aware of the number and organization of the execution nodes within the VLIW processor; this is more like the work performed by the compiler.
The inconvenience already mentioned above when using assemblers is that there are ugly many options for mnemonic instruction records (I leave aside non-architecture assembler differences, such as macro processing capabilities, supported ELF, PE output formats, and other “sugar”). How many tools, so many formats. Even within the same target architecture, almost everything can differ in the record: the naming of opcodes, the naming of registers, the order of the operands, the way the addresses are written. What to say about the different architectures!
Once again, I want to emphasize that the underdeterminedness of a mnemonic record distinguishes disassembling from decoding and makes the first unsuitable for intermediate representation tasks.
On the one hand, the “correct” can be considered the syntax used by the original vendor of the equipment for which the assembler wants to use. As written in the documentation for the processor, so the assembler should look.
On the other hand, de facto for many architectures, a common entry is an assembler in a so-called. AT & T notation, the default in GNU binutils toolkit. I use it in this article, even for examples of Intel architectures. Somehow got used to it more. GNU as is able to generate code for a very large number of systems, and it is practical to be able to understand exactly this notation.

Is there a standard for assembly language?

Good news: it turns out there is a standard - IEEE 694-1985 - IEEE Standard for Microprocessor Assembly Language. The bad news: it turned out to be unnecessary, and is in the status of "withdrawn". It was not possible to link all the variety of formats for CISC, RISC, VLIW, DSP into one book and still the devil knows which architectures.

But in practice? But in practice, you need to be able to recognize and read everything — both the Intel notation, the AT & T notation, and the notation of your favorite or unloved assembler program.
The relationship between machine code views and their transformative processes is illustrated in the following figure.

Temporarily miss the most interesting thing - the development of a simulation core; leave it for dessert. We now turn to another important task when creating a software simulator, namely, to test it.

Testing

How can I test the processor simulator? Of course, you can try to run and debug software immediately compiled for it, including the BIOS, OS, and application programs. However, the path is unproductive: debugging will be like a nightmare. It is strategically more correct to first ensure that individual instructions are simulated correctly. That is, check your code on unit tests.
What operations should be in a unit test for a machine manual?

Set the registers and device memory to a known input state.
Write the machine instruction code into memory and set the instruction pointer (PC, IP, RIP, IC register) at its beginning.
Give the simulator a command to execute one instruction.
Read the final state of the registers and memory.
Compare condition with expected. If there are differences, then look for the error either in the test or in the implementation of the simulating procedure.

Each such test will test one aspect of the instruction. For the most complete test, many such tests will be required: some of them will check for “normal” work, another - situations in which exceptions should be generated (and they should check that an exception has actually occurred), and still others - “boundary cases” in the instruction which, it seems, should not arise in normal programs, but in practice, according to Murphy's law, they will “shoot” constantly.
While ordinary applications run under the control of a particular operating system, it is not needed for unit tests. Moreover, it is harmful: it takes time to load the OS, then it prevents processes from accessing system resources, the task scheduler tries to run something of its own, etc. In general, the OS will consider itself the mistress of the system. But this is what we command here with a simulator!
An assembler comes to the rescue. In most cases, for a unit test, we need to load the initial values into registers and memory, execute the instruction being studied and compare the changed values in the registers and memory with the reference ones. This is all fully formulated in assembly language; higher-level languages are often less convenient, as they may either not have the means to express the required functionality, or it may be helpful to “optimize” the resulting code by rearranging and replacing machine instructions.
This assembly source file with the test should then be translated into the ELF or even a “raw” memory image, loaded into the simulator, set the instruction pointer to the first one, determine what is the condition for the end of the test (a predetermined number of commands executed, “magic” instruction, achievement debug point, attempt to access the device, etc.) and what is the condition for success in the test (setting the flag, the known state of the processor).
Of course, I'm a little cunning. The minimum efficient environment for a unit test is not always easy to prepare. Often, virtual memory is required, which means that the page tables set up for it, access rights and other pleasures. The possibility of exceptions and interrupts requires at least a minimum configuration of the interrupt tables (in the Intel IA-32 architecture, IDT and GDT). And checking work for instructions related to virtualization without the help of the OS means manual configuration of the virtual machine structures (in the case of Intel IA-32, this is VMCS).
On the other hand, once created environment can be repeatedly reused in all tests, and how to configure it can be peeped in operating systems. Well, or read the processor documentation.

To be continued

That's all for today. In the next article I will show the place of the assembler when building a simulator kernel, directly involved in modeling guest code.

Thanks for attention!

Source: https://habr.com/ru/post/254419/

All Articles

Assembler for simulation tasks. Part 1: guest assembler

Back and forth: decoding

Testing

To be continued

More articles: