Crash Course on Assemblers

This is a translation of one of the articles from Lin Clark . If you have not read the rest, we recommend starting from the beginning .

To understand how WebAssembly works, it's a good idea to understand what assembly code is and how compilers generate it. In an article on JIT, I compared computer interaction with alien interaction.

I want to consider the work of the alien brain - how the computer’s brain analyzes and understands the messages that it receives. Part of the brain is designed for thinking, for example, addition, subtraction or logical operations. There is an area responsible for short-term memory and another - responsible for long-term.

They all have names.
')

The field of thinking is the arithmetic logic unit (ALU).
Short-term memory is provided by registers.
Long-term memory is random access memory (RAM).

Machine code sentences are instructions.

What happens when such instruction enters the brain? It is divided into several parts with different meanings. Separation of instructions depends on the device of the brain. For example, the brain in the picture can take the first 6 and spend them in the arithmetic logic unit (ALU). Based on the zeros and units, the ALU determines that it needs to connect these parts.

This chunk is called an opcode (opcode, operation code), because it tells the ALU what operation to perform.

After that, the brain will take the next 2 chunks of 3 bits each, to determine which 2 numbers to add. These will be the addresses of the registers.

Pay attention to the explanations of the code that will help us, people, to understand what is happening here. That's what an assembler is. It is called a character machine code. This is a way for people to understand machine code.

There is a direct connection between the assembler and the machine code. Because of this, for different types of machine architectures there are different types of assemblers. When confronted with a new architecture, you will most likely need a new assembly dialect.

It turns out that we have more than one object for translation. There is not one language, called machine code, but many different types of machine code. Like people, cars speak different languages.

Translating from human to alien, you move from English, Russian or Chinese to alien language A or alien language B. In programming terminology, this is how to switch from C or C ++, or Rust to x86 or ARM.

Suppose you want to be able to translate any of these high-level programming languages into any kind of assembly language (which is suitable for different architectures). One way to do this is to create multiple translators who can translate from any language to any assembler.

It will be rather ineffective. To solve this problem, most compilers add another step to the process. The high-level programming language is transformed into a simpler, but still not working at the level of machine code. This is called an intermediate representation (IR).

This means that the compiler can take any high-level programming language and translate it into IR-language. After that, another part of the compiler can take the IR language and compile something suitable for the target architecture from it.

The frontend compiler translates a high-level programming language into an intermediate representation. A backend is an intermediate representation in assembler for the target architecture.

Conclusion

That's what assembler is, and this is how compilers translate high-level programming languages into assemblers. In the next article, we will see how this relates to WebAssembly.

Source: https://habr.com/ru/post/348738/

All Articles

Crash Course on Assemblers

Conclusion

More articles: