Of course, an assembler for Unix is different from an assembler for DoC or Windows. While in the system under these operating systems, the syntax imposed by Intel was used, abounding with various uncertainties (ambiguities, if you will) resolved by the type reduction (byte ptr, word ptr, dword ptr), the AT & T and SysV syntax was used in the system under Nix. / 386, which was developed specifically to eliminate the ambiguity of the interpretation of commands. Of course, there are assemblers, under Unix with Intel syntax, such as NASM, but this article will discuss the syntax of assemblers standard for this platform.
In general, of course, it would be worth starting with the rules. We will do that. In assembler using AT & T syntax, all Latin letters, numbers, as well as additional symbols, such as percentage, comma, period, underscore, asterisk, dollar sign, are used for operation. Processor commands: any sequence of allowed characters that does not begin with a special character or digit and does not end with a colon is considered an assembler by a processor command:
// hlt
if such a sequence begins with a percent sign, then this is the processor register:
')
pushl %eax // %eax ($), . 0, 10h, qwerty: pushl $0 pushl $0x10 pushl $qwerty
if the sequence starts with a dot, then it is considered an assembler derrictiv:
.aling 2
Well, if the sequence ends with a colon, then this is a label (it is used in the same way as in the assembler for doc and Windows). It is worth noting a special label point - this label, as in the case of an asc for doc, describes the current address.
AT & T syntax type conversion commands have four-letter names: C, source size, T, receiver size:
//cbw cbtw //cwde cwtl //cwd cwtl //cdq cltd
Where:
b-byte
w- word
l- double word
q- quadruple word
s- 32bit floating point number
l-64bit floating-point number
t- 80bit number with floating point
One of the most important differences in assemblers is the recording of the premnik and the source, and different from dos-asma, in Unix the source operand is always recorded in the first position
//mov ax,bx movw %bx,%ax //imul eax,ecx,16 imull $16,%ecx,%eax
Types of addressing: as mentioned earlier, the register operand and the immediate differ in the pre-mixes% and $:
//xor ebx,ebx xorl %ebx,%ebx //mov edx,offset qwerty movl $qwerty,%edx
For indirect addressing, the unmodified variable name is used, as it was in the Intel version:
//push dword ptr qwerty
pushl $qwerty
It is better to consider more complex addressing methods on the basis of operations with shift, base and indexing:
//mov eax,base_addr[ebx+edi*4] movl base_addr(%ebx+%edi*4),%eax //lea eax,[eax,eax*4] leal (%eax,%eax*4),%eax //mov ax,word ptr [bp-2] movw -2(%ebp),%ax //mov edx,dword ptr [edi*2] movl (%edi*2),%edx
The programming process itself is divided into programming using the libc library and programming without using it. Since the system itself is written in C and many functions refer to this library, programs written in assembly language can also access it. The library function is called using the call command. But there is one problem: since not all Unix systems are similar, in some systems, the library function must be preceded by an underscore. Consider the following program that displays the famous phrase:
.text .globl main main: pushl $message call puts popl %ebx ret .data message: .string "Hello world!\0"
Without using glibc, the program will look like this:
.text .globl _start _start: movl $4,eax xorl %ebx.%ebx incl %ebc movl $message,%ecx movl $mesg_len,%edx int $0x80 xorl %eax,%eax incl %eax xorl %ebx,%ebx int $0x80 hlt .data message: .string "Hello World!\012" mesg_len= .-message
In this example, we used to display two system calls: write and exit. I’m calling write corresponds to placing in the% eax register the value of 4-value under which this function is recorded in the system calls table.
This function is called by calling the $ 0x80 interrupt. Exiting the program, i.e. its completion corresponds to the system call $ 1.