📜 ⬆️ ⬇️

A simple x86 assembly program: Sieve of Eratosthenes

opening speech


In my profession, I do not encounter low-level programming: I do programming in scripting languages. But since the soul requires diversity, expanding the horizons of knowledge or simply understanding how the machine works at a low level, I do programming in languages ​​other than those with which I make money - this is my hobby.

And so, I would like to share the experience of creating a simple program in assembly language for processors of the x86 family, with the analysis of which you can begin your journey into conquering lowlands of abstraction levels.

Before writing it, I formulated the following requirements for the future program:

The task for my program, I chose to search for primes using the Sieve of Eratosthenes . As an assembler, I chose nasm .
')
I wrote the code with more emphasis on style and clarity than on its speed. For example, I did not reset the register using xor eax, eax , but using mov eax, 0 due to more appropriate instruction semantics. I decided that since the program is purely for educational purposes, it is possible to unleash and pursue the style of the code in assembler.

So, let's see what happened.

Where to begin?


Perhaps the most difficult thing that you encounter when moving from high-level languages ​​to assembler is the organization of memory. Fortunately, on this topic on Habré was already a good article .

It also raises the question of how data is exchanged between the internal world of the program and the external environment at such a low level. Here comes the operating system API. In DOS, as already mentioned, the interface was fairly simple. For example, the “Hello, world” program looked like this:

 SECTION .text org 0x100 mov ah, 0x9 mov dx, hello int 0x21 mov ax, 0x4c00 int 0x21 SECTION .data hello: db "Hello, world!", 0xD, 0xA, '$' 


In Windows, the Win32 API is used for these purposes, respectively, the program should use the methods of the corresponding libraries:

 %include "win32n.inc" extern MessageBoxA import MessageBoxA user32.dll extern ExitProcess import ExitProcess kernel32.dll SECTION code use32 class=code ..start: push UINT MB_OK push LPCTSTR window_title push LPCTSTR banner push HWND NULL call [MessageBoxA] push UINT NULL call [ExitProcess] SECTION data use32 class=data banner: db "Hello, world!", 0xD, 0xA, 0 window_title: db "Hello", 0 


Here, the win32n.inc file is used , where macros are defined that reduce the code for working with the Win32 API.

I decided not to use the OS API directly and chose the way to use functions from the C library. It also opened up the possibility of compiling a program in Linux (and, most likely, in other operating systems) - not a great achievement, but necessary for this program, but a pleasant achievement.

Subroutine call


The need to call subroutines entails several topics for study: organization of subroutines, passing arguments, creating a stack frame, working with local variables.

Subroutines are the label on which the code is located. The routine ends with the ret instruction. For example, this subroutine in DOS displays the string “Hello, world” to the console:

 print_hello: mov ah, 0x9 mov dx, hello int 0x21 ret 


To call her, you would use the call instruction:

 call print_hello 


For myself, I decided to pass arguments to subroutines through registers and specify in the comments which registers should have any arguments, but in high-level languages, arguments are passed through the stack. For example, the printf function from the C library is called like this:

 push hello call _printf add esp, 4 


Arguments are passed from right to left; the responsibility for clearing the stack lies with the caller.

When entering the subroutine, you need to create a new stack frame. This is done as follows:

 print_hello: push ebp ;       mov ebp, esp ;      


Accordingly, before exiting, you need to restore the previous state of the stack:

  mov esp, ebp pop ebp ret 


For local variables , the stack is also used, on which, after creating a new frame, the required number of bytes is allocated:

 print_hello: push ebp mov ebp, esp sub esp, 8 ;     8 ,    


The x86 architecture also provides special instructions with which you can more concisely implement these actions:

 print_hello: enter 8, 0 ;  ,  8     leave ;  ret 


The second parameter of the enter instruction is the nesting level of the subroutine. It is needed for linking with high-level languages ​​that support this method of organizing subroutines. In our case, this value can be left zero.

Directly the program


The project contains the following files:

Consider the code of the main file:

main.asm
 %define SUCCESS 0 %define MIN_MAX_NUMBER 3 %define MAX_MAX_NUMBER 4294967294 global _main extern _printf extern _scanf extern _malloc extern _free SECTION .text _main: enter 0, 0 ;   call input_max_number cmp edx, SUCCESS jne .custom_exit mov [max_number], eax ;     mov eax, [max_number] call allocate_flags_memory cmp edx, SUCCESS jne .custom_exit mov [primes_pointer], eax ;   mov eax, [primes_pointer] mov ebx, [max_number] call find_primes_with_eratosthenes_sieve ;  mov eax, [primes_pointer] mov ebx, [max_number] call print_primes ;     mov eax, [primes_pointer] call free_flags_memory ; .success: push str_exit_success call _printf jmp .return .custom_exit: push edx call _printf .return: mov eax, SUCCESS leave ret %include "functions.asm" SECTION .data max_number: dd 0 primes_pointer: dd 0 %include "string_constants.asm" 


It can be seen that the program is divided into 5 blocks, designed in the form of subroutines:
  1. input_max_number - using the console prompts the user for the maximum number to which the search for primes is performed; to avoid errors, the value is limited by the constants MIN_MAX_NUMBER and MAX_MAX_NUMBER
  2. allocate_flags_memory — request the OS to allocate memory for an array of number marks (simple / compound) on the heap; if successful, returns a pointer to the allocated memory through the eax register
  3. find_primes_with_eratosthenes_sieve - filter out composite numbers using the classic sieve of Eratosthenes;
  4. print_primes - output a list of prime numbers to the console;
  5. free_flags_memory - free memory allocated for flags

For functions, the following rule was agreed: the value is returned via the eax register, the edx contains the status. If successful, it contains the value of SUCCESS , that is, 0 , in case of failure, the address of the string with an error message that will be displayed to the user.

The string_constants.asm file contains the definition of string variables whose values, as the file name suggests, are not supposed to be changed. Only for the sake of these variables an exception was made to the rule “not to use global variables”. I never found a more convenient way to deliver string constants to I / O functions - I even thought of writing to the stack just before the function calls, but I decided that this idea was much worse than the idea with global variables.

string_constants.asm
 ; -,  str_max_number_label: db "Max number (>=3): ", 0 str_max_number_input_format: db "%u", 0 str_max_number_output_format: db "Using max number %u", 0xD, 0xA, 0 str_print_primes_label: db "Primes:", 0xD, 0xA, 0 str_prime: db "%u", 0x9, 0 str_cr_lf: db 0xD, 0xA, 0 ;  str_exit_success: db "Success!", 0xD, 0xA, 0 str_error_max_num_too_little: db "Max number is too little!", 0xD, 0xA, 0 str_error_max_num_too_big: db "Max number is too big!", 0xD, 0xA, 0 str_error_malloc_failed: db "Can't allocate memory!", 0xD, 0xA, 0 


The following script is used for the build:

Makefile
 ifdef SystemRoot format = win32 rm = del ext = .exe else format = elf rm = rm -f ext = endif all: primes.o gcc primes.o -o primes$(ext) $(rm) primes.o primes.o: nasm -f $(format) main.asm -o primes.o 



Subprograms (functions)


input_max_number

Subroutine code
 ;    ; : EAX -   input_max_number: ; -, ;4     enter 4, 1 ;  push str_max_number_label ;. string_constants.asm call _printf add esp, 4 ; scanf mov eax, ebp sub eax, 4 push eax push str_max_number_input_format ;. string_constants.asm call _scanf add esp, 8 mov eax, [ebp-4] ; cmp eax, MIN_MAX_NUMBER jb .number_too_little cmp eax, MAX_MAX_NUMBER ja .number_too_big jmp .success ; .number_too_little: mov edx, str_error_max_num_too_little ;. string_constants.asm jmp .return .number_too_big: mov edx, str_error_max_num_too_big ;. string_constants.asm jmp .return .success: push eax push str_max_number_output_format ;. string_constants.asm call _printf add esp, 4 pop eax mov edx, SUCCESS .return: leave ret 


The subroutine is designed to enter into the program the maximum number, which will be searched for simple ones. The key points here are calling the scanf function from the C library:

  mov eax, ebp sub eax, 4 push eax push str_max_number_input_format ;. string_constants.asm call _scanf add esp, 8 mov eax, [ebp-4] 

Thus, first, eax records the memory address 4 bytes below the stack base pointer. This is the memory allocated for the local needs of the subroutine. A pointer to this memory is passed to the scanf function as a target for writing data entered from the keyboard.

After calling the function, the entered value is moved to the eax memory.

allocate_flags_memory and free_flags_memory

Subroutine code
 ;      ; : EAX -   ; : EAX -    allocate_flags_memory: enter 8, 1 ; EAX+1  inc eax mov [ebp-4], eax push eax call _malloc add esp, 4 ; cmp eax, 0 je .fail mov [ebp-8], eax ; mov byte [eax], 0 cld mov edi, eax inc edi mov edx, [ebp-4] add edx, eax mov al, 1 .write_true: stosb cmp edi, edx jb .write_true ; mov eax, [ebp-8] jmp .success .fail: mov edx, str_error_malloc_failed ;. string_constants.asm jmp .return .success: mov edx, SUCCESS .return: leave ret ;      ; : EAX -    free_flags_memory: enter 0, 1 push eax call _free add esp, 4 leave ret 


The key points of these routines are calls to the malloc and free functions from the C library.

In case of success, malloc returns the address of the allocated memory through the eax register, in case of failure, this register contains 0 . This is the bottleneck of the program for the maximum number. 32 bits is quite enough to search for primes up to 4,294,967,295, but it will not be possible to allocate so much memory at once.

find_primes_with_eratosthenes_sieve

Subroutine code
 ;       ;: EAX -    , EBX -   find_primes_with_eratosthenes_sieve: enter 8, 1 mov [ebp-4], eax add eax, ebx inc eax mov [ebp-8], eax ;   cld mov edx, 2 ;p = 2 mov ecx, 2 ;  = 2 .strike_out_cycle: ;x = c*p mov eax, edx push edx mul ecx pop edx cmp eax, ebx jbe .strike_out_number jmp .increase_p .strike_out_number: mov edi, [ebp-4] add edi, eax mov byte [edi], 0 inc ecx ;c = c + 1 jmp .strike_out_cycle .increase_p: mov esi, [ebp-4] add esi, edx inc esi mov ecx, edx inc ecx .check_current_number: mov eax, ecx mul eax cmp eax, ebx ja .return lodsb inc ecx cmp al, 0 jne .new_p_found jmp .check_current_number .new_p_found: mov edx, ecx dec edx mov ecx, 2 jmp .strike_out_cycle .return: leave ret 



The subroutine implements the classical algorithm for crossing out compound numbers, the sieve of Eratosthenes, in assembler language x86. Pleasant because it does not use calls to external functions and does not require error handling :)

print_primes

Subroutine code
 ;    ; : EAX -    , EBX -   print_primes: enter 12, 1 mov [ebp-4], eax mov [ebp-8], ebx push str_print_primes_label call _printf add esp, 4 cld mov esi, [ebp-4] mov edx, esi add edx, [ebp-8] inc edx mov [ebp-12], edx mov ecx, 0 .print_cycle: lodsb cmp al, 0 jne .print jmp .check_finish .print: push esi push ecx push str_prime ;. string_constants.asm call _printf add esp, 4 pop ecx pop esi mov edx, [ebp-12] .check_finish: inc ecx cmp esi, edx jb .print_cycle push str_cr_lf call _printf add esp, 4 leave ret 


The subroutine outputs prime numbers to the console. The key point here is to call the printf function from the C library.

Conclusion


Well, the program meets all the stated requirements and seems to be easy to understand. I would like to hope that someone will help her to understand programming at a low level and he will receive from him the same pleasure that I received.

Also cite the full source code of the program .

I can also bring an interesting fact. Since we were taught from childhood that assembly language programs run faster, I decided to compare the speed of this program with the speed of the C ++ program I wrote once and which looked for simple numbers using Reshet Atkin . A C ++ program compiled in Visual Studio with /O2 performed a search up to the number 2 30 in about 25 seconds on my machine. The assembler program showed 15 seconds with the Sieve of Eratosthenes.

This, of course, is more like a bike than a scientific fact, since there was no serious testing, there was no clarification of the reasons, but it would seem to me like an interesting fact to complete the article.

useful links


  1. List of resources for learning assembler
  2. Memory organization
  3. Sieve of Eratosthenes
  4. Sieve Atkina
  5. Stack
  6. Stack frame

Source: https://habr.com/ru/post/165397/


All Articles