Good afternoon, Habr!
Today I will tell you how to write your real web server on asm.
At once I will say that we will not use additional libraries like libc. And we will use what the core gives us.
Already the lazy one did not write such articles, - the server on perl, node.js, in my opinion there have even been attempts at php.
')
That's just the assembly language has not yet been - it means you need to fill in the blanks.
A bit of history
Once I needed to store small files (less than 1Kb), there were sooo many of them, I was afraid for ext3, and I decided to store all these files in one big one, and give through the web server, setting the get parameter in offset and length The file itself is in hex form.
The time was decent, I decided to pervert a little and write it on the asm.
So let's get started
We will write on FASM, because I like it, and I'm used to Intel syntax.
So, the standard procedure for creating elf:
format elf executable 3 entry _start segment readable writeable executable
Further some data for headings:
HTTP200 db "HTTP/1.1 200 OK", 0xD,0xA ; CTYPE db "Content-Type: application/octet-stream", 0xD,0xA ; CNAME db 'Content-Disposition: attachment; filename="BIGTABLE"',0xD,0xA,0xD,0xA ; SERVER db 'Server: Kylie',0xD,0xA ; KeepClose db 'Connection: close',0xD,0xA,0xD,0xA ; sendfile off_set dd 0x00 n_bytes dd 0x00
As well as the path to the largest file in which all the pictures are stored:
FILE1 db "/home/andrew/FILE.FBF",0
We define several constants for convenience:
IPPROTO_TCP equ 0x06 SOCK_STREAM equ 0x01 PF_INET equ 0x02 AF_INET equ 0x02
Let's connect the self-writing translation function from str to hex
include 'str2hex.asm'
The principle of operation of this function is simple:
We hammer in google.com.ua "ASCI Table", - we print, and we look at it ...
We notice that values ​​in ASCII from 0 - 9 correspond to values ​​from 30h to 39h
A values ​​from A to F in the range from 41h to 46h
The input parameter for the macro is the address of the buffer in esi (at this address is the string that needs to be translated from str to hex)
The macro just checks the ASCII character code and if it is more than 39h, then we work with A - F, if it is less or equal to it, then with 0 - 9
Here is his full code:
; esi,- id : ; eax - Macro STR2HEX4 { local str2hex,bin2hex, out_buff, func, result, nohex ;
PS The function is deprived of error handlers, so I hope you will correctly set the size-offset (note, the parameters are case-sensitive. Ie, A! = A, B =! B, etc.)
Also maximum size and maximum offset = 32 bits.
Understood, go further:
Now it's finally time to create a socket.
;
The socket is created, bind it to the address 0.0.0.0 (in common - INADDR_ANY) and port 8080 (because for 80m I have lighttpd running, and if I change to 80y, then in eax it returns 0 and an error occurs -EADDRINUSE telling that the port is already busy)
; binding push 16 ; socklen_t addrlen push ecx ; const struct sockaddr *my_addr push edi ; int sockfd mov eax, 102 ; socketcall() syscall mov ebx, 2 ; bind() = int call 2 mov ecx, esp ;
By the way about using INADDR_ANY. If you want to use localhost, or any other address, you should write it “vice versa”. Those.
localhost = 127.0.0.1 = 0x0100007F
habrahabr.ru = 212.24.43.44 = 2C2B18D4
The same applies to port numbers:
8080 = 901Fh
25 = 1900h
Of course, nothing prevents you from specifying ip something like this:
localhost db 127,0,0,1
habrahabr.ru db 212,24,43,44
etc.
Finally, we are starting to listen on the socket itself for accepting new connections:
push 1 ;
Now an important point. Since we work with processes, the parent process will wait for the return code from the child after the fork, and at the end of the child process, the parent will “think” that it is still there. Thus, zombies appear from the child processes. If we tell the parent that we will ignore these signals, no one will wait for anyone, and the zombies will not appear either:
mov eax,48 mov ebx,17 mov ecx,1 ; SIG_IGN int 0x80
Create a structure for accept and start accepting connections:
push 0x00 push 0x00 ; struct sockaddr *addr push edi ; int sockfd sock_accept: mov eax, 102 ; socketcall() syscall mov ebx, 5 ; accept() = int call 5 mov ecx, esp int 0x80 ;
If no errors have occurred and we are in this part of the code, then a new client has connected.
Create a process for processing:
mov eax,2 ;
Now we find out who we are here, fork or parent process:
test eax,eax jnz fork ; ( ) ; edi - accept descriptor ;
Everything! The “head” of our server is ready.
Next comes the code exclusively for the child process.
Send the status to the client 200 OK
mov eax, 4 ; write() syscall mov ebx, edi ; sockfd mov ecx, HTTP200 ; Send 200 Ok mov edx, 17 ; 17 characters in length int 0x80 ;
Also the type of content. "Application / octet-stream" - the most universal in this case
mov eax, 4 ; write() syscall mov ebx, edi ; sockfd mov ecx, CTYPE ; Content-type - 'application/octet-stream' mov edx, 40 ; 40 characters in length int 0x80 ; Call the kernel
Server Name:
mov eax, 4 ; write() syscall mov ebx, edi ; sockfd mov ecx, SERVER ; our string to send mov edx, 15 ; 15 characters in length int 0x80 ; Call the kernel
Since our server does not yet support Keep-Alive, we admit this:
mov eax, 4 ; write() syscall mov ebx, edi ; sockfd mov ecx, KeepClose ; Connection: Close mov edx, 21 ; 21 characters in length int 0x80 ; Call the kernel
Please note that you need to send 0xD 0xA twice at the end (we did this together with Connection: Close) and we can assume that the headers are done
Well, now we’ll actually find out which file the client wants to download. To do this, place a GET request in the buffer with a shift of 5 bytes to the left, thereby cutting off unnecessary information ('GET /'), leaving only a blank ID of 16 bytes in size.
Oh yeah, I'm all about id, id ... And what is he like? I decided to do everything simply by specifying the 32-bit value in the GET for the offset in the file, and immediately after it the 32-bit value equal to the file size.
Those. if the URL request looks like this:
127.0.0.1/00003F480000FFFFThat offset in the file is 00003F48 and the size of the requested data is 0000FFFF
mov esi,buffer ;
Now we need to open a large file, where the beginning of the file will be with a given offset:
Now just open it (the handle will be saved in eax):
; Open BIG file mov eax,5 mov ebx,FILE1 mov ecx, 2 int 0x80
Now for complete satisfaction, it's time to use the sendfile function.
As they say in the manuals:
It is more efficient to make it possible to read the data.
; Send [n_bytes] from BIGTABLE starting at [off_set] send_file: mov ecx,eax ; file descriptor from previous function mov eax,187 mov ebx,edi ; socket mov edx,off_set ; pointer mov esi,[n_bytes] ; int 0x80
As you understand the descriptor from eax, we copied to ecx for the function sendfile, without saving it in intermediate registers \ memory.
successHere, at one time, I did not sleep at night for a long time, because I could not understand why, after sending all the bytes, the file does not download completely, and a second before the full download, the browser writes “Network error” and does not save it. There were no errors in sendfile, I had to learn how to use chrome developer tools.
It turns out that after sending the file itself, the browser sends a header that the server should receive. No matter what kind of data there is, it can still be sent to / dev / null, but it’s very important that the server read it. Otherwise, the browser will find that something is wrong with the file. Why exactly this is done - 100% unknown to me. It seems to me that this is due to the possible absence of Content-Length in the headers, when the file should be accepted, and how much data the browser does not know. I would be grateful if someone will open the secret)))
So, accept the browser header:
We read from the address in edi, in the address buffer
; Read the header mov eax,3 mov ebx,edi mov ecx,buffer mov edx,1024 int 0x80
If the headers are not too large, then 1024 bytes would be enough
(If you do not use long cookies on this domain, etc.)
File closing and completion:
mov eax, 6 ; close() syscall mov ebx, edi ; The socket descriptor int 0x80 ; Call the kernel ; end to pcntl_fork () mov eax,1 xor ebx,ebx int 0x80
In general, the file can be kept open for some time in the parent, and used by the rest of the forks, to save time. But this is not quite the right option.
And the most important thing!
No external libraries!
root @ server: / home / andrew # ldd server
not a dynamic executable
Download link (you can check it works \ no, test it with an ab benchmark for example)))
http://ubuntuone.com/3yNexPG0yewlGnjNd6219WPS The code misses a lot of error checks, also in some pieces of code the stack is not cleared, the presence of some variables is selected manually (due to the lack of normal documentation), and in general the code does not claim to be the title of the “most clean”.
The server works well on multi-core systems (tested on Core I7 2600). It overtakes lighttpd on my server by statics almost 4 times, although I think that my lighttpd is simply not configured for multi-core.
What can be added quickly:
Well, for example, cgi for any language (php, perl, python), etc. It is also possible to remove the read from the file, and write the work with the file system as well as add virtual hosts. In general, everything is limited only by your imagination.