📜 ⬆️ ⬇️

File formats for programs on FASM under Windows

When you create a program in assembler (for example, FASM will be shown) from under Windows OS, the question arises about which file format to choose.
To determine the format of the executable file being created, the “format” directive is used with the format identifier following it.
Under the cut there is a brief description of the COM and EXE programs of the MZ and PE formats with a program template (in the form of the traditional “Hello World!”).

The default format is a simple binary file, it can also be selected with the “format binary” directive, which forms programs like .COM.
“Use16” and “use32” instruct the assembler to generate a 16-bit or 32-bit code, ignoring the default setting for the selected output format. "Use64" includes generating code for the long mode of x86 processors.
The following describes the different output formats with directives specific to them.

.COM programs

Programs like .com after loading into memory are an unmodified representation of a program in machine language on a disk. The .com format is one of the simplest executable file formats on the x86 architecture. The size of the .com file is limited to 1 segment and is 64 KB, all data must be defined in the same code segment. When a COM program starts working, all segment registers contain the address of the program segment prefix (PSP), a 256-byte (100h) block, which is reserved by the DOS operating system immediately before the COM or EXE program in memory. Since the addressing starts at an offset of 100h from the beginning of the PSP, the ORG 100h directive is encoded in the program. This directive sets the relative address to start the program. The program loader uses this address for the command index.

An example of a simple program in .COM format:
')
use16 ; 16-  org 100h ;    100h mov dx,hello ; DX  . mov ah,9 ;  DOS. int 21h ;   DOS. mov ax,4C00h ;  AH  4Ch,  AL – 00h. int 21h ;  ;------------------------------------------------------- hello db 'Hello, world!$' 

The "use 16" directive indicates the generation of a 16-bit code. “Org 100h” declares the skip 256 bytes (addresses 0000h - 00FFh). The specified addresses are reserved for service data (PSP).
Next come the commands. The address of the string hello is placed in the DX register. Then function number 9 of interrupt 21h is called to display a string on the screen.
The program is terminated by calling function 4C with the same interrupt parameter 21h.
The hello line ends with a '$' character, which in DOS indicates the end of the line.

It should be remembered that programs like COM are not supported by 64-bit Windows operating systems. To run such programs under these operating systems, you should use the DOSBox program, or use the PE format described below.

MZ format

MZ is the standard format for 16-bit executable files with the .EXE extension for DOS. Named so on the signature - ASCII-characters MZ (4D 5A) in the first two bytes.

An example of a simple program using the MZ format:

 format MZ ;  DOS EXE (MZ EXE) entry code_seg:start ;    stack 200h ;  ;-------------------------------------------------------------------- segment data_seg ;C  hello db 'Hello, asmworld!$' ; ;-------------------------------------------------------------------- segment code_seg ;  start: ;    mov ax,data_seg ;  DS mov ds,ax mov ah,09h mov dx,hello ;  int 21h mov ax,4C00h int 21h ;  


To create a need to use the directive "format MZ". The default code for this format is 16-bit.
A “segment” defines a new segment, followed by a label, whose value will be the number of the segment being defined. Optionally, this directive can be followed by “use16” or “use32” to indicate the bitness of the code in the segment. The beginning of the segment is aligned to the paragraph (16 bytes). All labels defined below will have values ​​relative to the beginning of this segment. In the example above, 2 segments are declared: “data_seg” and “code_seg”.
“Entry” sets the entry point for the MZ format, followed by the far address (segment name, colon and offset in segment) of the desired entry point. In our case, the “start” label is declared.
"Stack" sets the stack for MZ. The directive may be followed by a numeric expression indicating the size of the stack for automatic creation, or the far address of the initial stack frame if you want to set the stack manually. If the stack is not defined, it will be created with a default size of 4096 bytes.
The “heap” followed by its value determines the maximum amount of additional space in paragraphs (this is the place in addition to the stack and for unspecified data). Use "heap 0" to always allocate only the memory that the program really needs.

The MZ format, similar to COM programs, is not supported by 64-bit Windows operating systems.

PE format

PE is short for Portable Executable, i.e. portable (universal) executable file. This format appeared in the later times of Windows 3.11, but the present spread was with the flourishing of Windows 95. We can say that now on computers with Windows 9x / 2K / XP / Vista / 7 there are 95% of executables (exe, dll, drivers (sys) a) files - these are PE files.

To select the PE format, you need to use the “format PE” directive, it can be followed by additional format settings: “console”, “GUI” or the “native” operator to select the target subsystem (the floating point value that follows the version of the subsystem can follow ), “DLL” marks the output file as a dynamic linking library. Then the “at” operator and a numeric expression indicating the base of the PE image can be followed, and the “on” operator is optionally followed by a string in quotes containing the name of the file that selects the MZ stub for the PE program (if the specified file is not in the MZ format, then it is treated as a simple binary executable file and converted to MZ format). The default code for this format is 32-bit.

An example of a PE format declaration with all properties:
format PE GUI 4.0 DLL at 7000000h on 'stub.exe'
A “section” defines a new section, it must be followed by a string in quotes that defines the name of the section, and then one or more section flags may follow. Possible flags are: “code”, “data”, “readable”, “writeable”, “executable”, “shareable”, “discardable”, “notpageable”. The beginning of the section is aligned to the page (4096 bytes).

Example of declaring a PE section:
section '.text' code readable executable
Together with the flags, one of the special data identifiers PE can also be defined, marking the entire section as special data, possible identifiers: "export", "import", "resource" and "fixups". If a section is marked to contain address settings, they are generated automatically, and no more data is needed to be determined. Resource data can also be generated automatically from resource files, this can be achieved by writing the “from” operator and the file name in quotes after the “resourse” identifier.

Below you can see examples of sections containing some special data:
section '.reloc' data discardable fixups
section '.rsrc' data readable resource from 'my.res'
“Entry” creates an entry point for PE, followed by the entry point value.
“Stack” sets the stack size for PE, then the value of the reserved stack size must follow, optionally a comma separated value of the beginning of the stack can follow. If the stack is not defined, it is assigned a default size of 4096 bytes.
"Heap" selects the size of the additional space for PE, then the value for the reserved space for it should follow, optionally, there can also be the value of its beginning, separated by a comma. If additional space is not defined, it is set to 65,536 bytes by default; if its beginning is not specified, it is set equal to 0.
“Data” begins defining the special data PE, the directive must be followed by one of the data identifiers (export, import, resource or fixups) or the number of the data record in the PE header. The data should be defined on the following lines and end with the “end data” directive. If the definition of address settings is selected, they are generated automatically, and no more data needs to be determined. The same applies to resources, if the “resourse” identifier is followed by the “from” operator and the file name in quotes - in this case, the data is taken from this resource file.

An example of a simple program using the PE format:

 format PE console ;  Windows EXE entry start ;    include 'win32a.inc' section '.text' code executable start: push hello call [printf] push 0 ccall [getchar] call [ExitProcess] section '.rdata' data readable hello db 'Hello World!', 0 section '.idata' data readable import library kernel32, 'kernel32.dll', \ msvcrt, 'msvcrt.dll' import kernel32, ExitProcess, 'ExitProcess' import msvcrt, printf, 'printf', getchar,'_fgetchar' 

In this example, WinAPI functions are used to work with the console.

This was a brief (I hope, for someone useful) review of the use of PE and MZ formats. Overboard this article were ELF and COFF, for which I ask you not to judge much.

Source: https://habr.com/ru/post/257551/


All Articles