How to get on the road to OS development

This article serves one simple goal: to help a person who suddenly decided to develop his own operating system (in particular, the kernel) for the x86 architecture, reach the stage where he can just add his functionality without worrying about building, running and other poorly related to the development of the details. On the Internet and in Habre in particular, there are already materials on this topic, but it is rather difficult to write at least the “Hello world” core without opening dozens of tabs, which I will try to correct. Code examples will be mostly in C, but many other languages can also be adapted for OSDev. Long wished and just realized desire to develop the operating system from scratch - welcome under kat.

Theory

To understand what the role of the OS developer is, let's imagine what happens after pressing the PC power button.

First, the BIOS starts up and prepares the vital equipment, then loads the boot disk containing the code for the first part of the bootloader into the MBR. Only 446 bytes were allotted for the directly executable part, which is extremely insufficient, therefore few loaders really fit into these boundaries. In this regard, the bootloader is usually divided into two parts, and the only thing that the first part of the bootloader does is read from the disk and start the second part. The second part can already occupy at least the entire disk, and usually places the processor in protected mode, loads the core and modules into memory, and then transfers control to the core.

The kernel fully prepares the hardware and starts the first user processes, providing them and their descendants with runtime support.
')
Thus, the minimum kernel must be able to read programs from the disk, run them and continue to execute system calls. Also highly desirable output to the monitor and protection mechanisms.

Tools

Theoretically, development can be conducted on any OS, but most of the tools are designed for UNIX-like systems, and at least collecting them on Windows will already be a pain. Moreover, since WSL does not support kernel modules, mounting the disk image will not work, and you will have to configure communication between WSL and Windows. At this stage, it becomes easier to install a Linux virtual machine. The article will provide instructions for Linux and macOS.

For the full development cycle, you will need a code editor, an assembly system, a debugger with remote debugging support, a bootloader, a virtual machine and, preferably, a separate real machine for testing.

Bochs and QEMU are best suited to the place of the virtual machine, since they run quickly and provide the ability to debug the running kernel.

The loader some ~~masochum~~ especially ideological developers write themselves, but if we are talking about developing the operating system itself, it will be very boring to write the loader, as well as unnecessary, because there are ready-made solutions. Thanks to the Multiboot specification, you can write a kernel that will be loaded almost out of the box using, for example, GRUB or LILO.

With the build, everything is not so simple: you need a cross-compiler for x86. Why cross-compiler, if you build under the same architecture? The fact is that the standard compiler generates code based on the same OS on which it is running, or so-called. hosted code Hosted code uses system calls, interacts with other processes, but is tied to the operating system. Freestanding code exists by itself and requires only the equipment to run. The OS kernel refers to freestanding, and the programs it runs run to hosted. The cross compiler has enough corresponding flag, and the freestanding-code will be generated.

Training

Assembly tools

This part is easiest to produce from the command line. For convenience, you can create a separate directory for the assembly, which will be easy to delete after the assembly, as well as set several environment variables:

$ export TARGET=i686-elf $ export PREFIX=<  ->

$TARGET is the system under which the resulting compiler will build. Usually it is called like i686-linux-gnu , but here the result runs without the OS, so just the format of the executable file is indicated. Why i686, not i386? Just the architecture of 80386 has already, ahem, for many years, and since then much has changed; in particular, caches, multi-core and multiprocessor systems, embedded FPUs, “large” atomic instructions like CMPXCHG , so that collecting under i386 can be very slow in performance and a little to get in support of old computers.

$PREFIX is where the tools will be installed. Typically, paths like /usr/i686-elf , /usr/local/i686-elf and the like are used, but you can install it in an arbitrary folder. This directory is also called sysroot, since it will be the root directory for the cross compiler and utilities. More precisely, this is not a full path, but a prefix to the path; thus, for installation in the root, $ PREFIX will be an empty string, and not / . At GCC build time, you need to add $PREFIX/bin to your PATH .

If then the OS needs to be built for a different architecture, it will be enough to set other environment variables and copy the commands.

Binutils

Download and unpack the latest version from official FTP. Caution: minor versions have long gone over 10, as a result of which alphabetically sorting broke, to search for the current version, you can use sorting by last modified date. At the time of this writing, the current version of Binutils is 2.29.

Binutils does not support assembly in the directory with the source code, so we create a directory next to the unpacked code and go into it. Next, the usual build from source:

 $ ../binutils-2.29/configure --target=$TARGET --prefix="$PREFIX" --with-sysroot --disable-nls --disable-werror

More information about the parameters:

--with-sysroot - use sysroot;
--disable-nls - disable native language support. The OSDev community is not so large that any incomprehensible build error is necessarily found by a person who speaks the language of the person who had it;
--disable-werror - the compiler generates warnings when building Binutils, and with -Werror it causes the assembly to stop.

 $ make $ make install

Gcc

We also load, unpack and create a directory for the assembly. The assembly process is slightly different. Need a library of GMP, MPFR and MPC. They can be installed from the standard repositories of many package managers, or you can run the contrib/download_prerequisites script from the source code directory, which will download and use them when building. Configuration is performed as follows:

 $ ../gcc-7.2.0/configure --target=$TARGET --prefix="$PREFIX" --disable-nls --enable-languages=c,c++ --without-headers

--disable-nls is the same as for binutils;
--without-headers - do not assume that the target system will have a standard library (this, in fact, differs the compiler we need from the standard one);
--enable-languages=c,c++ - build compilers for selected languages only. Optionally, but significantly speeds up the assembly.

In the absence of the target OS, the usual make && make install will not work, since some GCC components are oriented towards a ready-made operating system, therefore we only assemble and install the necessary ones:

 $ make all-gcc all-target-libgcc $ make install-gcc install-target-libgcc

libgcc is the library that contains the internal functions of the compiler. The compiler may call them for some calculations, for example, for 64-bit division on a 32-bit platform.

Grub

On most Linux distributions, you can skip this section, since they already have the appropriate utilities for working with GRUB. For others, the OS will need to download and build it. You will also need a small objconv utility:

 $ git clone https://github.com/vertis/objconv.git $ cd objconv $ g++ -o objconv -O2 src/*.cpp

At the time of the GRUB build, you will need to add the newly compiled objconv
and cross-tools (i686-elf- *).

 $ cd ../grub $ ./autogen.sh $ mkdir ../build-grub $ cd ../build-grub $ ../grub-2.02/configure --disable-werror TARGET_CC=$TARGET-gcc TARGET_OBJCOPY=$TARGET-objcopy TARGET_STRIP=$TARGET-strip TARGET_NM=$TARGET-nm TARGET_RANLIB=$TARGET-ranlib --target=$TARGET $ make $ make install

GDB (for macOS)

The standard version of GDB does not know about ELF files, so when using GDB, you will need to rebuild it with their support. Download, unpack, build:

 $ mkdir build-gdb $ cd build-gdb $ ../gdb-8.0.1/configure --target=$TARGET --prefix="$PREFIX" $ make $ make install

Disk image

The process of creating such in different operating systems occurs in its own way, so here I will provide separate instructions.

For Linux (from the command line)

Create an empty file:

 $ dd if=/dev/zero of=disk.img bs=1048576 count=<  >

Create a partition table:

 $ fdisk disk.img Welcome to fdisk (util-linux 2.27.1). Changes will remain in memory only, until you decide to write them Be careful before using the write command. Device does not contain a recognized partition table. Created a new DOS disklabel with disk identifier 0x########. Command (m for help): n Partition type p primary (0 primary, 0 extended, 4 free) e extended (container for logical partitions) Select (default p): <Enter> Using default response p. Partition number (1-4, default 1): <Enter> First sector (2048-N, default 2048): <Enter> Last sector, +sectors or +size{K,M,G,T,P} (2048-N, default N): <Enter> Created a new partition 1 of type 'Linux' and of size N MiB. Command (m for help): t Selected partition 1 Partition type (type L to list all types): 0B Changed type of partition 'Linux' to 'W95 FAT32'. Command (m for help): a Selected partition 1 The bootable flag on partition 1 is enabled now. Command (m for help): w The partition table has been altered. Syncing disks.

Create a file system:

 $ losetup disk.img --show -f -o 1048576 #  <> $ mkfs.fat -F 32 <> $ mount <> < >

In the future it will be possible to mount by

 $ mount -o loop,offset=1048576 disk.img < >

Install the bootloader (here GRUB):

 $ grub-install --modules="part_msdos biosdisk fat multiboot configfile" --root-directory="< >" ./disk.img $ sync

For macOS (from the command line)

Create an empty file:

 $ dd if=/dev/zero of=disk.img bs=1048576 count=<  >

Partition Table:

 $ fdisk -e disk.img Would you like to initialize the partition table? [y] y fdisk:*1> edit 1 Partition id ('0' to disable) [0 - FF]: [0] (? for help) 0B Do you wish to edit in CHS mode? [n] n Partition offset [0 - n]: [63] 2047 Partition size [1 - n]: [n] <Enter> fdisk:*1> write fdisk: 1> quit

We separate the partition table and the only section:

 $ dd if=disk.img of=mbr.img bs=512 count=2047 $ dd if=disk.img of=fs.img bs=512 skip=2047

We connect the partition as a disk:

 $ hdiutil attach -nomount fs.img #  <>

Create a FS, here is FAT32:

 $ newfs_msdos -F 32 <>

Disable:

 $ hdiutil detach <>

“Glue” MBR and FS back:

 $ cat mbr.img fs.img > disk.img

Connect and remember the mount point (usually “/ Volumes / NO NAME”):

 $ hdiutil attach disk.img

Install the bootloader:

 $ /usr/local/sbin/grub-install --modules="part_msdos biosdisk fat multiboot configfile" --root-directory="< >" ./disk.img

The disk image is then quietly connected with the built-in system tools. You can, at your own discretion, create a directory hierarchy and customize the bootloader. For example, for GRUB you can create such a grub.cfg in / boot / grub:

 set default=0 set timeout=0 menuentry "BetterThanLinux" { multiboot ////.elf boot }

Setup of the assembly system

There are a lot of popular assembly systems in the world, so there will not be instructions for each one here, but I will describe the general points.

Assembler files are assembled into ELF object formats (32 bits):

 $ nasm -f elf -o file.o file.s

C-files are collected using the cross compiler with the -ffreestanding flag:

 $ i686-elf-gcc -c -ffreestanding -o file.o file.c

We use the same cross-compiler for linking, but specify a bit more information:

 $ i686-elf-gcc -T linker.ld -o file.elf -ffreestanding -nostdlib file1.o file2.o -lgcc

-ffreestanding - generate freestanding-code;
-nostdlib - do not include the standard library, since its implementation is hosted code and will be completely useless;
-lgcc - we connect the -lgcc described above. Its connection is always after the remaining object files, otherwise the linker will complain about unresolved links;
-T - since you need to place the Multiboot header somewhere, the usual ELF-file layout will not work. It can be changed using the linker script, which sets this flag. Here is a ready version of it:

 /*      */ ENTRY(_start) /*      */ SECTIONS { /*      1.     */ . = 1M; /*   Multiboot,    ,     */ .text BLOCK(4K) : ALIGN(4K) { *(.multiboot) *(.text) } /*  ( ) */ .rodata BLOCK(4K) : ALIGN(4K) { *(.rodata) } /*  (  , ) */ .data BLOCK(4K) : ALIGN(4K) { *(.data) } /*   (    , ) */ .bss BLOCK(4K) : ALIGN(4K) { *(COMMON) *(.bss) } /*    ,    */ }

Minimum core

Get control

We get control from the loader in a small assembler file:

 FLAGS equ 0 ;      MAGIC equ 0x1BADB002 ; 'magic number' lets bootloader find the header CHECKSUM equ -(MAGIC + FLAGS) ; checksum of above, to prove we are multiboot ;   section .multiboot align 4 dd MAGIC dd FLAGS dd CHECKSUM section .bss align 16 stack_bottom: resb 16384 ; 16 KiB stack_top: section .text global _start:function (_start.end - _start) _start: mov esp, stack_top ;   push ebx ;      extern kernel_main call kernel_main cli ;  -   ,   (,       ) .hang: hlt ;  jmp .hang ;   ,   .end:

Proof of work

In order to somehow see that the code is actually executed, you can display something on the screen. A full-fledged terminal driver is a big topic, but, briefly, at the address 0xB8000 there is a buffer for 2000 entries, each of which consists of attributes and a symbol. White text on a black background corresponds to the attribute byte 0x0F. Let's try to display something using a previously prepared line:

 #include <stddef.h> void kernel_main(void* multiboot_structure) { const char str[] = "H\x0F""e\x0Fl\x0Fl\x0Fo\x0F \x0Fw\x0Fo\x0Fr\x0Fl\x0F""d\x0F"; char* buf = (char*) 0xB8000; char c; for(size_t i = 0; c = str[i]; i++) { buf[i] = str[i]; } while(1); }

Launch

We copy the kernel into the disk image along the desired path, and after that any virtual machine must successfully load it.

Debugging

You can set the -s -S flags for debugging in QEMU. QEMU will wait for the debugger and turn on network debugging. It is also worth noting that debugging will not work when using an accelerator, so the --enable-kvm flag will have to be removed if it is used.

Bochs need to be built with --enable-gdb-stub , and in the config to include a line like gdbstub: enabled=1, port=1234, text_base=0, data_base=0, bss_base=0 .
In GDB, you can connect and start the machine in this way (kernel.elf is the kernel file):

 (gdb) file kernel.elf (gdb) target remote localhost:1234 (gdb) c

Everything else works the same way as always - breakpoints, memory reading, etc. You can also enable the debugger in the kernel itself, which will allow debugging on a real machine. You can write it yourself, but debugging errors in the debugger will bring a lot of joy. GNU distributes almost complete debuggers that require only a few functions from the kernel. For example, for i386 . However, while it is too early to do this, since there are still no necessary functions, such as installing an interrupt handler or receiving / sending data through the serial port.

Conclusion

At present, the following remains to the minimum working operating system:

Primitive debugging terminal ;
Global descriptor and interrupt table ;
PCI driver ;
Driver for IDE-controller (SATA-disks can work in IDE mode) and at least one FS;
Page addressing (unless a single-task OS without memory protection is planned, such as DOS);
Run custom code;
System calls;
Standard Library;
Compiler for the created OS .

Useful resources

OSDev wiki is a necessary theory;
OSDev forum - here (probably) will help in case of rare problems;
The little book about OS development - a pretty good squeeze of information on the topic;
JamesM's kernel development tutorials - a set of lessons on writing a kernel. Not without flaws ;
We write our own operating system - there is little theory, but you can see the finished implementation of some incomprehensible things.

Source: https://habr.com/ru/post/343690/

All Articles