Application KolibriOS. Part 2: Core Exposure for Iron Developers

Summer has already arrived outside the window, but we present you a continuation of a series of articles on the practical use of ColibriOS. In the first part, we conducted a theoretical review of possible applications, and now, as promised, we move on to a more practical part: the kernel kernel for developers.

I would like to say thanks art_zh for taking the time to prepare the material for this article.

Probably, few of our readers remember those old, old times, "when computers were big" and were practically not used for office work and entertainment. The main areas of application of these expensive monsters were furious computing, automated systems for collecting data, designing and managing complex technological processes.
')
Decades have passed, computers have become much cheaper and faster, but old tasks have not become easier. Requirements to the efficiency of processing broadband data streams, to the speed and reliability of control schemes, to the simplicity and visibility of the user interface are constantly increasing.

One of our developers, art_zh , an electronics engineer from England, came to the project after a long and unsuccessful search for the optimal OS for the "fast" technical vision on the x86 platform. A specific task required the processing of a wideband video stream (500 SXGA frames, 660 million pixels per second). From this avalanche of data, it was necessary to select several characteristic zones, find the average brightness in each zone and monitor the change in this brightness for 10-12 hours. Periodically displaying a full-format image on the screen.

The task was complicated by the irregularity of the iron used. The framegrabber and the DMA controller were implemented on an experimental PCI-express card with a Xilinx Virtex-5 engine. Severe Verilog-code required careful debugging of all packet transactions to the TLP-PCIe protocol. We will discuss this topic in more detail in one of the following publications of this cycle; here we confine ourselves to stating that the Windows environment seems to be completely unsuitable for such debugging, and in Linux the code modification cycle — driver compilation — restart — analysis of printk messages took an unacceptable long time.

So, we needed a fast, graphical, 32-bit operating system with an exo-card transparent for access from the application to the lowest level hardware resources — PCI configuration space, PCIe Root Complex registers, MMIO addresses (mapped to I / O memory), etc. .

KolibriOS attracted by its speed and minimal latency, simple and convenient graphical user interface, very compact and open source kernel code. The problem was with a limited set of functions for working with the PC hardware.

In the standard Hummingbird kernel, the application was only allowed access to a limited range of I / O ports, system messages about some hardware interrupts, and a very curved PCI service. In order to maximally speed up the work of programs with computer hardware, it was necessary to cut through the exodus in the monolithic core.

This is how Kolibri-A, the KolibriOS exo version for AMD Fusion based platforms, appeared. Since any ekzoyadro carries the risk of fatal damage to data and PC equipment due to incorrect access to critical system resources, and also considering that the vast majority of users of our system are ~~kettles,~~ inexperienced users trying new operating systems out of pure curiosity, Kolibri-A was initially positioned as a completely detached KolibriOS branch, intended only for qualified users who have an idea how and with what devices you can (and how you can) handle and carry they have full responsibility for the consequences of their actions.

First of all, the application was directly open access to

I / O to all ports (in the main branch of the kernel, the application still has to “order” the required range of I / O ports through the ugly Menuetov heritage — the so-called “fortieth functions”).
This was not enough. Now it is difficult to find a device that communicates with the processor exclusively through the ports. Convenient has been added
Access to the MMIO (memory mapped I / O space) using the 62:12 system. The service works like the Linux function ioremap (), but is implemented easier and more flexible. The application specifies the BAR register number in the configuration space of the selected device and selects the MMIO range to which it will apply, and the kernel maps the selected range to the linear space of the application.
But that's not all. The application must have full access to the configuration space of the selected device. The KolibriOS core PCI service was implemented through slow and uncomfortable CF8 / CFC ports. Quick access was added to A-version
PCI configuration space (including extended PCI Express configuration) with mapping to the virtual address space of the application starting at address 0xF0000000. Example of reading a PCI device header 1: 18: 2
mov eax , ( 1 shl 20 ) + ( 18 shl 15 ) + ( 2 shl 12 )
or eax , 0xF0000000
mov [ device_vendor ] , [ eax ]

It should be noted that the display of the configuration on memory is implemented only for platforms based on AMD-processors (by the way, this is what the name “Kolibri-A” picks up). For intel, exoservice has not yet been developed: art_zh works only with AMD hardware, and all other developers, unfortunately, do not show much interest in the exo-core.
Okay, the configuration space is open. Now we can identify devices, resolve interrupts, monitor the status of the PCI line and access any internal resources, controlling data flows and organizing direct memory access ... Although there is a problem here: the device knows only the physical addresses of the DMA region, and the application lives in its linear space. For this
Selected static area DMA. Using the special system 62: 12-DA, the application can request its linear address and work with it directly:
mcall 62 , 11 , 0x0500 ; init MMIO / DMA: bus = 5, device = 0, fn = 0
mov [ dma ] , eax ; store phys.addr of the DMA buffer:
mcall 62 , 0 + 12 , 4096 , 0 ; map MMIO access to BAR0-space
mov [ mio ] , eax ; store MMIO linear address
mcall 62 , 0xDA0C , 4096 , 0 ; 0x0da0c = create user DMA channel
mov [ mem ] , eax ; store DMA buffer: linear @
mov eax , [ dma ]
or eax , DMA_FLAGS
mov dword [ mio + 4 ] , eax ; program device-specific DMA-register
; ....
mov ecx , [ mem ]
mov eax , [ index ]
mov edx , dword [ ecx + eax * 4 ] ; load the fresh data from DMA-buffer

Now you can fully work with the device directly from the user application. It turned out a flexible and versatile tool that art_zh has been using in everyday work for 5 years to develop new hardware and debug low-level code of very different embedded x86 devices.
Unfortunately, having developed such a tool, the author switched to other tasks and rarely publishes new versions of his code (remaining in the project as an “advanced user” and an eternal beep on the forum). But despite long breaks, the work continues slowly. In addition to the above, today in Kolibri-A there are other exofics:
The organization of MSI interrupts can be organized through the LAPIC address space available to the user.
Physical memory can also be read from the application. This service is very useful when debugging a kernel.
Diskless OS loading was tested through CoreBoot (using Xvilka ) and via BIOS Extension ROM from onboard memory of PCIe devices.
The GPU space is open - for now only for installing hardware cursors and reverse engineering of AMD / ATI graphics processor resources. But who knows, maybe sometime in Kolibri there will be programs on a GPU-assembler ...

(to be continued)

Source: https://habr.com/ru/post/259215/

All Articles

Application KolibriOS. Part 2: Core Exposure for Iron Developers

More articles: