DMA for beginners or what you need to know

Hello everyone, today we will talk about DMA: about the technology that helps your computer play music for you, display an image on the screen, record information on a hard disk, and at the same time provide a meager load to the central processor.

DMA, what is it? What are you talking about?

DMA, or Direct Memory Access is a direct memory access technology, bypassing the central processor. In the era of the 486th and the first Pentium, the ISA bus reigned throughout, as well as the PIO (Programmed Input / Output) data exchange method between devices.

PIO is inherently simple: to get data from a device, the operating system driver (or the firmware of another device) had to read this data from the device registers. Let's look at an example:

1500 data bytes came to the network card.
The network card initiates an interrupt in order to inform the processor that the data must be collected from the device, otherwise a so-called buffer overrun will occur.
The operating system catches the interrupt from the interrupt controller and gives it to the driver for processing.
The driver in a cycle reads the data from the network card registers byte-by-byte.

As a result, if reading one byte takes about 1 ms of processor time, then reading 1,500 bytes is 1,500 ms, respectively. But this is just one Ethernet packet, imagine how many packets a network card receives when you read your favorite habrahabr. Of course, in reality, reading in PIO mode can be organized by 2, 4 bytes, but performance losses will still be catastrophic.
')
When the volumes of data with which the processor operates began to increase, it became clear that it was necessary to minimize the participation of the processor in the data exchange chain, otherwise it would be difficult. And here then the technology of direct access to memory found active application.

By the way, DMA is used not only for data exchange between the device and RAM, but also between devices in the system, DMA transfer is possible between two sections of RAM (although this maneuver is not applicable to x86 architecture). Also in its Cell processor, IBM uses DMA as the primary mechanism for exchanging data between synergistic processor elements (SPE) and central processing element (PPE). Also, each SPE and PPE can exchange data via DMA with RAM. This technique is in fact a great advantage of Cell, because it eliminates the problems of cache coherence in multiprocessing data processing.

And again the theory

Before we move on to practice, I would like to highlight several important aspects of programming PCI, PCI-E devices.

I casually mentioned the device registers, but how does the CPU have access to them? As many of you know, there is such an entity in computer technology as IO ports (Input / Output ports). They are designed to exchange information between the central processor and peripheral devices, and access to them is possible with the help of special assembly instructions - in / out. The BIOS (or OpenFirmware on PPC based systems) in the early stages of PCI device initialization, as well as some others (Super IO controller, PS / 2 controller, ACPI timer, etc.) assigns its own IO port range to a specific controller, where and the device registers are displayed.

Also, device registers can be displayed in RAM (Memory Mapped Registers), i.e. to the physical address space. This method has several advantages, namely:

The access speed to physical memory is higher than to IO ports.
IO ports can display no more than 65535 bytes of registers, while the size of the RAM of modern computers is many times larger.
Reading device registers from RAM is easier than using IO ports :)

Data on which range of IO ports or RAM is assigned to the device is stored in the PCI configuration space, namely, in the registers BAR0, BAR1, BAR2, BAR4, BAR5 [1].

So, there are two DMA disposal methods: contiguous DMA and scatter / gather DMA.

Contiguous dma

This method is very simple and is now almost outlived, however, it is still used to program sound controllers (for example, Envy24HT). Its principle is as follows:

One buffer is allocated large enough in RAM.
The physical address (more precisely, the address on the memory bus, because the physical address and bus address are equal in the x86 architecture, but not equal in the PPC) of this buffer is written to the device register.
As data arrives at the device, the device controller initiates a DMA transfer.
After the buffer is full, the device controller initiates an interrupt to inform the CPU that the buffer should be transferred to the operating system.
The operating system driver processes the interrupt and transfers the received data from the buffer, then through the stack of the operating system devices.

As you can see, everything is quite simple, and as soon as the ISA bus acquired DMA support, this method was widely used. For example, network card drivers had two such DMA buffers: one to receive data (rx), the other to send (tx).

Scatter / gather DMA

With the increasing speed of Ethernet adapters, contiguous DMA has shown its inconsistency. Mainly due to the fact that the required memory areas are large enough, which sometimes could not be identified, as in modern systems, the fragmentation of physical memory is quite high. The mechanism of virtual memory is to blame for everything, without which nowadays is nowhere :)

The decision suggests itself: to use several instead of one large chunk of memory, but in different regions of this very memory. The question arises, but how to inform the device controller, how to initiate a DMA transfer and at what address to write data? And then we found a solution to use descriptors to describe each such section in RAM.

A typical DMA buffer descriptor contains the following fields:

The address of the section of RAM (bus address), which is intended for DMA transfer.
The size of the described area of RAM.
Optional flags and other specific arguments.
The address of the next handle in memory.

The structure of the descriptors is determined by the specific manufacturer of the device controller, and may contain any other fields. The descriptor, like the DMA buffer, is located in RAM.

The scatter / gather DMA algorithm is as follows:

The operating system driver allocates and initializes DMA buffer descriptors.
The driver allocates DMA buffers (portions of RAM for DMA transfer) and writes the necessary information about them into descriptors.
The device, as the need arises, fills the DMA buffers, and after filling one or more buffers, initiates an interrupt.
The OS driver scans all DMA buffer descriptors, determines which ones have been filled by the device controller, forwards the data from the buffer further down the device stack and marks the buffer as ready for DMA transfer.

The order in which the device controller fills the DMA buffers is determined by the manufacturer. The controller can write to the first free DMA buffer, or simply write in a row (DMA buffer descriptors in this case form a simply connected ring list) to all buffers, etc.

Stop...

For today, perhaps all, otherwise the information will be too much. In the next article, I will show you how IOKit works with this street magic. Waiting for feedback and additions;)

Links

[1] PCI Local Bus Specification

Source: https://habr.com/ru/post/37455/

All Articles