📜 ⬆️ ⬇️

I / O tracer in the Linux kernel

Few people know that in the Linux kernel there are unusual and very useful tools for debugging and testing. In this small article I want to share a description of the I / O tracer.

Introduction


The I / O tracer was developed at one time to test driver reliability ( https://landley.net/kdocs/ols/2003/ols2003-pages-524-530.pdf ), and later developed into the reverse engineering project of the nVidia video card programming protocol for the Nouveau project. However, in addition to this goal in the development process, a handy tool appeared not so much for testing as for debugging drivers.

The advantages of this method are obvious - debug printing is slow, moreover introducing undesirable delays in execution time, which can often lead to incorrect program operation (I am sure that you have come across such a thing that the debug version works, and the release does not, or vice versa). There are drawbacks too, like this:

')

How it works


Access to the iron registers is achieved by allocating windows in the address space, in which all write or read operations are translated into corresponding commands on the bus. The tracer is embedded in the __ioremap() function and will be called accordingly at any time when the driver requests a window selection.

As a result, the window request event and all subsequent calls to this window will be recorded in the internal log of the tracer, since any access will cause a Page Fault exception, and the kernel is forced to process it. Before writing, the request is decoded in such a way as to distinguish the operation (read or write), the length of the operand and its value.

Since a race condition is possible during manipulations with window page flags, event loss is possible. Therefore, the tracer forcibly turns off all the processor cores, except for one. Turning them back on is not recommended.

Usage example


First you need to enable the necessary module in the kernel configuration file using the option:
CONFIG_MMIOTRACE = y

By default, the tracer is not active. To activate it, you must run the following commands in the console:
 #  debugfs mount -t debugfs debugfs /sys/kernel/debug #   echo mmiotrace > /sys/kernel/debug/tracing/current_tracer #     cat /sys/kernel/debug/tracing/trace_pipe > mylog.txt & #    , -    ... #   echo nop > /sys/kernel/debug/tracing/current_tracer 


Practical guide


We take the High Socket UART driver on Intel's SoC as a basis. Run the following commands:
 mount -t debugfs none /sys/kernel/debug/ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer echo 1 > /sys/kernel/debug/tracing/tracing_on echo "0000:00:18.1" > "/sys/bus/pci/drivers/intel-lpss/unbind" echo "0000:00:18.1" > "/sys/bus/pci/drivers/intel-lpss/bind" 

The last two commands are needed to call again ->probe() driver.

The output to the log will look something like this:
[178.024544] in mmio_trace_init
[178.065507] mmiotrace: Disabling non-boot CPUs ...
[178.080558] Cannot set affinity for irq 169
[178.086384] smpboot: CPU 1 is now offline
[178.100507] mmiotrace: CPU1 is down.
[178.110047] smpboot: CPU 2 is now offline
[178.119028] mmiotrace: CPU2 is down.
[178.127165] smpboot: CPU 3 is now offline
[178.134891] mmiotrace: CPU3 is down.
[178.138912] mmiotrace: enabled.
[178.147690] in mmio_trace_start
[178.151974] mmiotrace: Unmapping ffffc900007ae000.
[178.160092] mmiotrace: Unmapping ffffc900007ac000.
[178.170893] mmiotrace: Unmapping ffffc900007aa000.
[178.178683] mmiotrace: ioremap _ * (0x9242e200, 0x100) = ffffc900007dc200
[178.186279] mmiotrace: ioremap _ * (0x9242e800, 0x800) = ffffc900007de800
[178.195682] idma64 idma64.9: Found Intel integrated DMA 64-bit
[178.207663] mmiotrace: ioremap _ * (0x9242e000, 0x200) = ffffc900007fa000
[178.217141] dw-apb-uart.9: ttyS1 at MMIO 0x9242e000 (irq = 5, base_baud = 115200) is a 16550A

Redirect output to file:
 cat /sys/kernel/debug/tracing/trace_pipe >/tmp/t & 

The file will have a non-zero length, let's take a look at its contents:
VERSION 20070824
UNMAP 178.157353 -1 0x0 0
UNMAP 178.165484 -1 0x0 0
UNMAP 178.176279 -1 0x0 0
MAP 178.186036 1 0x9242e200 0xffffc900007dc200 0x100 0x0 0
R 4 178.186064 1 0x9242e2fc 0x11 0x0 0
W 4 178.186067 1 0x9242e204 0x7 0x0 0
W 4 178.186069 1 0x9242e240 0x9242e000 0x0 0
W 4 178.186071 1 0x9242e244 0x0 0x0 0
R 4 178.186125 1 0x9242e200 0x3d090240 0x0 0
R 4 178.186160 1 0x9242e210 0x800 0x0 0
R 4 178.186164 1 0x9242e214 0x800 0x0 0
MAP 178.195481 2 0x9242e800 0xffffc900007de800 0x800 0x0 0
W 4 178.195510 2 0x9242eb98 0x0 0x0 0
W 4 178.195512 2 0x9242eb10 0x300 0x0 0
W 4 178.195515 2 0x9242eb18 0x300 0x0 0
W 4 178.195517 2 0x9242eb20 0x300 0x0 0
W 4 178.195519 2 0x9242eb28 0x300 0x0 0
W 4 178.195521 2 0x9242eb30 0x300 0x0 0
R 4 178.195525 2 0x9242eb98 0x0 0x0 0
MAP 178.216881 3 0x9242e000 0xffffc900007fa000 0x200 0x0 0
R 4 178.216924 1 0x9242e200 0x3d090240 0x0 0
W 4 178.216926 1 0x9242e200 0x3d090241 0x0 0
R 4 178.216931 1 0x9242e200 0x3d090241 0x0 0
W 4 178.216933 1 0x9242e200 0xbd090241 0x0 0
R 4 178.216940 3 0x9242e0f8 0x3331342a 0x0 0
R 4 178.216943 3 0x9242e0f4 0x43f32 0x0 0
W 4 178.229863 3 0x9242e010 0x0 0x0 0

The format is described in the documentation, and in principle is intuitive. We are interested in the MAP [ping], R [ead], W [rite] strings, which respectively determine the addresses of the I / O windows, the data of the read from the register and the record. Since it is not very convenient to track only numbers, I wrote a simple parser that converts addresses into register names.
R 178.216940 UCV 0x3331342a
R 178.216943 CPR 0x00043f32
W 178.229863 MCR 0x00

It is much more convenient to read this way (registers that are not directly related to UART, such as DMA and extensions, are removed from the output, in fact they are present as addresses for further processing)!

For verification, run the simple command stty -F /dev/ttyS1 921600 .
W 335.706380 FCR 0x01
W 335.706391 FCR 0x07
W 335.706393 FCR 0x00
R 335.706397 LSR 0x60
R 335.706401 RX 0x00
R 335.706405 IIR 0x01
R 335.706408 MSR 0x00
R 335.706413 LSR 0x60
R 335.706417 LSR 0x60
W 335.706419 IER 0x02
R 335.706422 LCR 0x00
R 335.706426 IIR 0x02
W 335.706428 IER 0x00
W 335.706431 IER 0x02
R 335.706435 LCR 0x00
R 335.706440 IIR 0x02
W 335.706442 IER 0x00
W 335.706535 LCR 0x03
W 335.706541 MCR 0x08
W 335.706543 IER 0x02
R 335.706547 LSR 0x60
R 335.706550 IIR 0x02
W 335.706552 IER 0x00
R 335.706561 IIR 0x01
R 335.706565 LSR 0x60
R 335.706568 RX 0x00
R 335.706572 IIR 0x01
R 335.706575 MSR 0x00
W 335.706754 IER 0x05
W 335.706757 LCR 0x93
W 335.706760 DLL 0x0c
W 335.706762 DLM 0x00
W 335.706764 LCR 0x13
W 335.706766 FCR 0x01
W 335.706768 FCR 0x81
W 335.706771 MCR 0x08
W 335.706778 MCR 0x0b
R 335.706850 LSR 0x60
W 335.706918 IER 0x05
W 335.706920 LCR 0x93
W 335.706923 DLL 0x01
W 335.706925 DLM 0x00
W 335.706927 LCR 0x13
W 335.706929 FCR 0x01
W 335.706931 FCR 0x81
W 335.706934 MCR 0x0b
R 335.707047 LSR 0x60
W 335.707071 IER 0x00
R 335.707078 LSR 0x60
W 335.707080 MCR 0x08
W 335.707083 IER 0x00
W 335.707189 MCR 0x00
R 335.707196 LCR 0x13
W 335.707198 LCR 0x13
W 335.707200 FCR 0x01
W 335.707203 FCR 0x07
W 335.707205 FCR 0x00
R 335.707221 RX 0x00
R 335.707268 IIR 0x01

So we clearly saw how the UART 8250 is programmed in the Linux kernel. Accordingly, it can be used in debugging the driver, when it is not entirely clear which way the algorithm is going, since it depends on the read data from the device.

If it fails


Everything is described in more detail in the documentation to which it makes sense to refer. Also note that due to its implementation, the tracer may not work. So, in particular, I discovered regression in new kernels.

Source: https://habr.com/ru/post/306856/


All Articles