📜 ⬆️ ⬇️

Algorithm: How to make a bug on the Linux kernel

My experience in the development and debugging of Parallels Virtuozzo Containers allowed me to summarize and formulate a wish list for the user problem description, which can significantly reduce the time to diagnose and solve the problem in the Linux kernel. Please note that with all the obviousness of some of the recommendations, many members of the open-source community still neglect them. The algorithm is a tackle.

  1. Description of the problem.
    The most important thing you have to do is to accurately describe the problem and the actions that lead to it. Any, even the smallest detail, can be important. To the same point can be attributed the question: "How often is the bug reproduced?".
  2. Kernel version
    It is better if you attach the output of the command to the bug:
    # uname -a
  3. Where did you get the core?
    The linux kernel is distributed both in binary form and in source codes. In the bug it is important to specify the exact version of the kernel and by whom it was assembled. If you assembled it yourself, you need to attach the config and you may also need a vmlinux or module. When crashing, the developers often match the disassembled code and the C code to determine the exact place where the bug occurred.
  4. Kernel logs.
    Perhaps for individual users this is the most difficult topic. The most reliable way to collect logs is the serial port and the second machine. At home, not everyone has such an opportunity, and serial ports have become rare.
    The next step was netconsole. With it, you only need to configure rsyslogd on one machine to collect logs via the network and register its address on all servers. This method may not work, especially if the bug is connected to the network.
    Netconsole has simplified setting up log collection from servers, but the problem of users remained relevant, and kdump was invented. At the moment it is the most powerful and at the same time the most unreliable mechanism. In the case of system crash, a memory image is saved to disk. Later, you can open it with the crash utility and find out the status of all processes, the kernel logs, the memory value at a given address, and so on. It is somewhat similar to the core file that applications postpone in the event of a crash.
  5. The version and name of the distribution and applications directly related to the problem.
    It only seems that the kernel lies under user applications, and their work depends on the behavior of the kernel. But it often happens that the behavior of the kernel depends on user applications.
  6. Network problems.
    Some packages do not reach the recipients, or vice versa, something extra leaks through the filters. In this case, it is useful to attach the tcpdump log on both sides.
  7. Sysrq
    This is such a magic key combination. Hold Alt and press Prt-Scr and the letter in sequence. If you press H, then a little help will appear in the kernel logs.
    The most common use case for sysrq is all possible freezes. First you should press L, which will print information about what each processor is doing at the moment. T - prints the status of each process. Also, using these keys, you can kill all processes except init, flush the data from memory to disk, reboot the machine, send the car to panic, show the state of the timers and memory, etc.
  8. Boot options
    You can read about kernel parameters in Documentation / kernel-parameters.txt. In case of a bug, I would recommend including the following:
    sysrq_always_enabled — sysrq magic keys are enabled
    debug - improves kernel logging
    earlyprintk - print the kernel log at an early stage. This option should be enabled if the machine on boot does not issue anything and hangs.
    I would also recommend removing the quiet option, which in some distributions is enabled by default. With it, the kernel does not print part of the logs.

Additional debugging tools.


  1. Kernel debug
    You probably noticed that in distributions most often there are two cores of the same version. One of them has a debug prefix. This kernel is a bit slower and contains a number of additional checks. For example: it fills the freed memory with a special template, checks that the kernel does not fall asleep with forbidden interrupts, tracks the order in which locks are taken, and so on.
    If your bug is stable, it will be useful to try to reproduce it on the debug core and provide logs from it.
  2. Tracer
    This item is more likely for very advanced users who want to deal with the problem themselves.
    The interface to this mechanism is fully implemented in debugfs:
    mount debugfs -t debugfs /debug
    cd /debug/tracing

    There are two types of tracers. The first is the events, the developers obviously in the code report them. And the second ftrace is a tracer, which shows in which order the kernel functions were called. All events from tracers can be filtered, turned on and off separately. In case events are disabled, they practically do not affect the performance of the kernel.

And in conclusion some useful tips.


  1. The application does not work correctly or freezes. Try using the strace utility, it shows all system calls with arguments and return code.
  2. The process is in a state of D "uninterruptible sleep". Most likely the process is in the core. You can try to look in / proc / [PID] / stack and / proc / [PID] / status.
  3. If you have a panic. View the log from the beginning. Often at the end of the log are error messages that are a consequence of the previous ones. Bug need to score on the first error.
  4. In order for the kernel to save memory dumps to disk, you need to load the kernel with the crashkernel = 128M option, install kexec-tools on the machine and start the kdump service. Dumps will be saved in / var / crash
  5. Kernel logs can be obtained from the core dump using the makedumpfile utility:
    makedumpfile —dump-dmesg vmcore log.txt

Links


Documentation / kdump / kdump.txt
Documentation / kernel-parameters.txt
Documentation / networking / netconsole.txt
Documentation / trace / ftrace.txt
tcpdumpt at wikipedia
strace at wikipedia

')

Source: https://habr.com/ru/post/125017/


All Articles