For more than six months, I spent investigating a watchdog timer on IBM / Lenovo servers running Linux with IBM hardware and software support. The beginning of this detective story was described in my article
SLES 12, the watchdog timer and IBM / Lenovo servers . Now, it seems, the situation has been clarified, and constructive recommendations can be given to happy owners of the IBM / Lenovo xSeries hardware.
So, at first we repeat the brief educational program from the previous article. The server and industrial platforms have a special scheme - a watchdog timer. When activated, it starts to count down the specified time (for example, one minute). If during this time it is not re-addressed, then at the end of the interval the hardware will be executed. If you turn, the interval begins to re-count. This is necessary in order to automatically restore the computer in the event of an operating system freezing or providing some important software service. Such a solution is mandatory applied in high availability (HA) clusters and other applications that require constant system availability. For computers with Intel architecture, several watchdog timer hardware interfaces are used, depending on the system manufacturer, of which Intel TCO (iTCO) is the most common. In Linux, watchdog drivers are implemented as kernel modules that provide a programming interface to it in the form of a / dev / watchdog device.
The description of well-known things is complete, further facts are not well reflected in the Internet and are not very well known even to technical support of hardware and software manufacturers.
It is generally accepted that in hardware with Intel chipsets, including IBM's Intel servers, which are now manufactured by Lenovo, the Intel TCO hardware level and the iTCO_wdt Linux kernel module that supports it is responsible for the watchdog interface. Here it should be noted that, upon careful consideration, the architecture of the Intel TCO itself has a rather significant drawback, namely, it turns out that the processor
controls itself . Although theoretically nothing should prevent the program operating in the SMM mode to always do its work, but in fact, theoretically, the operating system should not hang, right? Therefore, the presence of a single hardware vulnerability for the processor as an executor of programs and for its own watchdog does not look very good if you are going to build a system with increased reliability.
')
However, I probably would never have gone into these details and would not even have known about them, if it were not for the fact that the iTCO_wdt driver was completely unworkable on IBM servers under SLES 12: the driver is loaded into memory, but the device / dev / watchdog is not created, but a small, inconspicuous message remains in the system log: “iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware / BIOS”.
At first, I thought it was a regression in SLES 12 compared to SLES 11, since the device / dev / watchdog was available in SLES 11. However, thanks to the interaction with IBM and SUSE, it turned out that everything is much worse. It turns out that in SLES 11, unlike SLES 12, the entry in the / dev / watchdog directory creates the kernel itself when it boots, and the watchdog driver simply clings to this entry. Therefore, in SLES 11, the iTCO watchdog timer is just as inoperable as in SLES 12, but this is much more difficult to notice, since its inoperability is masked by the presence of non-functional / dev / watchdog.
I think it is unnecessary to add that no manipulations with the BIOS settings, IMM, AMM and other great things that the xSeries has in abundance have any effect on the performance of the Intel TCO.
Fortunately, after more than six months of active work with IBM technical support for hardware and software, IBMers managed to discover
one ancient manuscript dated 2008 . It turns out that Intel has another architecture for working with a watchdog timer - IPMI watchdog, which is supported on the xSeries platform.
The essence of IPMI (Intelligent Platform Management Interface) is completely different than iTCO. In accordance with the IPMI architecture, somewhere on the motherboard there is a special controller - in fact, a separate computer - with its own processor, software, network interface and other gadgets, designed to track the operating parameters of the main computer equipment and have the ability to respond to their change in a given way. In terms of the description of the IPMI interface, this controller is called the BMC (Baseboard Management Controller) or simply MC. In IBM / Lenovo terminology, the device implementing its function is called the IMM (Integrated Management Module) or IMM2. The BMC can do many different things, which are described in the mentioned manuscript, but for us now it is essential that one of its functions is the watchdog timer. It is clear that the IPMI watchdog timer is an honest, separate device from the Intel processor, which, in general, works independently until the motherboard as a whole has failed.
The description of the work with the watchdog timer in the manuscript is made in the genre of the authors' commentary to the instruction MIGR-5069505 that did not reach us, and is based on the material of outdated software versions and their not always relevant features. But it is quite possible to understand what is at stake, and a brief, actualized content of this secret knowledge is presented below.
A pleasant surprise is that IPMI support is integrated into modern Linux distributions. This support itself consists of several components, of which we will be interested in three.
First of all, this is the ipmi.service service, which provides an opportunity for program communication with the BMC. In SLES 12, this service is installed and starts automatically. This can be verified as follows:
systemctl status ipmi
and, if necessary, further, as usual:
systemctl start ipmi
systemctl enable ipmi
Secondly, this is the IPMI watchdog driver itself, which is called: ipmi_watchdog. It is installed automatically, but does not automatically start (apparently, it is believed that the administrator must be sure of the equipment settings before allowing its hardware reboot by timeout). You can download this driver manually with the command:
modprobe ipmi_watchdog
You can enable its automatic loading at system startup by creating the ipmi_watchdog.conf file in the /etc/modules-load.d directory, consisting of one line “ipmi_watchdog”:
echo ipmi_watchdog> /etc/modules-load.d/ipmi_watchdog.conf
Thirdly, this is the ipmitool utility, which is installed automatically and allows you to run various BMC commands, including, for example, check the status of the watchdog timer:
ipmitool mc watchdog get
If you have a BMC in your system, in response to the specified command you will receive something like:
Watchdog Timer Use: SMS / OS (0x04)
Watchdog Timer Is: Stopped
Watchdog Timer Actions: No action (0x00)
Pre-timeout interval: 0 seconds
Timer Expiration Flags: 0x00
Initial Countdown: 300 sec
Present Countdown: 300 sec
If you run, for example, a high availability cluster, then it will configure the correct parameters for the watchdog timer (for example, in my system it is a period of 5 seconds and the Hard reset action).
Unfortunately, even the correctly installed ipmi service and the ipmi_watchdog driver and the presence of the / dev / watchdog file still do not guarantee that everything works as it should. What's the matter? It turns out that some versions of SLES 12 have the ugly habit to download softdog driver on their own initiative, trying to emulate the watchdog timer programmatically (the occupation is absolutely meaningless and harmful). And since the softdog is loaded to ipmi_watchdog, the latter, without the ability to create the already created file / dev / watchdog, does not do anything by tradition, modestly mumbling something into the bowels of the system log. Therefore, our last task is to look for a dog, giving the command
lsmod | grep dog
and analyzing its result. If we see ipmi_watchdog there and not see the softdog, then, most likely, everything works correctly for us. If there is a softdog there, then it must somehow get rid of the system, which in some versions of SLES 12 may not be quite a trivial matter.
I assume that the IPMI watchdog timer functionality on IBM / Lenovo hardware may be related to the value of the OSWatchdog parameter set in the IMM module using the web interface or the asu utility (asu64). This parameter can be set to a certain number of minutes or be turned off. I have it turned on at 2.5 minutes (minimum value), but this does not affect the watchdog interval programmed in the BMC.
So, the summary. The correct way to use the watchdog timer on the IBM / Lenovo platform may seem like a softdog, Intel TCO or IPMI, but, in fact, only IPMI is efficient. The IPMI watchdog driver is installed in SLES automatically, but requires manual prescription of the download. The softdog driver installs automatically and sometimes requires manual disabling of the download. The Intel TCO driver is installed and loaded automatically, but it has absolutely no effect, as it is completely inoperative on this platform.
I hope that this article will help someone a little more to understand the difficult task of organizing high-availability systems under Linux.