📜 ⬆️ ⬇️

Monitoring server health running VMware vSphere ESXi v5 hypervisor

There is a need to organize monitoring of the correct operation of the LSI MegaRAID family of controllers on servers running the VMware vSphere ESXi v5.5 hypervisor. And accordingly, to automatically receive notifications if there is any failure, for example, the failure of one of the HDDs. In the process of elaboration, it turned out that the solution found was not limited to only the data stores of the hypervisor.

At my disposal was a test server based on the Supermicro X9DR3 / iF motherboard with an LSI MegaRAID SAS 9260-4i controller, to which two HDDs were connected and RAID1 was configured.
Despite the fact that LSI MegaRAID SAS 9260-4i is officially supported in ESXi by going to the "Health Status" section of the VMware vSphere client, you cannot get any information about the RAID status:


Fortunately, this is fixable. We go to lsi.com and find an archive with “SMIS Provider” for the required controller:


Download, unpack and find the file with the extension "vib". This is a package that provides monitoring of the state of the controller using the built-in mechanism of ESXi sensors. Copy this vib to the server, connect to it via SSH and install:
esxcli software vib install -v /vmfs/volumes/datastore1/500.04.V0.53-0003.vib

')
We reboot the server, reconnect to it via SSH and make sure that the package is installed:
esxcli software vib list | grep -i lsi


Now in the “Health Status” section we can observe the status of the LSI MegaRAID controller:


But, of course, this is not enough for automatic monitoring. Because we don’t know about the crash until we start the VMware vSphere client. It is necessary to automate the procedure for polling sensors. To do this, use the “check_esxi_hardware.py” script, available at http://www.claudiokuenzler.com/nagios-plugins/check_esxi_hardware.php. Initially, it is an extension for Nagios. However, it is very universal, and it will not be difficult to connect it to any other monitoring system.
The script is written in the Python programming language, requires the PyWBEM library. Under Linux Debian and Ubuntu, it is installed via standard system repositories:
apt-get install python-pywbem

The syntax for running “check_esxi_hardware.py” is quite simple:
check_esxi_hardware.py -H XXX.YYY.WWW.ZZZ -U root -P XXXXXXXX

In response, you will receive a brief report on the health status of the server:
OK - Server: Supermicro X9DR3-F s/n: 0123456789 System BIOS: 3.0a 2013-07-31

You can make sure that the script polls the status of all sensors, including the LSI MegaRAID controller, by turning on the detailed information output:
check_esxi_hardware.py -H XXX.YYY.WWW.ZZZ -U root -P XXXXXXXX -v


The disadvantage of the script is that in order to poll the sensors, it must provide the authorization data of the hypervisor's administrator. This may not necessarily be root, but it must have the appropriate rights, otherwise it will not work to poll the sensors.
Let's try to simulate the failure of one of the HDD. We overload the server and go to the WebBIOS controller. Choose one of the hard drives:


Go to its properties:


And disable:


We load the hypervisor and in the VMware vSphere client we see that there is indeed a failure:


But that gives out "check_esxi_hardware.py":

Source: https://habr.com/ru/post/241605/


All Articles