📜 ⬆️ ⬇️

Cacti: Monitoring SMART



Data recovery as a result of the "physical death" of the information carrier is not the most pleasant pastime in the life of any system administrator. Consequently, the longer hours to correct the situation, and we have a lot of data, it is better to prevent it - by organizing monitoring.

Hard and solid-state drives have built-in technology that allows you to make independent self-diagnostics - SMART, and we will work with it.
SMART is a technology for evaluating the state of a hard disk with built-in self-diagnostic equipment, as well as a mechanism for predicting the time of its failure.
')
SMART is not a panacea for all ills, but certain information and some planning, through the data obtained, can be done on time, thereby making life a little easier.

The source server is Linux CentOS 6, which functions as a file storage.

Linux contains a group of packages called Smartmontools , which are a set of utilities for checking the status of disks. Directly configuring a smartd daemon for regular testing and monitoring of SMART is beyond the scope of this article, those interested can independently find configuration examples on the Internet.

We are also interested in the graphical display of key parameters, the anomalous change of which can sometimes be seen only visually. And this will help us Cacti .

1) To begin with, we install the smartmontools package:
yum provides "smartctl*" # 1:smartmontools-5.43-1.el6.x86_64 : Tools for monitoring SMART capable hard disks # yum install smartmontools 

After installation, you can check the disk parameters:
 smartctl -a DEVICE 

As a rule, instead of “DEVICE” there will be something like / dev / sda (first disk), / dev / sdb (second).
Sometimes it is necessary to force SMART on the disk:
 smartctl -s on DEVICE 

Currently, SMART technology is not available for USB drives, despite the fact that the drive inside the box can give the necessary information.

Getting SMART requires certain privileges, they are usually absent from the SNMPd daemon, so the main work on collecting intermediate data will be via the CRON script :
 cat /etc/snmp/smart-cronjob # where to keep the files STORE=/tmp/snmp # update smart parameters for devfull in /dev/sd?; do dev=`basename $devfull` /usr/sbin/smartctl -n idle -a $devfull >$STORE/smart-$dev.TMP mv $STORE/smart-$dev.TMP $STORE/smart-$dev done 

Do not forget to create the directory / tmp / snmp and place the launch in CRON
 cat /etc/crontab |grep smart */5 * * * * root /etc/snmp/smart-cronjob 

This code is looking for devices "/ dev / sd?" (/ dev / sda, / dev / sdb, etc.) and tries to write their SMART data to a file.
 #ll /dev/sd? brw-rw---- 1 root disk 8, 0 Oct 25 14:42 /dev/sda brw-rw---- 1 root disk 8, 16 Oct 25 14:42 /dev/sdb brw-rw---- 1 root disk 8, 32 Oct 25 14:42 /dev/sdc brw-rw---- 1 root disk 8, 48 Oct 25 14:42 /dev/sdd brw-rw---- 1 root disk 8, 64 Oct 25 14:42 /dev/sde brw-rw---- 1 root disk 8, 80 Oct 25 14:42 /dev/sdf 

All SMART parameters have a sequence number. The corresponding numbering will be used in the PERL script, instead of considering each parameter by itself.


Perl script to retrieve SMART parameters for SNMP (smart-generic)

We expose the script to execute:
 chmod +  /etc/snmp/smart-generic 

This script takes the SMART parameter number and displays the difference between the current value and the trashhold (threshold) for this parameter.

It should be noted that, depending on the manufacturer, model or firmware version, trashholds may be different. Therefore, the value itself does not represent a great interest, but unusual fluctuations or downward trends should be taken into account. For temperature, as a rule, you need to get an instantaneous value.

Add snmpd.conf lines (or others if you want to control “your” indicators. You can get a general idea of ​​the parameters in Wikipedia ):
 extend smartdevices /etc/snmp/smart-generic devices extend smartdescriptions /etc/snmp/smart-generic description extend smart1 /etc/snmp/smart-generic 1 extend smart3 /etc/snmp/smart-generic 3 extend smart4 /etc/snmp/smart-generic 4 extend smart5 /etc/snmp/smart-generic 5 extend smart7 /etc/snmp/smart-generic 7 extend smart9 /etc/snmp/smart-generic 9 extend smart10 /etc/snmp/smart-generic 10 extend smart12 /etc/snmp/smart-generic 12 extend smart170 /etc/snmp/smart-generic 170 extend smart171 /etc/snmp/smart-generic 171 extend smart172 /etc/snmp/smart-generic 172 extend smart177 /etc/snmp/smart-generic 177 extend smart178 /etc/snmp/smart-generic 178 extend smart179 /etc/snmp/smart-generic 179 extend smart180 /etc/snmp/smart-generic 180 extend smart181 /etc/snmp/smart-generic 181 extend smart182 /etc/snmp/smart-generic 182 extend smart183 /etc/snmp/smart-generic 183 extend smart184 /etc/snmp/smart-generic 184 extend smart187 /etc/snmp/smart-generic 187 extend smart189 /etc/snmp/smart-generic 189 extend smartR190 /etc/snmp/smart-generic R190 extend smartR194 /etc/snmp/smart-generic R194 extend smart195 /etc/snmp/smart-generic 195 extend smart196 /etc/snmp/smart-generic 196 extend smart199 /etc/snmp/smart-generic 199 extend smart203 /etc/snmp/smart-generic 203 extend smart226 /etc/snmp/smart-generic 226 extend smart230 /etc/snmp/smart-generic 230 extend smart231 /etc/snmp/smart-generic 231 extend smart232 /etc/snmp/smart-generic 232 extend smart233 /etc/snmp/smart-generic 233 extend smart235 /etc/snmp/smart-generic 235 extend smart241 /etc/snmp/smart-generic 241 


Restart the daemon and check the output (you need to substitute your "SNMP_Community" and "host_name"):
 snmpwalk -v2c -c SNMP_Community host_name NET-SNMP-EXTEND-MIB::nsExtendOutLine NET-SNMP-EXTEND-MIB::nsExtendOutLine."smart1".1 = STRING: 101 NET-SNMP-EXTEND-MIB::nsExtendOutLine."smart1".2 = STRING: 101 ... NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".1 = STRING: sda NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdevices".2 = STRING: sdb NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdescriptions".1 = STRING: WDC WD2003FYYS-... [01.01D01] 2 TB NET-SNMP-EXTEND-MIB::nsExtendOutLine."smartdescriptions".2 = STRING: WDC WD2003FYYS-... [01.01D01] 2 TB 

2. Cacti Templates

The archive contains the main template containing all the parameters on the same graph. In a separate graph displayed temperature.

To work, you must move the mapping file disk_smart.xml to / resource / snmp_queries /
And import cacti_host_template_smart_parameters.xml into a cactus.



If everything is configured correctly, the status of the polled objects should appear when the “Disk SMART Parameters” is added to the Associated Data Queries device.

Do not forget to choose the right template when creating graphs directly. By default, graphics are built for SSD drives (AGILITY3), which is not entirely correct.


3. Examples of graphs:





The article is based on the material SMART stats on Cacti (via SNMP)

Source: https://habr.com/ru/post/173443/


All Articles