⬆️ ⬇️

Zabbix + Iostat: monitoring the disk subsystem

Zabbix + Iostat: monitoring the disk subsystem.

image

What for?

The disk subsystem is one of the important server subsystems and very much depends on the level of the load on the disk subsystem, for example, the speed of content delivery or how quickly the database will respond. This largely applies to mail or file servers, database servers. In general, disk performance indicators need to be monitored. Based on the performance graphs of the disk subsystem, we can decide on the need to increase capacity long before the rooster bites. And in general, it is useful to glance from release to release as the work of developers affects the level of workload.



Under the cut, on monitoring and how to configure.



Dependencies:

Monitoring is implemented through a zabbix agent and two utilities: awk and iostat (sysstat package). If awk comes in distributions by default, then iostat needs to be installed with the sysstat package (thanks to Sebastien Godard and co-workers).



Known limitations:

For monitoring, you need sysstat starting from version 9.1.2, since there is a very important change: "Added r_await and w_await fields to iostat's extended statistics". So you should be careful, in some distros, for example, CentOS has a slightly “stable” and less feature-free version of sysstat.

If you start from the version of zabbix (2.0 or 2.2), then the question is not fundamental, it works on both versions. At 1.8 it does not work. Low level discovery is used.

')

Opportunities (purely subjective, as the utility decreases):



In general, as you can see, all the metrics that are available in iostat are available here (those who are unfamiliar with this utility, I strongly recommend that you look into man iostat).



Available graphics:

Graphs are drawn per-device, LLD detects devices that fall under the regular expression "(xvd | sd | hd | vd) [az]", so if your disks have different names, you can easily make the appropriate changes. Such a regular schedule is made to detect devices that will be parental to other partitions, LVM volumes, MDRAID arrays, etc. In general so as not to collect too much. A little distracted, so the list of graphs:



Analogs and differences:

Zabbix has boxed options for similar monitoring, these are the keys vfs.dev.read and vfs.dev.write . They are good and work fine, but less informative than iostat. For example, iostat has metrics like latency and utilization.

There is also a similar pattern from Michael Noman. In my opinion, the only difference is one, it is sharpened on the old versions of iostat, well, and + small syntactic changes.



Where to get:

So, monitoring consists of a configuration file for the agent, two scripts for collecting / receiving data and a template for the web interface. All this is available in the repository on Github , so by any available means (git clone, wget, curl, etc ...) we download them to the machines we want to monitor and proceed to the next item.



How to setup:



Now everything is ready, we start the agent and go to the monitoring server and execute the command (do not forget to change the agent_ip):

# zabbix_get -s agent_ip -k iostat.discovery 


Thus, we check from the monitoring server that iostat.conf is loaded and gives information, at the same time we are watching that LLD works. JSON with the names of detected devices will be returned as an answer. If the answer has not come, then something did wrong.



There is also such a moment that the zabbix server does not wait for the execution of some item'ov by agents (iostat.collect). To do this, increase the timeout values.



How to configure the web interface:

Now the iostat-disk-utilization-template.xml template remains. Via the web interface, we import it into the templates section and assign it to our host. It's simple. Now it remains to wait about one hour, this time is set in the LLD rule (also configurable). Or you can glance in the Latest Data of the monitored host, in the Iostat section. As soon as the values ​​appeared there, you can go to the graphs section and observe the first data.



And finally, the top three screenshots of graphs c localhost))):

Directly data in Latest Data:

image



Responsiveness Graphics (Latency):

image



Recycling schedule and IOPS:

image



That's all, thank you for your attention.

Well, according to tradition, I take this opportunity to convey greetings to Fedorov Sergey (Alekseevich) :)

Source: https://habr.com/ru/post/220073/



All Articles