Monitoring
(from lat. monitor - the one who reminds, warns) - a comprehensive system of regulated periodic observations, assessing and forecasting changes in the state of the environment in order to detect negative changes and develop recommendations for their elimination or attenuation.Introduction and common words
On Habré, there are surprisingly few articles on zenoss, although this monitoring system is completely ahead of most of its competitors in terms of functionality. The description of the system itself can always be found on the official
Zenoss website.
Why do I think Zenoss is ahead of the competition? Because it is easy to configure and receive the systems of observation, alerts and response to alerts that you need. Zenoss intelligently combines flexibility, functionality and complexity. The latter grows in proportion to your requests: keeping track of your hard disk space is easier, easier, and monitoring hardware and software systems using your own metrics is much more difficult.
')
The network has many guides on the initial setup of the system. Having passed on these guides you can easily start monitoring your servers or communication equipment by basic parameters.
I prefer to use zenoss with snmp servers on hardware. SNMP server is easy to configure (simple in the name refers not only to the protocol itself;), it is reliable, it allows you to mark the basic rights of access to the information provided, and in general - common software. In addition, snmp, in one form or another, is supported by most of the hardware.
Monitoring systems with their agents make me somewhat nervous: you never know what is being done in this agent and how well.
If there are no standard snmp agents for the parameters you need, you can implement yours based on the agentx mechanism built into net-snmp. You can find out more about agentx agents in the guide
www.net-snmp.org/tutorial/tutorial-5/toolkit (A colleague suggests: look at
code.google.com/p/linux-administrator-tools/source/browse/#svn% better
2Fsnmpd-agent% 2Ftrunk : the documentation is lame there, but all the code is working.) With a small amount of effort, you can monitor everything that can be obtained programmatically from the system, and send this information on snmp via a standard snmp server.
So, on this introduction can be considered complete. Summary: I like zenoss and snmp. We turn to zenoss denser.
Configuring zenoss for yourself
The simplest way to configure zenoss repeats the way to configure any other monitoring systems:
- add device
- define controlled key parameters
- repeat for each device

If you have 2 switches and 5 servers, this is done easily, once and for all. If you have dozens of servers and some of them are “rotated” (for example, you have virtual machines that then appear on the network, then they turn off as unnecessary), then adding and removing devices and monitoring parameters turns into penal servitude.
Therefore, modern monitoring systems such as zabbix and zenoss have the autodiscovery functionality of devices: set the range of ip addresses to search for, access parameters and voilĂ , the found equipment is added to the system. It remains to attach monitoring templates to it.
Hanging patterns is also a sad affair. Mark the hosts in groups or classes and hang the templates on them. Then it will be sufficient to determine the class of the device and the corresponding monitoring rules will be applied to it.
What to do if we have a dozen rules that are hung on the hosts "chaotically"? So that you can not clearly make classes of devices. You can go further and describe for the system the rules for assigning monitoring patterns. In Zenoss, the process of determining the monitoring composition is called “
device modeling ”: zenoss queries the snmp server, determines the available parameters, and monitors them if there are monitoring rules.
The monitoring rules themselves are provided by ZenPacks. Zenpack is a way to distribute native extensions of standard functionality. It can contain event classes, monitoring templates, modeler plugins, interface button descriptions, and more.
The correct way to expand monitoring functions is to determine the presence of certain components on a host and link graphs and events (events) to them. The right zenpack is not a hard thing, but a big one. Not everyone agrees to invent something for a long time. I will demonstrate a “quick & dirty” way to get my way.
Basic concept
Writing your own components and modeler plugins is difficult. Hanging monitoring patterns everywhere by hand is long. Let's combine these two approaches: their disadvantages will ultimately give advantages: the modeler will be simple, I will give its code below, and the templates created once will be hung on devices automatically: if the modeler finds the necessary oids in the snmp server's responses, it will hang monitoring pattern.
Why it is fast : we need to create 2 templates and 1 modeler and configure the modeler plugin for class 1 devices (Device-> Server). Fast, easy, easy, portable (using ZenPacks).
Why it is “dirty” : we will monitor the LSI RAID controller. The first and only. If there are many such controllers, they will not be covered by the monitoring system, they will have to create separate templates and modelers. The approach to creating your own components is more flexible: it would allow to describe the “lsi disk controller” component, to detect and monitor all such controllers.
Implementation
SNMP part
So, I had several linux servers with LSI RAID controllers. Some of them worked on the megaraid driver, some - on mptsas. After some torment, I found snmp agents to get information about the state of the arrays:
sas_snmp-3.17-1123.i386.rpm and
sas_ir_snmp-3.17-1126.i386.rpm (yes, 32-bit, others were not). After installing them, the snmp server is able to give oid from trees
.1.3.6.1.4.1.3582.4 and
.1.3.6.1.4.1.3582.5 respectively.
Among these trees you can find:
1.3.6.1.4.1.3582.4.1.4.1.2.1.19.0 - the number of degraded logical volumes for the megaraid adapter
.1.3.6.1.4.1.3582.5.1.4.1.1.3.1.20.0 - the number of degraded logical volumes for the mpt adapter.
These are good "integral" metrics that will allow the system administrator to draw attention to the server in case of problems: the disk has crashed - the logical volumes in which it is included will become degraded and will raise the metric from scratch.
I will not describe the installation of packages, it is trivial if you know the name of the packages and how sas_snmp differs from sas_ir_snmp.
Zenoss part
In Zenoss, we have to create two templates - for megaraid and mpt drivers, and configure two tresholds for them.
Go to Advanced -> Monitoring Templates and click the bottom "+" button. In the window that appears, enter the name - "lsiArray", and select the path - "Server in Devices".
After that, a window will appear, divided into three parts: data sources, tresholds and graph definitions. We will need the first two:
- Create a data source. The name "vdDegradedCount", as in mib (but you can, and more). Source - SNMP
- Edit the newly created data source: enter our first OID. In the same window, we can test on any server
- Create a treshold, give it the name "LSI Degraded virtual drive count", and the type MinMaxTreshold
- We edit the created treshold: select a single data point, set the minimum and maximum value to 0, denote the event class (I chose / HW / Store). Event class is required when setting up an alert.
After all these actions, we have a template that monitors the megaraid controller and generates an event in case of problems with it.
I will leave the configuration of the second template as an exercise for the reader: there is nothing complicated about it.
We can configure sending a message for this event. Go to Advanced -> Settings -> Users -> $ Your-user -> Alerting rules and add a new rule there. We rule and get something in the spirit:

Pay attention to the “importance” of the event and the beginning of the class chain for the event for which the alert is generated.
At this stage, you can hang the resulting pattern on a certain class of devices and calm down. But this is not our method. Further, the most interesting: its simplest modeler and zenpak.
Creating a modeler plugin.
That's the most interesting. Create your Zenpack! Zenpack will create the directory structure we need, in which we place the source code of the plug-in for the modeler. In addition, we can unload our plugin from the system as part of this Zenpack.
So, go to Advanced -> Settings -> Zenpacks and then click on the gear. In the menu that appears, select "Create Zenpack". A window will appear with the name entered. I chose ZenPacks.HW.Store.LSIArray. Although, as it turned out later, this is not quite the right name: it was necessary to enter there at least the word community before HW.
After creating zenpaka on the screw, a standard tree of files will appear. We will need to create the file
/opt/zenoss/ZenPacks/ZenPacks.HW.Store.LSIArray-*egg/ZenPacks/HW/Store/LSIArray/modeler/plugins/LSIArray.py (tweak the asterisk under the created class) with the following content:
__doc__="""LSIArray LSIArray modeller maps sas and sasir (mptsas) specific templates $Id: LSIArray.py,v 2.00 2012/04/15 16:01 Exp $""" __version__ = '$Revision: 2.15 $' from Products.DataCollector.plugins.CollectorPlugin import SnmpPlugin, GetTableMap, GetMap from Products.DataCollector.plugins.DataMaps import ObjectMap class LSIArray(SnmpPlugin): deviceProperties = SnmpPlugin.deviceProperties + ('zDeviceTemplates',) mibDesc = { '.1.3.6.1.4.1.3582.4.1.4.1.2.1.19.0': 'lsiArray', '.1.3.6.1.4.1.3582.5.1.4.1.1.3.1.20.0': 'lsiirArray', } snmpGetMap = GetMap( mibDesc ) def process(self, device, results, log): """collect snmp information from this device""" log.debug(str(self.deviceProperties)) log.info('processing %s for device %s', self.name(), device.id) getdata, tabledata = results log.debug(str(device.zDeviceTemplates)) newTemplates = [] rmTemplates = [] log.debug('getdata %s mibDesc %s', str(getdata),str(LSIArray.mibDesc)) if len(getdata.keys()) == getdata.values().count(None): log.info('no data') return for each in LSIArray.mibDesc.values(): if getdata.has_key(each) and getdata[each] != None: newTemplates.append(each) log.debug('newTemplates append: %s' % each) else: rmTemplates.append(each) log.debug('rmTemplates append: %s' % each) log.info('Current zDeviceTemplaces: %s' % str(device.zDeviceTemplates)) for each in device.zDeviceTemplates: if each not in newTemplates and each not in rmTemplates: newTemplates.insert(0,each) log.debug('adding to newTemplaces: %s' % str(each)) device.zDeviceTemplates=sorted(newTemplates) log.info('New zDeviceTemplates: %s' % str(device.zDeviceTemplates)) om = self.objectMap({'bindTemplates': newTemplates}) return om
In the source code, everything is quite simple.
At the beginning of a comment describing what the class does.
Then import some zenoss'nyh classes to describe the device model.
The “mibDesc” object is the most interesting. It contains a list of oids and corresponding monitoring templates. If the required oid was found when polling the device, a new monitoring pattern will be hung on the device.
When you edit the example for yourself, try not to miss all the punctuation marks in the text.
Next is a method that performs all checks and comparisons, as well as saves all previous monitoring patterns and adds its own to the list. The result of this method is the object model (om).
Plugin check
After you have copied, corrected and saved the text of the modeler, you need to restart zenoss (zopectl process in Advanced -> Settings -> Daemons) and check that a new one appeared in the list of plugins of the modeler with the name LSIArray (in plug-ins of any device class or in any device). The plugin must be transferred to the active and run the device modeling.
In the window that appears on the screen, you need to check that the modeler is in the list used for the device. In addition, an LSIArray.py file will be created on the hard disk, which indicates successful compilation of the script.
If the script is not compiled - check the correctness of the python code.
If the script does not appear in the list of active or available modelers, restart zenoss.
After the simulation process, a new monitoring pattern should be added to the device. Now you can go and pull out the hard drive from the test server, which will transfer the state of the array from “normal” to “degraded”. (If you don't have RAID0, of course). The number of degraded logical volumes will increase, treshold will be exceeded and the system will generate an event for us. If you configured alerts, it will come.
Conclusion
The method described above allows to achieve several important results:
- Start using automatic monitoring configuration in zenoss
- Start figuring out writing your plugins for a modeler
- Popularizes the use of zenoss as a more flexible and modern monitoring system
I know perfectly well that there are more complete and “correct” guides on writing my zenpacks and modelers. However, I hope that this text will give impetus to many system administrators to start learning and using zenoss in their systems.