📜 ⬆️ ⬇️

Inperfo - minimal network monitoring

image


First of all, Inperfo is intended for monitoring network interfaces on switches and routers. Of course, you can monitor network interfaces directly on servers, for example, when you do not have access to network equipment, but there are dozens or hundreds of leased servers (physical or virtual).


The service is designed to monitor physical Ethernet interfaces. Which, by the way, allows you to see the traffic imbalances in the Port Channels when configured incorrectly or some other problems. If you have other tasks, and you need to monitor other types of interfaces, then look towards Observium and fellows (LibreNMS, NetXMS, etc) or on SolarWind NPM.


The main purpose of creating the service was to see the top-ups on loading interfaces, errors, so that all problematic or potentially problematic places were in full view. In zabbix or cacti, to make such a dynamic screen, to put it mildly, is problematic. In addition, I wanted to have a top history in a week, not the current moment - a week, the best option for many networks, when the load increases after the weekend and decreases by the end of the week, or vice versa - at the weekend peaks, and weekdays load is lower.


And the second no less important goal is a minimalistic interface without unnecessary frills, complexities and finishes. Come and enjoy not everyone, yes.


Service solves 3 problems


  1. Shows network bottlenecks. This may be overloaded at the peak of the uplink access switch at the distribution level. Or overloaded port to backup or memcached-server. You can individually assign interfaces as internal or external uplinks to view them in a separate top. All graphs are based on maximums - weekly, monthly and annual graphs are not averaged.
  2. It shows errors on interfaces, for convenience, grouping data by hosts and data centers, displaying top errors from all devices. Oshbiki in the table of hosts and interfaces - the amount per week, on the graphs - errors per second.
  3. It keeps track of the number of free and busy ports on the switches, which helps in time to solve the problems of network expansion.

And of course, automatically tracks changes - renaming interfaces, descriptions (ifDescr), changing the status of the interface, and so on. The only manual work is adding new devices or servers to the agent's config. In time, auto discovery will be added, but not yet.


Inperfo is definitely not suitable for you if you need:


- CPU / Memory monitoring
- Monitoring hdd, temperature and other non-ethernet things
- Ability to draw maps and network diagrams


System components


The service consists of two components: a server and an agent. The agent collects snmp-data about the interfaces and sends them to the server. The server processes (sorts, updates rrd-files and other) the received data and displays it via the web-interface.


How the server works


The server is a docker-container with a "standard" set of software: nginx / php-fpm / memcached / mysql / rrdtool. The server expects agents to send data every 5 minutes. The data is stored in the database - according to the load of the interfaces and errors, a weekly history is maintained, according to which the 95th percentile and the top by max / avg are calculated. This is done in order to "see" the network in different "cuts" - without rare bursts, or vice versa, when you only need to look at the bursts.


Container data is stored on the host system for ease of updating, backup and transfer of the server to other hosts. You can upgrade with just one command (idea taken from the docker, see https://get.docker.com )


How the agent works


The agent is also a docker-container in which, every 5 minutes a crown launches an agent that collects snmp-data about interfaces from network devices or servers. So far, the update interval (polling) devices can not be changed.


The client supports two versions of SNMP - v2 and v3.


Agent configuration, logs, and data sent is stored on the host system. This allows you to easily edit configs, transfer the agent to other hosts if necessary.


Usage scenarios


Monitor network equipment


Ideally, we need to install and configure one agent for each of the data centers or remote offices so that the agent can locally poll devices inside the data center via snmp and send the collected data to a central server, which can be located in one of the data centers or anywhere in cloud (Amazon, DigitalOcean, Azure, etc).


If you have one data center or have “fast” links to other DCs, then it is enough to install the server and agent on the same linux machine, from which all network devices will be polled. Or, for example, on the same machine where you already have cacti - you will not need to configure snmp-access on network equipment (if you have one :)


The main "minus" of this scheme: you need snmp-access to network equipment.


Monitor network interfaces on servers


To monitor network interfaces on servers, we need to install a snmp daemon on each of them, for example, via an ansible-playbook . In this case, each linux server for the agent will look like a separate network device with one or more network connections.


Pros:



Mixed mode


Everything is clear here, you can monitor both switches / routers and servers together - the agent does not distinguish the type of device, and takes information on the interfaces from the MIBv2-base. By the way, this is another drawback - if you have a device that has information on interfaces given from "non-standard" MIBs (for example, BTI 7000), then Inperfo, at the moment, will not work for you.


Performance


He feels good on the "middle" hardware (16CPU / 16GB) up to 100 devices (6000+ ports), and for a large number of days, there was no way to start and watch the work. But since the agent for polling each device creates a separate process (fork), the golang with go-routines simply languishes and asks for this piece of code. Similarly, the server works when receiving data.


What will be added in the next versions


- Specify the maximum speed for the interface. It is necessary in situations when you are connected to the provider via a 1GB link, but the paid channel is actually less, ala is limited to 500MB.
- Weekly reports on tops by mail.
- Notifications by mail.
- Separate build docker-container with the server and agent. For smaller networks, this is ideal. Plus, it will be possible to add hosts via the web interface.
- Tops / graphics by packs
- Search by device name, by interface, by description and by alias
- Rewrite the agent and part of the server on golang.


At the moment the service does not send any alerts and other things, but by URI / export / you can import data into the same zabbiks, and receive notifications. The service is still a little damp, but it solves the tasks.


Install & enjoy.


')

Source: https://habr.com/ru/post/330188/


All Articles