NOC: Integrated Network Management

Complex networks require an integrated management approach. If the entire network consists of a dozen switches and is managed by one engineer, then to maintain it in working condition, it is enough to work with simple scripts, several spreadsheets, and any primitive monitoring system. In the larger networks, which are made up of different vendors' equipment from the disparate equipment, supported by dozens of engineers scattered in different cities and countries, very specific problems begin to emerge: a bunch of self-written scripts become completely unmanaged and unpredictable in behavior, more resources are spent on integrating different control systems among themselves than to develop from scratch and install and so on. As a result, an understanding quickly comes that the task of managing a complex network can only be accomplished in a comprehensive manner.

Back in the early 80s, an ISO committee highlighted the main components of a network management system. The model was named FCAPS . According to ISO, to successfully manage a network, you need to be able to manage failures (F), equipment and services configuration (C), collect and process statistics on service consumption (A), evaluate performance (P) and centrally manage security (S). The past three decades have not added anything fundamentally new, and all the tasks of network management somehow jump around the main components.
')
Commercial complexes of this kind are very expensive and far from sinless, and there was a clear and frank gap among the open-source systems, which simply pushed for the development of their bicycle. As a result of the generalization of our personal experience in building and operating networks, after much trial and error, the NOC system appeared.

In general, it should be noted that the NOC is not a monitoring system, and is not an alternative to zabbix / nagios / cactus / etc.
The main task is to automate the daily work of the network control center.

When developing the system, we proceeded from several prerequisites:

One of the prerequisites - the source of information should be one and it should be convenient to use.

The second prerequisite is the delegation of authority. The data in the system is typed by different people from different departments.

The third prerequisite - there is nothing to make a fuss with every single piece of iron - it is a loose change. Today there can be a six-ton, tomorrow it will be Force10. To manage the network, a higher level interface is needed, which is abstracted to the maximum from a specific vendor and a specific model.

Fourth - there is always enough gremlins among goblins. A significant part of accidents caused by human factors. You need to be able to quickly understand what is up with and what led to the accident. To do this, you must map configuration changes, syslog / snmp events, and more.

Currently, the NOC consists of several modules:

Address Space Management (IPAM) - address space management. The main difference from other solutions is support for independent address spaces in individual VRFs, hierarchical allocation of address blocks, delegation of authority. For example, you can select a block on the city and give the right to manage the block to the city branch, and then let them turn back what they want within the limits. At the same time, according to reports, it is possible to track the extent to which the activities of a particular city correspond to general policies. The subsystem works normally with tens of thousands of dedicated blocks and hundreds of thousands of addresses and supports IPv4 and IPv6 addresses

DNS Management - if the addresses have been put in order, then why not synchronize everything with DNS. Thus, a single interface is obtained for managing zones and provisioning zones to different DNS servers according to the described logic. For example, data for zones can be generated automatically, the zones of the city and the customers of this city will leave for servers located in this city. There is no need for slave zones, you can easily migrate to other DNS servers. Along the way, database of registrars is monitored when a specific domain goes rotten.

Service Activation - an interface for working with equipment. A wide range of equipment is supported. The basic idea is that there is a set of cube-scripts that have a common interface, perform some kind of action and completely abstract the features of a particular piece of hardware. Examples - get config, get version of software, create vlan, and so on. The resulting cubes can be made to work in different combinations and solve with them a very wide range of tasks. Also implemented is the map / reduce tasks mechanism, which allows you to perform a single-type action on a large amount of equipment and analyze the result of the execution.

Configuration Management - tracks where, what and when changed. It began as an interface to mercurial, now the module's functionality has become much more. In particular, if the switch changes in the city, a message will be sent to a dedicated engineer in this city. In a large number of cases, he will have time to quickly respond to local amateur activities, give a cap and prevent an accident. The system is able to check received configs for compliance with established policies and is able to take active actions in case of suspicious situations.

VC Management - VLAN management. When fully deployed, it is enough to add and delete vlans to the database, and they will automatically appear on the necessary switches, regardless of the vendor. For example, in one installation it was necessary to steer at the same time a mountain of Pussy six-ton, four-tone, nexus, 3750 / CBS3120, force10 E, C, S-series, HP ProCurve and GbE2c and small Alcatels.

Fault Management - collecting, analyzing and correlating syslog / snmp trap events from hardware. NOC takes an original and flexible approach to event handling. FM is a separate topic for conversation, it can be said that there are simply no sane open-source implementations, and sane commercial ones can be counted on the fingers of one hand. The current implementation of FM in the NOC is capable of processing hundreds of events per second and identifying anomalous and emergency situations among them. The correlator finds the connections between accidents and tries to establish the root cause. For example, a dropped link can generate hundreds of accidents of various types in different places of the network. The correlator, operating with knowledge of network topology and a built-in set of rules, can establish that the true cause of many accidents lies precisely in the fallen link and clearly indicates where to look for the cause

Peering Management - everything related to peering and BGP. It allows you to store a peer database, generate filters for BGP, update the RIPE database and do much more. When the peers go through the tens and hundreds, the thing is irreplaceable.

Knowledge Base is the usual built-in wiki with a set of additional interesting macros. For example, using the rack macro, you can draw a row of racks. In KB, you can store instructions, certificates, agreements, rules and policies, useful recipes, and so on.

Performance Management - the active collection of performance parameters (including snmp). The module is quite interesting and will be actively refined.

Inventory - a common base for the physical gland. It allows you to work with objects of different levels - from the city and communication center to the rack and the power cord of the switch. The module is in active development.

As a result, the NOC is, first of all, a highly specialized tool for managing complex networks. If you look apart from this context, it is quite possible to be like the three blind, who felt the elephant and the first recognized him with a hose, the second - a tree, and the third - a string.

NOC is open-source, distributed under the BSD license, and has been successfully used for several years in a number of large Russian and foreign networks. The main programming language is python. As a database used a bunch of PostgreSQL and MongoDB. Web interface is implemented on Django. We invite competent specialists to take part in the work on the project, we have a lot of interesting areas of work for bright heads.

Project website: http://redmine.nocproject.org .
IRC: # nocproject.org on irc.freenode.net

Source: https://habr.com/ru/post/125034/

All Articles

NOC: Integrated Network Management

More articles: