My job begins if the report on the failure of the communication center "flew" to me ...

I want to warn you not to waste your time! This is a continuation of thoughts on the maintenance of the L2 city-level network. It is worth reading this article in order to understand whether you need it or not.

Network Administrator:

My work begins if I received a report on the failure of the communication center.

To understand what actions are required of me, I need to figure out what to work with.

For the physical organization of the communication network, we have long invented the TIA / EIA-606 standard “Standard for the administration of the telecommunications infrastructure of commercial buildings ( GOST R 53246-2008 , an analogue of the American standard). The architectural infrastructure of the network includes three subsystems of cable management.
')

access level;
level of aggregation (distibution layer);
core level.

The logical organization of what we have.

graphical representation of what is

Now consider what data we have

Managing network through which you can reach via IP to a managed switch allocated to a separate Vlan (vlan 100 in the figure, untagged on UpLink)
ARP protocol, which allows us to go down to L2 to see the MAC addresses,
Switching tables, will show which MAC addresses in which ports live.

Moreover, the second and third points are not documented due to the fact that the information is updated and distributed throughout the network. In other words, we have a problem here ... in the event of a node failure or a distortion of the switching table during flood, the task of finding a point (port) of failure is complicated. The search itself is the work with documentation and sequential (along the chain) poll of switches.

Maintenance of documentation on ports is a laborious task due to the fact that the technical service “forgets” that you need to unsubscribe about the work done on switching the port multiply by entering information into the monitoring system, that is, the human factor manifests its negative effects.

Based on the above, we should think about how to automate the collection and storage of information about the structure of a working network, as well as to understand how to perform a quick comparison of the “broken” branch and the last working snapshot.

If success is achieved in the implementation of both tasks, the total time to find the cause of failure will decrease, the load on the network administrator during crisis moments will decrease.

As a means of automation, I plan to use Python v3, to work with the MAC and IP addresses library netaddr (originally thought to use ipaddress, but this gives more freedom of action). The server where the zabbix-proxy is deployed is the main reason for choosing an already working server - it is located by the management-vlan one of the interfaces.

Step one identifying the "Alive"

The code is inside, CAUTION! May cause eye irritation.

import datetime import json import re from netaddr import IPNetwork, IPAddress, EUI from netaddr.core import AddrFormatError import subprocess class AliveHost: """   ,   ping ip  arp -n ip     """ def __init__(self, address, mac): self.ip = IPAddress(address) self.mac = EUI(mac) self.current_time = datetime.datetime.now() def __repr__(self): return "{} - {}".format(self.ip, self.mac) def __hash__(self): # TODO:    hash-  ip+mac??? return hash((self.ip, self.mac)) @staticmethod def _str_class(o): # TODO:      to_json result = {} for attr in o.__dict__: result.update({attr: str(o.__dict__.get(attr))}) return result def to_json(self): return json.dumps(self, default=self._str_class, sort_keys=True, indent=4) def ping(host): """PEP 324 – PEP proposing the subprocess module""" h = str(host) result = [] with subprocess.Popen(["ping", "-c 1", h], stdout=subprocess.PIPE) as s_ping: if '1 received, 0% packet loss' in str(s_ping.stdout.read()): result.append(h) if not len(result): return None, None with subprocess.Popen(["arp", "-n", h], stdout=subprocess.PIPE) as s_ping: mac = re.compile(r'(([0-9A-Fa-f]){2}?:){5}([0-9A-Fa-f]){2}') result.append(re.search(mac, str(s_ping.stdout.read().decode('utf-8')))[0]) return result if __name__ == '__main__': alive = [] for ip in IPNetwork('10.0.0.0/22').iter_hosts(): try: alive.append(AliveHost(*ping(ip))) except AddrFormatError: continue except TypeError: continue

By executing this code, we get a list of objects that already contain a part of the information necessary for building a tree.

The next step is to collect information using the telnetlib library and form approximately such records: (alive [0], port_13), (alive [1], alive [2], alive [3]).

I deliberately avoid discussing data serialization for now, since the amount of information collected will be small. In particular, the network / 24 occupied 24K of memory, and therefore there is no point in rushing.

But I want to know your opinion on the subject of correctness of reasoning or the meaning of publications right now.

Source: https://habr.com/ru/post/426403/

All Articles

My job begins if the report on the failure of the communication center "flew" to me ...

Step one identifying the "Alive"

More articles: