Centralized logging from network equipment consoles via ssh

There are recommended, generally accepted means of collecting logs of network equipment - SNMP, syslog, and others like them. Usually they work fine, but at times something more is needed.

Imagine the following scenario: a kind of network piece of iron disappears from the network on level ground, and reappears in a few minutes. “Show version” indicates a reboot caused by a crash, which may be due to thousands of reasons in hundreds of OS components. The crashinfo file is missing. Syslog server did not receive any messages from the device just before the crash. The device is covered by a service contract - but TAC cannot reproduce the accident in itself, and the information transmitted by the client is too small to determine the exact cause of the accident. It is not even clear whether the crash was caused by software or hardware failure. You can replace the device, but it does not help if the cause is software. Switch to another OS version? And which one? After all, it is not known what bug caused the crash and whether it is closed in the new version - and there may be new bugs. In the process of communication, the TAC employee mentions that just before the crash, when the network had already failed, the device probably sent a message to the console with information about which subsystem had fallen and therefore. Of course, you already have a terminal server , but it is used only for emergency access to the device, and it ignores all messages arriving from the console port of the monitored piece of hardware. We must somehow collect these messages. This is what we will do.

A little note. Everything that is proposed below is considered only as an addition to traditional monitoring tools (primarily, regular SNMP / syslog on devices) and is intended to simplify the investigation of the causes of accidents (well, at the same time, to automatically collect the reload log). And I would like to hope that the data collected in this way will never be useful.

We take as the basis the article " Terminal server based on Cisco router ". There are correct first steps. However, the issue of actually logging messages is not considered there. Well, to connect using telnet, which, of course, is unacceptable. You can consider this article a continuation of that.
')
SSH client and part-time log collector will be a separate server on * NIX. In my case - Centos 6th line.
To begin with, we consider access to the terminal server itself. A regular Linux client ssh does not know how to automatically enter a password when called from a script (if you prefer, it costs you in a very ugly way), but we don’t need it.

1) We update the router, working as a terminal server, to the 15th IOS version (with support for the “SSHv2 Enhancements for RSA Keys” feature - for the IOS router platforms, at first glance, all the software with k9 15 lines supports this feature)
2) Create an RSA key pair for SSH sessions on the server. We ignore the “Enter passphrase” request and press enter.

[root@centos ~]# ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 1d:60:fe:72:b5:8c:1e:b5:5e:d1:3c:9c:67:15:9c:59 root@centos The key's randomart image is: +--[ RSA 2048]----+ | o ..E| | o . .*o| | . . o .+=| | o * o oo| | S * + . | | + o . | | . . | | | | | +-----------------+

3) Take the public key RSA.

 [root@centos ~]# cat /root/.ssh/id_rsa.pub ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA0w2L4YVD/V303ccFatgtJxcS+JMYlPkmyufW36fUCogGjzWLbtMZGYoAW8vgy bVgN6r7lcbrbpF6oW9beGfHIWTBfUT898sUQL9jOOki0qvUWzkbej/po6agAK3KK/Z7QCtnAkbDQDb1SzHEmTx9rmboY EZosHOchQy+dvHEoBKCOMBrGKpYgdHfImjctKS3Q02TrkTO0+BoIFc2V32R9AukWFp7+ppGy2ZdoxLv5eEjlhcHukbM yKg9Kjc72/dPNbNkvLXcWKVnkebTmTJIQQyGU2qsAy2asgPC6D02gy6tZAdqp+0umEF2gLXlq2G1p3kn+AojH8bWvYBwyL2s6Q== root@centos

4) If the router is not configured by SSHv2 server, then fix this annoying misunderstanding in the standard way.
5) Copy the server's public key to the router. Important: the line is long, so you will need to copy it in parts, in 2-3 steps, pressing enter between the pieces and make sure each time that all the characters fit.

 termserver(config)#ip ssh pubkey-chain !   ,        termserver(conf-ssh-pubkey)#username consoleuser termserver(conf-ssh-pubkey-user)#key-string termserver(conf-ssh-pubkey-data)# ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQE… termserver(conf-ssh-pubkey-data)#...BwyL2s6Q== root@centos !    id_rsa.pub         termserver(conf-ssh-pubkey-data)#exit !      ,      termserver(conf-ssh-pubkey-user)#exit termserver(conf-ssh-pubkey)#

Check if RSA authentication works:

 [root@centos ~]# ssh consoleuser@10.10.0.10 termserver>

Now you need to check whether you can connect to the console port. First, we allow ssh on all lines (by blocking telnet at the same time, if it was open). On my terminal server, the numbering is from 1/0 to 1/15:

 termserver(config)# line 1/0 1/15 termserver(config-line)# transport input ssh

Now the connection itself. From the second column “show line” we find out the line number (let it be 75) and do:

 [root@centos ~]# ssh consoleuser:75@10.10.0.10

Click enter again and see:

 Username:

And it has already sent the device whose console is connected to the 75th line of the terminal server. We authenticate to check:

 Username: admin Password: router1> exit router1 con0 is now available Press RETURN to get started.

Great, ssh access to the console is there. It remains to configure sending logs to the console on the monitored equipment. There is a nuance. Many do "no logging console", and in general it makes sense. We can not allow the console to be overloaded with messages, and they can interfere. However, this is not suitable for our purposes. Because the first thing on both sides:

 router1(config)# line console 0 router1(config-line)#speed 115200

 termserver(config)# line 1/0 1/15 termserver(config-line)#speed 115200

In this case, 115200 is set. This is a fairly reliable value from experience (and a damn fast compared to the native 9600), but you still need to check that when receiving large blocks of text, there are no krakozyabry.

Next, you need to determine what level of recording to send to the console with the logging console X command, where X is a number from 1 to 7. It is absolutely impossible to include "6" and "7", there are only informational messages (most often useless), which can be many (especially on “7” - this is the debag level, which should be written only to the buffered.) “5” and “4” - usually suits, but it is necessary to analyze how many messages with such a level go into the buffer. For example, "% ASA-4-106023" are messages about blocking packets on ASA firewalls, which can be extremely many, and we do not need to drive them to the console. It may make sense to change the facility of individual syslog messages on the device itself. We are certainly interested in collecting any messages from the facility from 1 to 3, and the rise / fall events of the interfaces do not interfere (although if we are talking about a switch with hundreds of ports, this is questionable). In general, there is a field for thought.

There may be several terminal servers, each with dozens of lines. Yes, and any router without HWIC-16A is a terminal server on one port (AUX). Now we have already configured access to consoles via ssh and sending logs to the console, but there is no record of events. We are starting to write scripts, and such that adding a new console is a simple and pleasant affair.

To begin with, we will write a script that will parse the list of hosts and lines, starting connections. Let it be called startcon.sh

 #!/bin/bash #   .       ,          syslog,   . LOGFOLDER="/root/logs/" #      LIST="/root/collectconsole/consolelist.txt" #     ssh LOCATION="/root/collectconsole" #      «75,10.10.0.1,termserver»,    – ,  –   ,   –     –      .         ,        ,    for i in $(cat $LIST | egrep -o "[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3},[a-zA-Z0-9\-]+") ; do #          connectcon.sh – ,      . ARGS="$(echo $i | cut -f 1 -d ",") $(echo $i | cut -f 2 -d ",")" #        if ! ps ax | grep -v grep | grep "connectcon.sh $ARGS" > /dev/null ; then $LOCATION/connectcon.sh $ARGS >> $LOGFOLDER$(echo $i | cut -f 3 -d ",").log & fi Done

Create a consolelist.txt with the line numbers (the second column from the “show line” of the terminal server), the addresses of the terminal servers and the names of the devices connected to these lines:

 nano consolelist.txt 66,10.10.0.10,router1 70,10.10.0.10,router2 71,10.10.0.10,router3 74,10.10.0.10,router4 67,10.10.0.10,router5 72,10.10.0.10,router6 75,10.10.0.10,router7 76,10.10.0.10,router8 79,10.10.0.10,router9

Create a connectcon.sh script. Not everything is simple with him. Initially, I tried to make it an ordinary bash script that invokes ssh. But as it turned out, ssh, being run in the background, refuses to redirect everything heard to the file. A solution was found. First you need to install the expect interpreter — for centos, this is “yum install expect”. Then create a script:

 #!/usr/bin/expect –f # ,      –      .        . set timeout -1 # ,    . set line [lrange $argv 0 0] set ipaddr [lrange $argv 1 1] #  ssh.  ,    .        –  . ..  . «expect timeout»      ,      ,      ssh     –   ,  2 . while { true } { spawn ssh consoleuser:$line@$ipaddr expect timeout sleep 120 }

As you know, multiple connections can not simultaneously use the same console. And what if you need to go to the console port yourself and perform any actions? The solution is simple: you need to go into the normal session of the terminal server and podlit the desired line.

 termserver#clear line 75 [confirm] [OK]

By this we beat the logging server from the terminal server. The “sleep 120” command in the script will give us 2 minutes to log in by ourselves. And the logger will continue to knock on the door every 2 minutes until we leave.

Everything. Start startcon.sh:

 [root@centos collectconsole]# ./startcon.sh [root@centos collectconsole]#

We look at the processes:

 [root@centos collectconsole]# ps -ef | grep -E "connectcon|ssh" … root 23151 1 0 14:54 pts/4 00:00:00 /usr/bin/expect -f /root/collectconsole/connectcon.sh 66 10.10.0.10 root 23152 1 0 14:54 pts/4 00:00:00 /usr/bin/expect -f /root/collectconsole/connectcon.sh 70 10.10.0.10 root 23153 1 0 14:54 pts/4 00:00:00 /usr/bin/expect -f /root/collectconsole/connectcon.sh 71 10.10.0.10 root 23154 1 0 14:54 pts/4 00:00:00 /usr/bin/expect -f /root/collectconsole/connectcon.sh 74 10.10.0.10 root 23155 1 0 14:54 pts/4 00:00:00 /usr/bin/expect -f /root/collectconsole/connectcon.sh 67 10.10.0.10 root 23156 1 0 14:54 pts/4 00:00:00 /usr/bin/expect -f /root/collectconsole/connectcon.sh 72 10.10.0.10 root 23157 1 0 14:54 pts/4 00:00:00 /usr/bin/expect -f /root/collectconsole/connectcon.sh 75 10.10.0.10 root 23158 1 0 14:54 pts/4 00:00:00 /usr/bin/expect -f /root/collectconsole/connectcon.sh 76 10.10.0.10 root 23159 1 0 14:54 pts/4 00:00:00 /usr/bin/expect -f /root/collectconsole/connectcon.sh 79 10.10.0.10 root 23239 23155 0 14:54 pts/2 00:00:00 ssh consoleuser:67@10.10.0.10 root 23240 23156 0 14:54 pts/3 00:00:00 ssh consoleuser:72@10.10.0.10 root 23242 23158 0 14:54 pts/6 00:00:00 ssh consoleuser:76@10.10.0.10 root 23243 23159 0 14:54 pts/7 00:00:00 ssh consoleuser:79@10.10.0.10 root 23244 23153 0 14:54 pts/8 00:00:00 ssh consoleuser:71@10.10.0.10 root 23247 23152 0 14:54 pts/9 00:00:00 ssh consoleuser:70@10.10.0.10 root 23248 23154 0 14:54 pts/1 00:00:00 ssh consoleuser:74@10.10.0.10 root 23255 23151 0 14:54 pts/10 00:00:00 ssh consoleuser:66@10.10.0.10 root 23341 23157 0 15:09 pts/5 00:00:00 ssh consoleuser:75@10.10.0.10 …

We are looking at the log repository (router4 has already recorded something, the rest are still silent):

 [root@centos collectconsole]# ls -l /root/logs/ -rw-r--r-- 1 root root 58 Sep 22 14:54 router1.log -rw-r--r-- 1 root root 58 Sep 22 14:54 router2.log -rw-r--r-- 1 root root 58 Sep 22 14:54 router3.log -rw-r--r-- 1 root root 115 Sep 22 14:57 router4.log -rw-r--r-- 1 root root 58 Sep 22 14:54 router5.log -rw-r--r-- 1 root root 58 Sep 22 14:54 router6.log -rw-r--r-- 1 root root 58 Sep 22 14:54 router7.log -rw-r--r-- 1 root root 58 Sep 22 14:54 router8.log -rw-r--r-- 1 root root 58 Sep 22 14:54 router9.log

We look at a terminal server:

 termserver#sh line Tty Line Typ Tx/Rx A Modem Roty AccO AccI Uses Noise Overruns Int 0 0 CTY - - - - - 0 0 0/0 - 1 1 AUX 115200/115200- inout - - - 0 0 0/0 - * 1/0 66 TTY 115200/115200- - - - - 0 0 0/0 - * 1/1 67 TTY 115200/115200- - - - - 0 0 0/0 - 1/2 68 TTY 115200/115200- - - - - 0 0 0/0 - 1/3 69 TTY 115200/115200- - - - - 0 0 0/0 - * 1/4 70 TTY 115200/115200- - - - - 0 0 0/0 - * 1/5 71 TTY 115200/115200- - - - - 0 0 0/0 - * 1/6 72 TTY 115200/115200- - - - - 0 1 0/0 - 1/7 73 TTY 115200/115200- - - - - 0 0 0/0 - * 1/8 74 TTY 115200/115200- - - - - 0 0 0/0 - * 1/9 75 TTY 115200/115200- - 1 - - 2 0 2/4 - * 1/10 76 TTY 115200/115200- - - - - 0 0 0/0 - 1/11 77 TTY 115200/115200- - - - - 0 0 0/0 - 1/12 78 TTY 115200/115200- - - - - 0 0 0/0 - * 1/13 79 TTY 115200/115200- - - - - 0 0 0/0 - 1/14 80 TTY 115200/115200- - - - - 0 0 0/0 - 1/15 81 TTY 115200/115200- - - - - 0 0 0/0 -

The lines indicated in the configuration file are occupied (marked with asterisks). Great, the scripts work.

Now to start the startcon.sh script. You can put it in init.d and / or cron.daily (then the nodes added to the list will start being written the next day or after the system restart). And - the problem is solved.

It remains to note a few important points.
1) It makes sense to restrict the rights of the account used for these purposes by means of the Secure ACS server. He should be able to login, but nothing else. And he should log in on a limited list of devices.
2) To run scripts, it is better to create a separate account on a Linux machine. Respectively, generate RSA keys under it and send them to the terminal server. Yes, in this article I sat under the rue all the time, but everyone knows that this is not good. And the location of the log repository is worth changing.
3) Logrotate on Linux will allow log files not to reach astronomical sizes. It is better to enable it for these files. The archiving algorithm depends on the speed of filling files.
4) It is necessary to strictly restrict access to log files.
5) It is better to place the log collection server in relation to the terminal server so that the fall of any monitored piece of hardware does not break the connection between the collector and the terminal server for more than a few seconds. And in general, build networks so that the failure of one device causes a network failure of not more than two or three seconds.
6) Nothing prevents to use these scripts to record events of any devices connected to the terminal server - not only tsiskin devices.
7) The terminal server can be any tsiskin router that has an AUX port. The console port of the target piece of iron is connected to this port using a rollover cable. And in the same way it is put on record by adding of one line to a config. This is useful for small offices, where there are one router and a switch - you can monitor the switch.
8) I have little experience in writing shell scripts, and in the proposed configuration you can certainly improve a lot. I will be glad to any additions.

Source: https://habr.com/ru/post/152024/

All Articles

Centralized logging from network equipment consoles via ssh

More articles: