Monitoring channel status via jitter / packet loss

Good afternoon, colleagues.

Gathering with thoughts, I decided to make a decision that was born to me.

So, the statement of the problem :
')
There are two channels between points A and B, most often from different providers. It is necessary to take into account the quality of service on these channels, namely:
1. With losses> 0.5% per channel, the channel should not be used.
2. When jitter> 10ms, the channel should not be used.

This problem arose in my work, because the two cities are connected by two channels, through which a large number of votes run, which, as is well known, are quite capricious with respect to the indicators described above. To whom it is interesting - you are welcome under the cat.

Initial decision .

Initially, there were even two clumsy options. The first was to raise a pingule on the ziska, which checks the channel survivability and switching on his death. The solution was managed until we had problems with jitter in the absence of losses.

The second solution was to create a monitor based on udp packets that mimic the G729 codec. The monitor showed losses and jitter, in case of problems with communication, the administrator climbed to the cat, observed the current values of jitter and losses on it, and, depending on the circumstances, decided to turn off the channel. It worked, of course. But this is some kind of semi-automatic system. Therefore, I pulled myself together and brought this situation to some final solution.

The current solution .
So, as in the second case, we create a udp monitor for channel quality that simulates the G729a codec (the so-called SLA monitor).

ip sla 33
udp-jitter 172.16.1.66 49333 source-ip 172.16.1.65 codec g729a codec-size 20
tos 70
threshold 10

This monitor will send 1000 packets at intervals of one minute to port 49333 at the destination point, labeled tos = 70 = 0x46 = EF. The destination must be enabled.
ip sla responder
Next, create a stub track (created specifically to control it using applets, and not tied tightly to an SLA monitor):

track 20 stub-object
default-state up

Now our task is to remove the results from the SLA monitor and, by their values, leave track 20 in the UP state or put it. This can be done for example with the help of Cisco EEM (Embedded Event Manager), which allows you to monitor the current state of your piece of iron and perform certain actions.
To do this, create two applets. One will put the track in the Down state, if at least one of the parameters (jitter or the number of losses) does not suit us. The second will raise it back if BOTH parameters are normal.

Configuration
1. Create the first applet:
event manager applet LB trap
Create two events based on SNMP OID for RTT and jitter From our SLA monitor:

event tag jitter snmp oid 1.3.6.1.4.1.9.9.42.1.5.2.1.46.33 get-type exact entry-op ge entry-val "10" entry-type value poll-interval 4
event tag loss snmp oid 1.3.6.1.4.1.9.9.42.1.5.2.1.1.33 get-type exact entry-op le entry-val "994" entry-type value poll-interval 4

Here, the last digit 33 in the SNMP OID is the number of the SLA instance. 10 is the threshold for jitter (in ms), 994 is the minimum number of packets returned from the thousand sent (1000 - packet_loss). poll-interval - the interval with which the cat polls the state of values. Here 4c.
We indicate that our applet should work with ANY of the events, i.e. logical OR is used.

trigger
correlate event loss or event jitter

Next action:
action 20 track set 20 state down
Those. our track fits into the Down state.

The second applet is similar:

event manager applet LB2 trap
event tag jitter snmp oid 1.3.6.1.4.1.9.9.42.1.5.2.1.46.33 get-type exact entry-op lt entry-val "10" entry-type value poll-interval 4
event tag loss snmp oid 1.3.6.1.4.1.9.9.42.1.5.2.1.1.33 get-type exact entry-op gt entry-val "994" entry-type value poll-interval 4
trigger
correlate event loss and event jitter
action 20 track set 20 state up

Only the applet is triggered by a logical AND between events. And the track is cocked up.

It can be seen that polling occurs at 4s intervals and the current state of the track is not taken into account, i.e. traps are triggered constantly, every 4s. I tried to tighten the monitoring of the state of the track itself, but it worked very buggy, it did not always work. So I donated a couple of percent of the money and left it like that.

Additionally, there are applets informing me about the problems on the channel and their disappearance:

event manager applet LB_info
event syslog pattern "20 stub Up->Down"
action 10 syslog msg "applet works!"
action 11 cli command "enable"
action 12 cli command "show ip sla stat 33"
action 13 mail server "192.168.6.20" to "ilya@tut_domen.ru" from "alert@tut_domen.ru" subject "Frame loss or high jitter on NiS channel" body "$_cli_result"

and
The $ _cli_result variable contains the output of the last command, i.e. in our case, show ip sla stat 33.

event manager applet LB2_info
event syslog pattern "20 stub Down->Up"
action 10 syslog msg "applet 2 works!"
action 11 cli command "enable"
action 12 cli command "show ip sla stat 33"
action 13 mail server "192.168.6.20" to "ilya@tut_domen.ru" from "alert@tut_domen.ru" subject "NiS channel is correct" body "$_cli_result"

In other words, we send ourselves a letter in which the body is the result of the last launch of the SLA monitor, which actually caused the switching of the track.

So, now actually how to take this into account . I see two ways:

1. line in the route-map (as it works for me, actually, but there is just a tricky scheme)
set ip next-hop verify-availability 172.16.1.66 1 track 20
2. tracking statics when we have a route of the form
ip route 172.16.0.0 255.255.255.0 192.168.1.1 track 20
In case of losses or jitter on the channel, this route will simply be removed from the routing table and traffic will follow an alternate path.

It may be clumsy, but after suffering a week I did not invent anything better. Than rich, as they say.

PS I mean the reader is a little familiar with the basics of the Cisco console.
PPS The ip sla command syntax differs by 12.4T and 12.4, but the meaning is the same.
PPS If you need explanations that do not go beyond the border of several lines - write, add.

Respectfully,
Podkopaev Ilya

_________
UPD : about the CPU. In general, under load without applets, I have a router load (ISR 3845) of about 42% on average, with an applet - 43-44.

Source: https://habr.com/ru/post/108519/

All Articles

Monitoring channel status via jitter / packet loss

More articles: