Get comfortable, my reader. Today I will tell you a fascinating story.
It all started recently. The newly installed (less than a month) Cisco router was suddenly offended and left to himself. He left so much that he stopped reacting to external stimuli completely, considered the traffic through himself considered ungrateful and unworthy, and in general I couldn’t bite.
The first thought after rebooting the router: some joker decided to launch another uber program. Well, or someone's laptop is crazy - it also happens. However, a close study of traffic (netflow, interception of tcpdump-ohm on the subject of cunning Broadcast) did not work. Moreover, storm control on client ports did not work.
')
In the meantime, the router that worked after the reboot for barely five minutes hung up again. Please note in the midst of the working day. “Fortunately,” the telephony went through the same router, and only this saved us from the screams of distressed colleagues :).
Hmm ... - said the harsh Siberian men.We turn off all clients, turn on one by one - it seems, quietly. We begin the interrogation with passion: who, what, where ... in response, silence. Were not, do not know.
Repaired primus. Read the mail, worked.
And it would be possible to write off as an accident, if such a problem did not repeat several times in other branches, where the routers were recently changed, at different times, with different load ...
Naturally, as soon as it became clear that the problem was not a single one, a Cisco TAC ticket was opened, a standard story began with a change of IOS versions, a
proposal to change the router , checking settings and typing.
In parallel with communication with TAC, they assembled a booth and tried to reproduce the situation “in the laboratory”. After analyzing tons of proxy logs, we found out that a hang occurs when opening a mailbox on outlook.com.
Damn it, Holmes, but how ?!On the stand, the problem is reproduced 100%. When you log into your outlook.com account, the router dies without making a sound. A crash dump does not leave a router behind itself, even if you ask, the watchdog does not save the situation - the router hangs tight and only a cold restart on power saves it. We start one by one to turn off the activated settings and find out that if you turn off the traffic inspection, everything returns to normal. We are changing several versions of IOS — identical behavior, even with “recommended by TAC specialists for this model”. We begin to dig deeper - it turns out that the fault is nbar (Network Based Application Recognition) - a module that allows you to recognize the type of traffic (voice, video, data ... etc) for the correct coloring of it and the application of various QoS policies.
The TAC engineer who led our ticket, from such news was somewhat shocked, but took all the necessary information and passed on to programmers. Their answer was gorgeous:
"It is a rather general behavior of the ISR routers in the case of loops at the interrupt level.
If it is not interrupted
The manual reload to restore service.
Scheduler isr-watchdog (as shown in example below)
in order to activate a mechanism for detecting such cases.
It will also trigger a router reload if it’s identified. ”
Those. everything is fine, so it should be. Proud cats die silently.
Also, the TAC engineer answered that we would not find this problem in the buglist, since it is internal, and not at all FIG once again to disturb the public, but it is better to just silently update the protocol pack (set of signatures and rules for this very nbar). It can, of course, be correct, but, on the other hand, the absence of this problem in known bugs turns the search for a solution, in essence, to the fortune-telling in the coffee grounds and turning off all the functions of the router alternately (and taking into account the cost of the piece of iron, then the suitable labs are on hand may well not be).
After updating the protocol pack, everything returned to normal, and for a month now (pah-pah-pah), there are no problems with routers.
This is the Friday story. So if you use nbar and your farm has the aforementioned routers (ISR G2, if that), then I strongly advise you to update the protocol pack, until your users decide to open some interesting site somehow.