Cisco VSS: a bug that was not fixed

Today I will continue the story about the not obvious nuances of the operation of the Cisco Catalyst 6509 kernel level switch in VSS mode. Since many people use this platform in their infrastructure, I believe that this story may be useful to someone.

The beginning of fascinating stories with VSS was laid a year ago and described in this post .

So, exactly one year later, as usual, the item “core vacuum cleaner” was included in the work plan at the January quarterly prevention of this year. Let me remind you that the core of our network is a VSS-pair of Cisco Catalyst 6509 switches. Here is a brief information for statistics:
')

Each switchboard has on board one SUP Engine 720 10GE.
It was decided to start the process of dust removal using a vacuum cleaner with a standby chassis. Turned off, vacuumed. Turned on. Oil painting - Standby-chassis went into a cyclic reboot due to a sync configuration error:

If you are interested in how events developed further,

This time it was decided not to show heroism and initiative and just turn off the standby chassis. So did. Left to fly on the main wing. Network performance during cyclic reloads was not affected by the standby chassis. In the morning all the necessary information was sent to the technical support of the integrator, and he in turn opened the case in the Cisco TAC and waited. The response from CTAC was followed quickly. We were asked to reproduce the situation with a cyclic reboot and remove the following debug when the standby chassis is on:

"Debug redundancy config-sync bulk"
"Debug redundancy progression"

At night debug removed and sent to CTAC. I did not publish here. There is a lot of text and a little clear.
CTAC reported that this behavior is described in DDTS:
CSCtx12231
Config Sync: Bulk-sync failure due to PRC mismatch in ACL

tools.cisco.com/bugsearch/bug/CSCtx12231/?reffering_site=dumpcr

Since you need to view the account on cisco.com, I will upload the screen here:

However, our release 12.2 (33) SXJ6 is listed as “Known Fixed Releases” . What is the matter is not clear. We were asked to remove duplicate lines (ACEs) from the ACLs that were presented in the “show redundancy config-sync failures prc” output:

and try to load standby chassis. We immediately had questions, the answers to which from CTAC I will give below in the screenshot:

1. Is it possible to check the correctness of deletion of duplicated ACEs by the output of “show redundancy config-sync failures prc” or by other means, or will it be necessary to start standby in order to check this?

2. Would this bug prevent me from switching to standby if the active chassis had been reset?

3. We had situations when IOS did not allow adding duplicate ACEs. I would like to clearly understand the scenarios when such a check is performed and when it is not (presumably associated with object groups). You need to know where to be especially careful and what to recheck.

As a result, we removed the duplicated ACEs from the active chassis config with standby turned off, but after that the output “show redundancy config-sync failures prc” did not change, indicating that this check would occur when attempting to load the standby chassis. A next technical window was planned, during which a standby chassis was launched. The result was that everything started, reports of duplicate ACEs disappeared from the “show redundancy config-sync failures prc” output.

Now everything is working, we pay special attention to editing the ACL in order to prevent a repetition of the situation. To the questions of how it turned out that our IOS release is listed as corrected from this bug and why IOS did in due time allow us to add duplicate ACEs - we are waiting for answers from Cisco TAC.

When new information from CTAC appears, I will update the post or write it down in the comments.

Good luck to everyone in the battlefield!

Source: https://habr.com/ru/post/249317/

All Articles

Cisco VSS: a bug that was not fixed

More articles: