📜 ⬆️ ⬇️

EXTREME'AL LACP

The devil is in the details - I always remember this when I deal with something new. A new software or a new piece of hardware can be all cool, both technically and economically, but there is sure to be such a trifle that seems to be unprincipled, but it drinks a lot of blood. That's about some of these insatiable little things in the network equipment Extreme Networks and I want to tell under the cut.



Snack


Extreme Summit X670V-48x switches are used as aggregation switches in our Moscow network. For reliability, they are enclosed through VIM4-40G4X modules (these are 4 ports on 40G-ethernet). Now there are 2 chassis in each stack, but there are already tickets to add a third one.


')
Accordingly, all links are evenly distributed (i.e., halved) between the two chassis and aggregated using LACP. If something happens to one chassis, we will get degradation along the strip, but the service will continue.

The first


The first thing we dug into, transferring the load to Extreme, was such a thing. We switch the provider - the physics has risen, but the LACP is not going to. That's all right, but confused - no. Exactly the same settings in the direction of another provider - everything works there. And then - even kicked with your feet ... It would be possible to lose a lot of time for disassembling with support, but colleagues from AS8359 helped out: Andrei and Pavel. Quickly recommended line:

configure sharing 1 lacp system-priority 1 

It helped. The strangest thing in this business is that the problem does not occur with all partners. With our tsiski - no problem, but if on the other side of the ASR9000 - most likely there will be a problem. But not always. In general, to understand laziness. Memorized and repeated as a mantra.

The second


The switches are 10-gigabit, and the traffic is much more. Therefore, we have many aggregated links (LAG). Well, inside LAG you need to balance, and to do it in such a way that all legs would be used as evenly as possible, because it is very unpleasant to rest with one foot. So while there were 8 legs in one LAG, everything was fine, but suddenly one lambda jumped up and fell. We always have a reserve for the cumulative lane, losing one dozen is not a problem. But in a strange way, evenly filled legs exfoliated. Some have risen higher (this is expected), others have declined. Oops!

The first time did not have time to understand - they repaired the victim, everything returned to normal. But the sediment remained. The next time we returned to this question, when we added two more lambdas. Look - again a bundle. Turning off the "extra" two pieces - everything is smooth. Turn on - guard! It was experimentally verified that delamination occurs if the number of active legs in a LAG is not a power of 2.

Thoughtful, talked with support. And they say: use custom. Frankly, I did not believe it, because if you believe the documentation, the custom with the default settings is completely equal to L3_L4. I was convinced by colleagues from MSK-IX, who had already fought this problem out.

We reconfigured our LAGs to use the custom method, and now we have the balancing uniformity independent of the number of active legs in the LAG. To do this, however, I had to delete the old settings and create the LAG again, because the balancing algorithm is set only at the moment of creating the LAG. But when did it scare us?

 enable sharing 1:1 grouping 1:1-5, 2:1-5 algorithm address-based custom lacp 

Dessert


For dessert, left my favorite :) Port-channel. How is port aggregation done on all tsiska? A special port is created - the Port-channel, it is configured, enabled. Then physical ports are added to it (configuration is not quite true, but now it does not matter). It was necessary to add - add! There were extra - delete. Any. Conveniently, damn it!

In XOS, the story is different: the LAG configuration is tied to one of the physical ports (it becomes the config master). Such a port must be a member of LAG. I can add and remove ports as many as I need, unless the config master is moved. But if you need to move the last (master) port, that's all. The only option is to delete one sharing and create a new one. Upsik ...

As in any decent office, we love to redo it for ourselves. For various reasons, but we have enough housekeeping. Accordingly, we have several times perfectly applied to this rake. I will not speak for other members of our team, but I personally did not like it.

Judge for yourself: the port must be configured with VLANs (by the way, in the EXTREME'al ideology, not vlan are hung on the port, but ports are on the vlan. Accordingly, you cannot hang the entire list of vlan on the port on one line), STP (officers, keep silence! ) and other disgraces. The stump is clear that with such a volume of settings it is a mistake - how to send a couple of bytes. In general, where are you, my favorite cisco-style portchannels? However, not everything is so bad. If the config master port goes to down, then the LAG itself remains alive (otherwise it would be quite x ... bad). Even when we disconnected the chassis that the config master is on, everything continued to work. Well, at least so ...

From hopelessness, I discovered feature-request FR4-4584728621 last year. As you know, not implemented. And there is no certainty that the manufacturer thought about it at all.

Salvation of drowning people is the work of drowning people themselves. I once looked at one portchennel on a tsiska:

 #sh int po1 eth Port-channel1 (Primary aggregator) Age of the Port-channel = 174d:06h:35m:37s Logical slot/port = 2/1 Number of ports = 4 HotStandBy port = null Port state = Port-channel Ag-Inuse Protocol = LACP Port security = Disabled 

And noticed a beautiful line of Logical Slot / Port. "Oh," I said. From where on this not stacked non-expandable tsiska the second slot ?! When I realized that this is a logical slot, I was born PLAN!

And what if you use such a port as a config master port that will never be used? No, we are sorry for the ports purchased, so they will all be used. What about a non-existent port? This, which is on the slot (in the stack), which is not. Then we do not care which ports are included in a specific LAG. All the settings we will do on this virtual slot. Add and remove ports from it as needed. Yes, we will reduce the number of chassis in one stack, but have you seen so many stacks of 8 chassis? I did not see 3 thicker.

Then everything turns out just. We declare a virtual slot, take as many ports as possible (we will announce all LAG on its ports) and go ahead!

 configure slot 8 module X670V-48x enable sharing 8:1 grouping 8:1, 1:1-5, 2:1-5 algorithm address-based custom lacp 

To add new ports:

 configure sharing 8:1 add ports 1:48, 2:48 enable ports 1:48, 2:48 

Now you can safely switch to new ports. LACP will be active all the time. And then we clean the old ports so that they can be used for what is a thread:

 disable ports 1:1, 2:1 configure sharing 8:1 delete ports 1:1, 2:1 

Fluent testing in the lab showed the vitality of the idea. But I warn you at once: while in battle, we did not exploit it, although there is already an introduction ticket.

I tried to reach the developers and made a topic on the Extreme forum . You can read the answers yourself.

It seems to me that making a command line interface that would allow conveniently working with LAG should be relatively simple, given that there are built-in mechanisms. I think, just all networkers suffer over this problem in splendid isolation, so I propose to raise the degree of this problem. If you are an Extreme user, contact support. Refer to the specified FR and require topping up the foam after the beer has settled. Write in the given forum thread everything you think about the manufacturer. Let it be such a network flash mob.



Our previous publications:
» Implementing a secure VPN protocol
» Implement an even more secure VPN protocol
" Unnecessary items or how we balance between servers
» Blowfish on guard ivi
» Non-personalized recommendations: the association method
" By cities and villages or as we balance between CDN nodes
" I am Groot. We do our analytics on events
» All on one or as we built CDN

Source: https://habr.com/ru/post/264527/


All Articles