Why doesn’t a standard vSwitch need a Spanning Tree protocol?

Today I would like to digress from vSphere 5 fever a bit and recall the basics of the standard vSwitch, and in particular, how it does without the Spanning Tree Protocol.

I assume that you already have the simplest knowledge of switching and know what vlan is, the switching loop, the spanning tree protocol, and some types of link aggregation protocols. I will try to briefly go over the main features of the standard vSwitch, focusing on facts that seemed interesting to me or that were not very obvious in the official documentation, at least for me. This also implies some confusion of the following.

The main purpose of a standard vSwitch (or vNetworking Standard Switch aka vSS) is to provide communication between virtual machines and the physical network infrastructure. In addition, it provides logical separation of virtual machines using Port Groups, offers various balancing algorithms in case you have more than one uplink on one ESXi host, provides outgoing traffic shaping from virtual machines to vSS, and finally, allows you to detect uplink failure and automatically switch traffic to the remaining uplinks.
')

So what are the main differences from a physical switch?

Unlike physical switches, vSS does not need to learn the MAC addresses of all devices in the same broadcast domain. Since all virtual machines are assigned MAC addresses by ESXi, vSS already knows all the default addresses. Another distinctive feature is that vSS clearly divides the ports into two types - internal ports and uplinks, and applies different switching rules to them.

Port Group and Vlan

Port Group defines a configuration template (for example, Vlan number, shaping, traffic balancing) for internal ports. When you connect a virtual machine to vSS, you simply specify which Port Group to use for it and therefore apply the pre-configured parameters. For example, you can specify that for virtual machines of a certain Port Group, only a specific interface should be used as uplink, and the rest of interfaces should be used as backup. Another good example is the situation when you want to configure on MS Network Load Balancing Cluster virtual machines in Unicast mode. In this case, you will need to create a separate Port Group for these virtual machines and disable the Notify Switches option in them.

Very often, the Port Group is compared to vlan in physical switches, although in fact it is a completely wrong comparison. I think this happens due to the fact that in most cases vSphere admins use Port Groups to separate their virtual machines into different vlans. However, there is not always a direct correspondence between them. The previous example of MS Networking Load Balancing is an excellent proof of this - in our vSphere there are two Port Groups: one for MS Network Load Balancing servers and the second for the other servers. At the same time, the virtual machines of both Port Group belong to the same Vlan.
Well and from this another difference of Port Group from Vlan follows. Computers in different vlan-ah can not communicate directly. They will definitely need either a L3 device to route traffic, or they will need to enable bridging between the vlans, but between the Port Groups, unicast traffic can be freely transmitted.

The big surprise for me was the discovery of vlan 4095 just yesterday, or rather the ability to listen to absolutely all traffic on vSS with it. As a rule, in the official documentation of VMware vlan 4095 it is indicated as VGT - virtual guest vlan tagging. With it, we can forward vlan tags to the virtual machine itself and allow it to make decisions about what to do with this traffic. In practice, this is not used very often - VST (virtual switch vlan tagging) is often used when vSS removes vlan tags and sends traffic to the appropriate Port Group. We do not use VGT at all, so I read about it briefly. It turns out that if you create a port group and assign it vlan 4095, enable Promiscious mode in this Port Group and then place a virtual machine there, you can uncover the wireshark and calmly collect and analyze traffic from all the machines connected to this vSS. Another useful practical application is to place a virtual IDS in such a Port Group for the purpose of traffic inspection.

Uplink - traffic balancing

In vSphere 4.1, you can have a maximum of 32 uplinks per virtual switch. Uplink can only be used on one vSS, that is, using the same uplink on another vSS will not work. It also means that traffic will never be transmitted directly from one vSS to another vSS. The traffic between them must either go uplink to the L3 device, or through the internal virtual ports to the virtual machine that plays the role of the L3 device.

In 99% of situations, your ESXi host will have more than one uplink and you will have to decide how you will balance traffic over 2 or more uplinks. By default, these rules apply to vSS, but you can optionally change them also on certain Port Group s.

There are three basic traffic balancing policies in vSS:

Route based on the origin virtual switch port ID - let's say you have 6 virtual machines and a single vSS has two uplinks. When you turn on the first virtual machine, vSS assigns it the first uplink, which it will constantly use this virtual machine until you turn it off and turn it on again. The second car will be assigned the second uplink, the third - again the first, and so on. Switching to another uplink can only take place if the main link is dropped. The traffic from the physical switch to the virtual machine will always return via the same uplink, because only on this uplink will the physical switch know the MAC address of the virtual machine.
Route based on source MAC hash - Essentially the same method as the previous one, except that in this case the virtual machine MAC address hash will be used to select uplink.
Route based on IP hash - in this case, vSS uses the hash ip of the source and destination addresses to select uplink. That is, a session between two ip addresses will always go on top of the same uplink, which will guarantee us the correct order of packet delivery. On the physical switch, it will also be necessary to enable support for the 802.3ad standard in order for it to do the same thing - hash the source and destination addresses, use only one uplink for them, again ensuring the correct order of arrival of the packets. By the way, it is absolutely not critical if the traffic from the virtual machine goes uplink one by one, but returns differently.

One of the most common traffic distribution scenarios is very similar to that used on physical switches using the Spanning Tree protocol, when different vlans go through different uplinks, until one of the uplinks “drops” and then all the traffic of these vlans ov switches to remaining aplinks. Similarly, in ESXi you create 2 Port Groups on a single vSS. For the first Port Group, the first uplink is indicated as active, the second - the backup. The second Port Group is configured exactly the opposite. Thus, you achieve more efficient use of bandwidth to the physical switch and at the same time do not lose redundancy.

Sometimes vSphere admins allocate a separate pair of uplinks only for vMotion or FT traffic, which seems to me a waste of interfaces, because vMotion or FT traffic balances over more than one interface in very rare cases. Most of the time, one of the interfaces will simply be idle.

Shaping outgoing traffic
In the case of vSS, we can shape only outgoing traffic from virtual machines to vSS. In principle, this, for example, is enough to shape the vMotion traffic. But, alas, we will not be able to shape incoming traffic from physical machines.

So, why doesn't vSS need a spanning tree protocol?

All STP BPDU packets sent by the physical switch to the virtual switch are simply ignored.
Any package that vSS receives through one of the uplinks will never be sent via another uplink. The only way for it is to one of the virtual machines or to the VMK interface. That is, the loop through vSS will not be
When a vSS sends a broadcast, multicast, or a packet with a still unknown unicast address, the physical switch will redirect it to all ports without fail. Therefore, vSS checks the source address in all incoming packets and if the MAC address of one of the virtual machines or the VMK interface is detected there, then such traffic is ignored. Thus, the problem of a loop through a physical switch is solved.
Broadcast and multicast traffic sent by the virtual machine will be sent to all ports of the same Port Group. A copy of this traffic will be sent to the physical switch on top of only one uplink selected in accordance with the traffic balancing policy.
Since vSS already knows all MAC addresses, it will drop all packets addressed to an unknown MAC address. If the same packet is sent to the unknown recipient by the virtual machine, then it will be sent in the usual way through the uplink already selected by the traffic balancing mechanism

All of this, in principle, proves that you can have a lot of uplinks between vSS and the physical switch, but you don’t absolutely need the Spanning Tree Protocol, which is difficult to troubleshoot.

I compiled most of the information from Comrade Ivan Pepelnjak's great blog and VMware official documentation.

Source: https://habr.com/ru/post/124317/

All Articles

Why doesn’t a standard vSwitch need a Spanning Tree protocol?

More articles: