📜 ⬆️ ⬇️

CloudEngine. Application for leadership from Huawei. Part 1

This article opens a series of publications devoted to a completely new line of Huawei equipment under the general name CloudEngine , as well as technologies and solutions used in this equipment.

Introduction


Until now, there have been no offers from Huawei in the segment of equipment for data centers. In the third quarter of last year, a promo program from Huawei was launched, primarily for partners, dedicated to this new product - a series of data center switches.
There is no doubt that Huawei set several goals at once, creating a new line of equipment:
1. Fill in the gaps in the spectrum of equipment offered by the company on the market
2. Overtake and get ahead of their main competitors
And we can say for sure that the company has achieved these goals.

Among the many needs of modern data centers are ever-increasing bandwidth requirements. These requirements are dictated, first of all, by tens of thousands of servers united by a network subsystem. There is a forecast as to what will be the dominant way of connecting the server to the PDS. Regarding this, it is possible to predict which types of ports on line cards and in what quantity will be relevant.


Fig. 1 Forecast on the types of used server interfaces in the data center
')
In addition to bandwidth, administrative and technical problems are acute:
- Migration of virtual machines within the data center or between data centers of one company;
- Traffic control in networks segmented into L2 and L3 (OSI);
- Transfer of storage network traffic to data center locations where there is no support for this standard on switching equipment, and others.
Of course, all problems are eliminated using a combination of solutions, for example, from vendors VMWare and Cisco. Huawei also offers a mono-vendor solution ranging from nCentre software products with an open API and the ability to integrate with third-party vendor hypervisors and ending with a data transfer subsystem for data centers of any scale!
So, CloudEngine. Huawei has divided the entire line into two large segments - these are CORE and TOR devices. CORE devices are positioned as the "heart" of the data center, which passes through all the data streams, providing the highest fault tolerance. TOR (Top-Of-Rack) devices act as rack-mount aggregators, collecting traffic from servers, disk arrays or less efficient network devices on themselves. Based on the functional division, there are fundamental differences in the external performance. All TOR devices have a single-unit design, CORE are made on the basis of different chassis, 4 variants today including CE12816 which is not shown in Fig. 2


Figure 2 - CloudEngine Switchboard Family

Any large Enterprise level network, which, by the way, also includes data centers, can be idealized and described by a multi-level scheme (Figure 3). Below is the positioning of devices of the CloudEngine family of all types.


Fig.3 - Idealized network structure of a large Enterprise-level LAN

Next, consider what interesting Huawei offers to potential owners of this equipment. Consider the hardware and software advantages of new switches.

Hardware structure and features


Huawei demonstrates good competitive performance in CORE devices:
1. The capacity of the internal switching bus up to 2 Tbit / s per slot with the possibility of software expansion in the future up to 4 Tbit / s
2. High port density - up to 96x10G and 24x40G ports per slot. The total port density is 1536x10G ports on the chassis of the CE 12816.
3. Ultra-low data transfer delay - 2-5 micro seconds, regardless of frame length
4. High availability of devices due to the possibility of hot-swap of any element of the chassis.
5. Energy efficiency is 50% above the market average.

Competitive offer analysis

The main advantage of the switches of the CloudEngine line is, first of all, the figures on port density and data transfer rates that are dramatically out of the range of similar indicators for other vendors' devices. A small competitive comparison in Fig.4


Fig. 4 Competitive comparison

What achieves such a high switching speed and low latency:

- The switches of the CloudEngine12800 series implemented a physical connection of each line card (LPU - Line Processing Unit) with each switch factory (SFU - Switch Fabric Unit), i.e. in fact, the matrix MxN is obtained (Fig. 5).


Fig.5. Method of internal connection of LPU and SFU

- When transferring traffic between LPUs, a dynamic non-blocking CLOS architecture is used (Fig. 6). Link to materials with the description of the network Kloza in the application.


Fig. 6 Non-blocking CLOS architecture

This technology, together with the switching method of the SPU and LPU boards, allows us to use all switch factories of the device for data transfer at any time. The static CLOS architecture used in many similar devices has several disadvantages. Comparison of two implementations are given in table 1.
Table 1


The high density of ports, of course, depends on the ability to send all traffic through the bus, otherwise there would be no point in them. As shown above in CloudEngine this is not a problem. Things are easy - pick a productive platform for LPU, and! apply little tricks!
Figure 7 shows the most efficient line card for the CE 12800 platform with 24 40G ports onboard. Thus, a total transmission rate of 960 Gbit / C is achieved.


Fig. 7 - Linear LPU card CE-L24LQ-EA

However, if it is necessary to drastically increase the number of ports, you can use an ingenious device - a “splitter” (Fig. 8).


Fig. 8 - Divider

In fact, inside a special QSFP module there are 4 separate modules, each operating at its own wavelength. The switch instead of one 40G interface defines 4x10G and the CE-L24LQ-EA board already has a maximum of 24x4x10G = 96G interfaces! By the way, the board can combine 40G and 4x10G interfaces, there are no restrictions.

High availability of the device is realized, as noted above, by the possibility of hot-swappable any component of the chassis: CMU (Control Management Unit), LPU, MPU (Main Processing Unit), SFU,
PMU (Power Management Unit), PSU (Power Supply Unit) and FAN. Moreover, for all devices of the CE12800 family, all the boards except the switch-factories are identical! Figure 9 demonstrates how to back up chassis components.


Fig. 9 Redundant Chassis Components CE12800

The energy efficiency of the CE12800 series equipment is achieved through the following solutions:
- The use of chips with high integration. Those. instead of cooling dozens of chips on the board, only a few units need to be cooled.
- Almost all processors and ASIC are made on 45-nm technology, which can significantly reduce the energy consumption of such components.
- Patented cooling system, namely the internal structure of the chassis. All this allows CE12800 devices to be placed in the data center without worrying about the mutual influence on neighboring racks, and to organize warm and cold corridors. (Figure 10-11).

a) Side view on the rightb) Top view

Fig. 10 Patented Cooling System


Fig. 11 Heat distribution in hot and cold corridors

Program structure and features


It should be noted right away that Huawei is very actively developing its own modular operating system VRP (Versatile Routing Platform), which, contrary to all rumors and stories, has nothing to do with the code of competitors. In the entire history of the development of VRP has undergone many changes from VRP1 to VRP5 and specifically for CloudEngine, the developers have released VRP8.
Key features of VRP8:
1. Support for virtualization and resource management of virtual subsystems
2. Support clustering CSS (Cluster Switch System) and DCB (DataCenter Bridging)
3. Support for network technology TRILL (Transparent Interconnection of Lots of Links)
4. Supports FCoE (Fiber Channel over Ethernet)
5. Improved configuration management system

Now more.

Virtualization As an additional service for data center customers, Huawei suggests considering virtualization not only of the server side, but also of network. For those who want to completely isolate themselves and manage their entire infrastructure inside the data center is an excellent opportunity to rent a part of CloudEngine (Fig. 12).


Fig. 12 - Model of use of virtual systems based on CE12800

The possibility of virtualization is provided by:
- The use of multi-core multiprocessor model of the device,
- The use of a modular operating system

The virtual system has the following features:
- Individual Control, Forwarding, Management and Service plane,
- Individually assigned to the virtual system line cards or card ports,
- Individually assigned system resources (I \ O, CPU, Mem) and configuration
- Complete isolation from other virtual systems

Clustering The innovation from Huawei is the CSS technologies for the CE12800 and iStack for TOR, which allow to combine several physical switches into one logical switch in order to simplify network management and improve its reliability.
In the case of CSS (Cluster Switch system), the difference from the classic stack is that:
- Consolidation is carried out using both special cards and line card ports (up to 16 ports)
- Intelligent load balancing between ports and switch chassis in a cluster
- Cluster components can be separated by a considerable distance (up to 80 km)
- Any type of 12800 chassis can be combined into a cluster.
The maximum number of switches in CSS is 4. The CE6800 / 5800 series supports a stack of up to 16 switches. A simple example of using CSS in Figure 13.


Fig. 13 Model Back-to-Back, an example of traffic management of two vlan.

TRILL . The trendy protocol from IETF was first implemented by Huawei for the CloudEngine equipment line. TRILL (Transparent Interconnection of Lots of Links) implements data transfer at the data link layer (L2) using the IS-IS protocol extensions and has the following advantages in comparison with traditional xSTP protocols:
- Support ECMP (Equal Cost Multi Path), the ability to use both primary and backup connections
- SPF algorithm based on the route selection mechanism for traffic, adapting the traffic model for the CLOS architecture
- Unified control plane for Unicast and Multicast traffic
- High speed of convergence
- Support for a large number of devices in one TRILL domain (more than 500 switches)

All these features of the TRILL protocol make it possible to make the creation and movement of virtual servers inside the cloud and real inside the data center completely transparent.
More TRILL will be discussed in the following articles.

Fcoe . As already noted, Huawei first implemented the FCF functionality (FCoE Forwarder) in CloudEngine devices. This made it possible to bring the existing network or design a new one in accordance with modern needs. In fig. 14 shows the result of combining the functionality of data transfer devices and traffic storage systems.


Fig. 14 Transition to Converged Networks



Conclusion


In this article, an attempt was made to convey to the Internet public information about the new line of Huawei equipment, as well as the latest software and hardware technologies implemented in the CloudEngine series equipment. I will try to describe some moments in detail in the following articles.

The article partially used the materials "For official use", which are not in public access.

Source: https://habr.com/ru/post/186792/


All Articles