📜 ⬆️ ⬇️

Network factory for the data center Cisco ACI - to help admin


With the help of this magic piece of the Cisco ACI script, you can quickly set up a network.

The network factory for the Cisco ACI data center has existed for five years, but nothing was said about it at Habré, so I decided to fix it a bit. I'll tell you in my own experience what it is, what is the use of it and where is its rake.

What is it and where did it come from?


By the time ACI (Application Centric Infrastructure) was announced in 2013, competitors were approaching traditional approaches to data center networks from three sides at once.
')
On the one hand, the first-generation SDN solutions based on OpenFlow promised to make networks more flexible and cheaper at the same time. The idea was to make decision making, traditionally performed by proprietary switch software, on a central controller.

This controller would have a single vision of everything that happens and, based on this, would program the hardware of all switches at the level of rules for processing specific threads.
On the other hand, overlay network solutions made it possible to implement the necessary connectivity and security policies without any changes in the physical network, building software tunnels between virtualized hosts. The most famous example of this approach was Nicira's solution, which by that time had already been acquired by VMWare for $ 1.26 billion and gave rise to the current VMWare NSX. Some piquancy of the situation was added by the fact that the co-founders of Nicira were the same people who previously stood at the origins of OpenFlow, who now said that it was not suitable for building a data center factory OpenFlow .

And finally, switching chips available on the open market (what is called merchant silicon) have reached a maturity level at which they became a real threat to traditional switch manufacturers. If earlier each vendor himself developed chips for his switches, then over time, chips from third-party manufacturers, first of all, from Broadcom, began to reduce the distance from vendor chips in terms of functions, and exceeded them in price / performance ratio. Therefore, many believed that the days of switches on self-developed chips are numbered.

ACI has become Cisco’s “asymmetric response” (more precisely, Insieme, which was incorporated into it by its former employees) for all of the above.

What is the difference with OpenFlow?


In terms of the distribution of functions, ACI is actually the opposite of OpenFlow.
In the OpenFlow architecture, the controller is responsible for writing detailed rules (streams)
in the equipment of all switches, that is, in a large network, it may be responsible for maintaining and, most importantly, changing tens of millions of records in hundreds of points in the network, so its performance and reliability in large-scale implementation become a bottleneck.

ACI uses the opposite approach: the controller also, of course, exists, but switches receive high-level declarative policies from it, and the switch itself renders them in details of specific settings in the equipment. The controller can be rebooted or turned off altogether, and nothing bad will happen to the network, except, of course, the lack of control at this moment. Interestingly, in ACI there are situations in which OpenFlow is still used, but locally within the host for programming Open vSwitch.

ACI is built entirely on VXLAN-based overlay transport, but it also includes the underlying IP transport as part of a single solution. Cisco called this the term "integrated overlay." In most cases, factory switches are used as the point of termination of overlays in ACI (they do this at the speed of the channel). Hosts are not required to know something about the factory, encapsulations, etc., however, in some cases (for example, to connect OpenStack-hosts) VXLAN traffic can be communicated to them.

Overlays are used in ACI not only to provide flexible connectivity through the transport network, but also to transmit meta information (it is used, for example, to apply security policies).

The chips from Broadcom used to be used by Cisco in the Nexus 3000 series switches. The Nexus 9000 family, specifically released to support ACI, originally implemented a hybrid model, which was called Merchant +. In the switch, both the new Broadcom Trident 2 chip and the Cisco development chip complementing it, implementing all the ACI magic, were used at the same time. Apparently, this allowed us to speed up the output of the product and reduce the switch price tag to a level close to the models just for Trident 2. This approach was enough for the first two or three years of ACI supply. During this time, Cisco has developed and launched the next generation Nexus 9000 already on its own chips with higher performance and feature set, but at the same price level. External specifications in terms of interaction in the factory are fully preserved. At the same time, the internal filling has completely changed: something like refactoring, but for iron.

How the Cisco ACI Architecture Works


In the simplest case, ACI is based on the topology of the Clos network, or, as it is often said, Spine-Leaf. Spine-level switches can be from two (or one, if we do not care about fault tolerance) to six. Accordingly, the more of them, the higher the fault tolerance (the smaller the reduction in bandwidth and reliability in the event of an accident or the servicing of one Spine) and overall performance. All external connections go to Leaf-level switches: these are servers, and docking with external networks via L2 or L3, and connecting APIC controllers. In general, with ACI, not only tuning, but also collecting statistics, monitoring failures, and so on - everything is done through the interface of controllers, of which there are three pieces in regular-size implementations.

The console never has to connect to the switches, even to start the network: the controller itself detects the switches and builds a factory out of them, including the settings of all service protocols, so, by the way, it’s very important to record serial numbers of the installed equipment during installation, so that you won’t which rack is located. For troubleshooting, the switches can be connected via SSH, if necessary, with the usual Cisco show commands that are carefully reproduced.

Inside, the factory uses IP transport, so there is no spanning tree and other horrors of the past: all links are involved, and the convergence with failures is very fast. Traffic in the factory is transmitted through VXLAN based tunnels. More specifically, Cisco itself calls iVXLAN encapsulation, and it differs from regular VXLAN in that the reserved fields in the network header are used to transfer service information, primarily about the relationship of traffic to the EPG group. This allows you to implement the rules of interaction between groups in the equipment, using their numbers in the same way as usual access lists use addresses.

The tunnels allow you to stretch through the internal IP transport and L2-segments, and L3 (that is, VRF). At the same time, the default gateway is distributed. This means that each switch is involved in routing the traffic that enters the factory. In terms of traffic logic, ACI is similar to a VXLAN / EVPN based fabric.

If so, what is the difference? Everything else!


The difference number one that you come across in ACI is how they are included in the server's network. In traditional networks, the inclusion of both physical servers and virtual computers goes to VLANs, and everything else dances from them: connectivity, security, etc. In ACI, they use a construction that Cisco calls EPG (End-point Group), from which get away Is it possible to equate it with VLAN? Yes, but in this case there is a chance to lose most of what ACI gives.

With respect to the EPG, all access rules are formulated, and the ACI principle by default uses the “white list” principle, that is, only traffic is allowed, which is allowed to be passed explicitly. That is, we can create EPG-groups “Web” and “MySQL” and define a rule allowing interaction between them only on port 3306. This will work without binding to network addresses and even within the same subnet!

We have customers who have chosen ACI because of this feature, because it allows you to restrict access between servers (virtual or physical - it does not matter) without dragging them between subnets, which means without touching the addressing. Yes, yes, we know, because no one handles IP addresses in application configurations, right?

The rules for passing traffic to ACI are called contracts. In such a contract, one or several groups or levels in a multi-tier application become a service provider (say, a database service), others become a consumer. A contract can simply skip traffic, and can do something more cunning, for example, send it to a firewall or balancer, and also change the QoS value.

How do servers get into these groups? If these are physical servers or something included in the existing network into which we created the VLAN trunk, then in order to place them in the EPG you will need to point to the switch port and the VLAN used on it. As you can see, VLANs appear where you can not do without them.

If the servers are virtuals, then it is enough to refer to the connected virtualization environment, and then everything happens by itself: a port-group will be created (if to speak in terms of VMWare) to connect the VM, the necessary VLANs or VXLANs are assigned, the necessary ports of switches are configured, etc. So, although ACI is built around a physical network, connections for virtual servers look much easier than for physical ones. ACI already has a bundle with VMWare and MS Hyper-V, as well as support for OpenStack and RedHat Virtualization. At some point, there was also built-in support for container platforms: Kubernetes, OpenShift, Cloud Foundry, while it concerns the application of policies and monitoring, that is, the network administrator can immediately see on which hosts which subsets work and which groups they belong to.

In addition to being included in a particular port group, virtual servers have additional properties: name, attributes, etc., which can be used as criteria for transferring them to another group, for example, when renaming a VM or when an additional tag appears. Cisco calls this the microsegmentation groups, although, by and large, the design itself with the ability to create many security segments in the form of an EPG on the same subnet is also quite a microsegmentation. Well, the vendor knows better.

The EPGs themselves are purely logical constructions that are not tied to specific switches, servers, etc., so that with them and constructions based on them (applications and tenants) you can do things that are difficult to do in regular networks, for example, to clone. As a result, say, it is very easy to create a clone of the production environment in order to get a test environment that is guaranteed to be identical to the product. You can do it manually, but better (and easier) - through the API.

In general, the control logic in ACI is not at all like what you usually meet
in traditional networks from the same Cisco: the software interface is primary, and the GUI or CLI is secondary, because they work through the same API. Therefore, almost everyone who is involved in ACI, after a while begins to navigate the object model used for management, and automate something to fit their needs. The easiest way to do this is from Python: there are convenient ready-made tools for it.

Promised rake


The main problem is that many things in ACI are done differently. To start working with it normally, you need to relearn. This is especially true of network operation teams in large customers, where engineers have been involved in “prescribing VLANs” for years on request. The fact that now VLANs are no longer VLANs, but for laying new networks to virtualized hosts, you don’t need to create VLANs with your hands at all, completely “tearing down the roof” of traditional networkers and forcing them to cling to familiar approaches. It should be noted that Cisco has tried to sweeten the pill a bit and added a “NXOS-like” CLI to the controller, which allows you to configure from an interface similar to traditional switches. But still, in order to start using ACI normally, you will have to understand how it works.

From the point of view of prices on large and medium-sized networks, the ACI network doesn’t actually differ from traditional networks on Cisco equipment, since the same switches are used to build them (Nexus 9000 can work in ACI and in traditional mode now) main workhorse for new data center projects). But for the data centers of the two switches, the presence of controllers and Spine-Leaf-architecture, of course, make themselves felt. Recently, Mini ACI-factory appeared, in which two of the three controllers were replaced by virtual machines. This reduces the cost difference, but it still remains. So for the customer, the choice is dictated by how interested he is in security features, integration with virtualization, a single point of management, and so on.

Source: https://habr.com/ru/post/455882/


All Articles