📜 ⬆️ ⬇️

Cumulus Linux for the network in the data center

The modern data center is markedly different from the traditional corporate network. Applications focus more on L3 switching instead of assuming that all connections are on the same subnet, more traffic goes “horizontally” than “vertically”, etc. Undoubtedly, the main difference is a gigantic scale.

Thanks to virtualization, the number of network connections in the data center is estimated from tens of thousands to millions, whereas in the good old days there were only a few thousand. The scale of virtual networks has long gone beyond the capabilities of traditional VLANs, the speed of reconfiguration has also increased by orders of magnitude. In addition, the number of servers in a modern data center is such that the network equipment is needed a head more than in a traditional corporate network.

The evolution of the network infrastructure went on as usual, from monolithic pseudo-OS via embedded QNX and VxWorks operating systems to the modern stage, when the OS is based on Linux or BSD, sometimes this image is launched in the virtual machine of the usual Linux distribution. On the other hand, the management method refused to evolve; it is still a command line with a multi-level structure. Difficulties add more and different syntax from different manufacturers, and often one in different models.
')
To summarize, the problem lies not in hardware, but in network operating systems. Approaches to the solution were divided, the first path went to the northbound and southbound API approach (OpenFlow gained the most popularity), the second - to use Linux and its ecosystem. In other words, the first option tries to add an advanced management API to existing OSs, the second one suggests switching to an OS that already has all the necessary features.

In the previous article, we talked about switches without a pre-installed OS.
and the deployment environment for ONIE .

It's time to tell about one of the representatives of the second approach, which can be installed on our platforms - Cumulus Linux.





Cumulus Linux is not just based on Linux, as many are, this is Linux, so working with it is almost the same as normal.
Linux server. Completely standard applications are installed directly on the switch, if necessary, new utilities are developed
without focusing on specific APIs, the usual tools for working with the network are supplemented with tools for building CLOS factories and automation.


Architecture

Installation and Monitoring

What does this look like in deployment? Everyone is accustomed to the "boot by PXE, OS image deployment" approach, here is similar.


Habitual approach

  1. The first download goes to ONIE, the process of searching for the source of the OS image is launched, the required image is determined in it and deployed to the switch.


    Training
  2. The first boot in Cumulus Linux, the presence of configuration scripts is determined through option 239 DHCP. If the response is a URL, then
    a script is requested for it, CUMULUS-AUTOPROVISIONING is searched for in it and the script is run as root. Bash, Ruby, Perl, Python are supported.


    Script search
  3. The third step is a normal download to Cumulus Linux.



Work mode

The first setup is complete, great. What's next?

And then you can use all the same tools as for servers.


Orchestration

In addition to existing tools, you can integrate any development that is already used in the company.

Monitoring? Ganglia, Graphite, collectd, net-SNMP, Icinga - install on the switch and collect data in real time.

The delights of automation

Linux not only has excellent management capabilities and clear APIs, but it also fits perfectly with the network model of a modern data center. The control level is in user space and is separate from the packet processing level in the kernel. The FIB (Forwarding Information Base) is located in the kernel, the RIB (Routing Information Base) is controlled from the user environment by the corresponding daemons, the basic capabilities of the network equipment and a number of advanced functions related to BGP are supported. The current toolkit allows you to directly access interfaces, routing tables, there are mechanisms for notification of changes, etc.

This database allows you to use your native Linux console, instead of specialized JunOS shells, NX-OS, and others. The console is by its nature great for using chains of independent commands and using scripts.

And the switch is perfectly automated!

  1. Package and Process Management
  2. Configuration templates
  3. Monitoring automation


All this helps to customize L2 / L3 and simplify the life of the administrator.

Another handy tool is the Prescriptive Topology Manager (PTM). When the logical topology of the data center is defined, it is not an easy task to consume the cable management system, consuming a lot of time and effort. PTM allows you to check the correctness of connecting cables in real time and specify the exact location to eliminate the error. He uses the graphviz-DOT cabling plan (some companies already use it to create a plan) and compares it with the information obtained through the LLDP to verify that the cables are connected correctly.


Topology check


Comparison with LLDP data

Integration examples

Software solutions for overlays limit the ability to control the hardware side of the network, not allowing to automate configuration and management, as well as complicate problem tracking. Linux supports VXLAN, so it’s not a problem to develop agents for various network virtualization solutions. Cumulus Linux can work as a hardware portal L2, allowing you to bypass the limitations of software solutions and maintain high switching performance (VXLAN tunnel endpoint (VTEP) is processed at the speed of the channel) while using all the advantages of VXLAN technology.

Developed agents for PLUMgrid, Nuage Networks, Midokura, VMware NSX solutions.


VMware Integration


Integration with Midokura

Network architecture


Traditional network

The traditional network architecture includes kernel, aggregation and access levels. Three levels - above the delay and there are a number of hereditary restrictions. Such solutions are not suitable for modern data centers with high server density and significant cross-server traffic. Protocols STP / RSTP / PVSTP, VTP, HSRP, MLAG, LACP are used. In general, "so we were taught."

  1. Orientation on L2.
  2. Static by nature - VLAN, poorly scaled when virtualized.
  3. Leans on crutches - MLAG, Trill, etc.
  4. Optimization for north-south traffic.
  5. Low scalability - hundreds or thousands of connections.



Modern network

  1. She is easier.
  2. Fewer proprietary protocols.
  3. Predictable delays.
  4. Horizontal Scalability.
  5. Orientation on L3.
  6. Great for virtualization and cloud environments with multiple “owners.”
  7. Scalable to millions of connections.


Weak spots

There are no ideal options, due to the orientation on the L3 environment, Cumulus Linux has several weak points.

  1. The desire to build a L2 factory using MLAG, TRILL, Virtual Chassis, VLT, Chassis Stacking, etc.
  2. Work with Netflow
  3. Providers access switches with VRF, MPLS, VPLS, etc.
  4. QinQ
  5. Access switches with Port Security, complex QoS rules, 802.1x, PoE
  6. IS-IS, RIP, EIGRP routing protocols.
  7. Authorization by TACAS or Radius AAA (there is an alternative to LDAP).


Everything related to the traditional approach to building a network and providers are poorly compatible with the methods for building
networks in a modern data center.

Summarizing.

Unlinking a hardware switch from the operating system allows you to take the next step to software-configured DDC. The ability to select the OS allows you to focus on those features that are necessary and choose from several available options the best.

The system becomes much more transparent, everything can be checked using the built-in Linux tools without being tied to specific vendor commands. It also becomes possible to work with a router as with a server that has many network ports, just as at the beginning of the development of network equipment, a UNIX server with several ports. Only in the modern case, all network operations are performed using ASIC, and the server part only manages.

As usual, we have a lab with Cumlus Linux on the Eos 220 switch available for inhuman experiments and preparation.

Source: https://habr.com/ru/post/231533/


All Articles