📜 ⬆️ ⬇️

How we moved the cloud from 10G Ethernet to Infiniband 56G


Mellanox MC2609125-005 cable

In our case, Infiniband would work five times faster than Ethernet, and it would cost as much. There was only one difficulty - all this had to be done without interrupting cloud services in the data center. Well, this is about how to rebuild a car engine while driving.

In Russia, such projects simply did not exist. All those who have so far tried to switch from Ethernet to Infiniband have somehow stopped their infrastructure for a day or two. We have about 60 large customers in the cloud “shoulder” , which is located in the data center on Volochaevskaya-1 (including banks, retail, insurance and critical infrastructure) on almost 500 virtual machines located on about a hundred physical servers. We were the first in the country to have the experience of rebuilding the infrastructure and network infrastructure without downtime, and we are a little proud of it.
')

Infiniband-cable at the entrance to the server

As a result, the bandwidth of the communication channels between the "cloud" servers increased from 10 Gb / s to 56 Gb / s.

What happened


In the data center on Volochaevskoy-1 there was Ethernet 10 GB, which united servers. We began to seriously consider Infiniband for ourselves at the moment of designing our next data center - Compressor (TIER-3 level). Comparing the total cost of ownership, taking into account the cards, switches, cable, maintenance, it turned out that Infiniband cost almost as much as Ethernet. But it provided much less network latency and was corny faster at least 5 times.

Well, plus it is worth mentioning the great fact that Infiniband at such facilities is an order of magnitude easier to set up and maintain. The network is designed once and then quietly expands simply by plugging in new hardware. No dancing with a tambourine as in the case of complex Ethernet-architectures, no bypass switches and settings for the situation. If somewhere on the switch problems, only he falls, and not the entire segment. Inside the infrastructure is a real plug-and-play.

So, having built a cloud shoulder in the data center Compressor on Infiniband and feeling how great this technology is, we thought about rebuilding the cloud shoulder of the data center on Volochaevskaya-1 using FDR InfiniBand.

Provider


There are four large Infiniband suppliers, but the practice is that it is worth building the entire network on homogeneous equipment from one vendor, namely Mellanox. In addition, at the time of design, it was the only vendor that supported the most modern Infinband standard - FDR. Actually, we had a wonderful experience of reducing network delays for several customers (including a large financial company) with the use of Mellanox - their technology has proven itself well.

Infiniband FDR latency is on the order of 1-1.5 microseconds, and on Ethernet on the order of 30-100 microseconds. That is, the difference is two orders of magnitude, in practice in this particular case, too, something like this.

Regarding the topology and architecture, we decided to greatly simplify our lives. It turned out that it is not very difficult and costly to make exactly the same scheme as in the second cloud arm (our cloud is based in two data centers) - in the Compressor Data Center - this allowed us to get two identical sites. What for? It is easier to reserve replacement equipment, it is easier to maintain, there is no zoo equipment, each of which needs its own approach. Plus, we just had quite a bit before the release of some equipment from the support - we also replaced the old servers. I will clarify: this fact did not affect the decision-making, but it turned out to be a nice bonus.

The most important reason was the reduction of network delays. Many of our customers in the "cloud" use synchronous replication, and therefore reducing network delays can indirectly speed up work, for example, storage systems. In addition, using Infiniband allows you to migrate virtual machines much faster. If we are not talking about the current moment, but about the future, then since Infinband supports RDMA, then in the future we will be able to add the ability to migrate virtual machines a few times faster, replacing TCP with RDMA.
Of course, now RDMA, TRILL, ECMP technologies appear in the Ethernet world, which together provide the same capabilities as Infiniband - but in Infinband the possibilities of building Leaf-Spine topology, RDMA, autotuning have been around for a long time and were designed at the protocol level rather than were added as AD-HOC solutions, as in Ethernet.

Transfer


In the cloud shoulder on Volochaiskaya, we had a network and storage segment on a 10-gigabit network. We connected the infrastructure segment built on Mellanox to it, so that we got one L2 segment, and everything that was on the first stand in the online mode migrated to a new one. The customers didn’t feel anything other than increasing the speed of the services, no machines were turned off, no critical problems occurred during the move.

Here is the transfer scheme in a simplified version:





Here is more complicated:



In fact, we created one data transmission medium between these two stands, although the technologies are not very compatible in principle. Transferred virtual infrastructure of customers, and then disconnected the stands.

To combine the two environments, we used Mellanox SX6036 managed switches. Before designing a test stand for cloud migration, an engineer from Mellanox came to us for advice and technical assistance. After agreeing with him on our plans and capabilities of the Mellanox equipment, he sent us GW licenses, allowing us to use the switch as a Proxy-ARP gateway, which can transmit traffic from the Ethernet network to the Infiniband segment. In total, the move made about a month (not counting work on the project). It was divided into several stages, the first - we assembled a test stand - a mini-model of the "cloud" - in order to test the viability of the idea (it was necessary to migrate not only the usual, but also the storage network). Worked several times transfers on the test stand and began to write a plan for combat migration.


Testbed layout

Next - the operation service switched to the night schedule and slowly began to mount everything you need. The story about the wife, whose husband went at night where it was not clear where and claimed that he went to work - it was just about such transfers.


The specific scheme (ip addresses, vlans, pkey necessary equipment settings for the day) to implement the transition from 10G network to Infiniband.


During installation


QSFP-4xSFP + Hybrid cable connected to Mellanox SX6036


The same cable connected to the 10G switch Ethernet module


Optics at the entrance to the 10G-Ethernet part

Old equipment is now standing aside, and we have already begun to disassemble it. Slowly and quietly remove it from the racks and dismantle it. Most likely, it will go to test environments to help us work out even more complex technical cases, which we will still please you with.

Everything, at least formally, we have not finished the move yet (we need to remove the old stands), the result is already very warm. I am pleased to answer your questions about the move in the comments or by mail mberezin@croc.ru .

Source: https://habr.com/ru/post/232509/


All Articles