📜 ⬆️ ⬇️

DWDM lines between data centers: how does the approach change, if we are talking about banks and responsible objects


This is 8 Tbps (when using 80 wavelengths with a bandwidth of 100G).

Since 2006, I have commissioned the switching equipment of one and a half dozen banks. And a number of other objects that I can not mention. These are the very channels where the speed of light propagation in optical fiber interferes with the speed of synchronous replication.

It happens that customers first build data centers, and then think about how to connect them with WDM . An analogy can be made with traffic interchanges in Moscow, when they first build high-rise buildings and then realize that the two lanes do not cope with traffic, and build expensive three-level junctions on the heel of the ground, although it would be much more logical to lay a place for future roads and interchanges, and after building houses.
')
Below I talk about a few typical cases of architecture where it is very easy to catch a scaling error or an incorrect reservation. And about the magic "works - do not touch."

I must say at once that a specialist with experience in designing optical networks can make a good DWDM project . The devil, of course, is in the details, namely in the search for a compromise between price and functionality. Surely you can imagine how fast the requirements for the channel of your data center are growing. With optics, the story is the same as with servers: you can buy exactly for current needs and change everything in six months, when a new version of ERP comes out, or you can take it to grow, clearly understanding how to grow further.

What is it all about?



On the multiplexer-demultiplexer on each side, the optics in the middle. Compared to dark optics, it would take 40 optical pairs to transmit forty channels of 10G, when using DWDM technology only one optical fiber is needed.

The WDM system, in addition to solving traffic transmission problems, can solve redundancy tasks. In some cases, installation of just a few additional boards is enough - and we get a system with redundancy "on line". At the receiving and transmitting side, devices are installed that transmit all traffic over one pair of optical fibers of the main direction. With a break for no more than 50 ms (the average time in our practice is 23 ms), they switch to the backup direction.

A very important point: if you initially lay the system as a transport network with the ability to switch optical links using ROADM, and not pile up the existing equipment with “dark optics”, you could avoid many future problems that our customers are facing now. This is me to the question of proper scaling planning.

The usual situation is that a large company announces a tender or competition to build the infrastructure between its data centers (or its data centers and partner data centers, or critical nodes of the entrance to the highway). And then the fierce story begins with a lack of understanding how to do it. There are 5-6 companies for the tender, of which 2-3 stably offer prices are much lower. It is rather simple with them - most likely, their project will either not work according to the specification, or simply will not meet the requirements of the customer after acceptance. Experienced IT executives get around this rake, but immediately after they face another dilemma: how to choose from the three remaining offers?

Here you can only dig deep into the parameters of the project. For example, for banks, each such case is a balance between the budget, reliability and performance of the system. The question is how well everything is designed and how well the equipment is selected. Explaining on the fingers is very, very difficult, but I will try to give examples.

Typical situation


When two points are connected, two independent channels are simply laid. What happens if an excavator arrives and winds one of the channels onto the bucket? Will the equipment respond in milliseconds to build a new route? What will happen to the already sent data (stuck "right in the bucket")? What happens if a multiplexer fails? Suppose that the entire site was completely flooded or a fire broke out at the site. The system should automatically switch to the channels available to it with a minimum of time so that the connection is not lost. And the time there is absolutely not the same as that of human reaction - the bill in the same banking transactions goes in milliseconds.

The excavator has not yet understood what he has done, and the data are already making a detour 200 kilometers, bypassing our hero.

Projects


Over the past year, the number of projects with distributed data centers has increased dramatically. The infrastructure is growing, the amount of data is growing, data centers are increasing in scale. It is just one data center in which all business-critical data plus information processing processes are concentrated; this is somehow not very reasonable. In fact - a single point of failure, the benefit of examples, even in the banking sector was already enough.

And at this moment, when a decision is made to build a distributed data center, a question arises with a connection. How to make the bundles inside the data center is clear to everyone - if it is Ethernet, it’s not a question at all, if FC is, in general, too, Infiniband is rarely used (it is the youngest technology now, but in the long run it is very popular). But how to properly build the infrastructure for the integration of data centers - here begins the rake.

A simple example: dark optics and WDM


My team at KROK creates a sophisticated, disaster-resistant DWDM system. It is planned to link the three data center and the customer's test area. For fault tolerance, it was decided to create two independent rings.


Topological DWDM scheme using two independent rings

Initially, the customer thought about dark optics, because the solution was quite simple architecturally and it seemed cheap. However, to transmit the required amount of traffic, we would have to use about 30 optical pairs per ring. Almost all sections of the rings would pass in one cable, and this would require about 60 pairs of optics. The same distance that would be required to cover the “dark optics” was about eighty kilometers, which would not allow to overcome without amplifying the signal. Then we would have to add two additional sites that served as a repeater only.


Topology without DWDM

Thus, a competent formulation of the problem (more precisely, an understanding of the architecture) made it obvious for the customer a question of the choice of technology.

A little more complicated: the choice of equipment node


The issue of choosing equipment and architectural solution of the DWDM network is being resolved. Initially, it is unclear what exactly and how much traffic will be transmitted. Also, the network topology was not fully understood (it developed). Customer requirements sometimes changed within two weeks as new analytical data and new development plans were received. Naturally, laying a system in the project that would initially cover all possible customer requirements is insanely expensive.

The customer actively scaled, but could not predict more than two years. We agreed that the network is built with nodes that have a reserve in the planning horizon. Further, with the growth of traffic, the network could be expanded one and a half times without replacing the chassis, without applying new technologies and without a fundamental change in the architecture. More than 200 Gb / s of traffic was transferred to the line between the sites.

Architecture - 3 flat rings, 5 multiplexers, linear redundancy. The odd number of multiplexers is due to the fact that one multiplexer took two lines, and performed the function of 2 devices. This architecture made it possible not to use a cross-switching matrix for the organization of redundancy and to do with cheaper Optical Line Protection modules. At the same time, the system only benefited from such a decision, since no traffic was transmitted over the backplane.

In simpler terms, we intentionally made the multiplexer functionality less flexible, but at the same time we increased the reliability and reduced the cost of the nodes. Of course, for an accurate calculation, it was necessary to check hundreds of parameters and recalculate the project with the engineering team more than a dozen times.



The third example: reliability does not happen much


Initially, when building a DWDM system, the main criterion was fault tolerance. It may seem like redundancy is redundant, but it is not. A full reservation system of 1 + 1 was selected and a reservation on the line was added. What was it done for? The fact is that with full 1 + 1 redundancy and an optical cable break, traffic in one of the systems disappears before the optical cable is restored. With combined redundancy, if a cable is broken, traffic in one of the systems disappears only for 50 ms or less (in our case) after which switching occurs, and both systems operate at full capacity, which allows the customer to send extra traffic through one of the systems. Also such a system allows you to survive as a single cable break, and the simultaneous failure of any of the nodes in the case of the same fire.

An example of a very large bank


We made a bundle for three data centers of the bank and two of our own, where they have a number of critical services. We, in fact, linked two infrastructures - our own infrastructure and the customer's infrastructure. Communication - optics with DWDM. An optimal set of equipment was sought, which corresponds to a specific topology and specifically to specific tasks. Further, the algorithms of operation of this network structure were designed and tuned (in fact, rings with two rassechikami). At each point there is a complete catalog of site failure scenarios, each individual node, channel, physical line, and combinations of these factors — a sort of large table of typical reactions. Even the scenario was developed “if, for example, the multiplexer operation fails at the same time and at the same time a line breaks in a completely different section”. In theory, this is unlikely, but I know at least two cases with the operator and the bank, when this happened with the difference in hours. The laws of Murphy in the backbone sphere work like nowhere else. Well, malicious intent in scenarios was also not excluded.



Here is a card of the project of another bank, still large, but not so large:
• Equipment MSTP 15454E Cisco Systems
• Three sites (main data center, backup data center, operator), distance 5-20 km
• Network topology - full ring
• Client interfaces between data centers - 10GE - 8 pcs., FC-800 - 8 pcs., FC-400 - 4 pcs., GE - 16 pcs.
• Client interfaces from each data center to the operator site - FE / GE - 8 pcs.
• Client signal protection is used - in the case of a single ring break, the signal switches to another direction within 50 ms
• Used multiplexers for 40 channels (wavelengths)
• Transponder cards are used - customers connect with multimode optics or copper
• Uses 220 V power from two power supplies.
• Data center platforms used 5 M6 chassis (6 slots for line cards), an operator platform - 2 chassis.
• Typical data center equipment set occupies 34 RU rack space
• Work on the deployment and launch of the system was carried out by two people during the month
• Optics for the needs of DWDM stood out in stages as the functionality of the existing network was transferred to the already running sections of the new transport network

Here is another similar example:



This is what iron itself looks like:



Management Interface (one of the options):



Result


As a rule, at the entrance we have a bank or another similar customer with our own optical line, which requires a new data transmission system (more precisely, a deep modernization of the old one). The specificity of such channels in Russia is such that while it works, it is better not to touch it. Modernization occurs if and only if the customer requires an expansion in speed, and not on the fact of the release of new technologies.

During the project we are building a reliable DWDM network. The DWDM installation opens up opportunities for growth without changing optics.

A few general educational tips:


Summary


For 9 years, our team has received a very interesting experience with the former Nortel now - Siena, Tsiskoy, Houiveem, MRV, X-Terra and other vendors. There were also introductions of domestic producers. As a result, an exact understanding of the specifics of the equipment appeared (I repeat, in the task of the main line for the operator, the operators themselves are cooler) - but I think we know almost all the possible rakes in terms of building reliable networks. If you are interested in analyzing a nuance or understanding how to design-count, ask in the comments or by mail AFrolov@croc.ru.

And, taking this opportunity, I convey fervent greetings to all those who dig in the city limits without construction permits.

Source: https://habr.com/ru/post/247905/


All Articles