📜 ⬆️ ⬇️

How it works: data center "Berzarina"

image

Our clients (even those who have access to the server) cannot see everything that happens in the data center. Many of them only approximately know how exactly uninterrupted work is maintained in 24/7/365 mode. Today we will tell how this is done in our data center Berzarina, located in Moscow.

image

Fault tolerance in our data centers is provided with additional equipment that duplicates the functions of vital devices that provide two necessary resources for the operation: power supply and cooling for the placed equipment.
')

Cooling


Let's start the story with cooling systems and air conditioning. In the Berzarin data center, precision (from the English. Precision - accuracy) UNIFLAIR air conditioners are used, which use the traditional operation pattern of the “chiller-fan coil” and provide permanent cooling of the server room.

image

This is an air conditioning system in which coolant (refrigerant), circulating under relatively low pressure, serves as the coolant between the central refrigerating machine (chiller) and the air cooling units (fan coils).

image
Chiller

image
Fan coil

In addition to chillers and fan coils, the system includes a pumping station (hydraulic module), an automatic control subsystem and pipe wiring between them. The greatest load falls on the summer season, when the difference between the ambient temperature and the temperature inside the server is maximum. In the rest of the year, the system uses the technology of “free cooling” (from the English. Free cooling), which uses a low ambient temperature for natural cooling with minimal load on the chillers. Such technologies are actively used by the largest corporations in their data centers - for example, Microsoft makes the most of them in its data center located in the cool climate of Ireland (the city of Dublin). A very interesting photo report can be viewed at the link .

Pumping station is an important component of the system. Here, pumps operate around the clock, which continuously supply refrigerant flows from chillers to fan coils.

image

image

Redundancy implies that the system requires at least two pumps to operate. We have three pumps installed that work in shifts. Every 10 hours, the running pump is turned off and an idle pump starts instead.

image

This ensures a uniform operation time and in the event of failure of one of the pumps, this will not affect the operation of the system. System engineers of our data centers during daily rounds necessarily check the status of the pumps and monitor the readings of their work. To control the operation of chillers, we have a separate hardware control panel for the cooling system, which is monitored round the clock.

We use the classic configuration of the location of server cabinets, forming two climate zones inside the server room. Two rows of racks are located front to each other. Cold air comes from under the raised floor, and servers take it from there. This climate zone is called the “cold corridor”. The temperature in this zone is +20 ± 2 ° .

image

image

The air that has heated up during the operation of the servers is discharged into the space behind the rack, where the so-called “hot” corridor is located. It contains fan coil units that take hot air for cooling.

image

Operational information about the temperature in the “hot” and “cold” corridors arrives round-the-clock to the system engineer on duty with an update interval of 30 seconds.

image

If the temperature is out of range, an alarm will sound. During rounds, engineers measure the temperature of the equipment with non-contact infrared pyrometers. If we find that the client equipment is overheating, we immediately report it to the client, indicating the recorded temperature.

Power supply


Continuous power supply to the racks is a priority. Three independent inputs fit our main electrical panel: two from different transformer substations and one from diesel generator sets.

image

Both inputs work synchronously: the load is distributed evenly across them. In the event of a power outage from one of them, the ATS (automatic input reserve) instantly transfers the load to the second beam, eliminating downtime.

In the event of a sudden power outage (for example, as a result of a serious accident of the city power grid), our clusters of UPSs (uninterruptible power supplies) from General Electric are automatically activated.

image

At the same moment, 3 seconds after the loss of power supply, a command is automatically given to start the diesel generator set (DGS). After 2 minutes, it goes into operation and the entire load is switched to it. We use high-performance Des Gesan with Volvo Penta engines. In peak mode, they can produce up to 504 kW of power, so the data center can work without stopping as much as you need: a standard fuel reserve is enough for 10 hours, and if necessary, you can always refuel.

image

Every month we carry out test launches of a diesel generator set with checking levels of fuel, oil and antifreeze. Periodically, tests are carried out in which a complete cessation of the power supply is simulated and the DGU is automatically started and then the load is switched to it. In winter, diesel engines can run worse than in summer, so they are equipped with preheaters and are designed to guarantee starting even at −30 ° C.

Firefighting


When working with any equipment, even the most reliable, there is always the risk of short circuit and fire - for example, if any components fail. That is why all data centers are equipped with an automatic fire extinguishing system. It is calculated in such a way that it is possible to reliably eliminate the source of fire without damaging the equipment. We use a gas fire extinguishing system for this.

image

The principle of its operation is based on chemical retardation of the combustion reaction. The system delivers the gas fire extinguishing composition (Freon-125) into the room. Once in the combustion zone, this gas rapidly decomposes with the formation of free radicals, which react with the primary products of combustion. When this occurs, the burning rate decreases to complete attenuation.

Automatic installation of a fire alarm quickly detects a fire. The supply of fire extinguishing composition is delayed in order to have time to evacuate people from the premises.

image

In our case, the system gives 30 seconds to evacuate, after which the system triggers. Protection against accidental operation is also provided, the system activates fire suppression only if at least two fire detectors (sensors) have been triggered.

Evacuating people is necessary: ​​the gas displaces most of the oxygen from the room, and visibility is reduced to several tens of centimeters. Our engineers are trained to act in the event of a system trigger and know how to act in such a situation.

Monitoring and response


All equipment is always under surveillance around the clock, and system engineers can always quickly find out the status of each device. This ensures instant response to all faults and emergencies.

image

Several times a day, rounds of all the premises of the data center are carried out. During these rounds, we identify all possible shortcomings and inform all those responsible. Thanks in large part to this, we can say that our data centers are ready for any surprises and able to work autonomously for any amount of time.

Conclusion


Ensuring the smooth operation of the data center is a very non-trivial task. For its successful solution, all “bottlenecks” prone to failures are necessarily reserved with additional equipment. Regular rounds and monitoring allow you to diagnose and prevent possible causes of failures on time. Timely replacement of old equipment with new ones, development of more sophisticated monitoring systems and a flexible approach to controlling it - this is our task, which is successfully solved every day, ensuring our customers are confident that their data and projects will be reliably protected and available in 24/7/365.

Readers who can not leave comments here are invited to our blog .

Source: https://habr.com/ru/post/225035/


All Articles