📜 ⬆️ ⬇️

Monitoring of engineering infrastructure in the data center. Part 3. Cooling system


Cooling system NORD-4.

Part 1. Monitoring of engineering infrastructure in the data center. Highlights.
Part 2. How is the monitoring of power supply in the data center.
Part 3. Monitoring of cold supply by the example of the NORD-4 data center.
Part 4. Network infrastructure: physical equipment.

We have already talked about what a monitoring system is and why it needs to be designed during the construction phase. In the article “Monitoring of the engineering infrastructure in the data center,” we looked at the general situations in which comprehensive monitoring is necessary, and also talked about the features of our systems.
')
Today we will talk about how the cooling supply monitoring system is arranged on the example of the NORD-4 data center. Before reading, we recommend reading the article on how the cooling system was created.

For the NORD-4 project, we chose a water-glycol cooling system. In the diagram above, it is indicated by yellow-blue lines. Since the coolant in the system is liquid, it is extremely important to follow:


All air conditioners and chillers are connected to a common monitoring system. We monitor the performance of each device.


Dashboard with parameters.

The monitoring system also displays the status of equipment operation: disabled, normal operation, accident, repairs are underway.


Green color indicates staffing chillers, white - disabled. If something goes wrong, the indicator lights up in red.

Temperature sensors


The frontman parameter in the monitoring system is the temperature in the cold corridors of the machine rooms. The average temperature in the halls ranges from 23 to 27 Âş. At this temperature, the equipment is still not heated, but it is no longer covered with frost :). This parameter is registered in the SLA, and non-compliance will have to pay a fine to the customer. From him we are "dancing", setting up the entire cooling system in the data center.

The temperature stability in the halls is not a discrete parameter, but a process that is ensured by the equipment of the cooling system.


The temperature in the engine room on the dashboard monitoring system. Temperature indicators reflect the location of the sensors in the room.

In all the cold corridors we placed three temperature sensors. We foresee a dispute in the comments about their optimal number and location. Our practice shows that three pieces are enough, and this is why:


In the hot corridors installed one sensor. Their testimony is for informational purposes only.

In a nutshell, use sensors rationally and without paranoia. The excessive number of sensors creates a “noise” on the observation panel and prevents the engineers from concentrating.


Temperature sensor mounted on the rack.

Fluid temperature The water is cooled as follows. Chiller cools the glycol that enters the heat exchanger. Due to the cold glycol in the heat exchanger, the warm water is cooled. After the water and glycol have “exchanged” temperatures, the water goes to the air conditioners in the halls, and the glycol returns to the chiller.


The interaction of external and internal circuits in the chiller scheme.

First of all, the temperature of the water that enters the air conditioners is important for the system. It should steadily keep at a given level. For our system, this is 18 ° C.

To adjust the temperature, we use a three-way valve (THK). It regulates the amount of water leaving the heat exchanger. If the temperature rises, the valve opens stronger and supplies more water to the heat exchanger. The current percentage of discovery of TCC is displayed in the monitoring system interface.

It remains to understand the monitoring of the temperature of the glycol and water in the circuits themselves. The temperature is monitored over the entire length of the pipeline; we placed the sensors as follows:


Temperature "overboard". This indicator does not apply to the data center directly, but it is also important in monitoring. We do not use the average indicators in Moscow, since the air temperature at Borovaya and Korovinsky may have a difference of several degrees. We are interested in the weather exactly where the equipment is.

An independent weather station is installed at each location, reading the temperature, humidity and wind speed. These data show how the air conditioning system works in real weather conditions at a particular facility. Since the annual difference in Moscow can be from –35 º to + 35º, we are obliged to monitor the weather and prepare for its quirks in advance.


This is what an independent weather station installed on the site looks like.

For example, every summer evening, the duty engineer receives a weather forecast from three sources. If temperatures above 30 CÂş are expected, responsible specialists receive an sms-mailing with a call to be more vigilant.


Temperature and humidity data from the weather station in the data center on the dashboard monitoring system.

In general, monitoring does not have a seasonal division, unlike equipment that needs to be prepared for winter / summer.

Other sensors


Leakage sensors. In each NORD-4 machine room, 14 Stulz air conditioners are installed. They are equipped with factory leakage sensors, but not enough to monitor them. In places of valves, pipe joints, on the heat exchanger, under air conditioners and in other critical nodes, we installed an independent sensor network. Data from them is collected and fed into the overall system.

The entire fourth floor of the data center is allocated for air conditioning: there are heat exchangers, pumps, tanks. We do not install leakage sensors under each unit of cooling equipment, as the floor is made in the floor. If a leakage occurs, water will flow down the drain into storage tanks. In front of each drain funnel are leakage sensors. In other words, it is not the device that is being monitored, but the zone from which water can flow.


So, the leaked sensors are displayed on the dashboards.

Fluid pressure In addition to temperature and humidity, we monitor the fluid pressure in the cooling circuit. Since the system is closed, a pressure drop can mean a depressurization - read, a leak. A sharp drop is already a serious problem, provided in the emergency instructions.

Pressure monitoring is carried out at different points on all floors of the data center. The rate indicator for pressure is slightly floating: it insures the system against false positives due to a height difference.

Difficulties in building a monitoring system


At first glance, it seems that the start-up of the monitoring system is a linear process: sensors were installed, they were networked, people were seated at the console, and threshold values ​​were scored into the system. But in the case of NORD-4 there were nuances: the halls are being filled gradually, we do not know in advance what kind of equipment and on which racks it will be located.

By launching the monitoring system, we set thresholds, guided by the design decision. Thus, the indicator WATER OUT (water that enters the air conditioners in the halls) should be stable 18 º. Based on this, we calculate the remaining values ​​and create a table of “ideal” parameters.

As the data center becomes full, the control panel may start issuing false warnings. This is extremely dangerous, since the attention of the specialist is dissipated, and he may miss the real problem. We call this “re-monitoring”: on new equipment, indicators may float slightly within threshold values, creating varnings.

Therefore, fine tuning occurs during operation. All changes in the threshold values ​​and monitoring parameters are first agreed with the technical director and the operations manager, and only after that are entered into the system.

Couple of tips


Alerts. To learn about problems in time, configure different alerts in the monitoring system. We have three types of automatic alerts:


In the comments to the past articles we were asked how we deal with the human factor: absent-mindedness, natural needs, etc. The answer is simple: we do not save on staff, training and training. Each shift on duty, including night shift, consists of four engineers. Therefore, if someone wants to drink coffee or go to the toilet, the panel will not be left without observation. How we select and prepare engineers on duty, read this article .

According to the instructions, as soon as the “red code” appears, the specialist has just a few minutes to notify everyone and return the equipment to working capacity. We talked about our technical support service in May .


Duty shift at work.

N + 1 monitoring. Consider reserving a monitoring system to eliminate the loss of control over the data center. Most of our devices are connected in series using the ModBus RS-485 protocol, and at the design stage of the data center, we thought through how the monitoring system runs and laid backup routes.

Marking Another mandatory practice. Label the sensors and map the location so that engineers can easily understand where to look for them.

Collect statistics. Collect as much equipment and systems data as possible. Even if this data is not needed for ongoing monitoring, they will be useful in the future. After analyzing the statistics, you can determine what other indicators, except for the main, differed from the norm before the breakdown. For example, equipment vibrated or roared. This will help to plan diagnostics with greater accuracy, and sometimes even predict a possible failure.

That's all. In the next article in this series, we will talk about monitoring network infrastructure. We are waiting for your questions.

Source: https://habr.com/ru/post/338966/


All Articles