We took exactly half of the course
“Managing the engineering infrastructure of the data center” . It turned out that certain topics were heard at almost every seminar - it doesn’t matter, we told our listeners about preparing a data center for the summer, working with contractors, or building one’s own operation service. We decided to put together a small guide on the most burning topics and recommendations from our experience.
Marking equipment
This topic is a record holder by reference to our seminars. About marking you need to know the following:
')
- The labeling system is thought out and coordinated with the maintenance service at the design stage of the data center or server. If designers and builders have their own principle of marking, which is incomprehensible to the operation service, then engineers will have to decipher all this legacy or mark everything anew in the system of coordinates that is understandable for them.
- All elements of all systems must be marked. Its number should be not only the air conditioner, chiller, UPS, but each machine, valve, camcorder. Often unmarked signs are cable routes, cross-connects. Pay special attention to “temporary houses“ - they must also be marked.
- The principle of labeling should be transparent and clear to every engineer. For example, the first digit in the cipher may indicate a data center, the second - the hall, the third - a row, the fourth - the number of the rack. Then it turns out that the rack 5H3C030 is located in the data center 5, in the engine room H3, in row C under number 030.
For switchboards, the principle may be as follows: shield type, hall, beam, shield number. Then the engineer will understand that SchR2.2.1 is the switchboard number 1, powered from the second beam, in the second hall.
- The marking must be visible and readable so that the engineer can easily recognize the equipment. For greater clarity, use color. In our country, for example, “color differentiation of pants” is used for marking supply lines and pipes of the cold supply system.
- Marking must be relevant. If the equipment moves to another room, the layout changes, do not forget to reflect all this in the labeling.
Marking of NORD-4 cold water supply pipelines. The number, flow direction and temperature of the water (warm or cold) are indicated on each of the pipes.Monitoring
Data center or server of any size is difficult to operate and maintain without a monitoring system. There is no monitoring - there is no information, which means that the data center or server will have to be managed blindly.
Our recommendation for monitoring is the following. Track:
- The state of engineering equipment (on, off, if there are any errors): inputs from the city, diesel generator set, UPS, battery charge level, battery runtime on the UPS, fuel level at the diesel generator set.
- Indicators that appear in your SLA: temperature, humidity in air conditioners, voltage and currents for UPS.
When the minimum program is completed, you can optionally add:
- autonomous leakage sensors under pipeline elements;
- autonomous temperature sensors in machine rooms;
- current analyzers in switchboards;
- pyrometers in transformer substations.
At the 80th monitoring lvl already monitored the work of the elements inside the equipment. For example: at what speed does the fan of the external unit spin, what is the pressure in the freon system, or how many percent is the compressor in the chiller now loaded? In the future, this will help to understand whether the equipment has a reserve (does it not work at 100% of its capacity), track a potential problem and analyze the operation of systems in different conditions.
A large number of monitored parameters is not yet an indicator of the quality of monitoring work. Everything should be properly configured. Here are the main covenants:
- Set different polling times for different systems. For conditioning, just a minute is enough, and for energy supply, a minute is too rare. During this time, the beam may fall off, switching to the battery does not work, do not start the diesel generator set. Therefore, we remove data from power supply equipment as often as possible. We, for example, get testimony every second.
- Visualize the main monitoring indicators on the screens so that they are always visible. It is easier to extract information from charts and graphs than from tables with numbers. But do not overload the screen, otherwise visibility will be lost.
- Record the critical values at which the alerts will be triggered. It is better to provide two levels of warning - warning (warning) and critical errors (alarm).
- Keep track of the relevance of the data. “Overdue” alarms should not be displayed on the alert screen. Such a situation may occur when an alarm is triggered in the monitoring system. After the incident on the accident started, do not forget to change the status of the accident to “underway”. So the new alarm will not get lost among the old ones. If necessary, you can configure email and SMS alerts for alerts and alarms.
The schematic diagram of the NORD-3 data center in the monitoring system allows you to quickly assess the state of the power center, the temperature in the cold corridors of the machine room.
The engineers on duty monitor the work of NORD-3 in the control and monitoring center.This is completely optional, but since we are a commercial data center, we have set up for our customers to broadcast all the main indicators to the Personal Account and the DL Monitor mobile app.
Statistics collection
All parameters need not only to be monitored in real time, but also to collect statistics on them. In the future, this will help to better understand how the equipment behaves during the life cycle, how often repairs are needed, and whether there is a reserve of power. This will help to plan the frequency of maintenance, estimate the required number of spare parts, form a
budget for the purchase and maintenance of equipment .
How it works? For example, we have long-term statistics on air-conditioning and information on weather conditions (there is a weather station at each of our sites). We can trace how the cooling system worked last summer at +32 ° C. If a hot summer is expected, we will be able to assess whether the cold-supply system has a power reserve or need to somehow strengthen it. Also from the history of breakdowns and repairs, we can predict which spare parts are most likely needed.
To maintain such statistics do not need specialized programs. The only advice is: it is more convenient to work with a monitoring system that can build graphs. Where the information does not imply the display in the form of graphs (for example, the content of incidents, repairs, accidents and maintenance), you can put the data in the usual Excel.
Here is how the summary table for DGU can look. Put active links to information on the contract, warranty service, maintenance schedule, repairs, test runs and instructions, and all information on a particular diesel generator set is always at hand.

Spare parts and consumables
Spare parts and consumables should always be at hand. If you can keep them near the server, then this is ideal. If the storage space is tight, you can charge the contractor with the storage and provision of spare parts on request.
What you need to have in store for spare parts and consumables for urgent repairs:
- for freon air conditioners - oil and freon, fans for external units (yes, we have it consumables, since there are more than 1000 of them);
- for power supply - automatic, fuse-links, cables of various types;
- for monitoring - sensors;
- for security systems - several sets of access control systems (controller, reader, magnetic lock).
- for telecom infrastructure - switches, line cards, chassis, routers.
Long-term spare parts (compressors, controllers, input switchboards) should be available so that the data center does not remain without the necessary reserve.
Warehouse spare parts for air conditioning systems.Installing equipment in racks
We had a
separate lesson about the correct installation of equipment, but we remember about frequent mistakes at almost every seminar. Why? It's simple. An improperly installed server in a rack can cause local problems even in a well-designed data center with competent maintenance service.
Here are the main mistakes:
- IT equipment with two power supplies connected to one PDU.
- Equipment with one power supply is connected without AVR.
- The equipment is connected to the adjacent racks.
- Overloaded sections PDU.
- Equipment installed "face" in the hot corridor.
- There are no plugs that prevent parasitic heat exchange in free units.
The correct scheme for connecting servers with one and two power supplies.This concludes our hit parade of topical themes for the operation engineer. Share your comments in the comments, ask questions. At the
next seminar we will tell you how to test the engineering systems of the data center and how to build a monitoring system.
More articles about the device and operation of data centers:»
How the cooling system of the NORD-4 data center was created»
Errors in the project of the data center that you feel only during the operation phase»
The path of electricity in the data center»
Excursion to the largest data center in Russia