📜 ⬆️ ⬇️

The life of the data center after commissioning

Many believe: built a data center, and it's done. In fact, it is only after this that the everyday solution of a large and complex task begins - the operation of the data center.

The main problems that arise in managing the operation of the data center are ensuring uptime and reducing operating costs without compromising reliability.

A competent solution of both the first and second tasks is determined by the framework of the data center operation program . Periodically in the industry there is talk that a single operation program must be adopted for proper operation. And it certainly should be. But as for the single ... At different sites, the stuffing is different, so the programs for different data centers will be different. In any case, when the site is certified by the Uptime Institute for operational sustainability, each object is considered individually. Instructions for IT professionals will be the same everywhere, and for personnel serving the engineering equipment, both technical and emergency procedures will be different.

By the way, having a maintenance program is also important in terms of the company's reputation: at least, customers are looking at it. And there is a reason for this: if the instructions are worked out, then there is a hope that they will be carried out. But if there are no instructions, no one will follow them exactly.
The number of problems at the operational stage can be significantly reduced if the conceptual design of such a program will be developed in parallel with the design phase, while the design and operation departments work side by side . At the very least, this will avoid errors associated with the inconvenience of servicing individual components of the data center.

In theory, independent expertise helps to reduce the number of errors, but for this, an independent expert must be an expert . Unfortunately, today anyone calls himself an expert ... In fact, only a person who has his own experience in building a data center, and preferably not one, but at least ten, can be a real expert. But there are no such specialists in Russia yet.

Many people rely on certification. The Uptime Institute certification does filter out mistakes in projects, but mostly only gross ones. So, the rules of operation are first developed and then supplemented taking into account the errors that were identified and corrected during the operation. Therefore, the maintenance program must be constantly updated . These documents are not born from scratch - they accumulate the experience of the attendants at a real facility.

We put the data center "on the wing"

The industry of data centers is now actively seeking approaches and discussing various options for setting operating standards, including developing from scratch, partial borrowing from other areas of activity, and adaptation of foreign practices.

At key data center conferences, examples were given of solving the problem of operating exploitation based on borrowing from other industry practices. In particular, from the practice of the aviation industry, well-designed programs for the operation and maintenance of aircraft can be an example for the data center industry. This is a good example of standardization of processes where it is impossible to fully ensure operation and maintenance on your own: it is necessary to work out the processes of interaction with a huge list of suppliers ...

Of course, a 100 percent analogy cannot be carried out: the control program for operations in the aviation field is redundant. In aviation, there are several thousand industry standards and various methods are provided, including an operating manual for certain types of equipment. In the data center the scale of the exploitation processes is not so great.

But in general, the task of managing the operation is similar for different industries, so the methods of regulating the activities of the data center can not be invented, but try to adapt the approaches used in other industries for the needs of the data center.

Actual Service

One of the most interesting current trends in the field of data center operations is the use of a state prediction system. This is due to the fact that traditional maintenance systems for production facilities, according to planned indicators, become very inefficient due to their high cost. Therefore, recently there has been a tendency to manage reliability and maintenance according to the actual state , when all repair and maintenance work is performed depending on the state of the system.

In traditional practice, maintenance work is carried out regardless of the condition of the equipment. In the management of the actual state, if the maintenance period of the system has come, but it is in perfect order, a reasonable decision is made to continue operation.

When Rolls-Royce puts its turbines on jets, we remove a huge amount of information from the sensors. With this amount of data, it is possible to predict the probability of engine failure with high accuracy. This method allows you to anticipate the onset of an undesirable situation for a few tens to hundreds of hours, putting the problematic elements for maintenance.

Of course, comprehensive and high-quality monitoring is important for this approach. When it is available, maintenance is done not when it is supposed to, but when it is really required. Because any engineer in production is interested in reinsuring himself and repairing as much as possible, so long as nothing fails. The same problem, for example, exists in the electrical industry: a lot of money is “buried” in new construction projects, while maintenance is increasing reliability, but, on the other hand, a lot of money is being thrown, essentially, to the wind.

The system of monitoring the operation of the actual state implies a large number of monitoring systems - depending on how long the facility was built. The need for scheduled repairs according to the actual condition allows reducing the number of repairs by several times . This is a huge potential for savings , especially in large data centers.

Alexey Soldatov, General Director of DataPro

Source: https://habr.com/ru/post/231751/

All Articles