📜 ⬆️ ⬇️

Human error, design and downtime

If one of the leading causes in the fall of the data center is the human factor, then why do we spend so much time on the level classification and the construction of data center ratings? Of course, design is important in ensuring reliability, but it is negligible compared to people, processes, operation and maintenance, equipment life cycle and risk reduction strategies.



Human errors lead to the risk of failure of any data center, so why do we continue to invent new precautions, knowing the result in advance? Have we not come to this young enough industry to realize that human errors are inevitable, and attempts to correct the root causes of these errors are too complicated? This thought led to excess redundancy and waste of resources, as a result of which disputes broke out between the leaders of this market about what really lies behind the rankings.
')

Ranking system


The industry level classification has traditionally been considered a benchmark in design standards and resource reliability. CIOs in search of a data center for their organizations are turning to this classification, forming an idea of ​​the supposed reliability of the data center. However, it is rash to rely too heavily on this method. Much more important are the results obtained from real data from data centers.

Perhaps you should not ignore the "most likely" causes of failures in attempts to design a system around the human factor. You can confidently assert that the managerial and operational part affects much more than the data center class. Ranking and any related projects - this is not a guarantee of reliability.

Until recently, the reliability of the data center was determined by the excess capacity and capacity, forcing customers to organize excess reserves and as a result - a waste of resources in the process. Today, the economy opposes such an approach, a shift in the industry to a more progressive vision of the reliability of data centers is noticeable: design for specific tasks, combined with world-class management systems.

How to minimize the influence of the human factor in the data center


Human error is the largest contributor to data centers. This simple can be costly. On average, a large company can lose about 100,000 pounds in just a minute of inactivity. On such a scale, an hour of inactivity can actually push a company into bankruptcy. So, how do you deal with human errors in the data center? Here are some tips to keep them to a minimum:

Correct performance

Ensure that all equipment and machines are properly labeled to illustrate the correct sequence of actions to be performed on them.

Strict operating instructions

It is important to have a user guide or usage guide for the object as a whole. Ensure that this manual is accessible to all personnel, so that it can refer to it during times of emergency, and act in the best possible way.

Periodic exercise

The reaction of employees should be periodically analyzed and assessed by conducting exercises and rehearsing possible emergencies that may occur. This will allow staff to be more prepared in such situations.

Monitoring and surveillance

Ensure that the movement and location of each employee is carefully controlled so as to prevent his unauthorized access to confidential information.

From design to operation


Despite the changes, the design of data centers, which still remains only a small part in ensuring high availability, is very important. More time needs to be spent on the Uptime Institute's Management and Operations Stamp of the Approval Program. Research in this area should continue. Why? Because after the data center is designed and built, it is inevitably controlled by people. No project known to date can prevent human intervention.

You can not just take and solve this problem by the transfer of resources or design. It can be solved by creating an organizational structure that will reduce or completely eliminate the intervention of the human factor. However, it is not so simple. Promotion of property, technological discipline, adherence to procedures, training, and favorable working conditions will create a working set in the heads of your team that will allow your data center to develop the maximum potential regardless of ranking.

Finally, the final assessment of the success of your data center is extremely simple: the years of uninterrupted operation ensured by you, despite the number of unplanned outages that occurred. Focusing on operational thinking and organizational strategy, you can get a very long period of uninterrupted work. This is not a question of luck, not a question of design, it is a strategy.

Source: https://habr.com/ru/post/234153/


All Articles