📜 ⬆️ ⬇️

How a failure in the data center can lead to the cancellation of thousands of flights of major airlines

Last Monday, many online media reported that thousands of second-largest flights of the world's airlines were canceled. This is about Delta Air Lines. Thousands and thousands of Delta Air Lines passengers could not fly anywhere, because the flights for which they bought a ticket simply ceased to exist. As it turned out, the problem is in the failure of the company's computer system. And not in the regional, but in the main - the problem happened in the main data center Delta Air Lines, located in Atlanta, USA.


A Delta Airlines employee helps a passenger whose flight has been canceled understand the situation.

The company also has backup systems, which, in the event of a problem in the main data center, were supposed to start work, replacing the problematic servers. But this did not happen, the secondary, duplicating system did not function either. Interestingly, according to management, the company has invested tens of millions of US dollars in duplicate systems. Delta Air Lines specialists were able to recover everything in just six hours, but during this time the company lost millions of dollars due to the cancellation of flights and the associated losses. The problem is a failure in the power system and problems with a spare generator.

As it turned out , regular switching from the main power system to the auxiliary generator led to the failure of the latter. A fire broke out that was quickly put out. But the entire infrastructure of the Delta Air data center turned out to be de-energized. Only a few hours later, 400 out of 500 servers were put into operation, and some time later the work of the remaining 100 servers was restored. All this time, almost the entire fleet of 800 aircraft remained on the ground. And the cancellation of just one flight cost the company $ 17,000, plus it was necessary to compensate passengers for transportation, food, additional costs, etc.
')
Another problem is the outdated infrastructure of the data center. An electronic ticket booking system was created in 1960. Since then, it has been repeatedly rebuilt, updated, but still the company's IT infrastructure does not meet modern requirements. The amount of data with which the company's servers are working is large, and backups are made several times a day, and not permanently, no shadow copies are created. As a result, in case of an emergency, restoration of the normal operation of the data center takes more time than if the data were continuously backed up.

And what about Southwest Airlines?




This is another major airline that has lost millions of US dollars due to a failure in the data center that happened on July 20th. And the reason for the failure is a partial failure in the operation of a single router, of which there are hundreds in the company's data center. The support service of the DC did not notice the problem, and literally in a few minutes the entire system crumbled like a house of cards. The head of the company compared the incident with a flood that happens once every 1000 years.

Over the next four days, 2,300 flights were canceled, hundreds of thousands of passengers did not fly anywhere, for four days it was impossible to book tickets. All this cost the company tens of millions of dollars in direct and indirect losses. Shares of Southwest Airlines fell by 11% and so far the dynamics of reverse growth is not very active.

After conducting a detailed investigation of the incident, it turned out that the failure had happened by itself, there was no interference from outside. According to experts, the system of backup and storage of information in the data center of the company was configured incorrectly, so it turned out to be impossible to use all this data by putting a backup system into operation.

In the near future, the company plans to deploy a new system of backups with new equipment, which should minimize the repetition of this situation in the future. However, the company lost its 10-15 million US dollars.

And that is not all


Southwest Airlines and Delta Airlines are not the only airlines that have lost money due to data center equipment failures. In May, JetBlue asked its passengers to check in at the airport “manually” rather than automatically. The reason is the same computer system failure. Plus, United Airlines canceled hundreds of flights due to failures in its own data center last year.

The main source of problems is an insufficient number of servers for backup, incorrectly configured data backup system, problems with energy infrastructure, savings. Plus, there is still the problem of lack of unification of equipment and services - each company has its own technical systems, sometimes unique, that have been developed for decades. As a result, the general solutions recommended in case of failures in the structure of data centers are simply not applicable for a number of such companies. And this entails, as we see, millions in losses.

According to a recent study conducted by the Ponemon Institute, a failure in the data center costs its owners an average of $ 74,000 (in 2015). The most expensive was one of last year’s accidents, with a total loss of $ 2.4 million for the owner of the data center.

Our other publications:

Source: https://habr.com/ru/post/307660/


All Articles