📜 ⬆️ ⬇️

Accidents in data centers that were (almost) impossible to foresee



Despite all the efforts of designers to provide data centers with a reliable system of protection against failures and emergencies, accidents do happen, and a whole chain of events may precede their appearance. This sometimes leads to a failure in the most reliable systems of protection against "trouble", and the work of the DC is stopped.

In the sequel - a few cases, both long-standing and recently occurred, which show that some situations are simply impossible to foresee.

Hurricane Sandy: Generator Failure



This is how fuel was delivered
')
When Hurricane Sandy was raging all over the West Coast of the USA, electricity was cut off almost everywhere in this region (October 2012). In addition to rainfall and hurricane winds, the situation was aggravated by the masses of salt water flooding Manhattan and many other regions and cities.

Emergency generators stood on the 18th floor of a high-rise building on 75 Broad Street, Lower Manhattan, ready to provide Peer 1 data-center with electricity in case of failure of the main power supply system. The generators began working immediately after the salt water flooded the high-rise basements and its lobby.

Unfortunately, water brought down another critical element for the entire system - the fuel pump, which fed fuel up. The fact is that after 9/11 in New York, new rules for storing fuel in buildings were introduced; diesel, gasoline, or any other fuel on the upper floors could only be stored in limited quantities. Therefore, as soon as the generators on the 18th floor exhausted their fuel, the entire system stopped, since no new fuel was supplied.



Instead of stopping the system, the Peer 1 team began manually delivering fuel, raising it to the 17th floor where the fuel tank was located. And from there, fuel rose to the 18th floor, to the generators.

This work continued for several days until it was possible to launch the main power supply system. It is worth noting that Peer 1's clients were companies such as SquareSpace and Fog Creek Software.

The work of the data center was supported only by the team’s ingenuity, although manual delivery of the fuel, of course, was not provided for by any of the plans.


Everything works and it's great.

Flying SUV and Rackspace




An even more unusual case occurred in 2007, November 13th. Then an SUV flew into the Rackspace data center. The driver, suffering from diabetes, lost control, losing consciousness. The SUV sped up (apparently, the driver, unconscious, pressed the gas pedal), flew off the road, and crashed (being in the air) into the building where the energy system of the Rackspace data center was located.

The cooling system of the data center switched to an auxiliary power supply system, and work continued without problems. The main equipment also switched to emergency power supply without any problems. But then the problems started - as it turned out, the massive coolers did not restart, having stopped even during the first power failure. Two coolers did not work, and the data center employees could not commission them at the right time.

As a result, the equipment began to overheat, and the team of engineers decided to turn off the DC, so that the equipment did not fail.

The equipment had to stop for five hours, during which the sites and services of the data center clients did not work. As a result, Rackspace had to compensate its customers for losses of $ 3.5 million.

Samsung problems




On April 20, 2014, a fire broke out in an office building in Gwacheon, South Korea. The flames quickly spread throughout the building, without going round DC Samsung SDS. Fire and smoke went outside the building, and were clearly visible from afar.

All employees of Samsung, as well as employees of other companies working in this building, were evacuated. The fire did not destroy the DC completely, but the damage was sufficient so that users of Samsung gadgets could not access their data.

Access to data users received only after the secondary data center was commissioned in the same city. This was followed by a formal apology from the management of the company.

Cable channel fire




Another short circuit with the subsequent ignition of the sheathing of the cable channel in Fisher Plaza, Seattle, led to the inoperability of a number of services, including Authorize.net, Bing Travel, Geocaching.com, AdHost and a number of other resources. It was possible to locate the problem only in the morning (everything happened on July 3, 2009).

At the same time, a number of services resumed work at 10 am, other services did not work for a few more hours. The company Fisher Communications, which owned the affected data center, spent more than $ 10 million to repair and replace equipment.

Flame in Iowa




On the afternoon of February 18, 2014, the data center serving the work of public services and services worked quite normally. On this day, the state had to make a number of payments to state employees, in the amount of $ 162 million. Ironically, on this very day a short circuit occurred in the data center.

In this case, a team of engineers several days in a row prepared the object for a meeting with a completely different nuisance - the consequences of the blizzard predicted for the evening of February 18.

After a short circuit occurred, smoke spread throughout the building, and employees were evacuated. The FM-200 fire extinguishing system tripped, causing the fire to be localized. At the same time, the system responsible for controlling energy in the DC, overheated and melted.

Employees were able to quickly arrange energy supplies through another channel, and energy supply was resumed after a few hours. However, without access to the infrastructure of the DC, it was impossible to resume its work. And firefighters and police did not allow support into the building, because there was a lot of smoke inside. Only after 3.5 hours, employees were able to enter the DC. All this time, nothing worked, payments are not gone.

It was possible to restore the work of the DC only at 9 pm (the fire itself began at 3 pm), and after that it was possible to start making payments.

Amazon and Welding




January 9, 2015 in a large building, where the Amazon data center was built, there was a fire. The problem arose because of a welder who accidentally set fire to building materials nearby. A small flame quickly turned into a fire storm of the third category, which had to be put out for a long time. Sultan of smoke was visible for many kilometers from the data center. The total damage to the company was $ 100,000.

True, Amazon customers have not suffered, because the object has not yet been put into operation.

Instead of output


In most cases, an emergency situation arose very unexpectedly, going beyond the actions provided for by the plan and instructions. Nevertheless, in some cases, the problem was managed, but in others, companies, both owners of DCs and clients, suffered significant losses.

And what problems and emergencies did you have? How did you deal with them?

Source: https://habr.com/ru/post/260357/


All Articles