📜 ⬆️ ⬇️

WMF: Global outage (cooling and DNS issues)

image They say in Florida there is still a little Wikipedia .... ©

Due to the problem with overheating in our European data center (in Amsterdam), many of our servers shut down for self-defense. Since this affected European users' access to all Wikipedia and other projects, we had to move all user traffic to our cluster in Florida (in St. Petersburg) , for which we have a standard fast failover procedure in place. which changes our DNS records.

However, shortly after we did this failover switching, it turned out that this fault-tolerant mechanism is now faulty, which led to the termination of the DNS resolution (DNS resolution) of Wikimedia sites on a global scale. This problem was quickly resolved, but unfortunately, access recovery for everyone can take up to an hour due to caching effects.
')
We apologize for the inconvenience.

Update : Unfortunately, for many this stoppage seems to have lasted longer than an hour. It appears that many DNS provider resolvers do not take into account the so-called Negative Cache TTL, which we sent, and use larger values ​​instead. We avoided this problem by renaming the faulty DNS record to something else.

Update from 9:32 pm Coordinated Universal Time: Our SSL access secure.wikimedia.org has been disabled due to overload, but it is now working.

Wikimedia Technical Blog, techblog.wikimedia.org

Copies of Wikipedia: English (legal), Russian ("pirated" =)). Maybe the Coral Content Distribution Network will also help you.

By the way, Wikipedia in 2005, after a power outage, had a much larger failure .

Source: https://habr.com/ru/post/88829/


All Articles