📜 ⬆️ ⬇️

DNS failure with registrar R01 and some fatal accidents

Today, one of the oldest registrars R01 announced a failure in the DNS .
In this regard, I want to tell you a small instructive story about how it almost killed our company.

By the nature of the activity, we are the saas-analytics for the web. Our primary weapon is a javascript file that collects statistics. The file is distributed to many sites of our users, so we must ensure its impeccable stability, the inaccessibility of our site should not affect the sites of our clients. And we spent a lot of effort to ensure complete stability: put the script in a great powerful CDN, made your domain to abstract this CDN (so that you can change the CDN at any time if it fails or becomes too expensive). But they did not take into account one little thing: the DNS server was at the registrar.

The failure of DNS in R01 was that all domains resolved into one specific IP address, which showed HTTP requests for the usual "parking" page of the domain with advertising. Either did not show anything, because this IP had a spontaneous DoS attack from all the people who tried to go to the usual sites that kept the DNS from R01. But not in our case. The server responds with requests from /somescript.js script . And not just a script, but dynamically generated, like this:

var redir_url = 'http: // domain that was resolved to the fatal IP /';
if (window! = top) {
top.location.href = redir_url;
} else {
window.location = redir_url;
}
')
The script cut off the request parameters (still on the server) and made a redirect to the base domain. All our users who connected analytics, instead of our script on their sites, got someone else who launched a redirect from their site to the left page.

Partially it saved us only that almost all the time the server that responded by that IP was under DDoS due to the number of requests to it, which is why, for the overwhelming majority of users, requests to our script fell off by timeout (if the script was unavailable , of course, do not affect other people's sites). But those fractions of a percent that were “lucky” to receive a response from the server received a redirect.

The fatal confluence of circumstances, as often happens, nearly cost us 2 years of project work. Just unavailable DNS; available DNS, which sends requests to an unavailable server; available DNS + available server, but which does not respond to script requests by the script; all previous, but a script that does not redirect - it would all help to avoid a catastrophic effect on the sites of our users. I don’t know if we will be able to get out after such a blow to our service, but many customers, of course, have noticed complaints from their users about incomprehensible redirects. Yes, and we did not hide, immediately conducted an email-list with a request to disable our service until the DNS cache is updated around the world.

The moral of the story is simple: it is impossible to foresee everything, and even such reliable and familiar things as a DNS server cannot be trusted.

It was extremely difficult to foresee such a situation: too much was not in our favor. Of course, we will change the DNS provider, and even put our servers, if it is more reliable. But it can break and DNS, and CDN, anything. The only thing that can be done to minimize the chance to fly as much as we do: make sure that the best, most reliable possible solution is used at every point of your system. Even if it is the choice of a DNS server or registrar.

Source: https://habr.com/ru/post/232363/


All Articles