📜 ⬆️ ⬇️

Straight DNS: doing the right thing

We present to your attention a very emotional story by Lev Nikolayev ( @maniaque ) about how to configure the DNS, and especially how not to do it. Right after each point, you can mentally add: “Please do not do this!” In his report, Lev says so.

The article will consist of three parts:

1. How to make a resolver (unbound, bind)
')
A resolver is the thing that you write in the settings of your operating system so that you can turn human-readable addresses like ya.ru into an incomprehensible 87.250.250.242.

2. How to keep zones (PowerDNS)

If you have already grown to this, we will tell you how to keep the zone yourself, how to do it well and fault tolerantly, and how to do it if you have several hundred domains.

3. How to shake up, but not to mix (PowerDNS + unbound)



About the speaker: Three years ago, Lev Nikolaev came to the Maxnet company, in which the DNS was developing just not quite right. There was bind and text files with zones; my hands itched to clean up the mess. The article is based on a report on Root Conf 2017, during which Leo shares his assortment of rakes with the community.

Making a resolver


The resolver is needed by those organizations that either have no provider resolver, or which it is already beginning to interfere with (so many requests pour in from you that the resolver does not have time to respond or limits you). From the data center or provider, in principle, waiting for the presence of its productive resolver.

An interesting question for reflection is why, in the standard package of almost any operating system, there is no resolver that can independently perform DNS requests from root servers. It is completely incomprehensible why my car should always go to someone and ask him to complete a DNS query. Strangely enough, colleagues from FreeBSD replied that they had something there, that they had unbound out of the box, but the rest of the operating systems did not.

Selection of software for resolver


By and large, there are only 2 options:

  1. You can use bind ;
  2. You can use unbound .

In fact, this is almost all that you can really use for serious sales and heavy loads. I deliberately do not call other options, because not all servers meet the requirements. If you make a resolver, then automatically assume that you will have a large load, otherwise you would not do it.


Regarding the resolver, I'm not trying to persuade you to choose any particular software, because today the resolver is first of all some realizable features , and not a specific software. You can use bind or unbound, or anything else, if it responds to the things I’ll tell you about.

Hello ubuntuvodam!


If you like Ubuntu, as we do, and use it in production, you will have to do a little banging. As you know, unbound should start with the system. At 16.04, we switched to systemd, but naturally, the unit was forgotten to write. And when the generator tries to generate it automatically from the SysV script, it turns out a complete disgrace. Do not try to roll it out in production - I did it and left 30,000 subscribers without DNS for half an hour. Fortunately, it was at night.

Write a unit! Or take it from me , at the end will be the address of the repository. This applies only to those with Ubuntu, but, for example, something close in Debian should be.

What should be in resolver?


5 things to do in your resolver.

1. No forwards to Yandex or Google

This is pretty obvious, so please stop doing that. Redirecting requests to Google is bad for two reasons. First, sooner or later you will be limited in terms of the number of requests, which they warn about in advance. But the most annoying is that connectivity is not ideal .

Sometimes it happens that even the coveted 4 eights are not available, respectively, you will also have problems. Absolutely the same applies to Yandex.

No, this is not a Google or Yandex problem, their inaccessibility is usually due to the fact that you (or your uplink) dropped the channel to them.

2. SO_REUSEPORT

This option really speeds up life , but requires that you have at least a kernel version 3.9. If among you there are lovers of kernels 2.6. (my grandmother of RedHat), bury him, please. The SO_REUSEPORT option allows several processes to bind one port at the same time (they must have the same UID so that the port does not “hijack”), but its pleasures in the other - the load is distributed to these flows evenly . For DNS, this is perfect, and you’ll actually see a performance boost just by going to the modern core.

SO_REUSEPORT is both in bind and unbound. In bind, it is included out of the box, in unbound it must be separately included, because unbound tries to be as compatible as possible, sometimes at the expense of performance.

3. Prefetch

This is a strange option. It really helps in the sense that it does not respect TTL a little. When we go to a reputable server and ask him for some kind of record, she has a TTL - this is the time during which you can not go to it for an update. Unbound and bind follow the updated contents of this record before the TTL really expired.

But there are two features. You will get a 10 percent increase in outgoing traffic . Although in general, resolvers in general can hardly generate much traffic at all, so this is unlikely to worry you. The second point - with short TTL (for example, per minute) with Prefetch, no particular benefit will come out, but since for a ru-segment a short TTL is still fantastic, in principle, it can work perfectly.

4. Expired

I did not find this option in bind, but it is in unbound and allows you to return an expired entry with zero TTL, and in the background it tries to get fresh from reputable servers. This helps well from major crashes , for example, users might not even notice the recent fall of Dyn, if this option were massively enabled. But not everyone knows her, not everyone loves and does not include everything.

5. DNSSEC

DNSSEC is in 2 different guises - simple and complex:


In a simple version, it works automatically. Simply, if there is DNSSEC, it is necessary to check it, and in case of violations, do not give the result of the query. This is important today, because TTL records tend to 0, i.e. there is a certain probability of an attempt to poison the DNS cache.

As you understand, if we poison the resolver's cache, all its clients will get a poisoned record , and many interesting things can be done. Therefore, please always check DNSSEC if it is in a domain, there is nothing complicated about it and it is done automatically and well.

Foreign domains have acquired DNSSEC already well, although the percentage there is also small. At the same time, many providers in principle do not check DNSSEC, for example, Rostelecom.

For a year and a half of this validation, I expected everything from one of our large companies to do with DNSSE, thus breaking our technical support for the British flag, but so far all single hits were on foreign sites with little traffic - everything is fine.

Little nothings of life


1. Stop responding to ANY

If you are doing a resolver, stop responding to ANY requests because this is a kind of request that is not really standardized and is used by 99% of all wonderful viruses.

In order not to go down to the software level, you can use iptables, which works fairly quickly: if it sees that the request is ANY, it simply drops it.

iptables -A INPUT -p udp --dport 53 -m string --hex-string "|0000ff0001|" --algo bm -j DROP

2. Rate-limit

Use the limit on the number of requests per second, if you are not closed from the outside. For a variety of reasons, my resolver had to be opened to the whole world, so practically the first thing I did was limit the number of requests per second from outside. If you cannot close your resolver for clients outside, then at least limit them. Set a limit to taste, for example, I have 70 requests per second.

[ -j ACCEPT , ]
iptables -A INPUT -p udp --dport 53 -m state --state NEW -m recent …
--set --name DNSQF --rsource
--update --seconds 1 --hitcount 70 --name DNSQF --rsource -j DROP

3. For unbound well interface-automatic: yes

The third is a nice little thing. Usually, not one resolver is made on the network, and sometimes you want to failover between them, but, naturally, not with the help of the resolver itself. It is necessary to failover to do the routing methods and unbound has a great interface-automatic option: yes. She says: "If the request came to me in principle, I do not care who it was intended for, I will answer it." With this option it is very convenient, if necessary, to wrap traffic of a neighboring resolver to unbound.

What to monitor?


This is a typical picture with the sale. We see here that the number of requests has reached 300,000 requests, and this is not per minute and not per second, my statistics are taken every 5 minutes, that is, in fact, trifle.



Thus, it is absolutely necessary to monitor the number of requests, since if you cannot measure them, you cannot control them . Still need to monitor the types of requests . In the example below you can clearly see: the turquoise stripe is PTR-requests, and you need to understand why there are so many of them and where they come from.

Most often, their cause is crookedly configured software, or some router thinks that it needs to urgently make a PTR request for a couple of thousand times.



This is done quite simply - by cron, you pull unbound statistics, and then from there (we are in Zabbix user-parameter), take what you need.



Types of answers also need to be monitored. In the picture above, a typical example of DNS Water Torture, that is, a botnet inside your network, asks for non-existent subdomains of the attacked domain. As a result, he receives the answer Nodata (red color), in the example it reached 25,000 such requests at the peak.

The goal is to: hesitate to death the name-server of the attacked domain, so that it is tired to respond and to legitimate requests. And as soon as you begin to monitor the types of responses, you begin to see how botnets are active within your network .

Pay attention, and the number of legitimate requests bounces just as well when a red bar appears, that is, here the botnets increase the number of requests 2 times.

Another problem is what to do with it later, but this is not today's topic.

Your best friend is dnstop


This is a very convenient command to understand who is requesting what, if an attack happened and something went wrong. Usually, it is run without parameters, but this is wrong.

dnstop <> -i < ip>
, 2

I specifically write that we need to specify our IP address in order not to take into account in the statistics requests of the resolver itself, which will be many. Further, in combination, these are magic buttons with means to give details on IP addresses, 2 - details up to the 2nd level of domains.

So you can easily see who and where goes, and what the attack is now.



Rake (sharing my own)


A lot of them. As soon as you raise the resolver, you will encounter crooked tuned routers or software. You can get tons of PTR requests. This happens quite often, and you will see it, be able to correct, understand and make your network better. A separate moment - this is the perfect Chinese DVRs . For them, I have a special love, like many, probably.

The main problem in your network is that your users do not care a lot about your problems . That is, the fact that the user receives parasitic traffic from the DNS Water Torture worries him a little while World of Tanks are working.

Now they were talking about the simplest thing - about the resolver, now let's complicate the task.

Keep the zone


So, we have grown to the fact that we have domains, we want to keep them, to be an authoritative server for them, to give answers.

As a rule, they keep zones or organizations with special ambitions: “We are cool! We are better able to keep the zones than any Dyn! ”, Either to the data centers or providers, from which this is again expected by default.

No need to combine


The first task is not to combine. Please do not do this ever! Although we will find an exception to this rule. Combining a resolver and a reputable server on one machine is bad . What is the problem, I think you understand. If the client “steals” the domain from you (i.e., changed the ns server from the registrar) or the domain is rotten (that is, its paid period has expired), you won’t find out about it at all. Because you can not make changes locally.

We had clients who so believed that if the site opens from within our network, then they don’t need to renew the domain either. They have our Internet at home and our Internet at work - everywhere it opens, everything is fine.

Choosing software


The main point: do not think that you are choosing software here. This is a key mistake in the minds of many. DNS is a database, do not push it into a text file. Very, very bad, if you use Ansible or chef to generate a text file, which you then insert into bind. But I know - you do it, and then you tell that it does not work well.

So the answer is: PowerDNS

You know that bind has a patch to work with MySQL, right? Have you tried it? Many still do not know about PowerDNS. Most of them firmly believe that this patch can be somehow used on older versions, but it will work terribly in terms of performance, because this is just a set of crutches.

Hello again to ubuntuvodam


If you are using Ubuntu, the alpha version of PowerDNS 4.x is in the standard 16.04 repositories. I don't know who to say thanks for that. It really works, but with problems. Already a year, as I opened to version 4.x issue # 3824. I ask the developers of PowerDNS:

- Guys, and nothing, that I restart MySQL, but you have PowerDNS does not pick it up?
- Wow cool!

Remember this bug, they have already closed 3 times and opened 3 times in the 4th branch. Therefore - there is the 3rd stable, it does not have these problems, but on Debian / Ubuntu you will need to install it from the deb-file. And today, in March 2018, it is already unsafe. Therefore, the only way out is to switch to ppa from the developers for your version of Ubuntu.

Thoughtful architecture


This is where the difficult part of the article begins - let's think about architecture. As soon as we come to PowerDNS, since this is a database, we want a convenient editor on the web. And there are no editors, except PowerAdmin. This is a PHP web application, and it’s immediately clear that the one who will deploy it along with the DNS server needs to be cut off - you cannot put it on the same machine. As a result, the problem arises:


Naturally, first of all, the zone transfer mechanism * XFR comes to mind, i.e. IXFR or AXFR is not important. But if you leave this mechanism to transfer zones, you are banned. You will continue to make master / slave - and you will not get away from these concepts.

Next, we have several DNS servers and we need to deliver a database to them. It turns out that there is a machine with PowerAdmin, with the current database, and somehow you need to roll out this database for a bunch of other machines.

- Let's take, for example, MySQL replication. They say it works cool!
“She won't help you either.” Replication is not the best friend here.

Therefore, the scheme looks like this.



You have a server with PowerAdmin and MySQL. From the DNS server, you go there and do mysqldump with the skip-extended-insert option (we'll talk about it soon) and get the SQL file.

You will say: “Eka nevidal! What have we never done? ”

And then the interesting begins. Naturally, you can not, taking the dump in the database for example 700 domains, load it into the same database. Therefore, it must be loaded into the next, and then make RENAME TABLE. You ask - why? It is 100% atomic . RENAME TABLE is an awesome thing that, like renaming a file in Linux, is either working or not, it does not have an intermediate state . It is very convenient and convenient than a transaction, because it is much faster. After you have successfully downloaded this dump, you put the same file in git. Since there is a skip-extended-insert option, the file is git-friendly , that is, it has one line for each insert, and you get a sane diff.

The main thing here is this: I want to be able to see the diff from the results of “rolling” the base.

What we get



Despite this, it works very quickly and the whole process of sucking a new dump on 700 domains takes just 15 seconds. Yes, time does not grow in proportion. That is, if tomorrow you have 1400 domains, it will take 18 seconds, okay.

And forget about the concept of master / slave , in this context it is unimportant.

Houston, we have a problem


Everything would be great and wonderful, but we have one problem. This cool approach works only if we are for the domain in which we do it, both master and slave. If this is not the case, difficulties begin, which we will now overcome.

Let's redefine the master / slave role again. The master sends notifications as soon as the zone has changed, the slave receives these notifications and does something, while both of them respond to requests.

There are 2 option chairs :


  1. The client wants us to keep a slave . That is, the client will keep the master somewhere, and we have to take the data from him. This is just a difficult option that will require gestures.
  2. The client wants us to keep the master , and he will be a slave, that is, he will take a copy from us. This is a simple option, we simply allow the transfer of the zone to the client.

Exit - assign one of the servers (you can 2 - a separate conversation) responsible for receiving * XFR. This cannot be done by the server with PowerAdmin, since there is no DNS server and there is no one to accept it.

The scheme looks like this:




We can have 2 roles: just a DNS server that is synchronized, or maybe a DNS server with the role of a slave, which accepts * XFR, writes itself to the database, and returns the changes to PowerAdmin, executing another script.

I repeat that this scheme is quite simple, it works very well for quite a long time and allows you to completely abandon the concepts of master and slave in general, in principle. We slave in those cases where we need them to be, and nothing more.

What to monitor?


Power DNS is still a separate mechanism that needs to be monitored. Below are pictures from Zabbix. We remove Latency, i.e., how long the response took in microseconds, and bursts are clearly visible if the machine was busy or the database was inhibited.



The protocol on which the request came also needs to be monitored. There is not always legitimate TCP, it should also be carefully monitored. At the same time, you can understand how popular IPv6 is, here it’s 10% of requests.



Types of requests also need to be removed, then you will understand what is happening and see, for example, that requests of the AAAA type, that is, addresses in IPv6 in our situation are almost equal to requests for IPv4.



Be sure to monitor the sending of SERVFAIL and broken packages , and it is convenient to do this on one chart. If these two numbers match, sleep well. Do not match - you will see.



Shake but do not mix


Alas, sometimes you have to use a bunch of PowerDNS + unbound. For example, you have a local domain with a cunning structure, which is inconvenient to configure in unbound. By the way, this is how one of the site blocking mechanisms in Russia works. The resolver of your provider can return a stub for a “bad” domain, for a good one a normal entry. In a corporate environment, this is used, for example, to block social networks or protect against viruses.

Architecture


The architecture here is painfully simple - it's just a mixture of 2 components that we just talked about. That is, PowerDNS looks into the light, accepts the request, looks into the database, to the config-file of which there is an option to send the request further to this server (standing on the same unbound machine) if something is not in the database itself. The only feature is that, within the framework of monitoring, we set up a template Zabbix 2 times for this machine and 2 times more pictures.

Contacts


"Repository code - https://github.com/maniaque/rootconf2017
»Mail - nikolaev@kasatkina.org
»Telegram - @maniaque_ru

Answers and questions
- I would like to hear your advice on how to filter requests of certain types to the server. For example, I want to completely cut off all requests for IPv4 so that they do not reach the server at all.

- These options are in PowerDNS, and unbound. There is also an option using IPtables, you can use hex match to pull out a piece, watch what requests are there and just drop them completely. Another option. There are various DNS proxies, and even the authors of PowerDNS also release their resolver, which supports Lua scripts. You can slip your script there, which will do any custom magic. There are various means for this. It all depends on what your task is.

- Tell me, have you implemented the blocking of banned sites on your network using DNS? Is there approximate statistics?

I will say this, and her too. How many are blocked? It is clear that blocking via DNS is from housewives. It is clear that no one bothers the subscriber to take and drive in the DNS of Google. Honestly, we do not look at her, but in principle, something falls there.

- And according to the auditors' reports are there any changes after the introduction of the DNS block?

Yes. For an auditor, this is a great way. Keep this in mind.

- Do you give IP addresses to your customers who use your DNS? You ensure the availability of these addresses, and how?

Let's briefly tell you how it works. You remember that we enter 2 addresses there. You know how funny failover works there. See, Windows and Linux behave differently. When the first one is unavailable, Windows switches to the second one and once every 15 minutes tries to still try the first one and, if possible, switches to it. Linux does not.

First, what should be understood? What failover means by the operating system is not uniform and bad. Accordingly, your task is to ensure that both IPs that you give up as resolvers are always lit and working. Since we do this with the help of routing, each of our servers has additional IP addresses to the interface that contain the addresses of its friends. We have them used 3 and yet enough.

Using routing, we send traffic there. Since we use the “answer on all interfaces” option in unbound, it responds perfectly, and no additional manipulations are needed.

- You had a sign on transferring MySQL dump from servers to each other. You said that you did not have master / slave and master / master. That is, roughly speaking, do you always change the zone on one of the servers and transfer it to another and the split-brain cannot work in this case?

No, split-brain is possible. In general, each server runs every 2 minutes makes dump and throws it to him. But if he did not succeed, then we see a la split-brain, he has an old version of the database.

But here the following helps us. If he could not do this, most likely he has no connectedness. If there is no connectedness, it means that clients will not get to it either, and the problem will not arise. As soon as he gets connected, he will very quickly get a new copy.

- No, split-brain in the sense that you changed the record on one of the servers, but not on the other.

Look, on the DNS servers themselves, nothing changes. Changes in one place where PowerAdmin, and from there rolls out on all the others. Accordingly, this can not be that we have forgotten to change the base somewhere else. We did just that to never happen.

This was one of our problems when bind with text files. It was cool to change the zone in one place, then forget to change the serial, but it did not flow to the second with XFR. It was our pain, which we also eliminated.

- And then there is some statistics on when to stop using XFR ...

XFR is a mechanism that was coined under conditions of poor connectivity. Relatively speaking, XFR, especially incremental XFR, is designed to save bandwidth. But in modern realities, the DNS server band is 5 Mb / s, it no longer eats. Therefore, in my opinion, now XFR is a so-so mechanism. Therefore, I would, in principle, not recommend looking in his direction. The guys from Power DNS in the documentation and write that if you can somehow replicate the backend without XFR and other things, do it. In our case, it turned out great.

- When you told about the situation of setting up an authoritative server, you said that you supposed to forget about Master / slave replication, and we give the same configuration to everything on one server. In this situation, in the SOA record we have some kind of ns server. And how many ns records? That is, if there is no such thing, because there are different DNS checkers for services, whether it is configured correctly or not, it will swear like: “You have 1 ns server, this is very bad!”, Etc. Or we will make one ns record by several IP's?

Multiple entries, to be correct. , , ns , — , , , . Ns — . .

— PowerDNS . 4- PDNSutil edit. , . save — .

issue — PowerDNS — . DNS dist recursor. , . , , .

3- . 4- , 16.4 . «, !»

— . DNSSEC , !

Thank.

— , , . , DNS — ?

DNS , . Top Level Domain, TTL — . — , .

— , ns .ru ?

Of course. TTL , . — . . , - DNS . . . , : , . .

— , « — !» , . , . , . .

— Ubuntu, , dnsmasq .

, , . dnsmasq . . FreeBSD, unbound, local- . FreeBSD — .

, , — RootConf ++ 9 . , , -.

Source: https://habr.com/ru/post/350550/


All Articles