Docker + IPv6 = ❦

Some text about IPv6 support in docker and some other nuances of docker networking.

IPv4

To warm up, consider the usual IPv4-only system. The host machine has an eth0 interface. An external IP address is attached to this interface. There is also a loopback interface. When we install a docker on such a machine, it creates a default network called bridge . For this network, another docker0 interface is created on the host machine. It also appears ip address, for example, 172.17.0.1 . When we start the container, the docker allocates the container the address from the selected network ( bridge by default). For example, 172.17.0.5 . The interface eth0 appears in the container and the address 172.17.0.5 appears on it. So, the base is sorted out. Now we will try to understand how the process inside the container can access external resources and how to make it so that you can go outside to the container. If the process in the container wants to go over the network to the external Internet, it needs: an address, route settings and a configured DNS resolver. We already have the address, the docker builds the route automatically (directly to the internal eth0 , and then through iptables), and puts the DNS docker from the host machine from /etc/resolv.conf (the lines leading to localhost are removed from there, according to obvious reasons). There is one nuance with DNS: if you use not the default network, but create it with your own docker network create command, then when you start containers in this network, another 127.0.0.11 DNS server will be added to the /etc/resolv.conf config. This special DNS server is called the embedded DNS server, and it cannot be disabled in any way, except for manually removing the container at the entrypoint. This feature has little effect on work in IPv4-only environments, but in IPv6 it can be a problem. Let's return to this issue when we discuss IPv6. So, we have everything to ensure that traffic successfully reaches external networks. If we want the process in the container to listen to a port and be accessible from the outside, then we need to configure the process so that it listens to the interface issued to it, and start the container with the -p flag. In this case, the docker launches on the host machine a separate docker-proxy process (the so-called userland proxy), which, in turn, communicates to all interfaces of the host machine and proxies traffic to the specified container.

By the way, the addresses given to containers are available to other containers directly (if you do not consider exotic topologies). For example, if one container has the address 172.17.0.5 and listens to port 80, then we can access it at this address from the host machine or from any other container. You do not need to run the container with the -p 80:80 flag. This flag is intended to make the port available on the external interfaces of the host machine.

IPv6

Now imagine that we need to teach our application to access IPv6 hosts. Suppose that our host machine now, in addition to the IPv4 address on eth0, also received an IPv6 address (globally routable, the link-local addresses do not count). Also, instead of an IPv4 DNS server, we now have an IPv6 server (it would be strange to have a DNS server accessible over IPv4 that can respond to AAAA requests). If we want our containers to be able to walk on IPv6 too, we need:

Choose a method for distributing IPv6 addresses for containers
Create a custom network or configure the default network bridge
Make sure that the dns server specified in the container’s /etc/resolv.conf can respond to AAAA requests

DNS is an extreme point, because you first need to learn how to walk by IP addresses, and then add DNS support

Basically, there are the following options for distributing IP addresses (except --net=host , which we will not consider, and everything should work there):

1) Real subnet from provider size / 80 and more

If your provider gives you a real / 64-subnet (or at least / 80), this is the easiest option. Most likely, everything will work immediately after you add the options --ipv6 --fixed-cidr-v6={{ }} to the config. You may still need to install several sysctl options:

 net.ipv6.conf.all.forwarding=1 net.ipv6.conf.default.forwarding=1 net.ipv6.conf.eth0.accept_ra=2

2) NDP proxying

If you do not have a large enough subnet, but there is, for example, only 16 addresses (/ 124), then you can use NDP proxy. In this case, the containers will have to specify the addresses themselves (either explicitly using a custom network or implicitly in the default, using the fact that the docker predictably gives out the addresses based on the MAC address, and you can specify the MAC address manually). You will also have to add each such address to the neighbors list and include several kernel flags. This option is inconvenient due to the large amount of manual work. You can read more about this method in the documentation .

3) IPv6 NAT

If the provider allocates only one address (or do not want to bother with NDP), then you can take and configure NAT. At the same time, you can take any ULA subnet ( https://en.wikipedia.org/wiki/Unique_local_address ) (analogue 10.0.0.0/8 in IPv4) as a subnet and configure ip6tables. Good people made a container ( https://github.com/robbertkl/docker-ipv6nat ), which is simple enough to run, and it will work: it will automatically edit the rules of ip6tables when creating new containers. In principle, this option seems preferable, because in this case you do not depend on the behavior of the provider (or IaaS), thereby leveling the risks of switching to another service provider. For IPv6 to function in this mode, it is enough just to register the subnet in the config (or add your new network) and run a small container. He will do the rest.

Next, you need to configure the network in the chosen way and check the operability of the assembled scheme.

DNS

So, it remains to consider the DNS. If we decide to use the default network, there should be no problems with DNS. But if we took a custom network, then there is one feature. The fact is that when a container is created in a custom network, a docker always adds its built-in DNS server. Remember the one that 127.0.0.11 ? And he is not able to resolve ipv6-only domains. Most likely, this is a bug. And the joke is that it can not be disabled! The only way to get rid of it is to uproot a line from resolv.conf before starting the application:

 RESOLV_CONF=$(sed 's/nameserver 127.0.0.11//g' /etc/resolv.conf) echo "$RESOLV_CONF" > /etc/resolv.conf

(the inplace option was intentionally not used here - try doing the same with sed -i and you will understand why)

After this, our container can again successfully resolve domain names, since configuration copied from the host machine.

Next, you need to figure out how to properly listen to the sockets in the application, if you want to open the port for connections from the outside and what to do if the application is written crookedly and does not support ipv6.

Opening ports to the outside

If everything (both the host machine and the docker) is IPv4-only, there seems to be no problem. Simply application should be able on 0.0.0.0. If the application can listen only on localhost, then only the custom tcp proxy running in the same container will save it. It will listen on all interfaces, and broadcast data to localhost. socat should be quite suitable for this task.

If we are dealing with IPv6, then ideally it is necessary for the program to create a listening socket, tying it to :: and without the SO_IPV6_ONLY flag. Such a socket will listen to both :: and 0.0.0.0 at the same time. Further, it all depends on many factors:

is there a dual stack on the host machine, or is it IPv6-only
whether userland proxy is running, or the docker daemon is running with the flag --userland-proxy=false
Which of the IPv6 support mechanisms is selected: real / 64-subnet, NDP proxying or IPv6 NAT, or some other custom topology

In principle, there is not much difference between the dual-stack and the IPv6-only stack. Inside, there will still be a dual stack, as docker basically uses IPv4, and IPv6 only considers it as an add-on. The only thing that needs to be done additionally is to check that the port is accessible from the outside using both protocols.

With userland proxy a little more complicated. First you need to decide whether to use the default settings (userland proxy is enabled) or turn it off completely. Indeed, why is it needed if the publication of ports and traffic forwarding can be configured entirely in iptables? And indeed, there is such an opportunity. You can add the option --userland-proxy=false to the docker engine configuration. But there are a few nuances. First, through iptables, you can wipe the current open ports. For example, some application listened to port 80, then the docker came and asked to add a rule to forward packets for this port to the container. The firewall answered "OK" and after that the traffic stopped coming to the first application. Docker-proxy in this situation simply would not start, saying that the port is busy. Still, iptables does not allow forwarding traffic from the loopback interface, that is, at 127.0.0.1 and :: 1, your container will not be accessible to the host machine. For the first case, you can probably check the checks, but locale in principle is not very necessary. But the biggest problem is that --userland-proxy=false is not ready for production yet. Here is described how the inclusion of this option leads to serious problems. And apparently, in the near future it is unlikely to be corrected. Therefore, it is still better not to turn off the userland proxy, even though its use leads to additional memory consumption.

With the use of userland proxy in the dual-stack, you need to remember that even though it listens to the port on all interfaces, it can only forward traffic to one IP address (it’s not a balancer). Which of them will be chosen if the container has two of them: IPv4 and IPv6? In the source you can see that the IP address is selected one:
https://github.com/docker/libnetwork/blob/master/portmapper/mapper.go#L96
But which one will actually be selected? It was experimentally found that an IPv4 address is always selected. You will need to examine the code for an accurate answer.

https://github.com/docker/libnetwork/blob/master/drivers/bridge/bridge.go

In any case (returning to the basic recommendation), ideally, the application should be taught to listen to all interfaces. In any other case, there are risks. For example, if IPv6 NAT + userland proxy is used, the application listens only to IPv4, and the host machine has a dual stack, then when you try to connect to it from outside via IPv4, everything will be ok, but it will not work on IPv6. Why? Because incoming IPv4 packets are proxied through the docker-proxy to the container's IPv4 address, and IPv6 packets are translated to the container's IPv6 address by means of ip6tables rules created by IPv6 NAT. That is, IPv4 connections go legally through userland proxy, and IPv6 traffic directly to the container! This is easy to check by running python -m SimpleHTTPServer 8000 and some nc6 -6 -l -p 8000 --continuous --exec cat in the container at the same time. Two processes listen to the same port, but are bound to different IP: '0.0.0.0' and '::'. telnet to them outside will work independently: IPv6 will lead to IPv6, IPv4 to IPv4. If after this you find the docker-proxy process that serves this container and kill it, then connections will not be accepted via IPv4, and IPv6 will continue to work as if nothing has happened.

Conclusion

Use IPv6 NAT. It is fast, convenient and least risky.
Make the application listen on all interfaces. This will allow the port to open outside with minimal risk of detecting something is not working.
If you are using custom networks (docker network create), add code to the startup scripts of your applications that would remove the embedded DNS server from /etc/resolv.conf. This will not zafeilitsya when working with AAAA-only domains.
Do not refuse userland proxy. If you need a large range of ports, it is better to try using --net=host

In general, IPv6 support in the docker does not look completely finished. However, there is already some groundwork that allows for quite tolerable living. Therefore, take and use, the new era is not far off.

Source: https://habr.com/ru/post/334418/

All Articles