📜 ⬆️ ⬇️

Setting up the main and two backup operators on a Linux-router with NetGWM

The task of reserving the main gateway is one of the most popular in network administration. It has a number of solutions that implement the mechanisms for prioritizing or balancing outgoing channels for the vast majority of modern routers, including routers based on Linux.



In the article on a fault-tolerant router, we casually mentioned our corporate standard for solving this problem - the Open Source product of NetGWM - and promised to talk about this utility in more detail. In this article, you will learn how the utility works, what “chips” you can use to work with it, and why we decided to stop using alternative solutions.
')

Why NetGWM?


The classical backup scheme of the main gateway in Linux, implemented by iproute2, looks almost the same in almost all sources:


The details of this scheme are easy to read at the linux policy routing request. In our opinion, the scheme has a number of obvious flaws, which became the main motivator for creating the NetGWM utility:

  1. The complexity of making changes to the scheme, poor handling.
  2. If the number of gateways is 3 or more, the script logic becomes more complex, as is the implementation of the selection of a gateway based on metrics.
  3. The problem of detecting the disappearance of the channel. Often, the physical link and even the operator's gateway may be available, and the network is inaccessible due to problems inside the operator’s infrastructure or at its superior service provider. The solution to this problem requires the addition of additional logic in the ifupdown scripts, and in the routing based on metrics it is unsolvable in principle.
  4. The problem of "Humpty Dumpty." Such a problem manifests itself if short-term frequent communication interruptions are observed on a high-priority channel. In this case, the gateway successfully switches to the backup. Where does the problem seem to come from? The fact is that a number of services, such as telephony, video communication, VPN tunnels, and others, require some timeout to detect the fact of a break and establish a new session. Depending on the frequency of breaks, this leads to a sharp decline in the quality of service or its complete inaccessibility. Solving this problem also requires complicating the script logic and is also completely unsolvable by metrics.

We looked at what will help us solve all 4 problems: a simple and manageable tool with support for 2 or more default gateways, able to diagnose channel availability and test it for stability. And they did not find such an option. This is exactly how NetGWM came about.

Installing from GitHub and the "Flant" repository


NetGWM (Network GateWay Manager) is a small primary gateway prioritization utility written in Python and distributed under the free GNU GPL v3 license. The author of the original version is driusha (Andrey Polovov).

The source code and documentation in English are available on GitHub , and brief documentation and description in Russian are available here .

Install from github:

 ##     : ## iproute2, conntrack,  python-yaml ##    : $ git clone git://github.com/flant/netgwm.git netgwm ##   (  ): $ cd netgwm && sudo make install ##    ,    NetGWM $ sudo sh -c "echo '100 netgwm_check' >> /etc/iproute2/rt_tables" ##   cron  root  netgwm   , ##          ## (,   ): $ sudo crontab -e */1 * * * * /usr/lib/netgwm/newtgwm.py 

In addition, a ready-made DEB package with NetGWM can be installed from the repository for Ubuntu of the company "Fant". The installation for Ubuntu 14.04 LTS looks like this:

 ##  : $ sudo wget https://apt.flant.ru/apt/flant.trusty.common.list \ -O /etc/apt/sources.list.d/flant.common.list ##  : $ wget https://apt.flant.ru/apt/archive.key -O- | sudo apt-key add - ##  HTTPS- —  ,     : $ sudo apt-get install apt-transport-https ##      netgwm: $ sudo apt-get update && sudo apt-get install netgwm 

There is no need to add a service routing table and configure cron in Ubuntu. The table is automatically added when the package is installed. In addition, the installation will register the netgwm service, the init script of which starts as a daemon a small shell script /usr/bin/netgwm , which, in turn, reads the value of the INTERVAL parameter (in seconds) from the /etc/default/netgwm file /etc/default/netgwm ) and with the specified periodicity itself calls netgwm.py .

Customization


NetGWM is also based on policy-routing, and we will have to pre-configure the routing tables for each operator.

Suppose there are 3 operators, and it is necessary to make so that the main operator is operator 1, in case of failure - operator 2 was used, and in case of failure of both - operator 3.

Let the first operator be connected to the interface eth1, the second one - to eth2, the third one - to eth3. The first operator has a gateway 88.88.88.88, the second operator has a gateway 99.99.99.99, the third one - 100.100.100.100.

We almost always use packet labeling and the conntrack module from NetFilter when setting up a network with several main gateways. It is a good practice to help distribute packets to state-based routing tables, but is not mandatory.

Configure Marking Packages and conntrack:

 iptables -t mangle -A PREROUTING -i eth1 -m conntrack --ctstate NEW,RELATED -j CONNMARK --set-xmark 0x1/0x3 iptables -t mangle -A PREROUTING -i eth2 -m conntrack --ctstate NEW,RELATED -j CONNMARK --set-xmark 0x2/0x3 iptables -t mangle -A PREROUTING -i eth3 -m conntrack --ctstate NEW,RELATED -j CONNMARK --set-xmark 0x3/0x3 iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark --nfmask 0xffffffff --ctmask 0xffffffff iptables -t mangle -A OUTPUT -o eth1 -m conntrack --ctstate NEW,RELATED -j CONNMARK --set-xmark 0x1/0x3 iptables -t mangle -A OUTPUT -o eth2 -m conntrack --ctstate NEW,RELATED -j CONNMARK --set-xmark 0x2/0x3 iptables -t mangle -A OUTPUT -o eth3 -m conntrack --ctstate NEW,RELATED -j CONNMARK --set-xmark 0x3/0x3 iptables -t mangle -A OUTPUT -j CONNMARK --restore-mark --nfmask 0xffffffff --ctmask 0xffffffff iptables -t mangle -A POSTROUTING -o eth1 -m conntrack --ctstate NEW,RELATED -j CONNMARK --set-xmark 0x1/0x3 iptables -t mangle -A POSTROUTING -o eth2 -m conntrack --ctstate NEW,RELATED -j CONNMARK --set-xmark 0x2/0x3 iptables -t mangle -A POSTROUTING -o eth3 -m conntrack --ctstate NEW,RELATED -j CONNMARK --set-xmark 0x3/0x3 iptables -t mangle -A POSTROUTING -j CONNMARK --restore-mark --nfmask 0xffffffff --ctmask 0xffffffff 

2. Add routing rules for labeled packets. We do this using a script that is called from /etc/network/interfaces on the post-up event on the lo interface:

 #!/bin/bash /sbin/ip rule flush # operator 1 /sbin/ip rule add priority 8001 iif eth1 lookup main /sbin/ip rule add priority 10001 fwmark 0x1/0x3 lookup operator1 /sbin/ip rule add from 88.88.88.88 lookup operator1 # operator 2 /sbin/ip rule add priority 8002 iif eth2 lookup main /sbin/ip rule add priority 10002 fwmark 0x2/0x3 lookup operator2 /sbin/ip rule add from 99.99.99.99 lookup operator2 # operator 3 /sbin/ip rule add priority 8002 iif eth3 lookup main /sbin/ip rule add priority 10002 fwmark 0x3/0x3 lookup operator3 /sbin/ip rule add from 100.100.100.100 lookup operator3 

3. Declare the routing tables in /etc/iproute2/rt_tables :

 #  : 255 local 254 main 253 default 0 unspec #  ,   ( dpkg)  : 100 netgwm_check #   ,     : 101 operator1 102 operator2 103 operator3 

4. Configure NetGWM. By default, netgwm.py will look for the configuration file at /etc/netgwm/netgwm.yml , but you can override this with the -c . Configure the utility to work:

 #          #   ()   . 1 -    #  ,           #      .      ( #    )      (  ). #          # /etc/iproute2/rt_tables gateways: operator1: {ip: 88.88.88.88, priority: 1} operator2: {ip: 99.99.99.99, priority: 2} operator3: {ip: 100.100.100.100, priority: 3} #     «-»,   #  ( ) «». #      ( ), #   netgwm  ,    min_uptime: 900 #   ,    netgwm  #    .      #     ,      , #    .   ,  netgwm   #       ( AND)   check_sites: - 8.8.8.8 # Google public DNS - 4.2.2.2 # Verizon public DNS #   netgwm      #  .    —      .. #  ,    true,  netgwm  #       check_all_gateways: false 

5. Set up actions when switching

If a switch occurs, after the change of the main gateway, all executable files from the /etc/netgwm/post-replace.d/* directory will be executed. In addition, 6 command line parameters will be transferred to each file:


Based on these variables, the script can describe the logic of actions depending on the connected operator (add or remove routes, send notifications, configure shaping, etc.). For example, here’s a shell script that sends notifications:

 #!/bin/bash # ,  :    netgwm if [ "$4" = 'NaN' ] && [ "$5" = 'NaN' ] then STATE='start' else STATE='switch' fi #       case $STATE in 'start') /usr/bin/flant-integration --sms-send="NetGWM on ${HOSTNAME} has been started and now use gw: $1 - $2" ;; 'switch') /usr/bin/flant-integration --sms-send="NetGWM on ${HOSTNAME} has switched to new gw: $1 - $2 from gw: $4 - $5" ;; *) /usr/bin/logger -t netgwm "Unknown NetGWM state. Try restarting service fo fix it." ;; esac exit 

6. We start the netgwm service in Ubuntu, if you installed the deb package:

 $ sudo service netgwm start 

If you received NetGWM from GitHub, then the previously installed task in cron already checks the availability of your main gateway; no additional steps are required.

Journaling


Switching events NetGWM logs in /var/log/netgwm :

 $ tail -n 3 /var/log/netgwm.log 2017-07-14 06:25:41,554 route replaced to: via 88.88.88.88 2017-07-14 06:27:09,551 route replaced to: via 99.99.99.99 2017-07-14 07:28:48,573 route replaced to: via 88.88.88.88 

The stored switching history helps to analyze the incident and determine the reasons for a break in communication.

Tested in production


For about 4 years, NetGWM has been used in our company on 30+ Linux routers of various sizes. The reliability of the utility has been repeatedly tested in the work. For example, on one of the installations, since May 2014, NetGWM handled 137 operator switchings without any problems.

Stability, coverage of all our needs and the absence of problems in operation for a long time led to the fact that we are practically not engaged in the development of the project. NetGWM code is written in Python, so there is no need to adapt the utility to new versions of operating systems. Nevertheless, we will be very happy if you decide to take part in the development of NetGWM by sending your patches to GitHub or simply writing a feature request in the comments.

Conclusion


With NetGWM, we have a stable, flexible, and extensible (using scripts) utility that completely covers our needs for managing the priority of the main gateway.

Any questions about using NetGWM are also welcome - you can right here in the comments.

PS Read also in our blog: " Our recipe for a fail-safe Linux router " - and subscribe to it, so as not to miss new materials!

Source: https://habr.com/ru/post/335030/


All Articles