ToFoIn - Toggle Failover of Internet or switching between two external channels in FreeBSD

annotation

One of the options for increasing the stability of the Internet connection is to use two external communication channels, which implies automatic switching between them. The article briefly discusses some solutions to this problem. Offered their own way to solve using bash scripts in FreeBSD, provide instructions for creating the final system and the source for the necessary scripts.

Introduction

To improve the stability of the Internet connection, corporate solutions involve the use of two or more external network channels. Their simultaneous (for example, balancing method) or sequential (with switching between channels) use is not entirely trivial, but already solved by a multitude of ways. Here are some of them:

SOHO class routers with two exits to the external network (hereinafter, the external network means the Internet, the internal network means the local area network of the enterprise);
Layer 3 switches, as a rule, of carrier class, having a large number of variable parameters, in particular, allowing to solve the above described problem;
A lot of self-written scripts in different languages for different unix and linux-like systems, most often of dubious quality;
Channel balancing with NAT rules;
Balancing or switching using a proxy server.

Each of the above approaches has its advantages and disadvantages. Option One, SOHO routers:

Advantages:

low price;
easy installation and configuration.

Disadvantages:

lack of reliability for the corporate segment due to lack of redundancy;
lack of configuration flexibility, low functionality. (Typically, such devices are able to solve a very limited range of tasks and either cannot do the “side step” at all, or it is associated with various difficulties.)

The second option is Layer 3 switches:
')
Advantages:

reliability;
customization flexibility;

Disadvantages:

price (Typically, prices for such devices are outside of 50 m. p.);
the complexity of the settings (the device is a professional level requires an appropriate approach).

The third option is switching scripts:

Advantages:

price (free, not counting the working time to set up).

Disadvantages:

unpredictable reliability (since the professional level of the authors of these scripts is often unknown, without detailed study it is difficult to conclude about the quality of the product);
lack of flexibility and complexity of customization (usually such scripts are created for specific conditions, and sometimes it is easier to write your own version than to understand someone else's, which explains this diversity).

The fourth option, balancing the rules of NAT:

Advantages:

price (free, not counting the working time to set up);
relative ease of setup.

Disadvantages:

it is necessary to have approximately equivalent bandwidth.

There are doubts about the speed of work in the event of a "fall" of one of the external channels.

And finally, the fifth option, using a proxy server:

Advantages:

price (free, not counting the working time to set up);
customization flexibility.

Disadvantages:

slowing down the data flow;
the need for additional configuration on user machines;
the complexity of setting in non-standard situations.

At the beginning of development, a few years ago, the option of writing your own script was chosen for the following reasons. First, the price. According to this criterion, Layer 3 switches fall out of the second item. In the conditions of a local network of 10 machines, corporate-level solutions are an unaffordable luxury. Unfortunately, the author did not know about the devices from the first point at the time of the decision. By the way, now they are not suitable for the “stability” item. A solution from the fourth paragraph is not suitable, since the available Internet channels differ tenfold in speed and the use of such a scheme, in my opinion, is not justified. In addition, doubts are added about the quality of communication with an external network in the event of a drop in one of the channels. The fifth point does not suit, firstly, slowing down the flow rate, and secondly - I would like to have a solution that is independent of optional components. Accordingly, point 3 remained, where, after researching other people's scripts and attempts to adapt them, it was decided to abandon this idea and write my own script.

Over time, a backup was installed next to the main “router” on FreeBSD, the settings of dns, dhcp, nat, and ipfw repeatedly changed. Everything gradually evolved and improved, except for the aforementioned script, which was finally decided to rewrite, using, as fundamental, the following principles: modularity, a single configuration file, as well as flexibility and ease of configuration in any unix-like system, as well as the simplicity of adding new modules .

Targets and goals

What is the ultimate goal of this project? Create a universal and easily scalable software package based on the client-server system (although it would be better to call its agent-server), focused on identifying problems with external and internal connections and automatically switching to workable connections. The agent in this case is the “collector” of information on the state of external and internal connections at the current time, and the server is part of the program, deciding which connection is prioritized and, if necessary, giving commands to switch to this connection. At the same time on the server (in this context) the agent may not function.

So:

We have n “routers” with m external channels on each. In this case, all n "routers" are in a strict hierarchy.
All machines operate independently of each other an agent whose task is to collect and “add” the results of testing external channels to a server or “router” with the highest priority at the moment (it is assumed that the server part will be a mandatory addition to the agent, while as an agent is not required to perform server functions), as well as determine its (server) accessibility.
The server, in turn, analyzes the received data and determines which channel and which “router” is currently prioritized. For this, the article discusses the settings of the DHCP server, since To change the gateway, the dhcpd settings will change.
In the event of a server failure, a program is activated on all agents, which selects and assigns a new server from among the agents according to pre-set priorities and delegates to it the functions of collecting information on the current state of external connections and making switching decisions. After the restoration of the original server, the reverse process takes place - automatic switching to it.

The details of the algorithm can be painted for a very long time, the general essence is just stated. I do not argue that both n and m (from the example above) take values more than 2 extremely rarely, but they are found, therefore, why not make a universal remedy?

In the process of writing scripts, I ran into some limitations of the bash language, so at the moment, a more elegant solution to the above task seems very vague. So far, there is a solution for a stand-alone “router”, designed to further expand its capabilities.

Decision

For many reasons, it was decided to use the old machine (Pentium 3, 512 OP) with FreeBSD, currently version 9.2, as the basis of the local network, as well as a gateway to the Internet. Subsequently, to improve reliability, a second similar machine was installed, which works in tandem with the existing one. By the way, over the past two years there have been exactly two breakdowns - the first time the BP went down, the second one of the network cards. It should be noted that in this case the entire local network worked without any complaints, since in case of a failure, the backup machine came into play. So the use of old iron in this scheme has almost no effect on the stability of the network. There are also 2 external channels from different Internet providers. The general scheme is shown below, on it:

Blue and red arrows - external communication channels.
Black arrows - internal communication channels.

This system looks like this:

The switch separates traffic from providers using vlans. In the specific case, this is Cisco SF300-08.
In more detail, what and with the help of what works on the machines themselves:
Firewall - IPFW
NAT - “nuclear” NAT from IPFW.
DNS - Bind 9 (using the latest version for FreeBSD)
DHCP - isc-dhcpd
ToFoIn is the main culprit in this article.

The article will not describe the subtleties of configuring DNS, DHCP, since, generally speaking, it is assumed that the reader is already familiar with such systems. In addition, the material on this topic is complete, and some links will be mentioned at the end of the article. The technical part contains the complete Firewall and NAT rules for ipfw with almost no comments (again, there are also complete materials on this topic) that are currently available, as well as kernel parameters and rc.conf.

Now consider in detail the principle of the script. First, what are the modules and their functions:

Daemon - as the name implies, is the main process that starts testing and switching modules on a timer.
Tester - tests the availability of communication via external channels using the ping command.
Judge — based on the test results, determines which external channel is working and whether switching is necessary.
Logger - is responsible for logging events. It is necessary so that information about events is not duplicated and the magazine is easier to read.
Watchdog - runs on a schedule from the crontab. It determines the "freezing" of all modules and, if possible, tries to solve the problems that have arisen.

In addition to the scripts themselves, it is worth considering some more important files:

Tofoin.conf - a single configuration file.
Tofoin.log is a single event log file.
Result_ < internal channel number > - work file, test results are added here

Some more working files are also used and, of course, each script creates a pid file at startup, and deletes it during the completion process.

The work of Logger and Watchdog will not be described in detail, who is interested, will be able to get acquainted if desired. Let us consider in more detail the work of the main modules Daemon, Tester and Judge. Daemon runs Tester and Judge by timers, which are stored in the configuration file. It looks like this - at the start tests are run, and the timestamp is remembered, then, based on sensitivity, every n seconds it is checked whether the time for running the next test is exceeded, or the current state of the connection is evaluated. Thus, Daemon remembers the latest timestamp for tests and checks and compares them with the current timestamp. If the difference is larger than specified in the configuration file, then the test or test is launched, respectively, and the timestamp is replaced with the current one. Etc.

Tester is the easiest module for now. It accepts 2 variables as input:

./tester.sh ab

where a is the number of the routing table, b is the task (in the usual case b = 10, which means complete testing and recording the result).

Test modes are also provided for the Tester module, where b = 0 - ping only the first target (from the configuration file), b = 1 - ping only the second target (from the configuration file), b = <destination>, for example, b = habrhabr. ru - ping an arbitrary target in this mode. In this case, for 0 the routing table, the command will look like this:

 ./tester.sh 0 habrahabr.ru

The main component of the program is obviously the Judge module. The algorithm of his work in general is as follows:

Based on current ipfw rules, the current external channel is determined.
In the loop, an array of current data on the status of external channels is compiled.
The next cycle determines the preferred external channel.
Next, the function of determining whether to switch the channel is started, and, if necessary, the switch function is started, which receives the internal channel number for switching. (The return to the main channel does not take place immediately. This is done so that in case of unstable operation of the main channel, there are no jumps back and forth, and switching occurs only when the main external channel starts to work stably).
In the end, if the need arises, the switch function is launched, which substitutes the necessary ipfw settings, restarts it, and also restarts with the desired Bind routing table.

Of course, all key actions are recorded in the event log, and in the event of a contingency situation, again, the cause of the error is recorded and Watchdog is called.

So, the basic principles of work are considered, I suggest to get acquainted with how all this is implemented in practice.

Technical part

Equipment

About the equipment already mentioned, in this section I will try to tell in more detail. For my DNS, DHCP, NAT, and IPFW operation (in my case, an internal network of about 30 machines), a Celeron based on a Pentium III, 512 MB of RAM and 40GB HDD, and a 350W power supply unit with support for the corresponding motherboard connectors are enough. Also connected to 2 additional PCI network cards. In power, both routers are about the same.

Here you can argue that the power in places even unnecessary, but these machines are not specifically purchased, and were collected from what was left after the update of the fleet of custom cars. Most likely, the minimum required set of services can be run on a much weaker hardware. It would also be nice to make sure and organize a mirrored RAID. Unfortunately, I did not think about it beforehand and now it is connected with some difficulties, but this is a completely different story.

In my opinion, this is quite a worthy use of the old working iron, which otherwise often either gathers dust in the warehouse, or is thrown out or distributed.

Presetting

In order for this system to work, of course, you need to perform some pre-setting.

First, configure the Primary and Secondary DNS servers. If you have only one “router”, then only the Primary DNS server is enough for you to start. In this task, Bind 9 was used, as mentioned above. Some configuration links are given at the end of the article. The textbook "DNS and BIND" by Cricket Lee and Paul Albitts helps very well in this case.

Secondly, you need to configure dhcp failover peer. If you have only one “router”, then the usual settings for a standalone DHCP server are enough. Again, links are provided at the end of the article. In case, for any reason, the article on setting up a failover dhcp peer link is not available (and in the past few months the situation is exactly this), I’ll give here a script to synchronize settings, as well as key points on setting up.

Failover dhcpd setup

In order to configure failover dhcp peer you need:

Create the main dhcpd.conf settings file in / usr / local / etc, which is referenced in rc.conf. I have the following:

/usr/local/etc/dhcpd.conf

 # dhcpd.conf # # option definitions common to all supported networks... option domain-name "companyname.local"; option domain-name-servers 10.0.0.2, 10.0.0.1; option ntp-servers 10.0.0.2, 10.0.0.1; option log-servers 10.0.0.1; update-static-leases on; # 1 hour default-lease-time 3600; # 1 day max-lease-time 86400; # Use this to enable / disable dynamic dns updates globally. ddns-update-style interim; # If this DHCP server is the official DHCP server for the local # network, the authoritative directive should be uncommented. authoritative; # Use this to send dhcp log messages to a different log file (you also # have to hack syslog.conf to complete the redirection). log-facility local7; set vendorclass = option vendor-class-identifier; # DNS key include "/usr/local/etc/dhcpd/dns.key"; zone companyname.local.{ primary 127.0.0.1; key DHCP_UPDATER; } zone 0.0.10.in-addr.arpa.{ primary 127.0.0.1; key DHCP_UPDATER; } # DHCP Failover, Primary include "/usr/local/etc/dhcpd/dhcpd.conf_primary"; # Subnet declaration include "/usr/local/etc/dhcpd/dhcpd.subnet"; # Static IP addresses include "/usr/local/etc/dhcpd/dhcpd.static";

Here dns.key is the key for communicating with the dns server, these issues are discussed in detail in the articles on configuring dns + dhcp.

Create a folder / usr / local / etc / dhcpd. Create the following files in it containing approximately the following:

/usr/local/etc/dhcpd/dhcpd.conf_primary

 ########################## # DHCP Failover, Primary # ########################## failover peer "dhcpdpeer" { # Failover configuration primary; # I am the primary address 10.0.0.1; # My IP address port 1111; peer address 10.0.0.2; # Peer's IP address peer port 2222; max-response-delay 60; max-unacked-updates 10; mclt 3600; split 128; # Leave this at 128, only defined on Primary load balance max seconds 3; }

/usr/local/etc/dhcpd/dhcpd.subnet

 subnet 10.0.0.0 netmask 255.255.255.0 { pool { failover peer "dhcpdpeer"; range 10.0.0.15 10.0.0.240; } option subnet-mask 255.255.255.0; option routers 10.0.0.2, 10.0.0.1; option broadcast-address 10.0.0.255; option netbios-name-servers 10.0.0.3; option netbios-dd-server 10.0.0.3; option netbios-node-type 8; }

In this case, the netbios name server - windows server with the wins server service running, also samba can play this role.

/usr/local/etc/dhcpd/dhcpd.static

 host SERVER3 { hardware ethernet 11:11:11:11:11:11; fixed-address 10.0.0.3; } host SERVER4 { hardware ethernet 22:22:22:22:22:22; fixed-address 10.0.0.4; }

This file, as you might guess, is for static addresses.

On the second “router”, the files look like this:

/usr/local/etc/dhcpd.conf

 # dhcpd.conf # # option definitions common to all supported networks... option domain-name "companyname.local "; option domain-name-servers 10.0.0.2, 10.0.0.1; option ntp-servers 10.0.0.2, 10.0.0.1; option log-servers 10.0.0.1; update-static-leases on; # 1 hour default-lease-time 3600; # 1 day max-lease-time 86400; # Use this to enable / disable dynamic dns updates globally. ddns-update-style interim; # If this DHCP server is the official DHCP server for the local # network, the authoritative directive should be uncommented. authoritative; # Use this to send dhcp log messages to a different log file (you also # have to hack syslog.conf to complete the redirection). log-facility local7; set vendorclass = option vendor-class-identifier; # DNS key include "/usr/local/etc/dhcpd/dns.key"; zone companyname.local.{ secondary 127.0.0.1; key DHCP_UPDATER; } zone 0.0.10.in-addr.arpa.{ secondary 127.0.0.1; key DHCP_UPDATER; } # DHCP Failover, Primary include "/usr/local/etc/dhcpd/dhcpd.conf_secondary"; # Subnet declaration include "/usr/local/etc/dhcpd/dhcpd.subnet.DONOTEDIT"; # Static IP addresses include "/usr/local/etc/dhcpd/dhcpd.static.DONOTEDIT";

/usr/local/etc/dhcpd/dhcpd.conf_secondary

 ########################### # DHCP Failover,Secondary # ########################### failover peer "dhcpdpeer" { # Failover configuration secondary; # I am the secondary address 10.0.0.2; # My IP address port 2222; peer address 10.0.0.1; # Peer's IP address peer port 1111; max-response-delay 60; max-unacked-updates 10; mclt 3600; load balance max seconds 3; }

The rest of the files can be taken from the first "router", just by changing the name, or set it up to the end and the files will automatically move when you restart isc-dhcpd (how exactly - below).

Create an executable file with the following contents:

/ usr / local / bin / dhcpd-sync

 #!/bin/sh # backup generation date=`date -v-1d '+%Y%m%d-%H%M%s'` month=`date '+%m%Y'` sudo -u dhcp-updater cp -f /usr/local/etc/dhcpd/dhcpd.subnet /var/dhcp-backup/dhcpd.subnet.$date sudo -u dhcp-updater bzip2 -f -k -z /var/dhcp-backup/dhcpd.subnet.$date sudo -u dhcp-updater tar -r -f /var/dhcp-backup/dhcpd.subnet.$month.tar -C /var/dhcp-backup dhcpd.subnet.$date.bz2 sudo -u dhcp-updater cp -f /usr/local/etc/dhcpd/dhcpd.static /var/dhcp-backup/dhcpd.static.$date sudo -u dhcp-updater bzip2 -f -k -z /var/dhcp-backup/dhcpd.static.$date sudo -u dhcp-updater tar -r -f /var/dhcp-backup/dhcpd.static.$month.tar -C /var/dhcp-backup dhcpd.static.$date.bz2 sudo -u dhcp-updater scp -P 22 -q /var/dhcp-backup/dhcpd.subnet.$date.bz2 dhcp-updater@10.0.0.2:/var/dhcp-backup sudo -u dhcp-updater ssh -p 22 10.0.0.2 tar -r -f /var/dhcp-backup/dhcpd.subnet.$month.tar -C /var/dhcp-backup dhcpd.subnet.$date.bz2 sudo -u dhcp-updater scp -P 22 -q /var/dhcp-backup/dhcpd.static.$date.bz2 dhcp-updater@10.0.0.2:/var/dhcp-backup sudo -u dhcp-updater ssh -p 22 10.0.0.2 tar -r -f /var/dhcp-backup/dhcpd.static.$month.tar -C /var/dhcp-backup dhcpd.static.$date.bz2 sudo -u dhcp-updater ssh -p 22 10.0.0.2 rm /var/dhcp-backup/dhcpd.subnet.$date.bz2 sudo -u dhcp-updater ssh -p 22 10.0.0.2 rm /var/dhcp-backup/dhcpd.static.$date.bz2 sudo -u dhcp-updater rm /var/dhcp-backup/dhcpd.subnet.$date sudo -u dhcp-updater rm /var/dhcp-backup/dhcpd.static.$date sudo -u dhcp-updater rm /var/dhcp-backup/dhcpd.subnet.$date.bz2 sudo -u dhcp-updater rm /var/dhcp-backup/dhcpd.static.$date.bz2 # sync and restart secondary DHCP sudo -u dhcp-updater scp -P 22 -q /usr/local/etc/dhcpd/dhcpd.subnet dhcp-updater@10.0.0.2:/usr/local/etc/dhcpd/dhcpd.subnet.DONOTEDIT sudo -u dhcp-updater scp -P 22 -q /usr/local/etc/dhcpd/dhcpd.static dhcp-updater@10.0.0.2:/usr/local/etc/dhcpd/dhcpd.static.DONOTEDIT sudo -u dhcp-updater ssh -p 22 10.0.0.2 sudo /usr/local/etc/rc.d/isc-dhcpd restart

Create a dhcp-updater user with the appropriate rights on both servers, set it in sudo settings, configure the ssh connection using the key from the primary to the secondary “router”, delete the password. You may also need to create the / var / dhcp-backup / folder on both machines.

Modify the /usr/local/etc/rc.d/isc-dhcpd file as follows:

Before:

 dhcpd_checkconfig () { local rc_flags_mod setup_flags rc_flags_mod="$rc_flags" # Eliminate '-q' flag if it is present case "$rc_flags" in *-q*) rc_flags_mod=`echo "${rc_flags}" | sed -Ee 's/(^-q | -q | -q$)//'` ;; esac if ! ${command} -t -q ${rc_flags_mod}; then err 1 "`${command} -t ${rc_flags_mod}` Configuration file sanity check failed" fi }

After:

 dhcpd_checkconfig () { local rc_flags_mod setup_flags rc_flags_mod="$rc_flags" # Eliminate '-q' flag if it is present case "$rc_flags" in *-q*) rc_flags_mod=`echo "${rc_flags}" | sed -Ee 's/(^-q | -q | -q$)//'` ;; esac if ! ${command} -t -q ${rc_flags_mod}; then err 1 "`${command} -t ${rc_flags_mod}` Configuration file sanity check failed" else sh /usr/local/bin/dhcpd-sync fi }

If all settings are correct, when the dhcp server is restarted on the main machine, the current configuration will be archived, synchronized with the second server, and restart will occur on both machines.

It would be useful to add the following task to crontab:

 0 0 * * * root /usr/local/etc/rc.d/isc-dhcpd restart

This completes the failover dhcpd setting.

Thirdly, in order for routing tables to appear besides zero, as well as to work on “nuclear” nat and ipfw, you need to rebuild the kernel with the following parameters (of course, options are possible, but they, again, follow the links at the end):

 options IPFIREWALL options IPFIREWALL_VERBOSE options IPFIREWALL_VERBOSE_LIMIT=50 options IPFIREWALL_NAT options LIBALIAS options DUMMYNET options HZ=1000 options ROUTETABLES=2

In order for the second routing table (under the number “1”, since the first one has the number “0”) to work after a reboot, it is necessary to create it in rc.d (I have it located in /usr/local/etc/rc.d /) file with the following contents:

/usr/local/etc/rc.d/setfib1

 #!/bin/sh # # PROVIDE: SETFIB1 # REQUIRE: NETWORKING # BEFORE: DAEMON # # Add the following lines to /etc/rc.conf to enable setfib -1 at startup # setfib1 (bool): Set to "NO" by default. # Set it to "YES" to enable setfib1 # setfib1_defaultroute (str): Set to "" by default # Set it to ip address of default gateway for use in fib 1 . /etc/rc.subr name="setfib1" rcvar=`set_rcvar` load_rc_config $name [ -z "$setfib1_enable" ] && setfib1_enable="NO" [ -z "$setfib1_defaultrouter" ] && setfib1_defaultrouter="" start_cmd="${name}_start" stop_cmd="${name}_stop" setfib1_start() { if [ ${setfib1_defaultrouter} ] then setfib 1 route add -net default ${setfib1_defaultrouter} else echo "Can not set default route for fib 1 - setfib1_defaultrouter is not assigned in rc.conf!" fi } setfib1_stop() { setfib 1 route del -net default } run_rc_command "$1"

And also add a few lines in rc.conf, for example, for the primary “router”:

 setfib1_enable="YES" setfib1_defaultrouter="2.2.2.1"

In fact, this boot script adds the default route to the second table. If necessary, you can run up to 65536 routing tables (in version 10 of FreeBSD) by copying the above script with minor changes and adding parameters to rc.conf. (Of course, in the kernel parameters you must first enable these 65536 tables.)

My rc.conf configuration on the main "router":

But first some comments:
Eth0 is the physical interface of the main external channel.
Eth1 is the physical interface of the backup external channel.
Eth2 is the physical interface of the internal channel.
Vlan1 - interface of the main external channel.
Vlan2 - interface backup external channel.
Vlan3 and vlan4 are reserved for future functionality, about this at the end of the article.
10.0.0.1 - the address of the “router” in the internal network, respectively, the backup will have, for example, 10.0.0.2.
1.1.1.2 and 1.1.1.1 - ip-address and default gateway for the main external channel.
2.2.2.2 and 2.2.2.1 - ip-address and default gateway for backup external channel.
## ATTENTION! The names of the interfaces and ip-addresses are taken for example, in each case they will be yours! ##

/etc/rc.conf

 hostname="SERVER1.companyname.local" keymap="ru.koi8-r" font8x8="cp866-8x8" font8x14="cp866-8x14" font8x16="cp866-8x16" scrnmap="koi8-r2cp866" cursor="destructive" ifconfig_eth0="up" vlans_eth0="vlan1 vlan3" create_args_vlan1="vlan 1" create_args_vlan3="vlan 3" ifconfig_eth1="up" vlans_eth1="vlan2 vlan4" create_args_vlan2="vlan 2" create_args_vlan4="vlan 4" ifconfig_eth2="inet 10.0.0.1 netmask 255.255.255.0" ifconfig_vlan1="inet 1.1.1.2/24" ifconfig_vlan3="inet 10.0.1.1/30" ifconfig_vlan2="inet 2.2.2.2/24" ifconfig_vlan4="inet 10.0.2.1/30" defaultrouter="1.1.1.1" setfib1_enable="YES" setfib1_defaultrouter="2.2.2.1" gateway_enable="YES" sshd_enable="YES" moused_enable="YES" ntpd_enable="YES" powerd_enable="YES" hald_enable="YES" dbus_enable="YES" dumpdev="AUTO" firewall_enable="YES" firewall_logging="YES" firewall_script="/etc/firewall.sh" named_enable="YES" named_program="/usr/sbin/named" named_flags="-u bind -c /etc/namedb/named.conf" dhcpd_enable="YES" dhcpd_conf="/usr/local/etc/dhcpd.conf" dhcpd_ifaces="eth2"

Below are the NAT and Firewall settings that work for me:

When working through the main external channel:

/etc/rules.firewall0

 #!/bin/sh # Delete all rules /sbin/ipfw -q -f flush /sbin/ipfw -q -f pipe flush /sbin/ipfw -q -f queue flush /sbin/ipfw -q -f nat 1 delete /sbin/ipfw -q -f table all flush # Parameters ipfw="/sbin/ipfw -q add" extM_if="vlan1" extM_ip="1.1.1.2" extS_if="vlan2" extS_ip="2.2.2.2" int_if="eth2" int_ip="10.0.0.1" lan_net="10.0.0.0/24" odmin="10.0.0.111" # Tables # Table 1 - non-routes networks /sbin/ipfw table 1 add 192.168.0.0/16 /sbin/ipfw table 1 add 172.16.0.0/12 /sbin/ipfw table 1 add 10.0.0.0/8 /sbin/ipfw table 1 add 127.0.0.0/8 /sbin/ipfw table 1 add 0.0.0.0/8 /sbin/ipfw table 1 add 169.254.0.0/16 /sbin/ipfw table 1 add 192.0.2.0/24 /sbin/ipfw table 1 add 204.152.64.0/23 /sbin/ipfw table 1 add 224.0.0.0/3 # Choose route table $ipfw setfib 0 all from any to any via $int_if # Allow all traffic on loopback $ipfw allow all from any to any via lo0 # Deny access to lo0 from out $ipfw deny log all from any to 127.0.0.0/8 # Deny outcome packets from lo0 $ipfw deny log all from 127.0.0.0/8 to any # Allow returning $ipfw check-state # Deny IPv6 $ipfw deny log ipv6 from any to any # Antispoofing $ipfw deny log all from any to any not antispoof in # Block any delayed packets (fragments) $ipfw deny all from any to any frag ######################################### # Internal interface, outcoming traffic # ######################################### # Allow all traffic from gateway to lan $ipfw allow all from any to $lan_net out via $int_if # Deny and log other $ipfw deny log all from any to any out via $int_if ######################################## # Internal interface, incoming traffic # ######################################## # Deny all Netbios $ipfw deny tcp from any to any 81,137,138,139 in via $int_if # Allow traffic on internal interface # DHCP $ipfw allow udp from any to me 67,68,1515,1516 in via $int_if # Mail $ipfw allow tcp from $lan_net to any 25,110,143,465,993,995 in via $int_if # Time $ipfw allow tcp from $lan_net to any 37 in via $int_if $ipfw allow udp from $lan_net to any 123 in via $int_if # ICQ $ipfw allow tcp from $lan_net to any 443,5190,5222 in via $int_if # FTP and some other $ipfw allow tcp from $lan_net to any 21,22,49152-65535 in via $int_if # HTTP $ipfw allow tcp from $lan_net to any 80 in via $int_if # Output whois $ipfw allow tcp from $lan_net to any 43 in via $int_if # DNS $ipfw allow udp from $lan_net to any 53 in via $int_if $ipfw allow tcp from $lan_net 53 to $int_ip in via $int_if $ipfw allow tcp from $lan_net to $int_ip 53 in via $int_if # Ping $ipfw allow icmp from $lan_net to any icmptypes 0,3,8,11 in via $int_if # For admin $ipfw allow all from $odmin 1025-6000,11111,22222,50000-60000 to any in via $int_if $ipfw allow all from 10.0.0.2 22 to $int_ip in via $int_if $ipfw 55100 allow all from any to $int_ip 22 in via $int_if # Deny and log other $ipfw deny log all from any to any in via $int_if ######################################### # External interface, outcoming traffic # ######################################### # Deny all outcoming traffic to non-route networks $ipfw deny log all from any to 'table(1)' out via $extM_if $ipfw deny log all from any to 'table(1)' out via $extS_if # Deny broadcast ICMP on ext interface $ipfw deny icmp from any to 255.255.255.255 out via $extM_if $ipfw deny icmp from any to 255.255.255.255 out via $extS_if # Deny multicast on ext interface $ipfw deny all from 224.0.0.0/4 to any out via $extM_if $ipfw deny all from 224.0.0.0/4 to any out via $extS_if # Allow me go to internet $ipfw allow all from $extM_ip to any out via $extM_if setup keep-state $ipfw allow all from $extS_ip to any out via $extS_if setup keep-state # DNS BIND $ipfw allow udp from $extM_ip to any 53 out via $extM_if keep-state $ipfw allow udp from $extS_ip to any 53 out via $extS_if keep-state # Time $ipfw allow udp from $extM_ip to any 123 out via $extM_if keep-state $ipfw allow tcp from $extM_ip to any 37 out via $extM_if setup keep-state # Output whois $ipfw allow tcp from $extM_ip to any 43 out via $extM_if setup keep-state # NAT /sbin/ipfw -q nat 1 config log if $extM_if reset same_ports deny_in unreg_only redirect_port tcp 10.0.0.111:33333 33333 redirect_port udp 10.0.0.111:11111 11111 redirect_port tcp 10.0.0.111:22222 22222 redirect_port udp 10.0.0.111:22222 22222 # NAT outcoming traffic $ipfw nat 1 ip from any to any out via $extM_if # Allow traffic on outcoming interface # Mail $ipfw allow tcp from any to any 25,110,143,465,993,995 out via $extM_if # ICQ $ipfw allow tcp from any to any 443,5190,5222 out via $extM_if # FTP and some other $ipfw allow tcp from any to any 21,22,49152-65535 out via $extM_if # HTTP $ipfw allow tcp from any to any 80 out via $extM_if # Ping $ipfw allow icmp from any to any icmptypes 0,3,8,11 out via $extM_if $ipfw allow icmp from any to any icmptypes 0,3,8,11 out via $extS_if # For admin $ipfw allow tcp from any 1025-6000 to any out via $extM_if $ipfw allow all from any 11111,22222,50000-60000 to any out via $extM_if # Deny and log other $ipfw deny log all from any to any out via $extM_if $ipfw deny log all from any to any out via $extS_if ######################################## # External interface, incoming traffic # ######################################## # Deny all incoming traffic from non-route networks $ipfw deny log all from 'table(1)' to any in via $extM_if $ipfw deny log all from 'table(1)' to any in via $extS_if # Deny ident $ipfw deny tcp from any to any 113 in via $extM_if $ipfw deny tcp from any to any 113 in via $extS_if # Deny all Netbios $ipfw deny tcp from any to any 81,137,138,139 in via $extM_if $ipfw deny tcp from any to any 81,137,138,139 in via $extS_if # SSH (also for internal network) $ipfw allow all from any to me 22 in via $extM_if $ipfw allow all from any to me 22 in via $extS_if # NAT incoming traffic $ipfw nat 1 ip from any to any in via $extM_if # Allow traffic on outcoming interface # Mail $ipfw allow tcp from any 25,110,143,465,993,995 to any in via $extM_if # ICQ $ipfw allow tcp from any 443,5190,5222 to any in via $extM_if # FTP and some other $ipfw allow tcp from any 21,22,49152-65535 to any in via $extM_if # HTTP $ipfw allow tcp from any 80 to any in via $extM_if # Ping $ipfw allow icmp from any to any icmptypes 0,3,8,11 in via $extM_if $ipfw allow icmp from any to any icmptypes 0,3,8,11 in via $extS_if # For admin $ipfw allow tcp from any to $odmin 1025-6000 in via $extM_if $ipfw allow all from any to $odmin 11111,22222,50000-60000 in via $extM_if # Deny and log other $ipfw deny log all from any to any in via $extM_if $ipfw deny log all from any to any in via $extS_if $ipfw deny log all from any to any

When working through the backup external channel all the settings are the same, only the header changes:

/etc/rules.firewall1 header

 # Parameters ipfw="/sbin/ipfw -q add" extM_if="vlan2" extM_ip="2.2.2.2" extS_if="vlan1" extS_ip="1.1.1.1" int_if="eth2" int_ip="10.0.0.1" lan_net="10.0.0.0/24" odmin="10.0.0.111" serv="10.0.0.4

Also, sshguard is configured on the “routers”, but the sophisticated reader will be able to find and install this program.

Script source

ToFoIn - Toggle Failover of Internet. Most likely, the name is more than ambitious, but I did not invent the product’s characteristics more precisely. Below is the text of the scripts and related files with a little explanation.

tofoin.conf

 ## tofoin.conf ## ## by LordNicky v0.6 20140719 ## ## Little about the modules and about what function they perform. ## Tester - Testing the availability of the Internet on selected channel. ## Judge - Test results analysis, the decision to switch ## from one channel to another. ## Logger - Event logging. ## Watchdog - Testing and debugging of the scripts. ## Configuration. ## Amouth of the Internet channels. CNUMBER=2 ## Main Internet channel properties. ## Interface name. EXT_0_IF=vlan10 ## Id number of the routing table. RTABLE_0=0 ## Reserve Internet channel properties. ## Interface name. EXT_1_IF=vlan20 ## Id number of the routing table RTABLE_1=1 ## URL's supposed to be used for diagnostic of the availability ## of the Internet channel. PTARGET_0 should be domain name, and ## PTARGET_1 should be IP address. ## Attention: The resources should be different. PTARGET_0=ya.ru PTARGET_1=8.8.8.8 ## Count of icmp packets used for testing one resource. PNUMBER=2 ## Period of launching of the module "Tester" (in seconds). ## Strongly not recomended to set a value less than 60. TESTERPERIOD=240 ## Period of launching of the module "Judge" (in seconds). ## Strongly not recomended to set a value less than TESTERPERIOD. ## Usually enough TESTERPERIOD + 60. JUDGEPERIOD=300 ## Launching sensitivity for the modules Tester and Judge. ## Usually enough 60. SENSITIVITY=60 ## The maximum operating time for the module Tester. TESTERMAXDELAY=40 ## The maximum operating time for the module Judge. JUDGEMAXDELAY=30 ## The maximum operating time for the module Logger. LOGGERMAXDELAY=20 ## Amount of tests that successfully passed before returning ## to the main channel. Thereby, time elapsed since the restore ## the work main channel is approximately (WNUMBER+1)*JUDGEPERIOD ## seconds. WNUMBER=3 ## The frequency of writing error message into the log file. ## The main idea is the following. At first time the message ## is written completely. After LOGFREQ1 repetitions logger ## writes the only message about LOGFREQ1 the same messages. ## Later in each LOGFREQ2 repetitions logger writes the only ## message about LOGFREQ2 the same messages. This algorithm ## works only if the same messages are following after each other. LOGFREQ1=5 LOGFREQ2=20 ## File paths. ## Paths for configuration script files IPFW. ## Default file. (It is written in the rc.conf) FIRESETDEF=/etc/firewall.sh ## Settings for main Internet channel. FIRESET_0=/etc/rules.firewall0 ## Settings for reserve Internet channel. FIRESET_1=/etc/rules.firewall1 ## Paths for all ToFoIn files. ## Daemon. DAEMON=/path/to/file/tofoin_daemon.sh ## Tester. TESTER=/path/to/file/tofoin_tester.sh ## Judge. JUDGE=/path/to/file/tofoin_judge.sh ## Logger. LOGGER=/path/to/file/tofoin_logger.sh ## Watchdog. WATCHDOG=/path/to/file/tofoin_watchdog.sh ## Log file. It is recommended to locate it into the /var/log. LOGFILE=/path/to/file/tofoin.log ## The directory supposed for test results. It is recomended ## to locate it into the /tmp. TESTER_RESULT=/path/to/directory ## Auxiliary module file Judge. It is recommended to locate ## it into the /tmp. JUDGEMETER=/path/to/file/judgemeter ## Auxiliary module file Logger. It is recommended to locate ## it into the /tmp. LOGTMP=/path/to/file/logger.tmp LOGMETER=/path/to/file/logmeter ## PID files for all executable modules. It is recommended ## to locate it into /var/run. DAEMON_PID=/path/to/file/tofoin_daemon.pid TESTER_PID=/path/to/directory JUDGE_PID=/path/to/file/tofoin_judge.pid LOGGER_PID=/path/to/file/tofoin_logger.pid WATCHDOG_PID=/path/to/file/tofoin_watchdog.pid

tofoin_daemon.sh

 #!/usr/local/bin/bash # by LordNicky v0.5 20140717 . /root/ToFoIn/tofoin.conf test_time=`date +%s`; judge_time=`date +%s`; echo $$ > $DAEMON_PID; $LOGGER "DAEMON: start successfully with pid $$" & tester_0="$TESTER $RTABLE_0 10 0"; tester_1="$TESTER $RTABLE_1 10 1"; $tester_0 & $tester_1 & while true do current_time=`date +%s`; if [ "`expr $current_time - $test_time`" -ge "$TESTERPERIOD" ] then $tester_0 & $tester_1 & test_time=`date +%s`; else :; fi if [ "`expr $current_time - $judge_time`" -ge "$JUDGEPERIOD" ] then $JUDGE & judge_time=`date +%s`; else :; fi sleep $SENSITIVITY; done

tofoin_tester.sh

 #!/usr/local/bin/bash # by LordNicky v0.7 20140717 . /root/ToFoIn/tofoin.conf exit_function () { rm $tester_pid; exit $exit_code; } tester_pid=$TESTER_PID/tofoin_test_$3\.pid; if [ -e $tester_pid ]; then $WATCHDOG "tofoin_test" "$tester_pid" "$3" & exit 0; else echo `date +%s` $$ > $tester_pid; if [ "$2" -eq 10 ]; then if setfib $1 ping -c $PNUMBER $PTARGET_0 > /dev/null; then echo `date +%s` "0 0" > $TESTER_RESULT/result_$3; exit_code=0; exit_function; else if setfib $1 ping -c $PNUMBER $PTARGET_1 > /dev/null; then echo `date +%s` "0 1" > $TESTER_RESULT/result_$3; exit_code=0; exit_function; else echo `date +%s` "1 1" > $TESTER_RESULT/result_$3; exit_code=0; exit_function; fi fi elif [ "$2" -eq 0 ]; then setfib $1 ping -c $PNUMBER $PTARGET_0; exit_code=0; exit_function; elif [ "$2" -eq 1 ]; then setfib $1 ping -c $PNUMBER $PTARGET_1; exit_code=0; exit_function; else setfib $1 ping -c $PNUMBER $2; exit_code=1; exit_function; fi fi

As mentioned earlier, the tester module has a slightly expanded functionality for manual launch. The “solution” section describes how. Also, as can be seen from the text of the script, tester writes the results to a file only in the case of a regular launch.

tofoin_judge.sh

 #!/usr/local/bin/bash # by LordNicky v0.7 20140717 . /root/ToFoIn/tofoin.conf exit_function () { rm $JUDGE_PID; exit $exit_code; } decision_function () { if [ "$actualchan" -eq "$prefchan" ]; then if [ "$actualchan" -eq 0 ]; then $LOGGER "JUDGE: No problems detected" & exit_code=0; exit_function; elif [ "$actualchan" -eq 1 ]; then echo -e "0" > $JUDGEMETER; $LOGGER "JUDGE: No problems detected at channel $actualchan" & exit_code=0; exit_function; else $LOGGER "JUDGE(decision): Invalid actualchan = $actualchan" & exit_code=1; exit_function; fi else if [ "$prefchan" -eq 1 ]; then switch_function; exit_code=0; exit_function; elif [ "$prefchan" -eq 0 ]; then if [ "$actualstate" -eq 0 ] then meter=`cat $JUDGEMETER`; if [ "$meter" -eq "$WNUMBER" ]; then switch_function; exit_code=0; exit_function; elif [ "$meter" -lt "$WNUMBER" ]; then expr $meter + 1 > $JUDGEMETER; exit_code=0; exit_function; else echo -e "0" > $JUDGEMETER; exit_code=0; exit_function; fi elif [ "$actualstate" -eq 1 ] then $LOGGER "JUDGE: Emergency switch to $prefchan"; switch_function; exit_code=0; exit_function; else $LOGGER "JUDGE(decision): Invalid actualstate = $actualstate" & exit_code=1; exit_function; fi else $LOGGER "JUDGE(decision): Invalid prefchan = $prefchan" & exit_code=1; exit_function; fi fi } switch_function () { echo -e "0" > $JUDGEMETER; if [ "$prefchan" -eq 0 ]; then /etc/rc.d/named stop; cp $FIRESET_0 $FIRESETDEF; /etc/rc.d/ipfw restart; setfib $RTABLE_0 /etc/rc.d/named start; $LOGGER "JUDGE: Now switching on channel $RTABLE_0" & exit_code=0; exit_function; elif [ "$prefchan" -eq 1 ] then /etc/rc.d/named stop; cp $FIRESET_1 $FIRESETDEF; /etc/rc.d/ipfw restart; setfib $RTABLE_1 /etc/rc.d/named start; $LOGGER "JUDGE: Now switching on channel $RTABLE_1" & exit_code=0; exit_function; else $LOGGER "JUDGE(switch): Invalid prefchan = $prefchan" & exit_code=1; exit_function; fi } createarea_function () { for ((a=0; a < CNUMBER ; a++)) do current_time=`date +%s` timearea[$a]=`cut -c 1-10 $TESTER_RESULT/result_$a`; if [ "`expr $current_time - ${timearea[$a]}`" -ge 0 ]; then if [ "`expr $current_time - ${timearea[$a]}`" -lt "`expr $TESTERPERIOD + 120`" ]; then :; else $LOGGER "JUDGE: MAX period" & $WATCHDOG & exit_code=1; exit_function; fi else $LOGGER "JUDGE: testmodule $a in future" & $WATCHDOG & exit_code=1; exit_function; fi statearea[$a]=`cut -c 12 $TESTER_RESULT/result_$a`; if [ "$actualchan" -eq "$a" ] then actualstate=${statearea[$a]}; else :; fi done } findarea_function () { for ((a=0; a < CNUMBER ; a++)) do if [ "${statearea[$a]}" -eq 0 ] then prefchan=$a; decision_function; exit_code=0; exit_function; else if [ "${statearea[$a]}" -eq 1 ] then continue else $LOGGER "JUDGE: Invalid channel state" & exit_code=1; exit_function; fi fi done } if [ -e $JUDGE_PID ] then $WATCHDOG "tofoin_judge" "$JUDGE_PID" & exit 0; else echo `date +%s` $$ > $JUDGE_PID; if ipfw list | grep nat | egrep -q $EXT_0_IF; then actualchan=0; elif ipfw list | grep nat | egrep -q $EXT_1_IF; then actualchan=1; else $LOGGER "JUDGE: NAT error" & prefchan=0; switch_function; exit_code=1; exit_function; fi createarea_function; findarea_function; $LOGGER "JUDGE: All channels down" & exit_code=1; exit_function; fi

The judge module leaves room for further improvement, but in general, no frills.

tofoin_logger.sh

 #!/usr/local/bin/bash # by LordNicky v0.5 20140713 . /root/ToFoIn/tofoin.conf exit_function () { rm $LOGGER_PID; exit $exit_code; } main_function () { if [[ `tail -n 1 $LOGFILE | grep -o "$1" | grep -o "JUDGE: No problems detected"` = "JUDGE: No problems detected" ]]; then exit_code=0; exit_function; else if [[ `cat $LOGTMP` = $1 ]]; then meter=`cat $LOGMETER`; if [ "$meter" -ge "$LOGFREQ2" ]; then echo -e "0" > $LOGMETER; echo -e "`date -j +%Y%m%d%H%M` last message repeat $LOGFREQ2 times" >> $LOGFILE; exit_code=0; exit_function; elif [ "$meter" -ge "$LOGFREQ1" ]; then if [[ `tail -n 1 $LOGFILE | grep -o "last message repeat $LOGFREQ1 times"` = "last message repeat $LOGFREQ1 times" ]]; then expr $meter + 1 > $LOGMETER; exit_code=0; exit_function; elif [[ `tail -n 1 $LOGFILE | grep -o "last message repeat $LOGFREQ2 times"` = "last message repeat $LOGFREQ2 times" ]]; then expr $meter + 1 > $LOGMETER; exit_code=0; exit_function; else echo -e "`date -j +%Y%m%d%H%M` last message repeat $LOGFREQ1 times" >> $LOGFILE; exit_code=0; exit_function; fi elif [ "$meter" -ge 0 ]; then expr $meter + 1 > $LOGMETER; exit_code=0; exit_function; else echo -e "0" > $LOGMETER; echo -e "`date -j +%Y%m%d%H%M` LOGGER: logmeter index error, write 0" >> $LOGFILE; exit_code=1; exit_function; fi else if [ `cat $LOGMETER` -eq 0 ]; then echo -e "$1" > $LOGTMP; echo -e "`date -j +%Y%m%d%H%M` $1" >> $LOGFILE; exit_code=0; exit_function; else echo -e "0" > $LOGMETER; echo -e "$1" > $LOGTMP; echo -e "`date -j +%Y%m%d%H%M` $1 ; LOGMETER now zero" >> $LOGFILE; exit_code=0; exit_function; fi fi fi } if [ -e $LOGGER_PID ]; then sleep $((RANDOM%5+1)); if [ -e $LOGGER_PID ]; then $WATCHDOG "tofoin_logger" "$LOGGER_PID" & exit 0; else echo `date +%s` $$ > $LOGGER_PID; main_function "$1"; fi else echo `date +%s` $$ > $LOGGER_PID; main_function "$1"; fi

The most, in my opinion, scary module in terms of perception is a logger. But, unfortunately, it was not easier to write. Basically, most of the script is devoted to preventing the appearance of duplicate messages, hence the apparent complexity.

tofoin_watchdog.sh

 #!/usr/local/bin/bash # by LordNicky v0.5 20140713 . /root/ToFoIn/tofoin.conf exit_function () { rm $WATCHDOG_PID; exit $exit_code; } kill_function () { if [[ "`ps -o command -p $proc_pid | grep -o "$proc_name"`" = "$proc_name" ]]; then $LOGGER "WATCHDOG: Other $proc_s_name working during $diff, kill him" & kill $proc_pid; else $LOGGER "WATCHDOG: None or other process on $proc_s_name pid, cleaning pid file" & fi if [[ "$proc_name" = "tofoin_watchdog" ]]; then main_function; else rm $proc_pid_file; fi } main_function () { echo `date +%s` $$ > $WATCHDOG_PID; proc_name=${one:-all}; return_wait=10 if [[ "$proc_name" = "all" ]]; b=0; c=0 then for ((a=0; a < CNUMBER ; a++)) do current_time=`date +%s`; tester_result=$TESTER_RESULT/result_$a; tester_time=`cut -c 1-10 $tester_result`; diff=`expr $current_time - $tester_time`; if [ "$diff" -ge 0 ] then if [ "$diff" -lt "`expr $TESTERPERIOD + 120`" ]; then :; else proc_name=tofoin_daemon; proc_pid=`cat $DAEMON_PID`; if [[ "`ps -o command -p $proc_pid | grep -o "$proc_name"`" = "$proc_name" ]]; then $LOGGER "WATCHDOG: Restart daemon" & kill $proc_pid; $DAEMON & else $LOGGER "WATCHDOG: None daemon process, start" & $DAEMON & fi exit_code=0; exit_function; fi else $LOGGER "WATCHDOG: Check date" & fi done elif [[ "$proc_name" = "tofoin_test" ]]; then proc_pid_file=$two; cnumber=$three; test_function; return_val=$?; if [[ "$return_val" = "$return_wait" ]]; then sleep $TESTERMAXDELAY; test_function "nowait"; else :; fi elif [[ "$proc_name" = "tofoin_judge" ]]; then proc_pid_file=$JUDGE_PID; judge_function; return_val=$?; if [[ "$return_val" = "$return_wait" ]]; then sleep $JUDGEMAXDELAY; judge_function "nowait"; else :; fi elif [[ "$proc_name" = "tofoin_logger" ]]; then proc_pid_file=$LOGGER_PID; logger_function; return_val=$?; if [[ "$return_val" = "$return_wait" ]]; then sleep $LOGGERMAXDELAY; logger_function "nowait"; else :; fi else $LOGGER "WATCHDOG: Incorrect process name"; fi exit_code=0; exit_function; } test_function () { if [ -e $proc_pid_file ]; then proc_pid=`cut -c 12-18 $proc_pid_file`; proc_s_name="tester $cnumber"; start_time=`cut -c 1-10 $proc_pid_file`; current_time=`date +%s`; diff=`expr $current_time - $start_time`; if [ "$diff" -ge 0 ]; then if [ "$diff" -lt "$TESTERMAXDELAY" ]; then if [[ "$1" = "nowait" ]]; then if [ "$proc_pid" = "$proc_temp_pid" ]; then kill_function; return 0; else $LOGGER "WATCHDOG: Pid of $proc_s_name was changed, exit" & fi else $LOGGER "WATCHDOG: $proc_s_name now working, try wait" & proc_temp_pid=$proc_pid; return $return_wait; fi else kill_function; return 0; fi else $LOGGER "WATCHDOG: Time error in $proc_s_name = $diff" & kill_function; return 0; fi else return 0; fi } judge_function () { if [ -e $proc_pid_file ]; then proc_pid=`cut -c 12-18 $proc_pid_file`; proc_s_name="judge"; start_time=`cut -c 1-10 $proc_pid_file`; current_time=`date +%s`; diff=`expr $current_time - $start_time`; if [ "$diff" -ge 0 ]; then if [ "$diff" -lt "$JUDGEMAXDELAY" ]; then if [[ "$1" = "nowait" ]]; then if [ "$proc_pid" = "$proc_temp_pid" ]; then kill_function; return 0; else $LOGGER "WATCHDOG: Pid of $proc_s_name was changed, exit" & fi else $LOGGER "WATCHDOG: $proc_s_name now working, try wait" & proc_temp_pid=$proc_pid; return $return_wait; fi else kill_function; return 0; fi else $LOGGER "WATCHDOG: Time error in $proc_s_name = $diff" & kill_function; return 0; fi else return 0; fi } logger_function () { if [ -e $proc_pid_file ]; then proc_pid=`cut -c 12-18 $proc_pid_file`; proc_s_name="logger"; start_time=`cut -c 1-10 $proc_pid_file`; current_time=`date +%s`; diff=`expr $current_time - $start_time`; if [ "$diff" -ge 0 ]; then if [ "$diff" -lt "$LOGGERMAXDELAY" ]; then if [[ "$1" = "nowait" ]]; then if [ "$proc_pid" = "$proc_temp_pid" ]; then kill_function; return 0; else echo -e "`date -j +%Y%m%d%H%M` WATCHDOG: Pid of $proc_s_name was changed, exit" >> $LOGFILE; fi else echo -e "`date -j +%Y%m%d%H%M` WATCHDOG: $proc_s_name now working, try wait" >> $LOGFILE; proc_temp_pid=$proc_pid; return $return_wait; fi else kill_function; return 0; fi else echo -e "`date -j +%Y%m%d%H%M` WATCHDOG: Time error in $proc_s_name = $diff" >> $LOGFILE; kill_function; return 0; fi else return 0; fi } one=$1; two=$2; three=$3; if [ -e $WATCHDOG_PID ]; then proc_pid=`cut -c 12-18 $WATCHDOG_PID`; proc_name="tofoin_watchdog"; proc_s_name="watchdog"; start_time=`cut -c 1-10 $WATCHDOG_PID`; current_time=`date +%s`; diff=`expr $current_time - $start_time`; if [ "$diff" -ge 0 ]; then if [ "$diff" -lt "`expr $TESTERMAXDELAY + $JUDGEMAXDELAY + $LOGGERMAXDELAY + 30`" ]; then $LOGGER "WATCHDOG: Other $proc_s_name already working, exit" & exit 0; else kill_function; fi else $LOGGER "WATCHDOG: Time error in $proc_s_name = $diff" & kill_function; fi else main_function; fi

Watchdog is the biggest and, perhaps, ambiguous script of all presented. It turned out like this, because an attempt was made to provide all possible options for failures. But so far so. Since the launch of this module is supposed to be done using cron, something like this should be added to / etc / crontab:

 0 * * * * root /path/to/file/tofoin_watchdog.sh

Total

The script has been tested for six months. Moreover, no critical errors were found, minor ones were fixed. All modules work according to a given algorithm without deviations and unpredictable actions. The event log file is quite informative and allows you to judge the problems encountered and the time of their occurrence and resolution. Thus, it can be concluded that the initial goal has been achieved, and further development plans are set out below.

Plans

Plans for the further development of the script:

Place files in the appropriate system directories;
Consider the need for a special user to run using sudo for certain tasks. In the case of a positive decision to adapt the script;
Add a communication module with zabbix;
Make a client-server system. It was for this system that vlan3 and vlan4 were configured, since it is supposed in the case of lack of communication between the “routers” on the internal channel, to try to communicate on the vlans configured on external interfaces;
It is possible, in the far bright future, to rewrite the entire script in a language with more options. At the moment there is a desire to squeeze everything that is possible from bash.

Questions

Of course, when writing, and especially after, many questions arose. The most important of them is:
There are the following variables:

 a =<  > HI_1=”123” HI_2=”321”

It is necessary to call the variables HI_1 and HI_2, changing only a, i.e. The call will look something like this:

 ${HI_$a} ##

And, if we set a = 1 in advance, this expression would mean 123, and if a = 2, then 321. I studied the bash literature, which, in my opinion, should answer this question, but, to Unfortunately, I did not find how to do it. Using this function would greatly simplify the script and make it easy to expand.

Otherwise, of course, questions of a general nature - how relevant is this decision? What mistakes made in the script? What is the best way to resolve the issues identified in the plans and in the text of the article? Your comments?

If there is a desire to help in the improvement, then write to private messages, discuss possible cooperation.

Links

DNS BIND 9:
., . — DNS BIND (5- )
DNS BIND
DHCP:
Failover DHCP
DHS + DHCP:
DDNS+DHCP
SETFIB:
Multiple default routes in FreeBSD without BGP or similar
setfib
FreeBSD . setfib
IPFW + NAT:
ipfw nat
FreeBSD 9 + ipfw + ipfw nat
ipfw nat
DUMMYNET
Kernel NAT
SSH:
SSH
SSH
SSH ( )
BASH:
BASH. Part 2.
Advanced Bash-Scripting Guide

Also, when setting up the system and writing the script, many other materials were used with opennet.ru, lissyara.su, habrahabr.ru and many other sites. Unfortunately, many links have been lost over time, so if you find fragments from somewhere else, I’ll be happy to add links to them. Special thanks to Alexei Eresko and Valery Drub for advice and assistance in solving the difficulties in preparing and writing the script, and Oleg Matusevich for help in preparing the article.

ZY When using the materials of this article, be sure to specify a link to the source and author.

Source: https://habr.com/ru/post/241654/

All Articles

ToFoIn - Toggle Failover of Internet or switching between two external channels in FreeBSD

annotation

Introduction

Targets and goals

Decision

Technical part

Equipment

Presetting

Script source

Total

Plans

Questions

More articles: