Stumbled upon a nasty underwater rock. We have a system with several uplinks, and a policy-routing that realizes the balancing of connections between uplinks using:
ip route replace default scope global
nexthop via 11.22.33.1 dev eth0 weight 1
nexthop via 55.66.77.1 dev eth1 weight 1
(Sample instruction
here )
')
The problem is this: connections periodically fall, and there is no system. May stand for several hours, may fall in 5-10 minutes. Any http and torrent it does not interfere. In the first case, the sessions are usually quite short; in the second case, the reconnect passes unnoticed and without consequences. But if we work with ssh?
Explanation for those who do not know how this routing scheme works.
For each connection going through the default gateway, one of the available channels is selected, with a probability corresponding to the weight parameter.
Then a record of the route for this connection is recorded in the cache, and all packets between these ip-addresses further go along this route. If there is no packet for this route for some time, the entry from the cache is deleted. By default, this takes about five minutes. There is already at least one problem in this - if your connection does not transmit any data for a long time, the record from the route cache will be wiped out, although it may still not be from the connection cache. By default, the nf_conntrack module has a very large lifetime for tcp connections. What will happen? The next time a packet passes through this connection, which is still considered to be unbroken, a new route will be selected, as if it had been re-established. If lucky - the same as it was. Then everything will continue to work. But if the other - nothing will go anywhere.
But in practice this is a small problem, there are quite a few situations in which a connection is idle for so much time, and even if it is, say, an ssh session, then you can include keep-alive packets in it. As in most other practical cases. In theory, this situation is possible even with ftp, but I have not used it for a long time, and I do not advise you. And most ftp-clients also know how to keep-alive.
Worse is different - in this scheme even sessions with a continuous stream of data fell. And it turned out to be incomprehensible.
A simple workaround, made in a hurry, turned out to be in prescribing static routes for the most needed remote hosts, so that the route to them lay strictly through one interface. But ugly. Non-universally and breaks the idea of connection-failover.
The fact that it helped gave me the idea that the problem was somewhere in the routing. Three-hour excavations revealed the following:
in the kernels up to 2.6.35 (and there are a lot of systems on them) in the routing settings there is the
net.ipv4.route.secret_interval parameter, in seconds, by default 600. It is responsible for
forcing the route cache to avoid it overflow. In the future, it was decided to abandon it -
https://github.com/torvalds/linux/commit/3ee943728fff536edaf8f59faa58aaa1aa7366e3Thus, once every 10 minutes, your cache is reset, and routes are selected again. And not always the way we would like.
Therefore, for stable policy-routing on systems with multiple uplinks, I recommend setting this parameter to 0.
sysctl -w net.ipv4.route.secret_interval=0
You can certainly put the kernel patch, eliminating this behavior entirely, but this is not a solution for everyone.
It is quite safe to disable this reset, since in later kernels no additional mechanism was introduced to protect the cache from overflow, the existing ones were considered to be fairly reliable.