Large traffic flows and Linux: interrupts, router and NAT server

Written in the wake of the publication of large traffic flows and interrupt management in Linux

There are more than 30 thousand subscribers in our city network. The total volume of external channels is more than 3 gigabits. And the tips given in this article, we passed a few years ago. Thus, I want to broader the topic and share with the readers my best practices in the framework of the subject matter.

The note describes the nuances of configuring / tuning a router and a NAT server running Linux, as well as some clarifications regarding the distribution of interrupts.
')

Interruptions

Dropping network card interrupts across different cores is the very first thing a sysadmin encounters as the load on a linux router increases. In the mentioned article, the topic is covered in sufficient detail - therefore, we will not dwell on this issue for a long time.

I just want to note:

if you manually scatter interrupts, then you need to stop the irqbalance service. This service is intended just for automatic regulation of interrupts between the processor cores. If you do this work manually - it is better to stop the service;
Do not forget to make the appropriate changes to the "autoload" (for example, /etc/rc.local) - because after restarting the server, all interrupts are again distributed on one core;
after restarting the server, network cards can receive (and most likely, that will be so) new interrupt numbers. Therefore, in /etc/rc.local, it is better not to enter specific interrupt numbers with your hands — but to automate the recognition of what kind of network interruption with an auxiliary script.

Router

In the original article there is a phrase “if the server works only as a router, then the tuning of the TCP stack does not really matter”. This statement is fundamentally wrong. Of course, tuning does not play a big role on small streams. However, if you have a large network and the corresponding load, then you will have to deal with tuning the network stack.

First of all, if gigabits are “walking” through your network, then it makes sense to turn your attention to the MTU on your servers and switches. In a nutshell, MTU is the volume of a packet that can be transmitted over a network without resorting to its fragmentation. Those. how much information your router can transfer to another without fragmentation. With a significant increase in the amount of data transmitted over the network, it is much more efficient to transmit larger packets less often - than often and often send small data packets.

Increasing MTU by linux

/sbin/ifconfig eth0 mtu 9000

Increasing MTU on switches

On switching equipment, this will usually be called jumbo-frame. In particular, for Cisco Catalyst 3750

3750(config)# system mtu jumbo 9000
3750(config)# exit
3750# reload

Note: the switch then needs to be rebooted. By the way, mtu jumbo concern only gigabit links, - such command does not affect 100-Mbit.

Increasing transfer queue on linux

/sbin/ifconfig eth0 txqueuelen 10000

The default value is 1000. For gigabit links, it is recommended to set 10,000. In a nutshell, this is the size of the transmission buffer. When the buffer fills up to this limit value, the data is transferred to the network.

Keep in mind that if you change the size of the MTU on the interface of a piece of hardware, you must do the same on the interface of its “neighbor”. That is, if you increased the MTU to 9000 on the linux router interface, then you must enable jumbo-frame on the switch port in which this router is included. Otherwise, the network will work, but very badly: the packets will go through the network “through one”.

Results

As a result of all these changes, the network will increase "pings" - but the total bandwidth will increase markedly, and the load on the active equipment will decrease.

NAT server

Operation NAT (Network Address Translation) is one of the most expensive (in the sense of, resource-intensive). Therefore, if you have a large network, you cannot do without tuning of the NAT server.

Increasing the number of monitored connections

To accomplish its task, the NAT server needs to “remember” about all connections that go through it. Whether it is “ping” or someone’s “ICQ” - the NAT server “remembers” all these sessions and keeps track of it in its memory in a special table. When a session is closed, its information from the table is deleted. The size of this table is fixed. That is why if there is a lot of traffic through the server and the size of the table is not enough, then the NAT server starts dropping packets, tearing up sessions, the Internet starts working with terrible interruptions, and the NAT server itself can even get through SSH .

To avoid such horrors, it is necessary to increase the size of the table adequately - in accordance with the traffic passing through NAT:

/sbin/sysctl -w net.netfilter.nf_conntrack_max=524288

It is strongly NOT recommended to set such great value if you have less than 1 gigabyte of RAM on your NAT server.

You can see the current value like this:

/sbin/sysctl net.netfilter.nf_conntrack_max

You can see how complete the connection tracking table is, like this:

/sbin/sysctl net.netfilter.nf_conntrack_count

Increasing the size of the hash-table

The hash table, in which the lists of conntrack records are stored, should also be proportionally increased.

echo 65536 > /sys/module/nf_conntrack/parameters/hashsize

The rule is simple: hashsize = nf_conntrack_max / 8

Decrease time-out values

As you remember, the NAT server only tracks live sessions that pass through it. When a session is closed, information about it is deleted so that the table does not overflow. Session information is also deleted on timeout. That is, if there is no traffic within a connection for a long time, it is closed and information about it is also deleted from the NAT memory.

However, the default timeout values are quite large. Therefore, with large traffic flows, even if you stretch nf_conntrack_max to the limit, you still run the risk of quickly encountering a table overflow and a connection break.

To prevent this from happening, you need to correctly set connection timeouts on the NAT server.

Current values can be viewed, for example, like this:

sysctl -a | grep conntrack | grep timeout

As a result, you will see something like this:

net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_events_retry_timeout = 15

These are timeout values in seconds. As you can see, net.netfilter.nf_conntrack_generic_timeout is 600 (10 minutes). Those. The NAT server will keep the session information in memory as long as it “runs” over it at least once every 10 minutes.

At first glance, nothing terrible - but in fact it is very, very much.

If you look at net.netfilter.nf_conntrack_tcp_timeout_established, you will see the value 432000 there. In other words, your NAT server will monitor a simple TCP session until a packet runs through it at least once every 5 days ( !)

To put it in an even simpler language, such a NAT server becomes simpler for a DDOS'it: its NAT table (nf_conntrack_max parameter) overflows with a simple flood - as a result, it will break connections and in the worst case quickly turn into a black hole .

The values of time-outs are recommended to be set within 30-120 seconds. This is quite enough for the normal operation of subscribers, and this is quite enough for the timely cleaning of the NAT-table, excluding its overflow.

And do not forget to enter the appropriate changes in /etc/rc.local and /etc/sysctl.conf

Results

After tuning you will get quite viable and productive NAT-server. Of course, this is only “basic” tuning - we didn’t touch, for example, core tuning, etc. of things. However, in most cases, even such simple actions will be sufficient for a large enough network to work normally. As I have already said, there are more than 30 thousand subscribers in our network whose traffic is processed by 4 NAT servers.

In the following releases:

large flows and high-performance shaper;
large streams and high-performance firewall.

Source: https://habr.com/ru/post/108763/

All Articles

Large traffic flows and Linux: interrupts, router and NAT server

Interruptions

Router

Increasing MTU by linux

Increasing MTU on switches

Increasing transfer queue on linux

Results

NAT server

Increasing the number of monitored connections

Increasing the size of the hash-table

Decrease time-out values

Results

More articles: