Thanks to Habra, I found a lot of useful things for myself. I think it's time to "pay off debts."
I want to describe an algorithm that has been working for more than a year on my gateway for balancing channels (Gbit traffic, 8k clients, 2 providers, AS to 1k addresses, most clients are for NAT). Perhaps someone will come in handy. In any case, I did not see anything like that, and when I specifically looked for it, I did not find it. So completely my brainchild.
All that came across on the Internet, allowed to reserve one of the channels. And outgoing to regulate - there are many descriptions. But to regulate the incoming traffic (i.e., to ensure a uniform load of several channels) did not come across.
Of course, this algorithm can not be considered universal, suitable only in suitable conditions.
So, the original :
- Gateway to Linux (Debian 6). Used package quagga (former zebra).
- Two providers (let it be TTK and RTK). Each gives a channel of a certain thickness, the "extra" cuts.
- AS to 1k addresses (let it be 1.1.144.0/22). AS0000.
- Most clients have gray addresses (let it be 192.168.0.0/16), “client” networks 192.168.1-99.0 / 24, they will get tense on the gateway.
- A small part of clients have white addresses in the space of my AS.
Task :
Ensure uniform loading of TTK and RTK channels with incoming traffic to avoid channel overload.
')
Simplify
I will not talk here about the settings of the shaper. We assume that it is already configured and working. At the same time, the common channel is shaped, excluding TTC / RTK.
We will not balance outgoing. In most cases, this is not relevant (the outgoing is much less), and it is solved quite simply.
Theory
1. BGP allows you to control probability (preference). Those. Specify the preferred inbound route for a specific network. An artificial “lengthening” of the route is used for this — when announcing your route to a neighbor, you can repeat the number of your AS several times. In this case, each neighbor can "lengthen" the route in different ways. A shorter route is preferred.
2. BGP allows you to make descriptions of individual parts of AS. Those. in the RIPE our AS is described as 1.1.144.0/22, no one bothers to describe (i.e., announce) 1.1.144.0/24, 1.1.145.0/24, 1.1.146.0/24 and 1.1.147.0/24 in BGP.
Council - do not remove the announcement of the whole AS (1.1.144.0/22). Some gateways do not accept routes with mask 24. It is better to slightly lengthen the general route.
3. I recall the route selection algorithm for BGP routing from several available.
- A route with a larger mask is selected. If not selected, then next.
- A shorter route is selected (less than intermediate AS). If not selected, then next.
- The route announced earlier is selected (it is considered more reliable). If not selected, then next.
- Unambiguous pseudo-random selection.
A drop of tar.
Unfortunately, “preference management” does not mean “probability management” at all. In fact, it turns out that almost all traffic begins to follow the preferred route. Those. using lengthening routes,
smoothly adjust the flow will not work. It is rather a “switch” than a “regulator”.
Idea
Most of our customers have gray IP. Accordingly, on our gateway natit. And this is the main traffic. Well, how to natit them (i.e. what external IP to set) is in our power.
All our AS can be divided into 4 parts (1.1.144.0/24, 1.1.145.0/24, 1.1.146.0/24 and 1.1.147.0/24) and use them in different ways. For example, the first two are for customers with “white” IP, the third is RTK preference and the fourth is TTC preference. That is what I have done.
At the iptables level, decide which addresses to use for NAT.
If the client’s address is 192.168.1-
N .0 / 24, then for NAT use 1.1.146.0/24. Otherwise - 1.1.147.0/24.
Thus, by changing N, you can
smoothly balance the incoming traffic of the two channels.
Implementation .
Please consider this implementation just as an example. Not everything is optimal here, it is more convenient for someone to do it on a pearl / python, it is more convenient for someone to organize it as a single demon. The main goal of this example is to show the possibility of implementing an idea. Well, its performance.
1. To verify that the client IP belongs to 192.168.1-
N .0 / 24, in iptables, use the ipset module, in the rules, further the “rtk” set.
Chain "NAT_AS", which I have natit:
: NAT_AS - [0: 0]
# For old connections, so as not to "throw" them from one external to another
-A NAT_AS -m state -s 192.168.0.0/16 --state ESTABLISHED, RELATED -j SNAT --to-source 91.235.146.0-91.235.147.255 --persistent
# For new compounds, choose how to natit.
# RTK
-A NAT_AS -m state -m set --set rtk src --state NEW -j SNAT --to-source 1.1.146.0-1.1.146.255 --persistent
# TTK
-A NAT_AS -m state -m set! --set rtk src --state NEW -j SNAT --to-source 1.1.147.0-1.1.147.255 --persistent
Note that -j SNAT is used with the --persistent option. This is for the client to use a permanent external IP. Without this, the client may have problems on many services on the Internet.
Well, somewhere in NAT / POSTROUTING
# eth1, eth3 - interfaces that “look” on TTC and RTK
-A POSTROUTING -s 192.168.0.0/16 -o eth1 -j NAT_AS
-A POSTROUTING -s 192.168.0.0/16 -o eth3 -j NAT_AS
2. Prepare a set of ipset "rtk"
The file in which I store all the parameters (used in several scripts)
cat param_rtk_set:
# Client subnets to be managed
export rtk_start = 1
export rtk_min = 1
export rtk_max = 99
# Maximum traffic (what the provider gave, or rather what can be obtained from it without loss). It is selected.
# RTK
export shp_rtk_max = 547
# TTK
export shp_ttk_max = 535
# Relative regulation accuracy
export scale = 50
# file where the current value of N is stored. Better somewhere on tmpfs.
export f_set_end = / lib / init / rw / rtk_set_end
Directly create and update the rtk kit. You can (and should) run regularly.
cat create_rtk_set
#! / bin / sh
# If does not exist, then rtk set is created
/ usr / sbin / ipset -N rtk nethash -q
# Temporary dialing. To "hot" not to change the working set
/ usr / sbin / ipset -N temp_rtk nethash -q
/ usr / sbin / ipset -F temp_rtk -q
. ./param_rtk_set
# In some versions of sh, the variable must be declared before the loop or condition. To then use.
rtk_set_end = 0
# If this is the first run (there is no old N value, then we will create an average of min and max. And save.
if [-f $ f_set_end]; then
read rtk_set_end <$ f_set_end
else
rtk_set_end = $ (($ rtk_min + $ rtk_max))
rtk_set_end = $ (($ rtk_set_end / 2))
echo $ rtk_set_end> $ f_set_end
fi
# fill in the temporary set
net = $ rtk_start
while [$ net -lt $ rtk_set_end]; do
/ usr / sbin / ipset -A temp_rtk 192.168. $ {net} .0 / 24 -q
net = $ (($ net + 1))
done
# copy the temporary set to work
/ usr / sbin / ipset -W temp_rtk rtk
# remove the temporary set
/ usr / sbin / ipset -X temp_rtk -q
OK, there is a “control action”. Those. changing N (my value is stored in / lib / init / rw / rtk_set_end), you can smoothly change the ratios of incoming traffic of the TTC and RTK. Now it remains to configure the automation.
Automation .
cat rtk-ttk:
# Get the current values ​​of the counters on the interfaces
ttk = $ (/ sbin / ifconfig eth1 | grep -Eo "RX bytes: [0-9] *" | grep -Eo "[0-9] *")
if ["$ ttk" = ""]; then
echo "No TTK ifconfig"
exit
fi
rtk = $ (/ sbin / ifconfig eth3 | grep -Eo "RX bytes: [0-9] *" | grep -Eo "[0-9] *")
if ["$ rtk" = ""]; then
echo "No RTK ifconfig"
exit
fi
# Directory where we will store past values
work_dir = "/ lib / init / rw /"
# Find the difference, save the current value, if the current is less than the past, then the output.
read ttk_old <$ {work_dir} shp_ttk_old
read rtk_old <$ {work_dir} shp_rtk_old
echo $ ttk> $ {work_dir} shp_ttk_old
echo $ rtk> $ {work_dir} shp_rtk_old
if [$ ttk -le $ ttk_old]; then
echo "TTK RX smoll"
exit
fi
if [$ rtk -le $ rtk_old]; then
echo "TTK RX smoll"
exit
fi
ttk_cur = $ (($ ttk - $ ttk_old + 1))
rtk_cur = $ (($ rtk - $ rtk_old + 1))
# read parameters
. ./param_rtk_set
# Maximum change in N for one iteration. It is selected.
max_delta = 5
# Find the deviation.
p = $ (echo "scale = 10; $ scale * ($ shp_rtk_max / $ shp_ttk_max) / ($ rtk_cur / $ ttk_cur) + 100.5" | / usr / bin / bc)
p = $ {p %%. *}
p = $ (($ p - 100))
# Increase N
n_for = 1
while [$ scale -lt $ p]; do
#echo "$ p add"
p = $ (($ p - 1))
n_for = $ (($ n_for + 1))
if [$ max_delta -lt $ n_for]; then
break
fi
./add_rtk
done
# Decrease N
n_for = 1
while [$ p -lt $ scale]; do
#echo "$ p del"
p = $ ((1 + $ p))
n_for = $ (($ n_for + 1))
if [$ max_delta -lt $ n_for]; then
break
fi
./del_rtk
done
# apply new N value
./create_rtk_set
It remains to make scripts add_rtk to increase N and del_rtk to reduce. These scripts should read the current N from / lib / init / rw / rtk_set_end, decrease / increase, check the occurrence in the interval [min - max] and save. I will not give them, it's easy.
BGP setup .
In order for all of this to be able to control incoming traffic, BGP must be prepared.
An example of my bgp.conf (of course, the real IP and numbers are changed under the original data:
!
hostname AS0000
password ****
enable password ****
log file /var/log/quagga/bgpd.log
!
router bgp 0000
no synchronization
bgp router-id [our any external IP]
network 1.1.144.0/22
network 1.1.144.0/24
network 1.1.145.0/24
network 1.1.146.0/24
network 1.1.147.0/24
!
neighbor [RTC gateway IP] remote-as [AS RTC (only a number, for example 12345)]
neighbor [RTC Gateway IP] update-source [our external IP RTC]
neighbor [RTC gateway IP] route-map MY-OUT-RTK out
neighbor [RTC gateway IP] route-map INTER_NET in
!
neighbor [TTK gateway IP] remote-as [AS TTK (only a number, for example 12345)]
neighbor [TTK gateway IP] update-source [our external IP TTK]
neighbor [TTK gateway IP] route-map MY-OUT-TTK out
neighbor [TTK gateway IP] route-map INTER_NET in
!
ip prefix-list upstream-out seq 10 permit 1.1.144.0/22
!
ip prefix-list up144 seq 10 permit 1.1.144.0/24
!
ip prefix-list up145 seq 10 permit 1.1.145.0/24
!
ip prefix-list up146 seq 10 permit 1.1.146.0/24
!
ip prefix-list up147 seq 10 permit 1.1.147.0/24
!
! =========================
! --- MY-OUT-TTK
route-map MY-OUT-TTK permit 10
match ip address prefix-list up144
! set as-path prepend 0000 0000
!
route-map MY-OUT-TTK permit 20
match ip address prefix-list up145
! set as-path prepend 0000 0000
!
route-map MY-OUT-TTK permit 30
match ip address prefix-list up146
set as-path prepend 0000
!
route-map MY-OUT-TTK permit 40
! match ip address prefix-list up147
! set as-path prepend 0000 0000
!
! route-map MY-OUT-TTK deny 200
! =========================
!
route-map MY-OUT-TTK permit 100
match ip address prefix-list upstream-out
! set as-path prepend 0000 0000 0000
!
route-map MY-OUT-TTK deny 200
!
! --- the end of MY-OUT-TTK
! =========================
! --- MY-OUT-RTK
route-map MY-OUT-RTK permit 10
match ip address prefix-list up144
set as-path prepend 0000
!
route-map MY-OUT-RTK permit 20
match ip address prefix-list up145
set as-path prepend 0000
!
route-map MY-OUT-RTK permit 30
match ip address prefix-list up146
! set as-path prepend 0000 0000 0000
!
route-map MY-OUT-RTK permit 40
match ip address prefix-list up147
set as-path prepend 0000
!
! =========================
!
route-map MY-OUT-RTK permit 100
match ip address prefix-list upstream-out
set as-path prepend 0000 0000
!
route-map MY-OUT-RTK deny 200
!
! --- the end of MY-OUT-RTK
! =========================
! ---- Local nets
ip prefix-list local_ seq 15 permit 192.168.0.0/16
ip prefix-list local_ seq 18 permit 0.0.0.0/8
ip prefix-list local_ seq 19 permit 127.0.0.0/8
ip prefix-list local_ seq 20 permit 10.0.0.0/8
ip prefix-list local_ seq 21 permit 172.16.0.0/12
ip prefix-list local_ seq 22 permit 169.254.0.0/16
ip prefix-list local_ seq 23 permit 224.0.0.0/4
ip prefix-list local_ seq 24 permit 240.0.0.0/4
! Here you need to add your "white" network
!
route-map INTER_NET deny 10
match ip address prefix-list local_
!
!
route-map INTER_NET permit 200
set local-preference 500
!
line vty
!
Let me remind you that here "0000" is the number of my AS, "1.1." Is the beginning of my IP. Everything else is signed.
Tuning, tuning .
First, you need to configure all the “set as-path prepend” in bgp.conf.
The task is to get everything outgoing 1.1.146.0/24 (this is conditional, of course), then to get a big bias towards the RTK.
And vice versa, if everything goes outgoing 1.1.147.0/24, then get a strong bias towards the TTK.
Secondly, it is necessary to specify the maximum-available traffic from the TTC and RTK in the param_rtk_set file (shp_rtk_max and shp_ttk_max). I do not recommend to specify the value "from the contract." Indicate the maximum you have ever received. Keep in mind that this is where you specify the desired ratio of incoming traffic.
Third.
Specify the desired control accuracy (“scale” value in the parameters). The larger, the smaller the “dead zone”, the stronger the control action (the more N changes). Too much value may cause a “beating” (resonance).
Fourth.
Specify the maximum N change in one iteration. Too much of a value can cause beats, i.e. resonance. This happens because the response to the control action does not appear immediately, but with some delay. Well, too little value will not keep up with the realities of life.
Well that's all. It remains to do the task in cron, to perform rtk-ttk every minute. Or make some kind of daemon that rtk-ttk will run periodically.
I will add that this algorithm has been working for me for more than a year. Sometimes you have to intervene - correct BGP settings (set as-path prepend). Something on the Internet is changing, you have to react.
I will accept any comments or advice, I will answer in the comments.