How to patch K̶D̶E̶ TCP stack under FreeBSD

When it comes to choosing between proprietary and open source software, the following argument is often in favor of the latter: if necessary, you can always take the source code and fix it for yourself, or correct the error right now, rather than waiting for months of reaction from the vendor. In fact, this argument is quite speculative - well, really, who in their right mind will take up the optimization of the sql-scheduler, when it is easier to fix the sql query itself. As well as hardly anyone starts looking for and correcting the problem in the driver, when it is faster and easier to simply change the piece of hardware. Bug-report unsubscribe and then not everyone will take ... However, there are cases when the presence of open source allows you to avoid potential losses in case of unforeseen problems. About one of these I want to tell now.

This Friday evening foretold no problems. There were next weekend ahead, there were no special plans, it was supposed to rest easy;) But the real events turned out to be much more interesting than the supposed ...

The first bell rang on Saturday late in the evening. I was asleep, but I had to get up and, cursing, to go and figure out why one of the important servers came down. There were 3 + 3 pieces in the cluster, and everyone pulls the entire load of their troika, so that the loss of one did not threaten the service. But it was still extremely unpleasant to realize that the servers that had hitherto calmly received the total incoming traffic at 10 + K http-requests per second, and which had (as it seemed) a few times more performance margin, suddenly turned out to be not so stable . Well, while raid1 rebuild and postgres catching up replication, it was time to look at the other servers.
')
It is necessary to explain in advance how this cluster is arranged. Servers are in different places, two in Europe and four in the USA. They are divided into triples, each serving its own IP group (i.e., for each triples, one server in Europe and the other two in the USA). The traffic is distributed by anycast means - all the servers of the three have the same IP and a BGP session with the router is raised. If one server crashes, then its router stops announcing its network on the Internet and the traffic automatically goes to the remaining servers.

There was nothing to look at. According to the monitoring data, just before the fall, there was a strong surge of incoming and outgoing traffic to both European servers (one of them came down), and if the bundle grew just twice, the number of packets per second was already ten times higher, and round trip. Those. the packages were small, and there were a lot of them (under 200k per second). On highload services, traffic simply doesn’t change by itself, but also in such sizes ... Very similar to DDOS, isn't it? Not to say that I was very surprised, I had to see many different types of DDoSs, and so far, if the network equipment of the providers allowed to deliver traffic without loss to the servers, they managed to block everything successfully. It was surprising, however, that the surge in traffic was only on European servers, because if the botnet is distributed, then the traffic should also be distributed to the entire cluster.

After commissioning the server, I launched `top`,` nload` and began to monitor the load. I did not have to wait long, the traffic soon rose again twice and the ssh session began to lag appreciably. There is packet loss, `mtr -ni 0.1 8.8.8.8` immediately confirmed this hypothesis, and` top -SH` indicated that it is in the OS kernel - the processor of incoming network packets does not have enough CPU. Well, now it is clear why the server is frozen - the packet losses to it are similar to death:

FreeBSD has one very unpleasant feature in the network stack - it doesn’t scale well relative to the number of TCP sessions :(. Increasing the number of TCP sessions several times results in a disproportionately higher CPU consumption. While there are few sessions, there are no problems, but Starting with several tens of thousands of active TCP sessions, the handler of incoming packets begins to experience a shortage of CPUs and it has to drop packets, which leads to a chain reaction — because of packet losses, active TCP sessions start to be serviced slowly, their number starts immediately and the CPU shortage and packet loss increases with it.

While the server is not completely frozen, immediately the BGP session is extinguished, and at the same time I launch a packet loss check on the server that has taken over the European traffic. It has a slightly more powerful iron - there are chances that nothing bad will happen in the States. But with a problem server it is necessary to solve it urgently. First of all, turn off keep-alive - TCP-sessions will start to end earlier and in total they will be less. Tuning network card settings took more than a dozen minutes, checking for packet loss each time by briefly raising the BGP session - I had to leave the polling mode, but activating the idlepoll - now one processor core was occupied exclusively by the network card, but the packet loss stopped.

There were still incomprehensible moments - for example, the number of TCP-sessions during the attack and in normal operation mode did not differ much. But what was completely incomprehensible was why this attack was not visible at all on the United States servers! During the shutdown of European servers, only current working traffic came to the state servers, but there was no additional traffic! Although after the return of traffic to Europe for some time he kept at a working level, and then another surge began.

The time was past one o'clock, the loss of packets seems to have stopped, and with these network oddities it is possible to figure out a fresh head. With these thoughts, I went back to sleep, although I was not destined to sleep that night. A couple of hours later I was awakened again - this time both European servers already lay; (. Which added another oddity to the piggy bank - the time was already late, and the traffic peak was long gone. Although, as for a DDOS attack, it is so completely normal, because most of the specialists are asleep and there are very few people involved in the attack). Both servers were soon launched, but the subsequent monitoring of the situation did not give anything - the attack that day was no longer repeated.

On Sunday I had to work a little). A separate script has already monitored the number of TCP sessions and temporarily removed traffic (i.e. transferred it to the States) in case of increased load, which reduced the resulting damage. So far, the state servers have worked without problems, but still it was necessary to deal with this traffic and learn how to block it. There were no anomalies in the http-logs, netstat and similar utilities also did not show anything suspicious. But once we see an increase in traffic on the network card, then you can work with it, and the correct tcpdump will come to the rescue)

Scrolling through tons of dumps of network packets can be tricky, but this time it didn’t take long to find — among the usual HTTP / HTTPS exchanges, anomalously many empty TCP packets were seen, i.e. legal packages with correct IP and TCP headers, but no data. When keep-alive is turned off, there are quite a few empty packets - three empty ones to establish a connection, then two data exchange packets, and then again empty packets close the connections. Well, for HTTPS, we still have data packages for setting up a TLS session. But now intensive dumps of empty packets are regularly visible in the dump:

13:48:20.229921 IP 103.248.114.6.49467 > 88.208.9.69.80: Flags [.], ack 1, win 0, length 0
13:48:20.229925 IP 88.208.9.69.80 > 103.248.114.6.49467: Flags [.], ack 4294966738, win 8400, length 0
13:48:20.229927 IP 103.248.114.6.49467 > 88.208.9.69.80: Flags [.], ack 1, win 0, length 0
13:48:20.229931 IP 88.208.9.69.80 > 103.248.114.6.49467: Flags [.], ack 4294966738, win 8400, length 0
13:48:20.229933 IP 103.248.114.6.49467 > 88.208.9.69.80: Flags [.], ack 1, win 0, length 0
13:48:20.229937 IP 88.208.9.69.80 > 103.248.114.6.49467: Flags [.], ack 4294966738, win 8400, length 0
13:48:20.229939 IP 103.248.114.6.49467 > 88.208.9.69.80: Flags [.], ack 1, win 0, length 0

A sample check (`tcpdump -nc 1000 host 103.248.114.6 and tcp port 49467`) of individual TCP sessions showed that yes, for some sessions, a very intensive exchange of empty TCP packets occurs. And almost all of these sessions were from India! There was a bit of Saudi Arabia and Kuwait. It is difficult to say what kind of a cunning botnet it is, and so far not up to it. I am writing a second simple script that runs tcpdump every 30 seconds for packets and looks for sessions among them in which the number of consecutive exchanges of empty packets exceeds the specified limit, the IPs found are immediately blocked. The result was not long in coming - when blocking only five IP traffic immediately drops twice. Every minute another one or two new IPs were blocked. Victory! ))

On Monday, they discussed this problem with their colleagues, it turned out that everything was not so rosy (. First, the blocking rate of new IPs was increasing - the blocking speed reached several dozen pieces per minute not at the peak of traffic. Second, not only these servers were affected, but there are many others, which is typical, everything is in Europe and everything is on FreeBSD. It became clear that this is not a DDOS attack at all. But what is this? ...

So far, the point is, and blocked IP must be released. Instead of blocking, TCP sessions themselves were dropped now (on FreeBSD, there is the tcpdrop utility for this). It also effectively controlled the load. At the same time, keep-alive can be enabled.

I had to take tcpdump again and look at the traffic further. I will not describe in detail the hours that were spent on the search for anomalies and patterns, the story has already been very long). TCP sessions were different. There were completely empty:

dump1

06:07:58.753852 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [S], seq 3258188889, win 64240, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0
06:07:58.753868 IP 88.208.9.8.80 > 122.167.126.199.56698: Flags [S.], seq 2165986257, ack 3258188890, win 8192, options [mss 1452,nop,wscale 6,sackOK,eol], length 0
06:07:58.906312 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [S], seq 3258188889, win 64240, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0
06:07:58.906327 IP 88.208.9.8.80 > 122.167.126.199.56698: Flags [S.], seq 2165986257, ack 3258188890, win 8192, options [mss 1452,nop,wscale 6,sackOK,eol], length 0
06:07:59.059091 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [S], seq 3258188889, win 64240, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0
06:07:59.059103 IP 88.208.9.8.80 > 122.167.126.199.56698: Flags [S.], seq 2165986257, ack 3258188890, win 8192, options [mss 1452,nop,wscale 6,sackOK,eol], length 0
06:07:59.112677 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [.], ack 1, win 260, length 0
06:07:59.161950 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [.], ack 1, win 260, options [nop,nop,sack 1 {0:1}], length 0
06:07:59.269749 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [.], ack 1, win 260, length 0
06:07:59.313826 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [.], ack 1, win 260, options [nop,nop,sack 1 {0:1}], length 0
06:08:09.313764 IP 88.208.9.8.80 > 122.167.126.199.56698: Flags [.], ack 1, win 136, length 0
06:08:09.569443 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [.], ack 1, win 260, length 0
06:08:09.678113 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [F.], seq 1, ack 1, win 260, length 0
06:08:09.678132 IP 88.208.9.8.80 > 122.167.126.199.56698: Flags [.], ack 2, win 136, length 0
06:08:09.678206 IP 88.208.9.8.80 > 122.167.126.199.56698: Flags [F.], seq 1, ack 2, win 136, length 0
06:08:09.720977 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [.], ack 1, win 260, length 0
06:08:09.872479 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [.], ack 1, win 260, length 0
06:08:09.932997 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [.], ack 2, win 260, length 0
06:08:10.024179 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [.], ack 1, win 260, length 0
06:08:20.023725 IP 88.208.9.8.80 > 122.167.126.199.56698: Flags [.], ack 1, win 8712, length 0
06:08:20.279407 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [.], ack 2, win 0, length 0
06:08:20.279412 IP 88.208.9.8.80 > 122.167.126.199.56698: Flags [.], ack 1, win 8712, length 0
06:08:20.430575 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [.], ack 2, win 0, length 0
06:08:20.430581 IP 88.208.9.8.80 > 122.167.126.199.56698: Flags [.], ack 1, win 8712, length 0
06:08:20.534901 IP 122.167.126.199.56698 > 88.208.9.8.80: Flags [.], ack 2, win 0, length 0
06:08:20.534908 IP 88.208.9.8.80 > 122.167.126.199.56698: Flags [.], ack 1, win 8712, length 0

and there were data exchanges that later went into the cycle of exchanging empty packets:

dump2

06:18:39.046506 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [S], seq 1608423399, win 14600, options [mss 1400,sackOK,TS val 2790685 ecr 0,nop,wscale 6], length 0
06:18:39.046525 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [S.], seq 3258835787, ack 1608423400, win 8192, options [mss 1400,nop,wscale 6,sackOK,TS val 2982841058 ecr 2790685], length 0
06:18:39.228192 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 1, win 229, options [nop,nop,TS val 2790704 ecr 2982841058], length 0
06:18:39.234683 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [P.], seq 1:512, ack 1, win 229, options [nop,nop,TS val 2790704 ecr 2982841058], length 511
06:18:39.235039 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [P.], seq 1:358, ack 512, win 130, options [nop,nop,TS val 2982841246 ecr 2790704], length 357
06:18:39.379057 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 1, win 229, options [nop,nop,TS val 2790704 ecr 2982841058], length 0
06:18:39.385527 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [P.], seq 1:512, ack 1, win 229, options [nop,nop,TS val 2790704 ecr 2982841058], length 511
06:18:39.408290 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 358, win 274, options [nop,nop,TS val 2790722 ecr 2982841246], length 0
06:18:39.408304 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [.], ack 512, win 130, options [nop,nop,TS val 2982841420 ecr 2790722], length 0
06:18:39.408305 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [F.], seq 512, ack 358, win 274, options [nop,nop,TS val 2790722 ecr 2982841246], length 0
06:18:39.408312 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [.], ack 513, win 130, options [nop,nop,TS val 2982841420 ecr 2790722], length 0
06:18:39.408319 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [F.], seq 358, ack 513, win 130, options [nop,nop,TS val 2982841420 ecr 2790722], length 0
06:18:39.536434 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [P.], seq 1:512, ack 1, win 229, options [nop,nop,TS val 2790704 ecr 2982841058], length 511
06:18:39.536442 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [F.], seq 358, ack 513, win 130, options [nop,nop,TS val 2982841548 ecr 2790722], length 0
06:18:39.580158 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 359, win 274, options [nop,nop,TS val 2790739 ecr 2982841420], length 0
06:18:39.580167 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 359, win 274, options [nop,nop,TS val 2790739 ecr 2982841420], length 0
06:18:39.687698 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [P.], seq 1:512, ack 1, win 229, options [nop,nop,TS val 2790704 ecr 2982841058], length 511
06:18:39.688031 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [P.], seq 1:358, ack 512, win 138, options [nop,nop,TS val 2982841058 ecr 2790704], length 357
06:18:39.712200 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 359, win 274, options [nop,nop,TS val 2790752 ecr 2982841420], length 0
06:18:39.712204 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [.], ack 512, win 138, options [nop,nop,TS val 2982841083 ecr 2790704], length 0
06:18:39.882468 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 359, win 274, options [nop,nop,TS val 2790769 ecr 2982841420], length 0
06:18:39.882476 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [.], ack 512, win 138, options [nop,nop,TS val 2982841253 ecr 2790704], length 0
06:18:39.884164 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 359, win 274, options [nop,nop,TS val 2790769 ecr 2982841420], length 0
06:18:39.884170 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [.], ack 512, win 138, options [nop,nop,TS val 2982841255 ecr 2790704], length 0
06:18:39.917773 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [P.], seq 1:358, ack 512, win 138, options [nop,nop,TS val 2982841289 ecr 2790704], length 357
06:18:40.033516 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 359, win 274, options [nop,nop,TS val 2790769 ecr 2982841420], length 0
06:18:40.033525 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [.], ack 512, win 138, options [nop,nop,TS val 2982841404 ecr 2790704], length 0
06:18:40.035244 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 359, win 274, options [nop,nop,TS val 2790769 ecr 2982841420], length 0
06:18:40.035248 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [.], ack 512, win 138, options [nop,nop,TS val 2982841406 ecr 2790704], length 0
06:18:40.082506 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 359, win 274, options [nop,nop,TS val 2790789 ecr 2982841420], length 0
06:18:40.082513 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [.], ack 512, win 138, options [nop,nop,TS val 2982841453 ecr 2790704], length 0
06:18:40.132575 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 359, win 274, options [nop,nop,TS val 2790794 ecr 2982841420], length 0
06:18:40.132583 IP 88.208.9.8.80 > 106.193.154.239.1223: Flags [.], ack 512, win 138, options [nop,nop,TS val 2982841503 ecr 2790704], length 0
06:18:40.142588 IP 106.193.154.239.1223 > 88.208.9.8.80: Flags [.], ack 359, win 274, options [nop,nop,TS val 2790795 ecr 2982841420], length 0

But there was still a clue. Before going into the cycle of exchanging empty packets from the remote side, a FIN came in (the packet with the FIN flag signals that there will be no more data and the session should be closed), sometimes not one, but the RST packet (the packet with the RST flag indicates that the session already closed and no longer valid). What is interesting, despite the presence of FIN and RST packets, then it happened that data packets came to the server. Either somewhere the TCP stack is so crookedly implemented that it is unlikely, or somewhere there is a gross intervention in the TCP session, but this is quite likely (mobile operators especially indulge in this, I won’t point the finger). The second version was also confirmed by the fact that checking on the http-log of malicious TCP-sessions found showed that almost all of them had a mobile browser involved, both Android and iOS.

It was logical to assume that a FIN or RST packet transferred a TCP session to a closed state, in which the TCP stack simply acknowledged the receipt of packets. It was interesting what kind of TCP state.

tcp_fsm.h

 #define TCP_NSTATES 11 #define TCPS_CLOSED 0 /* closed */ #define TCPS_LISTEN 1 /* listening for connection */ #define TCPS_SYN_SENT 2 /* active, have sent syn */ #define TCPS_SYN_RECEIVED 3 /* have sent and received syn */ /* states < TCPS_ESTABLISHED are those where connections not established */ #define TCPS_ESTABLISHED 4 /* established */ #define TCPS_CLOSE_WAIT 5 /* rcvd fin, waiting for close */ /* states > TCPS_CLOSE_WAIT are those where user has closed */ #define TCPS_FIN_WAIT_1 6 /* have closed, sent fin */ #define TCPS_CLOSING 7 /* closed xchd FIN; await FIN ACK */ #define TCPS_LAST_ACK 8 /* had fin and close; await FIN ACK */ /* states > TCPS_CLOSE_WAIT && < TCPS_FIN_WAIT_2 await ACK of FIN */ #define TCPS_FIN_WAIT_2 9 /* have closed, fin is acked */ #define TCPS_TIME_WAIT 10 /* in 2*msl quiet wait after close */

This is how it behaves, and before calling `tcpdrop`, I added a search for a deleted TCP session in the` netstat -an` output. The result was a little discouraging - they were all able to be ESTABLISHED! It was already very similar to a bug - a closed TCP session cannot go back to the ESTABLISHED state, this option is not provided. I immediately started checking the sources and kernels and was discouraged a second time:

 tp->t_state = TCPS_ESTABLISHED

In the code, it is called exactly two times, and both times immediately before this, the current t_state value is checked - in one case it is TCPS_SYN_SENT (the server sent a SYN packet and received confirmation), and in the second it is TCPS_SYN_RECEIVED (the server received a SYN, sent a SYN / ACK and received confirming ACK). The conclusion from this should be quite specific - the FIN and RST packets were ignored by the server, and there is no bug in the TCP stack (at least, a bug with a wrong transition from one state to another).

Nevertheless, it was not clear why the server should respond to each TCP packet received. Usually this is not necessary, and the TCP stack works differently - it accepts several packets, and then sends a confirmation for all at once in one packet - this is more economical). A careful study of the contents of packets, in particular 32-bit TCP-sequence counters and acknowledgment, helped shed light on the situation. The default behavior of tcpdump is to show the difference of seq / ack between packages instead of absolute values in this case played a bad service :).

Let's look carefully at the absolute values:

16:03:21.931367 IP (tos 0x28, ttl 47, id 44771, offset 0, flags [DF], proto TCP (6), length 60)
46.153.19.182.54645 > 88.208.9.111.80: Flags [S], cksum 0x181c (correct), seq 3834615051, win 65535, options [mss 1460,sackOK,TS val 932840 ecr 0,nop,wscale 6], length 0
16:03:21.931387 IP (tos 0x0, ttl 64, id 1432, offset 0, flags [DF], proto TCP (6), length 60)
88.208.9.111.80 > 46.153.19.182.54645: Flags [S.], cksum 0xa4bc (incorrect -> 0xf9a4), seq 1594895211, ack 3834615052, win 8192, options [mss 1460,nop,wscale 6,sackOK,TS val 2509954639 ecr 932840], length 0
16:03:22.049434 IP (tos 0x28, ttl 47, id 44772, offset 0, flags [DF], proto TCP (6), length 52)
46.153.19.182.54645 > 88.208.9.111.80: Flags [.], cksum 0x430b (correct), seq 3834615052, ack 1594895212, win 1369, options [nop,nop,TS val 932852 ecr 2509954639], length 0
16:03:22.053697 IP (tos 0x28, ttl 47, id 44773, offset 0, flags [DF], proto TCP (6), length 40)
46.153.19.182.54645 > 88.208.9.111.80: Flags [R], cksum 0x93ba (correct), seq 211128292, win 1369, length 0
16:03:22.059913 IP (tos 0x28, ttl 48, id 0, offset 0, flags [DF], proto TCP (6), length 40)
46.153.19.182.54645 > 88.208.9.111.80: Flags [R.], cksum 0xa03f (correct), seq 0, ack 1594897965, win 0, length 0
16:03:22.060700 IP (tos 0x28, ttl 47, id 44774, offset 0, flags [DF], proto TCP (6), length 52)
46.153.19.182.54645 > 88.208.9.111.80: Flags [.], cksum 0x3a48 (correct), seq 3834615953, ack 1594896512, win 1410, options [nop,nop,TS val 932853 ecr 2509954639], length 0
16:03:22.060706 IP (tos 0x0, ttl 64, id 3974, offset 0, flags [DF], proto TCP (6), length 52)
88.208.9.111.80 > 46.153.19.182.54645: Flags [.], cksum 0xa4b4 (incorrect -> 0x475c), seq 1594895212, ack 3834615052, win 135, options [nop,nop,TS val 2509954768 ecr 932852], length 0

The first package contains seq 3834615051, in response to the server, the seq 1594895211, ack 3834615052 package went away (the out-of-ack number was in-seq + 1).

Then a couple of RST packages arrived, they are not interesting for us.

But the next packet is interesting to us - it contains seq numbers 3834615953, ack 1594896512. Both of these numbers are significantly larger than the initial seq / ack, which means that the remote party has already sent 3834615953-3834615052 = 901 bytes and even managed to receive 1594896512-1594895212 = 1300 bytes.

Of course, we do not see and will not see these data packets - this exchange was with the MiTM system. But the server does not know that. He sees a packet with seq 3834615953, and consequently, that he did not receive 901 bytes of data, and sends back a packet with the last valid seq / ack numbers known to him - seq 1594895212, ack 3834615052. The remote side receives this packet, and in turn reports that everything is fine with her, 1300 bytes of data were received successfully. Here we have a looping.

It also becomes clear why the staff servers did not see this traffic - it actually was, but many times less - as many times as ping from India to the States is more than ping from India to Europe.

It remains, in fact, to find how to fix this bug. Again we take the source code, the code of interest is in the file tcp_input.c. It was not particularly difficult - the tcp_input () function deals with the initial processing of a TCP packet — at the very end, if the packet passes all checks and the TCP connection is in the ESTABLISHED state — the packet is transferred to tcp_do_segment () for processing.

You just need to add one more check - if the ack-counter from the remote side shows that it received data that the server did not send - the packet should be ignored. You cannot immediately disconnect the connection - otherwise we will open to attackers a simple way to interrupt other people's TCP connections).

Patch testing showed that TCP traffic also contains packets with ack value of zero - they are no longer to be ignored. The final patch took three lines (excluding comments):

 + if(SEQ_GT(th->th_ack, tp->snd_max) && th->th_ack != 0) { + goto dropunlock; + }

PR (problem report) to FreeBSD developers sent the same day .

PS How is this problem in Linux and Windows? And everything is fine there, such packages are ignored (tested Windows 10 and Linux 3.10).

Source: https://habr.com/ru/post/331072/

All Articles

How to patch K̶D̶E̶ TCP stack under FreeBSD

More articles: