πŸ“œ ⬆️ ⬇️

TCP puzzles


They say that you can not fully understand the system until you understand its failures. While still a student, I wrote the TCP implementation for fun, and then worked for several years in IT, but I still continue to study the work of TCP more deeply and more deeply - and its mistakes. The most amazing thing is that some of these errors manifest themselves in basic things. And they are not obvious. In this article, I will present them as puzzles, in the style of Car Talk or old Java puzzles . Like any other good puzzles, they are very easy to reproduce, but solutions are usually surprising. And instead of focusing our attention on mysterious details, these puzzles help to explore some of the underlying principles of how TCP works.

The necessary conditions


These puzzles imply basic knowledge of how TCP works on Unix-like systems. But you do not need to be a master to penetrate into them. For example:


You can repeat all these examples on your own. I used two virtual machines running using VMware Fusion. The results were the same as in the production-server. For testing, I used nc(1) on SmartOS, and I do not believe that any of the reproducible problems will be specific to a particular OS. To track system calls and collect rough information about timings, I used the truss (1) utility from the illumos project. You can get this information using dtruss (1m) under OS X or strace (1) under GNU / Linux.

nc(1) very simple program. We will use it in two modes:
')

In both modes, after the connection is established, each side uses poll to wait for standard input or connect a socket that has ready-to-read data. Input data are output to the terminal. The data that you enter into the terminal is sent through a socket. When you press CTRL-C, the socket closes and the process stops.
In the examples, my client will be called kang , and the server will be kodos .

Warm up: normal TCP break


Let's start with the basic situation. Imagine that we set up a server on kodos :

Server
 [root@kodos ~]# truss -d -t bind,listen,accept,poll,read,write nc -l -p 8080 Base time stamp: 1464310423.7650 [ Fri May 27 00:53:43 UTC 2016 ] 0.0027 bind(3, 0x08065790, 32, SOV_SOCKBSD) = 0 0.0028 listen(3, 1, SOV_DEFAULT) = 0 accept(3, 0x08047B3C, 0x08047C3C, SOV_DEFAULT, 0) (sleeping...) 

(I recall that in these examples I use truss to display the system calls that nc makes. Time information is displayed using the - d flag, and -t allows you to choose which of the calls we want to see.)

Now I am connecting to kang :

Customer
 [root@kang ~]# truss -d -t connect,pollsys,read,write,close nc 10.88.88.140 8080 Base time stamp: 1464310447.6295 [ Fri May 27 00:54:07 UTC 2016 ] ... 0.0062 connect(3, 0x08066DD8, 16, SOV_DEFAULT) = 0 pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...) 

On kodos we see:

Server
 23.8934 accept(3, 0x08047B3C, 0x08047C3C, SOV_DEFAULT, 0) = 4 pollsys(0x08045680, 2, 0x00000000, 0x00000000) (sleeping...) 

TCP connection is in ESTABLISHED state, and both processes are in poll . We can see this on every system using netstat :

Server
 [root@kodos ~]# netstat -f inet -P tcp -n TCP: IPv4 Local Address Remote Address Swind Send-Q Rwind Recv-Q State –––––––––––––––––––– –––––––––––––––––––– ––––– –––––- ––––– –––––- ––––––––––- 10.88.88.140.8080 10.88.88.139.33226 1049792 0 1049800 0 ESTABLISHED ... 

Customer
 [root@kang ~]# netstat -f inet -P tcp -n TCP: IPv4 Local Address Remote Address Swind Send-Q Rwind Recv-Q State –––––––––––––––––––– –––––––––––––––––––– ––––– –––––- ––––– –––––- ––––––––––- 10.88.88.139.33226 10.88.88.140.8080 32806 0 1049800 0 ESTABLISHED ... 

Question: when we complete one of the processes, what happens to the other? Will he understand what happened? How will he understand this? Let's try to predict the behavior of specific system calls and explain why each of them does what it does.

Press CTRL-C on kodos :

Server
 pollsys(0x08045680, 2, 0x00000000, 0x00000000) (sleeping...) ^C127.6307 Received signal #2, SIGINT, in pollsys() [default] 

And this is what we see on kang :

Customer
 pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...) 126.1771 pollsys(0x08045670, 2, 0x00000000, 0x00000000) = 1 126.1774 read(3, 0x08043670, 1024) = 0 126.1776 close(3) = 0 [root@kang ~]# 

What happened? Let's see:

  1. By exiting the process, we sent SIGINT to the server. After the exit, the file descriptors closed.
  2. When the last handle for the ESTABLISHED socket is closed, the TCP stack on the kodos sends over the FIN connection and enters the FIN_WAIT_1 .
  3. The TCP stack on kang receives a FIN packet, switches its own connection to the CLOSE_WAIT state and sends an ACK in response. While the nc client is blocking the socket - it is ready to read, the kernel wakes this thread with a POLLIN .
  4. The nc client sees the POLLIN for the socket and calls read , which immediately returns 0. This means the end of the connection. nc decides that we are done with the socket, and closes it.
  5. In the meantime, the TCP stack on kodos receives an ACK and transitions to the FIN_WAIT_2 .
  6. While the nc client on kang closes its socket, the TCP stack on kang sends FIN to kodos . Connection to kang goes to LAST_ACK state.
  7. The TCP stack on kodos receives FIN, the connection goes to the TIME_WAIT state, and the stack on kodos confirms FIN.
  8. The TCP stack on kang receives an ACK for FIN and completely removes the connection.
  9. After two minutes, the TCP connection on the kodos closes, and the stack completely removes the connection.

The order of the stages may vary slightly. Also kang can pass through the CLOSING state instead of FIN_WAIT_2 .

So, according to netstat, the final state looks like:

Server
 [root@kodos ~]# netstat -f inet -P tcp -n TCP: IPv4 Local Address Remote Address Swind Send-Q Rwind Recv-Q State –––––––––––––––––––– –––––––––––––––––––– ––––– –––––- ––––– –––––- ––––––––––- 10.88.88.140.8080 10.88.88.139.33226 1049792 0 1049800 0 TIME_WAIT 

There is no outgoing data for kang for this connection.

Intermediate states pass very quickly, but you can track them using the DTrace TCP provider . Packet streams can be viewed using snoop (1m) or tcpdump (1) .

Conclusions: We saw a normal path for system calls to pass during the installation and close the connection. Please note that kang immediately discovered that the connection was closed on kodos - it was woken from poll , and returning zero read signaled the end of the transmission stream. At this point, kang decided to close the socket, which led to the closure of the connection with kodos . We will come back to this later and see what happens if kang does not close the socket in this situation.

Puzzle 1: Power restart


What happens with an inactive TCP connection established when one of the systems restarts?

Since many processes are completed correctly in the scheduled reboot process (using the β€œreboot” command), the result will be the same if you enter the β€œreboot” command in the kodos console kodos the server using CTRL-C. But what happens if in the previous example we just turn off the power for kodos ? In the end, kang will find out about this, right?

Let's check. We establish connection:

Server
 [root@kodos ~]# truss -d -t bind,listen,accept,poll,read,write nc -l -p 8080 Base time stamp: 1464312528.4308 [ Fri May 27 01:28:48 UTC 2016 ] 0.0036 bind(3, 0x08065790, 32, SOV_SOCKBSD) = 0 0.0036 listen(3, 1, SOV_DEFAULT) = 0 0.2518 accept(3, 0x08047B3C, 0x08047C3C, SOV_DEFAULT, 0) = 4 pollsys(0x08045680, 2, 0x00000000, 0x00000000) (sleeping...) 

Customer
 [root@kang ~]# truss -d -t open,connect,pollsys,read,write,close nc 10.88.88.140 8080 Base time stamp: 1464312535.7634 [ Fri May 27 01:28:55 UTC 2016 ] ... 0.0055 connect(3, 0x08066DD8, 16, SOV_DEFAULT) = 0 pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...) 

To emulate a power restart, I will use the reboot function from VMware. Please note that this will be a real restart - everything that leads to a gradual shutdown is more like the first example.

After 20 minutes, kang still in the same condition:

Customer
 pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...) 

We tend to believe that the job of TCP is to maintain abstraction (namely, TCP connections) between multiple systems at all times, so that such cases of broken abstraction look amazing. And if you think that this is some kind of nc (1) problem, then you are mistaken. The "netstat" on kodos shows no connection to kang , but at the same time kang will show a fully working connection to kodos :

Customer
 [root@kang ~]# netstat -f inet -P tcp -n TCP: IPv4 Local Address Remote Address Swind Send-Q Rwind Recv-Q State –––––––––––––––––––– –––––––––––––––––––– ––––– –––––- ––––– –––––- ––––––––––- 10.88.88.139.50277 10.88.88.140.8080 32806 0 1049800 0 ESTABLISHED ... 

If you leave everything as it is, kang will never know that kodos been reset.

Now suppose kang trying to send data to kodos . What will happen?

Customer
 pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...) kodos, are you there? 3872.6918 pollsys(0x08045670, 2, 0x00000000, 0x00000000) = 1 3872.6920 read(0, " kodos , are y".., 1024) = 22 3872.6924 write(3, " kodos , are y".., 22) = 22 3872.6932 pollsys(0x08045670, 2, 0x00000000, 0x00000000) = 1 3872.6932 read(3, 0x08043670, 1024) Err#131 ECONNRESET 3872.6933 close(3) = 0 [root@kang ~]# 

When I enter the message and press Enter, the kodos wakes up, reads the message from stdin and sends it through the socket. Call write successfully completed ! nc returns to poll , waiting for the next event, and eventually finds that the socket cannot be read without blocking, after which it calls read . This time read drops with ECONNRESET status. What does it mean? The read (2) documentation tells us:

 [ECONNRESET]     ,      . 

Another source contains a bit more details:

  ECONNRESET  filedes      .        .  /      filedes. 

This error does not mean any particular problem with the read call. It only says that the socket has been disconnected. For this reason, most socket operations will result in an error.

So what happened? At that moment, when nc tried to send data to kang , the TCP stack still did not know that the connection was already dead. kang sent a data packet to kodos , who answered the RST because he knew nothing about the connection. kang saw RST and cut the connection. The socket file descriptor cannot be closed, file descriptors do not work that way, but subsequent operations will fail with ECONNRESET status until nc closes the file descriptor.

Findings:

  1. A hard power outage is very different from a gentle shutdown. When testing distributed systems, this scenario should also be checked separately. Do not expect that everything will be the same as with the normal process stop (kill).
  2. There are situations when one side is sure that the TCP connection is established, and the other is not sure, and this situation will never be resolved automatically. You can control the resolution of such problems using keep-alive for connections at the application level or TCP.
  3. The only reason why kang still found out about the disappearance of the remote side is that it sent the data and received a response indicating that there is no connection.

The question arises: what if kodos for some reason does not respond to sending data?

Puzzle 2: Power Off


What happens if the endpoint of a TCP connection is disconnected from the network for a while? Do other knots find out about this? If so, how? And when?

Reconnect using nc :

Server
 [root@kodos ~]# truss -d -t bind,listen,accept,poll,read,write nc -l -p 8080 Base time stamp: 1464385399.1661 [ Fri May 27 21:43:19 UTC 2016 ] 0.0030 bind(3, 0x08065790, 32, SOV_SOCKBSD) = 0 0.0031 listen(3, 1, SOV_DEFAULT) = 0 accept(3, 0x08047B3C, 0x08047C3C, SOV_DEFAULT, 0) (sleeping...) 6.5491 accept(3, 0x08047B3C, 0x08047C3C, SOV_DEFAULT, 0) = 4 pollsys(0x08045680, 2, 0x00000000, 0x00000000) (sleeping...) 

Customer
 [root@kang ~]# truss -d -t open,connect,pollsys,read,write,close nc 10.88.88.140 8080 Base time stamp: 1464330881.0984 [ Fri May 27 06:34:41 UTC 2016 ] ... 0.0057 connect(3, 0x08066DD8, 16, SOV_DEFAULT) = 0 pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...) 

Now suddenly turn off the power to the kodos and try to send data with kang :

Customer
 pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...) 114.4971 pollsys(0x08045670, 2, 0x00000000, 0x00000000) = 1 114.4974 read(0, "\n", 1024) = 1 114.4975 write(3, "\n", 1) = 1 pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...) 

The write call ends normally and I see nothing for a long time. Only five minutes later appears:

Customer
 pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...) 425.5664 pollsys(0x08045670, 2, 0x00000000, 0x00000000) = 1 425.5665 read(3, 0x08043670, 1024) Err#145 ETIMEDOUT 425.5666 close(3) = 0 

This situation is very similar to the one when we restarted the power supply instead of turning it off completely. There are two differences:


Notice again - this is an expired read timeout. We would see the same error with other socket operations. This is because the socket enters a state when the connection has timed out. The reason for this is that the remote side did not confirm the data packet for too long - 5 minutes, in accordance with the settings of this system.

Findings:

  1. When the remote system simply turns off instead of restarting the power supply, the first system can find out about it only by sending data. Otherwise, she will never know if the connection is broken.
  2. When the system tries to send data for too long and does not receive a response, the TCP connection is closed and all operations with the socket will fail with an ETIMEDOUT error.

Puzzle 3: Breaking the connection without falling


This time, instead of describing a specific situation to you and asking what is happening, I will do the opposite: I will describe some kind of observation and see if you can understand how this happened. We discussed several situations in which kang may believe that it is connected to kodos , but kodos does not know about it. Is it possible for kang be connected to kodos so that kodos does not know about it for an indefinite period of time (i.e., the problem will not be solved by itself), and there would be no power cut or restart, no other error of the kodos operating system or network equipment?

Hint: consider the above case when the connection is stuck in the status of ESTABLISHED. It is fair to assume that the application is responsible for solving this problem, since it keeps the socket open and can detect it by sending data when the connection was interrupted. But what if the application no longer holds the socket open?

In the warm-up we considered the situation when nc on kodos closed the socket. We said that nc on kang read 0 (pointer to the end of the transfer) and closed the socket. Suppose the socket is left open. Obviously, it would be impossible to read from it. But nothing is said about TCP that you cannot send additional data to the party that sent you the FIN. FIN only means closing the data stream in the direction FIN was sent.

To demonstrate this, we cannot use nc on kang , because it automatically closes the socket after receiving 0. Therefore, I wrote a demo version of nc , called dnc , that skips this moment. Also, dnc explicitly outputs the system calls it makes. This will give us a chance to track the state of TCP.

First configure the connection:

Server
 [root@kodos ~]# truss -d -t bind,listen,accept,poll,read,write nc -l -p 8080 Base time stamp: 1464392924.7841 [ Fri May 27 23:48:44 UTC 2016 ] 0.0028 bind(3, 0x08065790, 32, SOV_SOCKBSD) = 0 0.0028 listen(3, 1, SOV_DEFAULT) = 0 accept(3, 0x08047B2C, 0x08047C2C, SOV_DEFAULT, 0) (sleeping...) 1.9356 accept(3, 0x08047B2C, 0x08047C2C, SOV_DEFAULT, 0) = 4 pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...) 

Customer
 [root@kang ~]# dnc 10.88.88.140 8080 2016-05-27T08:40:02Z: establishing connection 2016-05-27T08:40:02Z: connected 2016-05-27T08:40:02Z: entering poll() 

Now make sure that the connection on both sides is in the status of ESTABLISHED:

Server
 [root@kodos ~]# netstat -f inet -P tcp -n TCP: IPv4 Local Address Remote Address Swind Send-Q Rwind Recv-Q State –––––––––––––––––––– –––––––––––––––––––– ––––– –––––- ––––– –––––- ––––––––––- 10.88.88.140.8080 10.88.88.139.37259 1049792 0 1049800 0 ESTABLISHED 

Customer
 [root@kang ~]# netstat -f inet -P tcp -n TCP: IPv4 Local Address Remote Address Swind Send-Q Rwind Recv-Q State –––––––––––––––––––– –––––––––––––––––––– ––––– –––––- ––––– –––––- ––––––––––- 10.88.88.139.37259 10.88.88.140.8080 32806 0 1049800 0 ESTABLISHED 

On kodos use CTRL-C for the nc process:

Server
 pollsys(0x08045670, 2, 0x00000000, 0x00000000) (sleeping...) ^C[root@kodos ~]# 

We will immediately see the following on kang :

Customer
 2016-05-27T08:40:12Z: poll returned events 0x0/0x1 2016-05-27T08:40:12Z: reading from socket 2016-05-27T08:40:12Z: read end-of-stream from socket 2016-05-27T08:40:12Z: read 0 bytes from socket 2016-05-27T08:40:12Z: entering poll() 

Now let's look at the status of TCP connections:

Server
 [root@kodos ~]# netstat -f inet -P tcp -n TCP: IPv4 Local Address Remote Address Swind Send-Q Rwind Recv-Q State –––––––––––––––––––– –––––––––––––––––––– ––––– –––––- ––––– –––––- ––––––––––- 10.88.88.140.8080 10.88.88.139.37259 1049792 0 1049800 0 FIN_WAIT_2 

Customer
 [root@kang ~]# netstat -f inet -P tcp -n TCP: IPv4 Local Address Remote Address Swind Send-Q Rwind Recv-Q State –––––––––––––––––––– –––––––––––––––––––– ––––– –––––- ––––– –––––- ––––––––––- 10.88.88.139.37259 10.88.88.140.8080 1049792 0 1049800 0 CLOSE_WAIT 

It makes sense: kudos sent FIN to kang . FIN_WAIT_2 indicates that kodos received an ACK from kang in response to the FIN sent to them, and CLOSE_WAIT indicates that kang received a FIN, but did not send FIN in response . This is a completely normal TCP connection state that can last indefinitely. Imagine that kodos sent a kang request and did not plan to send anything else; kang can spend hours happily sending data back. Only in our case, the kodos actually closed the socket .

Let's wait a minute and check the status of TCP connections again. It kodos out that the connection completely disappears on kodos , but still exists on kang :

Customer
 [root@kang ~]# netstat -f inet -P tcp -n TCP: IPv4 Local Address Remote Address Swind Send-Q Rwind Recv-Q State –––––––––––––––––––– –––––––––––––––––––– ––––– –––––- ––––– –––––- ––––––––––- 10.88.88.139.37259 10.88.88.140.8080 1049792 0 1049800 0 CLOSE_WAIT 

We faced a less well-known situation related to the TCP stack: when the application closed the socket, the stack sent FIN, the remote stack recognized it FIN, and the local stack waited for a fixed period of time and closed the connection . Cause? The remote side has been rebooted. This case is similar to when the connection on one side is in ESTABLISHED status, and the other side does not know about it. The only difference is that the application closed the socket, and there is no other component that could deal with the problem. As a result, the TCP stack waits for a set period of time and closes the connection (without sending anything to the other side).

Question to follow: what happens if in this situation kang sends data to kodos ? Do not forget, kang still believes that the connection is open, although on the kodos side it is already completed.

Customer
 2016-05-27T08:40:12Z: entering poll() kodos, are you there? 2016-05-27T08:41:34Z: poll returned events 0x1/0x0 2016-05-27T08:41:34Z: reading from stdin 2016-05-27T08:41:34Z: writing 22 bytes read from stdin to socket 2016-05-27T08:41:34Z: entering poll() 2016-05-27T08:41:34Z: poll returned events 0x0/0x10 2016-05-27T08:41:34Z: reading from socket dnc: read: Connection reset by peer 


This is the same thing that we saw in Puzzle 1: write() succeeds because the TCP stack does not yet know that the connection is closed. But then comes the RST, which wakes up the thread in the poll() , and the subsequent read() request returns ECONNRESET.

Findings:


Conclusion


TCP is usually presented to us as a protocol that supports abstraction - β€œTCP connection” - between two systems. We know that due to some software or network problems, the connection will fall. , , - . For example:


TCP. , TCP , . TCP, , . , , TCP- .

, , , , Β«TCP-, Β» β€” . - , , . , - ( keep-alive).

, «» TCP-. ( , ) , . , TCP- .

, :



:


, , - . , .

Source: https://habr.com/ru/post/316128/


All Articles