"Address Already in Use" or how to avoid problems when terminating a TCP connection

Correct shutdown

In order for the network connection to complete correctly, both parties must send packets with a completion signal (FIN), which indicate that the parties will no longer send data, and each party must also acknowledge (ACK) the receipt of a signal that the network communication has been completed. FIN is triggered when an application calls the close (), shutdown () or exit () method. After the close () method completes, the kernel enters the mode of waiting for confirmation from the second party receiving the completion signal. This makes it possible for the process that initiated the shutdown to be completed before the kernel releases the resources associated with the connection, and again allows the port to be used to bind to another process (in this case, if we try to use the port, we will get the AddressAlreadyInUse exception).

On the image:

There is an established connection, the status is ESTABLISHED
The client initiates the end of the connection, sends a signal to the server to complete the connection (FIN), enters a state of waiting for a server response (FIN_WAIT_1)
The server receives a connection completion signal and sends an acknowledgment (ACK), switches to the connection termination state (CLOSE_WAIT) (calls close ())
The server sends a signal to the client that it has successfully closed the connection (FIN) and tries to read the client confirmation (ACK), after which it is disconnected without waiting for it.
Now the client can receive two signals in different order

ACK - the client received confirmation that the server understood its intention to close the connection
- The client enters a state of waiting for the signal about the end of the connection (FIN) from the server (FIN_WAIT_2)
- The client receives a signal to close the connection by the server (FIN), sends an acknowledgment (ACK), waits some time (TIME_WAIT) and shuts down (the kernel frees up resources) (CLOSED)
')
FIN - the client receives a signal to close the connection on the server side (FIN), earlier than a confirmation from the server (ACK), to receive an initiating close signal from the client (FIN)
1. The client sends an acknowledgment of the signal that the server closes the connection, and enters the disconnected state (CLOSING)
2. After shutdown, it tries to read the confirmation signal from the server (which was sent by the server immediately after receiving the completion signal from the client, point 2), waits for a while (TIME_WAIT) and the kernel releases the resources (CLOSING).

The figure shows all the possible states that may be during correct termination, depending on the order in which the FIN and ACK packets are received from the remote side. Please note that if you initiated the completion of the connection (the left half of the figure), the other party will not expect to acknowledge receipt of the FIN package (the right half of the figure). The TIME_WAIT state is required in case the acknowledgment (ACK) that you sent was not received on the other side, or in case of false packets for some reason. I don’t know why the server-side did not make a TIME_WAIT state, although if the client initiates a shutdown, it certainly should not require waiting. The TIME_WAIT state can hold the port for several minutes after the process is complete. The hold time varies depending on the operating system, in some operating systems it is dynamic, standard values are in the range from 1 to 4 minutes.

If both parties manage to initiate the termination signal before they receive it from the other side, then both parties will have to go through a wait (TIME_WAIT).

The correct shutdown of the listening party

The listening socket can be closed immediately, in the absence of incoming connections, its state goes immediately to CLOSED. If there are incoming connections, it will go to FIN_WAIT_1 and then to TIME_WAIT.

Note that on the side of the listening socket, it is impossible to guarantee a clean close. As long as you check the connection usage with the select () method before closing, there is a tiny but real possibility of an incoming connection appearing after the select () call and before the close () call.

Unexpected remote side shutdown

If the server is suddenly disconnected, the local side initiates the closure of the connection, and in this case TIME_WAIT is inevitable. If the remote side disappears due to a network failure or a reboot of the machine (rare cases), the local port will remain bound until the TIME_WAIT state timeout expires. Worse, some older operating systems do not implement a timeout for the state FIN_WAIT_2, and can remain in it indefinitely, in this case only a reboot of the system can save.

If the local application (client) drops during the active connection, the port will be busy until the TIME_WAIT state is completed, the same is true for applications that are closed during the connection to the remote side (pending).

Ways to avoid problems

Option SO_REUSEADDR

You can use the setsockopt () method to set the SO_REUSEADDR option, which allows you to create a port binding even if it is still in the TIME_WAIT state (port binding will only be allowed for one process). This is the easiest and most effective method to avoid the message “address already in use”.

But, oddly enough, using the SO_REUSEADDR option can lead to more difficult to catch errors than “address already in use”. SO_REUSEADDR allows you to use a port that is stuck in TIME_WAIT, but you can still use this port in the process in which it is bound initially.

WHAT?

Suppose I use local port 1010 and connect to port 300 of foobar.com server, then the client disconnects and the port goes to the TIME_WAIT state, and I can use this port (1010) in any connection except the connection to foobar.com on port 300.
The situation in which this may cause a problem may be this: my program is trying to find a reserved local port (<1024) to bind in order to connect to a service that requires a reserved port, and if I use the SO_REUSEADDR option, then every time I start the program on my machine I will receive the same reserved port, even if it hangs in TIME_WAIT, and I can get “Address already in use”, in the place where the port was last used. In this case, you need to stop using the option SO_REUSEADDR.

Some do not like to use SO_REUSEADDR, because This option has security issues. On some operating systems, this option may allow different processes to use the same port at the same time. And this is a problem, because most servers are bound to a port without using a specific address, instead they use INADDR_ANY (the netstat command will display them as * .8080). Thus, if the server is associated with the address * .8080, then another process, from another user of the local machine, can connect to the address local_machine.8080 (and its intentions may not be good at all), and intercept all your connections, since He indicated a more specific address. This problem manifests itself only on multi-user systems that do not have restrictions for accounts, and this is not a vulnerability that is available outside the local machine; it can be easily avoided by binding to a specific machine address (without using INADDR_ANY).

Others do not like the fact that the core of the system spends its resources on hundreds or even thousands of TIME_WAIT states; this problem can also be avoided using the approach described below.

Client disconnects first

Looking at the picture above, we see that the TIME_WAIT state can be avoided when the shutdown is initiated on the remote side, which means that problems can be avoided if the server allows the client to initiate the shutdown first. To do this, you can build a custom protocol architecture in such a way that the client knows when he needs to initiate the closure. The server can safely disconnect by receiving the EOF command from the client, but we still have to set the timeout for the client to shut down so that it can shut down correctly. It is almost always enough to wait a few seconds until the connection to the server is completed correctly.

This concept probably makes sense to call “the remote side disconnects first,” otherwise we will depend on what we call the client and what the server is. If you are developing a certain system consisting of several client programs that are on the same machine and access different servers, you will want to transfer the responsibility for disconnecting to the server in order to save the resources of the client machine.

For example, I wrote a script that uses remote shell (rsh) to communicate with all the machines on my network, and it performs the work in parallel, constantly using several open connections. For rsh, fewer than 1024 ports are available. First, I used the rsh -n command, which causes the local side to shut down first. After several tests, all free ports are less than 1024, were able to TIME_WAIT, and the process stopped. Removing the -n option triggers a trip on the remote side, and the TIME_WAIT problem is resolved; however, this may cause rsh to hang, waiting for an incoming connection. And if you close the incoming connection locally, the port will again be in the TIME_WAIT state. Ultimately, I simply refused to use rsh and wrote my implementation on perl (the current version can be downloaded here )

Timeout reduction

If, for some reason, none of the above options do not suit you, there is an opportunity to shorten the TIME_WAIT state timeout. The ability and implementation of such an operation depends on the operating system you are using. It is worth remembering that a too short timeout can have negative consequences, in particular, with packet loss or in overloaded networks.

Source: https://habr.com/ru/post/173415/

All Articles