Non-blocking TCP server without undocumented features

Introduction

A great article with trapexit “Building a Non-blocking TCP server using OTP principles” tells you how to build a non-blocking TCP server using OTP principles. I think everyone who began to study elrlang sooner or later came across this article. To build a non-blocking TCP server, the above article uses undocumented functionality from the prim_inet module.

I will not make good or bad use of undocumented features, in some “crutch” solutions this is really necessary, in production I would prefer to use proven tools. Note, even in the article itself, the author warns: " This is a non-documented property. OTP team is free to change this implementation, we will exploit this functionality in the construction of our server [1]. "

By a non-blocking server, we mean that the listening process and the FSM should not make any blocking calls and quickly respond to incoming messages (for example, changes in configuration, restart, etc.) without causing timeouts [2].

Regarding the clipping above: problems may arise (with the listening process), if it carries additional functional load (for example, contains any additional user APIs that need to be twitched in the process), FSM for architectural reasons does not must contain blocking calls. Therefore, if the listener’s only function is to listen, then there is nothing terrible that its flow will be blocked by waiting for a connection, in case it is necessary to restart this element of the system, it will be forcibly stopped by the supervisor, at a predetermined timeout and then restarted (unless rights correct). Problems may arise during the hot update of the code (the author didn’t check what rake could be encountered in this case, who tried to share experience).
')
We set the task to implement a non-blocking TCP server using only documented methods.

Server structure

The first thing that comes to mind about the task at hand is to implement the connection wait in a separate process. Thus, the server structure can be represented as follows.

Picture 1

1. application_master: main_loop / 2
2. application_master: loop_it / 4

When an application is launched, an application_master process is created, in a logical structure it is one process, but two processes are created on the physical layer. Application master is the group leader for all processes in an application.

3. Supervisor of our TCP server (supervisor)
4. A listener (gen_server) that hits a listener process (simple process)
5. Supervisor client processes (supervisor)
6. The listener process (simple process)
7. Client processes (gen_fsm)

Source

I think it makes no sense to provide the source code for all parts of the system, we’ll dwell only on the tcp_listener module and the process it starts.

-module(tcp_listener). -behaviour(gen_server). -export([start_link/1]). -export([init/1, handle_call/3, handle_cast/2, handle_info/2, terminate/2, code_change/3]). -export([accept_func/1]). -define(SERVER, ?MODULE). 1. -define(LOGIC_MODULE, tcp_fsm). 2. -record(state, { listener, %% Listening socket module %% FSM handling module }). start_link(Port) -> gen_server:start_link({local, ?SERVER}, ?MODULE, [Port], []). init([Port]) -> Options = [{packet, raw}, {active, once}, {reuseaddr, true}], case gen_tcp:listen(Port, Options) of {ok, LSocket} -> %% Create first accepting process 3. spawn_link(?MODULE, accept_func, [LSocket]), {ok, #state{listener = LSocket, module = ?LOGIC_MODULE}}; {error, Reason} -> error_logger:error_msg("Error: ~p~n", [Reason]), {stop, Reason} end. handle_call(_Request, _From, State) -> Reply = ok, {reply, Reply, State}. handle_cast(_Msg, State) -> {noreply, State}. handle_info(_Info, State) -> {noreply, State}. terminate(_Reason, #state{listener = LSocket} = _State) -> gen_tcp:close(LSocket), ok. code_change(_OldVsn, State, _Extra) -> {ok, State}. accept_func(LSocket) -> 4. {ok, Socket} = gen_tcp:accept(LSocket), error_logger:info_msg("Accept connection: ~p.\n", [Socket]), 5. {ok, Pid} = tcp_client_sup:start_child(), 6. ok = gen_tcp:controlling_process(Socket, Pid), 7. tcp_fsm:set_socket(Pid, Socket), 8. accept_func(LSocket).

1. Macro declaring a client connection handling module.
2. Structure for storing the state of the gen-server.
3. We create an additional process that will “listen”.
4. We are waiting for the connection.
5. Create a gen_fsm (tcp_fsm module) to handle the connection to the client.
6. Change the controlling socket process to the newly created process in Section 5.
7. Pass the socket to the tcp_fsm module.
8. Starting to "listen" again.

We are testing

 (emacs@host)2> make:all(). #  Recompile: tcp_server_sup Recompile: tcp_listener Recompile: tcp_fsm Recompile: tcp_client_sup Recompile: erltcps up_to_date (emacs@host)3> code:add_path("../ebin"). #    ebin true (emacs@host)4> application:load(erltcps). #   ok (emacs@host)5> application:start(erltcps). #   ok (emacs@host)6> =INFO REPORT==== 22-Jun-2011::13:10:07 === Accept connection: #Port<0.2353>. #   (emacs@host)6> =INFO REPORT==== 22-Jun-2011::13:10:07 === IP: {127,0,0,1} # IP  (emacs@host)6> =INFO REPORT==== 22-Jun-2011::13:10:15 === <<"hello\r\n">> #   (emacs@host)6> =INFO REPORT==== 22-Jun-2011::13:10:23 === {127,0,0,1} Client disconnected. #   (emacs@host)6>

findings

As a result, we built a TCP server frame “as it were” not blocking. In our implementation, a special process remains blocked, the only function of which is to wait for a connection and create a process for its processing. In the tcp_listener module itself, you can add additional logic (for example, starting / stopping receiving connections by stopping the listening process).
Pros:

We did not use undocumented features, which in production can cost us a lot.
A specially created process remains blocked.

Minuses:

In our OTP application there is a process created not according to the principles of OTP.
If the listening process fails (accept_func / 1 in the tcp_listener module), then the signal propagates, and tcp_listener also drops, since the supervisor restarts tcp_listener, and he in turn creates the listening process from the accept_func / 1 function.

These two minuses are interconnected. For everyone there is a solution. Here are a couple of tasks for readers:
1. What should be done to prevent tcp_lictener from falling if the listening process falls (accept_func / 1)?
2. What needs to be added for a safer use of simple processes in an OTP application?

Download

The source code for the article can be downloaded on github .

What to read?

1. Building a Non-blocking TCP server using OTP principles
2. Creating a non-blocking TCP server using OTP principles
3. Erlang questions mailing list ~ prim_inet
4. Excellent documentation
5. Erlang applications

Source: https://habr.com/ru/post/120815/

All Articles