Basic API sets for implementing transparent proxying services

One of the most important parts of any corporate data protection system against leaks is the outgoing network traffic analysis module. Most often, the module is implemented as a service of transparent proxying, i.e. a service that “transparently” stands between the network application and the target server, and whose task is to intercept the flow of data between the application and the server.

The article is devoted to the service of transparent proxying and how to implement traffic proxying. It will not consider the issues of redirecting network traffic to the transparent proxying service, although this is also quite an interesting technical problem.

Since the target applications can work on any, including non-standard, ports, all traffic needs to be processed. The number of connections that are created during the work of "high-performance" network applications, exceeds 100 per second. In this regard, the transparent proxying service should be as efficient as possible. The general algorithm of actions of the service is as follows:
')

Accept redirected connection.
Get information about where you need to establish a "proxied" connection.
Create a connection to the server (from step 2).
Get data from the application and transfer it to the server.
Get data from the server and transfer it to the application.
Repeat steps 4 and 5 until either the server or the application closes the connection.
Close the "pair" connection.

What APIs in the Microsoft Windows operating system can help solve this problem?

Sockets + WSA events

To organize proxying using this API, do the following:

1. Create a socket

SOCKET socket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

2. Create an event that will monitor the change of the socket state

 WSAEVENT sock_event = WSACreateEvent();

3. Associate the socket with the event, indicating, at the same time, what changes in the state of the socket we are interested in. When transmitting traffic, we are interested in the completion of data transmission and the completion of data reception. In addition, the moment of closing the connection is interesting, as this is a sign that it is time to finish processing traffic

 WSAEventSelect(socket, sock_event, FD_READ|FD_WRITE|FD_CLOSE);

4. Initiate data reading from the socket

 int res = recv(socket, buf, buf_len, 0);

5. Arrange for waiting for socket status changes. Most often this is done as follows: a separate thread is started, and the function of waiting for a state change is called in it

 int res = WSAWaitForMultipleEvents(1, sock_event, FALSE, INFINITE, FALSE);

The event associated with the socket will come to a signal state if data is received, or data is sent, or the connection is closed. I / O errors also cause the event to be signaled.

6. Check how exactly the state of the socket has changed, and make the appropriate processing

 WSANETWORKEVENTS wsaNetworkEvents; WSAEnumNetworkEvents(socket, sock_event, &wsaNetworkEvents); if( ( wsaNetworkEvents.lNetworkEvents & FD_READ) ) { // ,       «» ProcessReceivedData(); } if( ( wsaNetworkEvents.lNetworkEvents & FD_WRITE) ) { // ,      «» IssuerRead(); } if( ( wsaNetworkEvents.lNetworkEvents & FD_CLOSE) ) { // , //   (   ,     ) ClosePeer(); }

Advantages and disadvantages

What pitfalls await us when using this API:
Programs can install dozens of connections at the same time, and the transparent proxying service should create twice as many sockets, i.e. The proxy service creates two sockets for one connection of the program. The WSAWaitForMultipleEvents function used has a limitation - it cannot accept more than 64 objects at a time. Therefore, you need to run several wait threads and somehow distribute the sockets between them.

Long-term processing of data in one of the wait threads may cause events from other sockets that are expected in this thread to not be processed. To solve this problem, you need to run separate data processing threads and monitor their loading.
Retrieving data from a socket requires calling three functions: recv, WSAWaitForMultipleEvents, and WSAEnumNetworkEvents. Each of these functions potentially “goes into kernel mode,” which is quite a costly operation.

If the pool of threads waiting for events of sockets and data processing is implemented inefficiently, an increase in the amount of computing resources (processor cores) will not lead to an increase in the speed of proxying connections, and for terminal servers this possibility is very important.

Thus, this API is not very suitable for implementing an efficient transparent proxying method. Consider another set of APIs.

Overlapped I / O + Thread Pool + Completion Ports

1. Create a socket. But now to perform asynchronous operations, we need some contextual structure that describes an asynchronous operation. A feature of this structure is that its first element is the standard data type OVERLAPPED. This procedure will allow for the correct operation of the callback functions.

 struct AsyncOperationContext { //,       OVERLAPPED ov; //   –    // ) CALLBACK_FUNC pfFunc; //   PVOID pContex; } SOCKET sock = ::WSASocket( AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, 0, WSA_FLAG_OVERLAPPED);

2. We bind the socket to the input / output completion port, events from which are processed within the system thread pool. Since we will use a pointer to an OVERLAPPED structure to initiate an asynchronous operation, no one is stopping us from allocating more memory for our needs with this structure. And we will get the address of this particular structure in the callback of the I / O completion port.

 BindIoCompletionCallback(sock, IoSockCompletionRoutine, 0); VOID CALLBACK IoCompletionRoutine( DWORD error, DWORD bytes, LPOVERLAPPED ov) { AsyncOperationContext* actx = reinterpret_cast< AsyncOperationContext*>(ov); actx->pfFunc(actx->pContext,error,bytes); }

3. Initiate an asynchronous read operation from the socket. It should be remembered that if the operation was completed immediately, i.e. either without error, or with an error other than ERROR_IO_PENDING, then you need to complete the processing in the thread that initiated the reading. The I / O completion port callback function will not be called in this case. The context of an asynchronous operation should be stored in a structure that describes the intercepted connection, since the lifetime of this structure coincides with the lifetime of the connection context. Moreover, this structure can be reused for reading from a socket.

 AsyncOperationContext receive_ov; //    memset(&receive_ov, 0, sizeof(OVERLAPPED)); //        receive_ov.pfFunc = ReceiveDoneCallback; receive_ov.pContext = this; //     BOOL res = ReadFile((HANDLE)sock, buf, buf_len, &received, (LPOVERLAPPED)&receive_ov); if(res) { //  . //  /    if(received > 0) { //  ,   ProcessReceivedData(); //    InitiateRead(); } else { //  . ,    //  . ProcessConnectionClose(); } } else { DWORD error = GetLastError(); if(error != ERROR_IO_PENDING) { //  .   ProcessConnectionClose(); } }

The implementation of ReceiveDoneCallback is similar to the synchronous case.

4. We process the received data. Since we are already using the system thread pool for I / O processing, we need to use the system thread pool for data processing. It should be remembered: the data must be processed and transmitted to our paired socket in the same sequence in which they were received. Therefore, a queue of processed and transmitted data must be organized. The system pool function should work with the queue. It is important that the queue handles only one thread of the pool. You can organize a queue in any way.

 //         AddReceivedDataToQueue(buf, buf_len); //,       // ,    If(!IsQueueProcessingAndMark()) { QueueUserWorkItem(DataProcessingRoutine, this, 0); } DWORD WINAPI WorkRoutine(LPVOID param) { DataItem* dataItem; while( dataItem = GetQueueProcessingItem() ) { ProcessDataItem(dataItem); //   InitiateWrite(); } MarkQueueProcessing(FALSE); }

Access to the queue of processed items, as well as access to information on the status of processing must be synchronized. Asynchronous data transfer to our “pair” is organized in a similar way, but instead of ReadFile, the WriteFile function is used.

Advantages and disadvantages

What we got when we started using this API set:

We no longer need our own implementation of the thread pool — the thread pool is used, which is implemented by the operating system.
There are no restrictions that are associated with the number of treated compounds.
The data that is received on the socket is immediately passed to the callback function. Accordingly, you just need to initiate the operation and process the result. No additional API calls are required.

This set of API allows you to increase the number of processed compounds by increasing the number of process cores, i.e. This scheme will work on the terminal server.

But this API still has flaws:

The API does not allow to manage the pool, i.e. we can not limit the number of threads in the pool.
We cannot "guaranteed" separate the threads that deal with I / O processing and the threads that do business with the intercepted data.
It is necessary to organize in a special way the waiting for "stuck" I / O operations.

These problems can be solved using a different set of APIs.

Using the Vista Thread Pool API

This set of functions allows you to create separate thread pools and configure each of them. Consider the steps you need to take to organize proxying network connections using this API.

1. Create and configure the environment in which the thread pool will work. This environment allows you to correctly wait for the completion of all tasks that were transferred to the specified pool

 PTP_CALLBACK_ENVIRON io_pool_env; InitializeThreadpoolEnvironment(io_pool_env); PTP_CLEANUP_GROUP io_pool_cleanup = CreateThreadpoolCleanupGroup(); SetThreadpoolCallbackCleanupGroup(io_pool_env,io_pool_cleanup,NULL);

2. Create and configure thread pool

 PTP_POOL io_pool = CreateThreadpool(NULL); SetThreadpoolThreadMaximum(io_pool,10); SetThreadpoolMinimum(io_pool,2); SetThreadpoolCallbackPool(&io_pool_env, io_pool);

Now we have a dedicated thread pool, in which there can not be less than two and more than ten threads. In addition, we can use the io_pool_cleanup variable to wait for the completion of all operations that were initiated in this pool. Similarly, you can configure a thread pool for processing captured data (processing_pool).

3. Create a socket and structures that are required to initiate asynchronous operations

 SOCKET sock = WSASocket( AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, 0, WSA_FLAG_OVERLAPPED); PTP_IO io_item = CreateThreadpoolIo((HANDLE)sock, IoDoneCallback, this, io_pool); PTP_WORK process_item = CreateThreadpoolWork(WorkRoutine,this, processing_env);

The implementation of the IoDoneCallback (ReceiveDoneCallback) and WorkRoutine functions is similar to the implementations given for the previous API set. Those. You can reuse the already existing business logic for processing the intercepted data.

4. Initiate an asynchronous data read operation from the socket

 //,    /     StartThreadpoolIo(io_item) //  /. BOOL res = ReadFile((HANDLE)sock, buf, buf_len, &received, &ov);

Processing the results of the operation is similar to that described for the version with the I / O completion port, but with one feature. If we do not want to receive a callback in the pool for the case of a synchronous completion of the operation (and it will be executed “by default”), we need to mark the socket in a special way after its creation:

 SetFileCompletionNotificationModes((HANDLE), FILE_SKIP_COMPLETION_PORT_ON_SUCCESS)

In addition, it is important to remember that each I / O operation initiated must either be completed or canceled, i.e. if the operation was completed synchronously, with or without an error, you need to call:

 CancelThreadpoolIo(io_item);

5. We initiate the processing of the received data. The processing function is similar to the QueueUserWorkItem variant.

 //        AddReceivedDataToQueue(buf, buf_len); //,       // ,    If(!IsQueueProcessingAndMark()) { SubmitThreadpoolWork(processing_item); }

Advantages and disadvantages

The described API set is good for everyone, but it exists only in versions of the operating system starting from Windows Vista. For Windows XP and Windows Server 2003, you need to use the I / O completion ports and the old system pool. However, the interface of both options allows you to handle the intercepted data in the same way, so the codebase is the same, although it is built for different operating systems.

findings

Any high-quality software product should use the most efficient ways to solve technical problems from those provided by the operating system. The service of transparent proxying of our product has come a long way of development, and at the moment it is implemented, as it seems to me, as efficiently as possible. Hopefully, the conclusions from the path we have passed will help others to more quickly understand the technologies and make the right decision.

Source: https://habr.com/ru/post/346800/

All Articles

Basic API sets for implementing transparent proxying services

Sockets + WSA events

Advantages and disadvantages

Overlapped I / O + Thread Pool + Completion Ports

Advantages and disadvantages

Using the Vista Thread Pool API

Advantages and disadvantages

findings

More articles: