Optimization of the web service tips for postal addresses and name

In this article I would like to share the experience of developing a web service in C ++. In my opinion, this is quite an interesting topic, since using C ++ for web development is a rare thing and often causes confusion in IT circles. On the Internet, you can find many arguments against this approach. The use of pointers, memory leaks, segfolts, lack of support for web standards out of the box - this is an incomplete list of what we had to read before making a decision on the choice of this technology.

The development discussed in this article was completed in 2015, but the prerequisites for it appeared much earlier. It all started with the fact that in 2008 we had the idea to develop a web service for standardization and correction of user contact information, such as postal addresses and telephone numbers. The web service was supposed to receive through the REST API contact information, which was indicated by a certain user in an arbitrary text form, and to put this data in order. In essence, the service had to solve the problem of recognizing user contact data in an arbitrary text string. In addition, during such processing, the service had to correct typos in the addresses, restore the missing address components, and also reduce the processed data to a structured form. The service was developed for the needs of business users, for whom the correctness of customer contact information is a critical factor. First of all, these are online stores, delivery services, as well as CRM and MDM systems of large organizations.

In computational terms, the task was rather difficult, since unstructured text data is subject to processing. Therefore, all processing was implemented in C ++, while the application business logic was written in Perl and framed as a FastCGI server.

For six years, this service has worked successfully until we are faced with a new task, which forced us to reconsider the architecture of the solution. The new task was to form real-time prompts for user-entered email addresses, last names, first names and middle names.
')

Real-time processing

Forming hints in real time means that the service receives a new HTTP request from the user whenever he enters the next character of the mailing address or full name in the process of filling out some form with contact information. As part of the request, the service receives a text string entered by the user to the present moment, analyzes it and generates several of the most likely options for its completion. The user sees received prompts from the service and either selects the appropriate option or continues typing. In reality, it should look something like this.

This task differs from the standardization of the already entered contact information for which the service was originally designed, in that the same user generates an order of magnitude more requests during the filling out of the form. At the same time, the speed of processing these requests must exceed the speed with which the user types in the input data on the keyboard. Otherwise, the user will have time to enter all the data manually and no prompts will be needed.

To evaluate the acceptable response time, we conducted a series of experiments with adjustable delay. As a result, they came to the conclusion that prompts are no longer useful when the response time exceeds 150 ms. Our initial service architecture allowed us to remain within this framework while 40 users were working simultaneously (these figures were obtained for a server with two cores and 8GB of RAM). To increase this number, it is necessary to increase the number of processors for server hardware. And since the functions of prompts for postal addresses and full names were developed for their free use by everyone, we understood that processors and servers may be required significantly more. Therefore, the question arose whether it is possible to optimize the processing of requests by changing the architecture of the service.

Source architecture

The service architecture, which needed to be improved, had the following appearance.

According to this scheme, a user application (for example, a web browser) generates HTTP requests that the web server receives (in this case, the lighttpd lightweight web server is used). If in requests we deal not with a statics, then they are broadcast to an application server which is connected to the web server by means of the FastCGI interface (in our case, the application server is written in Perl). If requests concern the processing of contact data, they are passed on to the processing server. Sockets is used to interact with the processing server.

It can be noted that if in this scheme we replace the processing server with the database server, then we get a fairly common scheme used in traditional web applications developed using popular frameworks for Python or Ruby, as well as for PHP under the control of php_fpm.

This architecture seemed to be very successful, because it allows you to easily scale the service with increasing load, for this purpose, new processing servers are simply added. But since the performance left much to be desired, it was decided to measure the time that the service spends at different stages of processing the request. The result is the following chart.

This illustration shows how much time it takes as a percentage from the moment a request is sent and until the web client receives the answer, if the request passes through the entire processing chain or just some fragment of it. In this experiment, the client and server were located on the same local network.

For example, the first number in the diagram indicates that 25% of the time is spent sending the request from the web client, passing it through the web server path, and returning the response to the web client. Similarly, all other stages take into account both the passage of the request in the forward direction and the return of the answer along the same chain in the opposite direction. Namely, upon further promotion of the request, it enters through the FastCGI interface to the application server. Passing through this interface takes another 25% of the time.

Next, the request passes through the application server. It takes an extra 20% of the time. In our case, no processing of the request by the application server is performed. The application only performs the HTTP request parsing, passes it on to the processing server, receives the response from it and sends it back to the FastCGI interface. In fact, 20% of the time is spent on parsing the request and on the costs of the interpreter, since the application is implemented in a scripting language.

Another 20% of the time is spent on passing data through the socket interface, which is used to connect the application to the processing server. This interface is slightly faster compared to FastCGI (20% vs. 25%), since the corresponding protocol and its implementation are much simpler. Processing the request itself, which consists in forming prompts for user-entered data, takes only 10% of the total time (the tests used one of the most difficult, in terms of processing, requests).

I would like to emphasize that all the specificity of our task in the experiments performed is manifested only at the last stage, and it is this stage that, from the point of view of performance, raises the least of all questions. The remaining stages are very standard. So, we use an event-based web server, which simply retrieves the received request from a single socket associated with a listening HTTP port and puts this data into a FastCGI socket. Similarly, an application server — retrieves data from the FastCGI socket and transfers it to the socket of the processing server. In the application itself, there is nothing to optimize.

The depressing picture, in which only 10% of the response time accounted for efficiencies, made us think about a change of architecture.

New service architecture

To eliminate the costs in the original architecture, you should ideally get rid of the application in an interpreted language, as well as eliminate socket interfaces. At the same time it is necessary to maintain the possibility of scaling the service. We considered the following options.

Event Application Server

Within the framework of this option, the possibility of implementing an event-based application server was considered, for example, on Node.js or Twisted. In such an implementation, the number of socket interfaces through which the requests pass remains the same, since each request goes to the balancing web server, it forwards it to one of the application server instances, which in turn transmits the request to the processing server. The total request processing time remains the same. However, the number of simultaneously processed requests increases due to asynchronous use of sockets. Roughly speaking, while one request is in the process of passing through a socket interface, another request may pass through the business logic of the application within the same instance.

We had to abandon this implementation, since we considered it unreasonable to implement a fully asynchronous application just to remove one bottleneck in the old architecture - the socket interface between the application and the processing server. The remaining I / O operations, such as logging, capturing user statistics, sending mail and interacting with other services in the old application, were deferred in separate streams, so they did not require asynchrony. In addition, this architecture does not allow to reduce the processing time of a single request, so that user applications that work with the service through the API, will not receive any performance gain.

Application and Web Server Integration

Here, the application implementation was considered as a Java servlet or .Net application, which is directly called by the web server. In this case, you can get rid of the FastCGI interface, and at the same time from the interpreted language. The socket interface with the processing server is preserved.

Making a decision not in favor of this approach was affected by linking the entire solution to a specific web server that should support the selected technology. For example, Tomcat for Java servlets or Microsoft IIS when using .Net. We wanted to keep the compatibility of the application with the lightweight servers lighttpd and nginx.

Application Integration with Processing Server

In this case, there is no binding to a specific web server, since the FastCGI interface is preserved. The application is implemented in C ++ and is combined with the processing server. Thus, we move away from the use of an interpreted language, and also eliminate the socket interface between the application and the processing server.

The disadvantage of this approach can be attributed to the lack of a fairly popular and run-in framework for large projects. Of the candidates we considered CppCMS, TreeFrog and Wt. On the first part, we had concerns about the future support of the project by its developers, since there were no recent updates on the project website. TreeFrog is based on Qt. We actively use this library in offline projects, however, we considered it redundant and insufficiently reliable for the task. As part of the Wt - framework has a great emphasis on the GUI, whereas in our case the GUI is a minor thing. An additional factor in refusing to use these frameworks was the desire to minimize the risks associated with the use of third-party libraries, without which, in principle, you can do, because in this case there was a reworking of the existing working service that did not want to be broken due to an insufficiently well-established third-party library.

At the same time, the very fact of the existence of such projects prompted the idea that the development of web applications in C ++ is not so hopeless. Therefore, it was decided to study the existing libraries that could be used in developing a web application in C ++.

Available libraries

To interact with the web server, the application must implement one of the HTTP, FastCGI or SCGI protocols supported by the web server. We stopped at FastCGI and its implementation in the form of libfcgi.

For parsing HTTP requests and generating HTTP responses, the cgicc library approached us. This library assumes all concerns for parsing HTTP headers, retrieving request parameters, decoding the body of the received message, as well as generating an HTTP response.

To parse the XML requests that may come from the users of the service within the REST API, Xerces was chosen.

In C ++, there is no unicode support out of the box, so it was decided to use standard STL strings for working with text, subject to the obligatory observance of an internal agreement, that all string data should always be represented in UTF-8.

To interact with external services and mail servers, it was decided to use libcurl, and openssl for generating hashes.

Self-written components

To generate html representations, we needed a simple template engine. For this purpose, the old implementation of the service used HTML :: Template, so when switching to C ++ you needed a template engine with similar syntax and similar features. We tried to work with CTPP, Clearsilver and Google-ctemplate.

CTPP was inconvenient to use, because before using the template it is necessary to distill into a binary code, and then create a virtual machine that will execute it. All these difficulties make the code unnecessarily cumbersome.

In Clearsilver, the entire interface is implemented in pure C and to use it, it was necessary to write an impressive object wrapper. Well, Google-ctemplate did not cover all the features of HTML :: Template, which were used in the old version of the service. For its full use it would be necessary to change the logic responsible for the formation of ideas. Therefore, in the case of the template engine, I had to develop my own bicycle, which was done.

Developing our own C ++ template took about three days, whereas we spent twice as much time searching and studying the ready-made solutions mentioned above. In addition, its template engine allowed us to extend the HTML :: Template syntax, adding the “else if” construct, as well as the variable comparison operators with the values predefined in the template.

Session management also had to be implemented independently. This is due to the specificity of the service being developed, since the session in our case stores quite a lot of information reflecting the user's behavior in real time. The fact is that in addition to processing data through the REST API, ordinary users often access the service as a reference service, for example, when they need to know the postal code for a given address. From time to time among users there are those who decide to automate the standardization of their contact information by developing a web bot that simulates a person’s work in the browser, instead of using the intended REST API. Such bots create a useless load on the service, which affects the work of other users. To combat bots, the service within the sessions accumulates information reflecting the behavior of users. This information is subsequently used by a separate service module responsible for recognizing and blocking bots.

Perhaps the key standard that we had to implement independently is JSON. In C ++, there are quite a few of its open implementations, which we analyzed before creating another one. The main reason for creating your own implementation is to use JSON in conjunction with a non-standard memory allocator, which was used on the processing server to speed up the operations of dynamic allocation and freeing memory. This allocator works 2-3 times faster than the standard for mass operations of allocation / release of small blocks. Since working with JSON fits into this pattern, we wanted to get a free performance boost on all operations related to parsing and building JSON objects.

Final result

The architecture of the final solution that we got is shown in the following diagram.

Within the framework of a monolithic server, both the application logic and the processing of contact data are combined. To handle incoming requests on the server provides a pool of threads. All I / O operations that need to be performed during the processing of API requests are deferred. For these purposes, a separate pool of threads is created on the server, responsible for performing asynchronous I / O. Such operations include, for example, updating user statistics, as well as writing off money, in the case of using paid API functions. In both cases, you need to write to the database, the execution of which in the main thread would lead to its blocking.

This architecture allows the service to be scaled by running additional instances of the monolithic server; in this case, the web server is endowed with the additional role of a balancer.

According to the diagram given earlier, when switching to a new architecture, the response time of the service when processing a single request should have been reduced by about 40%. Real experiments have shown that the reduction occurred by 43%. This can be explained by the fact that the monolithic solution has become more efficient use of RAM.

We also conducted load testing to determine the number of users that a new service can serve while simultaneously using prompts, while ensuring a response time no higher than 150 ms. In this mode, the service was able to provide simultaneous operation of 120 users. Recall that for the old implementation this value was 40. In this case, a threefold increase in productivity is due to the reduction in the total number of processes involved in servicing the request flow. Previously, requests were processed by multiple instances of the application (in experiments, the number of instances ranged from 5 to 20), whereas in the new version of the service all requests are processed within a single multi-threaded process. While each instance works with its own separate memory, they collectively compete for a single processor cache, the use of which becomes less efficient. In the case of one monolithic process there is no such competition.

Conclusion

This article has been considered non-standard approach to the development of web services, when you want to ensure the processing of requests in real time. By the example of the task of forming prompts, an unusual situation for web services was demonstrated, when an increase in response time makes the service functionality virtually useless for the user. The example shows that the emergence of such requirements may lead to significant changes in the architecture.

To improve performance, we had to combine the application server and the data processing server into a single monolithic server implemented in C ++. This solution halved the response time when processing single requests, and also increased the service performance by three times with mass use.

In addition to solving the main task, pleasant bonuses to the work done were simplification of refactoring, since strict typing allows you not to strain about the renaming in the code, since The project simply does not meet in case of errors. Also, the resulting project has become easier to accompany as a whole, since we have a single server, whose business logic and data processing logic is written in one language.

Source: https://habr.com/ru/post/304590/

All Articles