⬆️ ⬇️

Benchmark of HTTP servers (C / C ++) on FreeBSD





A comparison was made of the performance of the cores of HTTP servers built using seven C / C ++ libraries, as well as (for educational purposes) other ready-made solutions in this area (nginx and node.js).



The HTTP server is a complex and interesting mechanism. There is an opinion that a programmer who did not write his own compiler is bad, I would replace “compiler” with “HTTP-server”: this is a parser, and work with a network, and asynchrony with multithreading and much more ....

')

Tests on all possible parameters (static return, dynamics, various encryption modules, proxies, etc.) are not only one month of hard work, so the task is simplified: we will compare the performance of the cores. The core of an HTTP server (like any network application) is a socket event dispatcher and some primary mechanism for processing them (implemented as a pool of threads, processes, etc.). This also includes the HTTP parser and response generator. At first glance, everything should come down to testing the capabilities of one or another system mechanism for handling asynchronous events (select, epoll, etc.), their meta-wrappers (libev, boost.asio, etc.) and the OS kernel, but the specific implementation in the form of a ready-made solution gives a significant difference in performance.



The version of the HTTP server on libev was implemented . Of course, support is provided for a small subset of the requirements of the notorious rfc2616 (it is unlikely that it is fully implemented by at least one HTTP server), only the necessary minimum to meet the requirements for participants in this test,



  1. Listen to requests on port 8000;
  2. Check method (GET);
  3. Check the path in the request (/ answer);
  4. The answer should contain:

                 HTTP / 1.1 200 OK
                 Server: bench
                 Connection: keep-alive
                 Content-Type: text / plain
                 Content-Length: 2
                 42
            


  5. To any other method \ path - the answer should be returned with error code 404 (page not found).


As you can see, there are no extensions, access to files on the disk, gateway interfaces, etc. - everything is as simple as possible.

In cases when the server does not support keep-alive connections (by the way, only cpp-netlib was the only one distinguished by this), the testing was done in acc. mode.



Prehistory



Initially, the task was to implement an HTTP server with a load of hundreds of millions of hits per day. It was assumed that there would be a relatively small number of customers generating 90% of requests, and a large number of customers generating the remaining 10%. Each request must be sent further to several other servers, collect responses and return the result to the client. The success of the project depended on the speed and quality of the response. Therefore, it was not possible to simply take and use the first available ready-made solution. It was necessary to get answers to the following questions:

  1. Should I reinvent my bike or use existing solutions?
  2. Is node.js suitable for high-load projects? If yes, then throw out thickets of C ++ code and rewrite everything in 30 lines on JS.


There were also less significant issues, for example, does HTTP keep-alive affect performance? (a year later, the answer was voiced here - it affects, and quite significantly).



Of course, my bike was first invented, then node.js appeared (I found out about it two years ago), and then I wanted to know how much the existing solutions were more efficient than my own, wasn’t it wasted time? Actually, this is how this post appeared.



Training



Iron



Soft



Tuning

Usually in load testing network applications, it is customary to change the following standard set of settings:

/etc/sysctl.conf
kern.ipc.somaxconn = 65535

net.inet.tcp.blackhole = 2

net.inet.udp.blackhole = 1

net.inet.ip.portrange.randomized = 0

net.inet.ip.portrange.first = 1024

net.inet.ip.portrange.last = 65535

net.inet.icmp.icmplim = 1000



/boot/loader.conf
kern.ipc.semmni = 256

kern.ipc.semmns = 512

kern.ipc.semmnu = 256

kern.ipc.maxsockets = 999999

kern.ipc.nmbclusters = 65535

kern.ipc.somaxconn = 65535

kern.maxfiles = 999999

kern.maxfilesperproc = 999999

kern.maxvnodes = 999999

net.inet.tcp.fast_finwait2_recycle = 1



However, in my testing, they did not lead to an increase in performance, and in some cases even led to a significant slowdown, so in the final tests no changes were made to the system settings (i.e. all the default settings, the GENERIC core).



Members



Library

NameVersionDevelopmentsKeep-alive supportMechanism
cpp-netlib0.10.1Boost.Asionotmultithreaded
hand-made1.11.30libevYesmultiprocess (one thread per process), asynchronous
libevent2.0.21libeventYessingle-threaded *, asynchronous
mongoose5.0selectYessingle-threaded, asynchronous, with a list (more)
onion0.5libevYesmultithreaded
Pion network library0.5.4Boost.AsioYesmultithreaded
POCO C ++ Libraries1.4.3selectYesmulti-threaded (separate stream for incoming connections), with a queue (more)


Turnkey solutions

NameVersionDevelopmentsKeep-alive supportMechanism
Node.js0.10.17libuvYescluster module (multiprocess processing)
nginx1.4.4epoll, select, kqueueYesmultiprocess processing


* for tests reworked according to the scheme “multiprocess - one process one thread”



Disqualified

NameCause
nxwebonly linux
g-wanonly Linux (and generally ... )
libmicrohttpdconstant drops under load
yieldcompilation errors
EHScompilation errors
libhttpdsynchronous, HTTP / 1.0, does not change the headers
libebbcompilation errors


As a client, an application from the developers of lighttpd - weighttpd was used . It was originally planned to use httperf as a more flexible tool, but it is constantly falling. In addition, weighttpd is based on libev, which is much better suited for FreeBSD than httperf with select. As the main test script (wrappers over weighttpd with the calculation of resource consumption, etc.), the gwan-ovsky ab.c , converted to FreeBSD, was considered, but was later rewritten from scratch on Python (bench.py ​​in the appendix).



The client and server were running on the same physical machine.

As variable values ​​were used:



In each configuration, 20-30 iterations were performed, 2 million requests per iteration.



results



In the first version of the article gross violations were made in the testing methodology, which was indicated in the comments by users of VBart and wentout . So, in particular, the strict separation of tasks by processor cores was not used, the total number of server / client threads exceeded the permissible norms. Also, the options affecting the measurement results (AMD Turbo Core) were not disabled, measurement errors were not indicated. The current version of the article uses the approach described here .



For servers running in single-threaded mode, the following results were obtained (maximum medians for server / client stream combinations were taken):

A placeNameClient. streams% timeRequests
UserSyst.Successful (in sec.)Unsuccessful (%)
onenginx400tenten1012100
2mongoose2001215532550
3libevent200sixteen33398820
fourhand-made1002032385500
fiveonionten2233292300
6Pocoten2550209430
7pionten2483165260
eightnode.jsten2317393740
9cpp-netlibten10018353620


Scalability:



In theory, if there were more cores, we would observe a linear increase in productivity. Unfortunately, it is not possible to verify the theory - there are not enough cores.



nginx, frankly, surprised - because in fact it is a ready-made, multifunctional, modular solution, and the results exceeded the highly specialized libraries by an order of magnitude. Respect



mongoose is still raw, version 5.0 is not run in and the branch is in active development.



cpp-netlib showed the worst result. Not only did he not only support HTTP keep-alive connections, it also fell somewhere in the bowels of the boost, it was problematic to perform all iterations in a row. Definitely, the solution is raw, the documentation is outdated. Legitimate last place.



node.js already scolded here , I will not be so categorical, but the V8 is still sawed and sawed. What is this high-load solution that, even without a payload, so greedily consumes resources and gives out 10-20% of the performance of top testing participants?



HTTP keep-alive on / off: if in a post the difference reached x2 times, then in my tests the difference was up to x10.



Ministat error: No difference proven at 95.0% confidence.



Todo







Links



Stress-testing httperf, siege, apache benchmark, and pronk are HTTP clients for load testing servers.

Performance Testing with Httperf - tips and tricks for benchmarking.

ApacheBench & HTTPerf - description of the benchmark process from G-WAN.

Warp is another high-load HTTP server with a complaint, Haskell.



application



In the appendix, you will find the source code and the results of all iterations of testing, as well as detailed information on building and installing HTTP servers.

Source: https://habr.com/ru/post/207460/



All Articles