A comparison was made of the performance of the cores of HTTP servers built using seven C / C ++ libraries, as well as (for educational purposes) other ready-made solutions in this area (nginx and node.js).
The HTTP server is a complex and interesting mechanism. There is an opinion that a programmer who did not write his own compiler is bad, I would replace “compiler” with “HTTP-server”: this is a parser, and work with a network, and asynchrony with multithreading and
much more ....
')
Tests on all possible parameters (static return, dynamics, various encryption modules, proxies, etc.) are not only one month of hard work, so the task is simplified: we will compare the performance of the cores. The core of an HTTP server (like any network application) is a socket event dispatcher and some primary mechanism for processing them (implemented as a pool of threads, processes, etc.). This also includes the HTTP parser and response generator. At first glance, everything should come down to testing the capabilities of one or another system mechanism for handling asynchronous events (select, epoll, etc.), their meta-wrappers (libev, boost.asio, etc.) and the OS kernel, but the specific implementation in the form of a ready-made solution gives a significant difference in performance.
The version of the HTTP server on
libev was implemented . Of course, support is provided for a small subset of the requirements of the notorious
rfc2616 (it is unlikely that it is fully implemented by at least one HTTP server), only the necessary minimum to meet the requirements for participants in this test,
- Listen to requests on port 8000;
- Check method (GET);
- Check the path in the request (/ answer);
- The answer should contain:
HTTP / 1.1 200 OK
Server: bench
Connection: keep-alive
Content-Type: text / plain
Content-Length: 2
42
- To any other method \ path - the answer should be returned with error code 404 (page not found).
As you can see, there are no extensions, access to files on the disk, gateway interfaces, etc. - everything is as simple as possible.
In cases when the server does not support keep-alive connections (by the way, only cpp-netlib was the only one distinguished by this), the testing was done in acc. mode.
Prehistory
Initially, the task was to implement an HTTP server with a load of hundreds of millions of hits per day. It was assumed that there would be a relatively small number of customers generating 90% of requests, and a large number of customers generating the remaining 10%. Each request must be sent further to several other servers, collect responses and return the result to the client. The success of the project depended on the speed and quality of the response. Therefore, it was not possible to simply take and use the first available ready-made solution. It was necessary to get answers to the following questions:
- Should I reinvent my bike or use existing solutions?
- Is node.js suitable for high-load projects?
If yes, then throw out thickets of C ++ code and rewrite everything in 30 lines on JS.
There were also less significant issues, for example, does HTTP keep-alive affect performance? (a year later, the answer was voiced
here - it affects, and quite significantly).
Of course, my bike was first invented, then node.js appeared (I found out about it two years ago), and then I wanted to know how much the existing solutions were more efficient than my own, wasn’t it wasted time? Actually, this is how this post appeared.
Training
Iron
- Processor: CPU: AMD FX (tm) -8120 Eight-Core Processor
- Network: localhost (why see TODO)
Soft
- OS: FreeBSD 9.1-RELEASE-p7
Tuning
Usually in load testing network applications, it is customary to change the following standard set of settings:
/etc/sysctl.confkern.ipc.somaxconn = 65535
net.inet.tcp.blackhole = 2
net.inet.udp.blackhole = 1
net.inet.ip.portrange.randomized = 0
net.inet.ip.portrange.first = 1024
net.inet.ip.portrange.last = 65535
net.inet.icmp.icmplim = 1000
/boot/loader.confkern.ipc.semmni = 256
kern.ipc.semmns = 512
kern.ipc.semmnu = 256
kern.ipc.maxsockets = 999999
kern.ipc.nmbclusters = 65535
kern.ipc.somaxconn = 65535
kern.maxfiles = 999999
kern.maxfilesperproc = 999999
kern.maxvnodes = 999999
net.inet.tcp.fast_finwait2_recycle = 1
However, in my testing, they did not lead to an increase in performance, and in some cases even led to a significant slowdown, so in the final tests no changes were made to the system settings (i.e. all the default settings, the GENERIC core).
Members
Library
Name | Version | Developments | Keep-alive support | Mechanism |
---|
cpp-netlib | 0.10.1 | Boost.Asio | not | multithreaded |
hand-made | 1.11.30 | libev | Yes | multiprocess (one thread per process), asynchronous |
libevent | 2.0.21 | libevent | Yes | single-threaded *, asynchronous |
mongoose | 5.0 | select | Yes | single-threaded, asynchronous, with a list (more) |
onion | 0.5 | libev | Yes | multithreaded |
Pion network library | 0.5.4 | Boost.Asio | Yes | multithreaded |
POCO C ++ Libraries | 1.4.3 | select | Yes | multi-threaded (separate stream for incoming connections), with a queue (more) |
Turnkey solutions
Name | Version | Developments | Keep-alive support | Mechanism |
---|
Node.js | 0.10.17 | libuv | Yes | cluster module (multiprocess processing) |
nginx | 1.4.4 | epoll, select, kqueue | Yes | multiprocess processing |
* for tests reworked according to the scheme “multiprocess - one process one thread”
Disqualified
As a client, an application from the developers of lighttpd -
weighttpd was used . It was originally planned to use httperf as a more flexible tool, but it is constantly falling. In addition, weighttpd is based on libev, which is much better suited for FreeBSD than httperf with select. As the main test script (wrappers over weighttpd with the calculation of resource consumption, etc.), the
gwan-ovsky ab.c , converted to FreeBSD, was considered, but was later rewritten from scratch on Python (bench.py ​​in the appendix).
The client and server were running on the same physical machine.
As variable values ​​were used:
- Number of server threads (1, 2 and 3)
- The number of parallel open customer requests (10, 100, 200, 400, 800)
In each configuration, 20-30 iterations were performed, 2 million requests per iteration.
results
In the first version of the article gross violations were made in the testing methodology, which was indicated in the comments by users of
VBart and
wentout . So, in particular, the strict separation of tasks by processor cores was not used, the total number of server / client threads exceeded the permissible norms. Also, the options affecting the measurement results (AMD Turbo Core) were not disabled, measurement errors were not indicated. The current version of the article uses the approach described
here .
For servers running in single-threaded mode, the following results were obtained (maximum medians for server / client stream combinations were taken):
A place | Name | Client. streams | % time | Requests |
---|
User | Syst. | Successful (in sec.) | Unsuccessful (%) |
---|
one | nginx | 400 | ten | ten | 101210 | 0 |
2 | mongoose | 200 | 12 | 15 | 53255 | 0 |
3 | libevent | 200 | sixteen | 33 | 39882 | 0 |
four | hand-made | 100 | 20 | 32 | 38550 | 0 |
five | onion | ten | 22 | 33 | 29230 | 0 |
6 | Poco | ten | 25 | 50 | 20943 | 0 |
7 | pion | ten | 24 | 83 | 16526 | 0 |
eight | node.js | ten | 23 | 173 | 9374 | 0 |
9 | cpp-netlib | ten | 100 | 183 | 5362 | 0 |
Scalability:
In theory, if there were more cores, we would observe a linear increase in productivity. Unfortunately, it is not possible to verify the theory - there are not enough cores.
nginx, frankly, surprised - because in fact it is a ready-made, multifunctional, modular solution, and the results exceeded the highly specialized libraries by an order of magnitude. Respect
mongoose is still raw, version 5.0 is not run in and the branch is in active development.
cpp-netlib showed the worst result. Not only did he not only support HTTP keep-alive connections, it also fell somewhere in the bowels of the boost, it was problematic to perform all iterations in a row. Definitely, the solution is raw, the documentation is outdated. Legitimate last place.
node.js already scolded
here , I will not be so categorical, but the V8 is still sawed and sawed. What is this high-load solution that, even without a payload, so greedily consumes resources and gives out 10-20% of the performance of top testing participants?
HTTP keep-alive on / off: if in a
post the difference reached x2 times, then in my tests the difference was up to x10.
Ministat error: No difference proven at 95.0% confidence.
Todo
- benchmark mode "client and server on different machines." You need to be careful - everything can rest against network glands, and not only network card models, but switches, routers, etc. - the entire infrastructure between real machines. For starters, you can try a direct connection;
- Testing client HTTP API (organize as a server and proxy). The problem is that not all libraries provide an API for implementing an HTTP client. On the other hand, some popular libraries (libcurl, for example) provide an exclusively client-side API set;
- use other HTTP clients. httperf was not used for the above reasons, ab - for many reviews is outdated and does not hold real loads. Many recommended. Here are a couple of dozen solutions, some of them would be worth comparing;
- similar benchmark in a Linux environment. This should be an interesting topic (at least - a new wave for holivarny discussions);
- run tests on top-end Intel Xeon with a bunch of cores.
Links
Stress-testing httperf, siege, apache benchmark, and pronk are HTTP clients for load testing servers.
Performance Testing with Httperf - tips and
tricks for benchmarking.
ApacheBench & HTTPerf - description of the benchmark process from G-WAN.
Warp is another high-load HTTP server with a complaint, Haskell.
application
In the
appendix, you will find the source code and the results of all iterations of testing, as well as detailed information on building and installing HTTP servers.