For those who do not know: LAppS - Lua Application Server , it is almost like nginx or apache, but only for the WebSocket protocol, instead of HTTP.
HTTP in it is supported only at the level of Upgrade request.
LAppS initially sharpened on high load and vertical scalability, and today I reached the peak of my capabilities on my hardware (well, almost, you can further optimize, but it will be a long and hard work).
Most importantly, LAppS in WebSocket performance on the stack exceeded the uWebSockets library, which is positioned as the fastest WebSocket implementation.
Interested please under the cat.
A couple of months have already passed since my last article about LAppS , and that article did not cause any interest. I hope this article will seem more interesting to habrovchanam. LAppS during this time has done a rather difficult path to version 0.7.0, has acquired functionality and has grown in terms of performance (which was promised earlier).
One of the features that has appeared: the loadable module with the implementation of the client part of the WebSocket protocol, is cws.
Thanks to this module, I was finally able to squeeze everything from my home computer, and load LAppS for real.
Previously, testing was performed using the websocketpp library client echo (for more details, see the project's github page), which is not only slow, but also difficult to parallelize. The tests were performed simply: a bunch of clients were started, the results from each client were collected using awk and simple arithmetic showed performance results. The results were as follows:
Server | Number of customers | RPS server | RPS per customer | payload (bytes) |
---|---|---|---|---|
LAppS 0.7.0 | 240 | 84997 | 354.154 | 128 |
uWebSockets (latest) | 240 | 74172.7 | 309.053 | 128 |
LAppS 0.7.0 | 240 | 83627.4 | 348.447 | 512 |
uWebSockets (latest) | 240 | 71024.4 | 295.935 | 512 |
LAppS 0.7.0 | 240 | 79270.1 | 330.292 | 1024 |
uWebSockets (latest) | 240 | 66499.8 | 277.083 | 1024 |
LAppS 0.7.0 | 240 | 51621 | 215.087 | 8192 |
uWebSockets (latest) | 240 | 45341.6 | 188.924 | 8192 |
In this test, as in the subsequent ones, the number of packages in sm business is twice as high, since measurement is performed on on_message and in the client's on_message method, a new package of the same size is sent. Those. the client's request and the server's response are the same size, and if you count the amount of traffic processed by the server, then you need to double the RPS result multiplied by payload and neglecting the headers you can get the approximate amount of traffic in bytes.
Obviously, with 240 client processes running simultaneously, the LAppS itself (like uWebSockets) doesn’t have that many CPU resources.
I looked through several client implementations for WebSocket under Lua, and unfortunately I did not find a simple and sufficiently productive module with which I could load LAppS properly. Therefore, as usual made my bike.
The module has a fairly simple interface and imitates the behavior of the browser WebSocket API.
A simple example of how to work with this module (service for receiving transactions with BitMEX):
bitmex={} bitmex.__index=bitmex bitmex.init=function() end - bitmex.run=function() -- BitMEX local websocket,errmsg=cws:new( "wss://www.bitmex.com/realtime", { ["onopen"]=function(handler) -- WebSocket local result, errstr=cws:send(handler,[[{"op": "subscribe", "args": ["orderBookL2:XBTUSD"]}]],1); -- 1 (OpCode 1 - ) if(not result) -- , - then print("Error on websocket send at handler "..handler..": "..errstr); end end, ["onmessage"]=function(handler,message,opcode) print(message) -- BitMEX . end, ["onerror"]=function(handler, message) -- print(message..". Socket FD: "..handler); end, ["onclose"]=function(handler) -- print("WebSocket "..handler.." is closed by peer."); end }); if(websocket == nil) -- then print(errmsg) else while not must_stop() do cws:eventLoop(); -- poll end end end return bitmex;
Immediately I warn you, the module appeared only today and it is poorly tested.
For testing, I wrote a simple service for LAppS and called it the same simple benchmark .
At the start, this service creates 100 connections to the echo WebSocket server (no matter which one), and upon a successful connection, sends a 1kb message. When receiving a message from the server, it sends it back.
My home computer: Intel® Core (TM) i7-7700 CPU @ 3.60GHz, microcode 0x5e
Memory: DDRM DIM4 Synchronized Unbuffered (Unregistered) 2400 MHz (0.4 ns), Kingston KHX2400C15 / 16G
All testing was conducted on this localhost.
Echo service configuration in LAppS:
"echo": { "auto_start": true, "instances": 2, "internal": false, "max_inbound_message_size": 16777216, "preload": null, "protocol": "raw", "request_target": "/echo" }
The instances parameter requires LAppS to start two parallel echo services.
Benchmark service (client) configuration:
"benchmark" : { "auto_start" : true, "instances": 4, "internal": true, "preload" : [ "cws", "time" ] }
Te creates 4 instances of the benchmark service at startup.
Server | Number of customers | RPS server | RPS per customer | payload (bytes) |
---|---|---|---|---|
LAppS 0.7.0-Upstream | 400 | 257828 | 644.57 | 1024 |
nginx & lua-resty-websocket 4 workers | 400 | 33788 | 84.47 | 1024 |
websocketpp | 400 | 9789.52 | 24.47 | 1024 |
Testing uWebSockets has not yet succeeded - TLS handshake swears at SSLv3 (my client uses TLSv1.2 and the libreSSL SSLv3 I use is cut out).
Server | Number of customers | RPS server | RPS per customer | payload (bytes) |
---|---|---|---|---|
LAppS 0.7.0-upstream | 400 | 439700 | 1099.25 | 1024 |
uWebSockets-upstream | 400 | 247549 | 618.87 | 1024 |
Why in the header of "half a million" messages, and in test 257828? Because there are twice as many messages (as explained above).
uWebsockets, shows unenviable results in this test, only because it works on the 1st core, the multi-threaded version of the uWebSockets from the project repository does not actually work, and when TLS is enabled, it has a data-race in the OpenSSL stack.
If we imagine that uWebSockets works fine on 2 cores (like 2 LAppS echo services), then it can be conditionally set off as 495098 RPS (simply double the result from the table).
But keep in mind that the echo server ( uWebSockets ) does not do anything with the received data, but immediately sends it back. LAppS transfers the data to the Lua stack corresponding to the service.
All this can be found on the project wiki page.
Well, for a snack, for connoisseurs, what does LAppS actually do during this test.
Without TLS
iptables. 4.98% lapps [ip_tables] [k] ipt_do_table 3.80% lapps [kernel.vmlinux] [.] syscall_return_via_sysret Lua 3.52% lapps libluajit-5.1.so.2.0.5 [.] lj_str_new WebSocket 1.96% lapps lapps [.] WSStreamProcessing::WSStreamServerParser::parse 1.88% lapps [kernel.vmlinux] [k] copy_user_enhanced_fast_string 1.81% lapps [kernel.vmlinux] [k] __fget 1.61% lapps [kernel.vmlinux] [k] tcp_ack 1.49% lapps [kernel.vmlinux] [k] _raw_spin_lock_irqsave 1.48% lapps [kernel.vmlinux] [k] sys_epoll_ctl 1.45% lapps [xt_tcpudp] [k] tcp_mt LAppS 1.35% lapps lapps [.] LAppS::IOWorker<false, true>::execute 1.28% lapps lapps [.] cws_eventloop ... 1.27% lapps [nf_conntrack] [k] __nf_conntrack_find_get.isra.11 1.14% lapps [kernel.vmlinux] [k] __inet_lookup_established 1.14% lapps libluajit-5.1.so.2.0.5 [.] lj_BC_TGETS C++ 1.01% lapps lapps [.] LAppS::Application<false, true, (abstract::Application::Protocol)0>::execute ... 0.98% lapps [kernel.vmlinux] [k] ep_send_events_proc 0.98% lapps [kernel.vmlinux] [k] tcp_recvmsg 0.96% lapps libc-2.26.so [.] __memmove_avx_unaligned_erms 0.93% lapps libc-2.26.so [.] malloc 0.92% lapps [kernel.vmlinux] [k] tcp_transmit_skb 0.88% lapps [kernel.vmlinux] [k] sock_poll 0.85% lapps [nf_conntrack] [k] nf_conntrack_in 0.83% lapps [nf_conntrack] [k] tcp_packet 0.79% lapps [kernel.vmlinux] [k] do_syscall_64 0.78% lapps [kernel.vmlinux] [k] ___slab_alloc 0.78% lapps [kernel.vmlinux] [k] _raw_spin_lock_bh 0.73% lapps libc-2.26.so [.] _int_free 0.69% lapps [kernel.vmlinux] [k] __slab_free 0.66% lapps libcryptopp.so.5.6.5 [.] CryptoPP::Rijndael::Base::UncheckedSetKey 0.66% lapps [kernel.vmlinux] [k] tcp_write_xmit 0.65% lapps [kernel.vmlinux] [k] sock_def_readable 0.65% lapps [kernel.vmlinux] [k] tcp_sendmsg_locked 0.64% lapps libc-2.26.so [.] vfprintf ( - bemchmark) 0.64% lapps lapps [.] LAppS::ClientWebSocket::send ... 0.64% lapps [kernel.vmlinux] [k] tcp_v4_rcv 0.63% lapps [kernel.vmlinux] [k] __alloc_skb 0.61% lapps lapps [.] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 0.61% lapps [kernel.vmlinux] [k] _raw_spin_lock 0.60% lapps libc-2.26.so [.] __memset_avx2_unaligned_erms 0.60% lapps [kernel.vmlinux] [k] kmem_cache_alloc_node 0.59% lapps libluajit-5.1.so.2.0.5 [.] lj_tab_get 0.59% lapps [kernel.vmlinux] [k] __local_bh_enable_ip 0.58% lapps [kernel.vmlinux] [k] __dev_queue_xmit 0.57% lapps [kernel.vmlinux] [k] nf_hook_slow 0.55% lapps [kernel.vmlinux] [k] ep_poll_callback 0.55% lapps [kernel.vmlinux] [k] skb_release_data 0.54% lapps [kernel.vmlinux] [k] native_queued_spin_lock_slowpath 0.54% lapps libc-2.26.so [.] cfree@GLIBC_2.2.5 0.53% lapps [kernel.vmlinux] [k] ip_finish_output2 0.49% lapps libluajit-5.1.so.2.0.5 [.] lj_BC_RET 0.49% lapps libc-2.26.so [.] __strlen_avx2 0.48% lapps [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
C find 10 differences when working with TLS
3.73% lapps [kernel.vmlinux] [k] syscall_return_via_sysret 3.49% lapps libcrypto.so.43.0.1 [.] gcm_ghash_clmul 3.42% lapps libcrypto.so.43.0.1 [.] aesni_ctr32_encrypt_blocks 2.74% lapps [ip_tables] [k] ipt_do_table 2.17% lapps libluajit-5.1.so.2.0.5 [.] lj_str_new 1.41% lapps libpthread-2.26.so [.] __pthread_mutex_lock 1.34% lapps libssl.so.45.0.1 [.] tls1_enc 1.32% lapps [kernel.vmlinux] [k] __fget 1.16% lapps libcrypto.so.43.0.1 [.] getrn 1.06% lapps libc-2.26.so [.] __memmove_avx_unaligned_erms 1.06% lapps lapps [.] WSStreamProcessing::WSStreamServerParser::parse 1.05% lapps [kernel.vmlinux] [k] tcp_ack 1.02% lapps [kernel.vmlinux] [k] copy_user_enhanced_fast_string 1.02% lapps [nf_conntrack] [k] __nf_conntrack_find_get.isra.11 0.98% lapps lapps [.] cws_eventloop 0.98% lapps [kernel.vmlinux] [k] native_queued_spin_lock_slowpath 0.93% lapps libcrypto.so.43.0.1 [.] aead_aes_gcm_open 0.92% lapps lapps [.] LAppS::IOWorker<true, true>::execute 0.91% lapps [kernel.vmlinux] [k] tcp_recvmsg 0.89% lapps [kernel.vmlinux] [k] sys_epoll_ctl 0.88% lapps libcrypto.so.43.0.1 [.] aead_aes_gcm_seal 0.84% lapps [kernel.vmlinux] [k] do_syscall_64 0.82% lapps [kernel.vmlinux] [k] __inet_lookup_established 0.82% lapps [kernel.vmlinux] [k] tcp_transmit_skb 0.79% lapps libpthread-2.26.so [.] __pthread_mutex_unlock_usercnt 0.77% lapps [kernel.vmlinux] [k] _raw_spin_lock_irqsave 0.76% lapps [xt_tcpudp] [k] tcp_mt 0.71% lapps libcrypto.so.43.0.1 [.] aesni_encrypt 0.70% lapps [kernel.vmlinux] [k] _raw_spin_lock 0.67% lapps [kernel.vmlinux] [k] ep_send_events_proc 0.66% lapps libcrypto.so.43.0.1 [.] ERR_clear_error 0.63% lapps [kernel.vmlinux] [k] sock_def_readable 0.62% lapps lapps [.] LAppS::Application<true, true, (abstract::Application::Protocol)0>::execute 0.61% lapps libc-2.26.so [.] malloc 0.61% lapps [nf_conntrack] [k] nf_conntrack_in 0.58% lapps libssl.so.45.0.1 [.] ssl3_read_bytes 0.58% lapps libluajit-5.1.so.2.0.5 [.] lj_BC_TGETS 0.57% lapps [kernel.vmlinux] [k] tcp_write_xmit 0.56% lapps libssl.so.45.0.1 [.] do_ssl3_write 0.55% lapps [kernel.vmlinux] [k] __netif_receive_skb_core 0.54% lapps [kernel.vmlinux] [k] ___slab_alloc 0.54% lapps libc-2.26.so [.] __memset_avx2_unaligned_erms 0.51% lapps [kernel.vmlinux] [k] _raw_spin_lock_bh 0.51% lapps libcrypto.so.43.0.1 [.] gcm_gmult_clmul 0.51% lapps [kernel.vmlinux] [k] sock_poll 0.48% lapps [nf_conntrack] [k] tcp_packet 0.48% lapps libc-2.26.so [.] cfree@GLIBC_2.2.5 0.48% lapps libssl.so.45.0.1 [.] SSL_read 0.46% lapps [kernel.vmlinux] [k] copy_user_generic_unrolled 0.45% lapps [kernel.vmlinux] [k] tcp_sendmsg_locked 0.45% lapps lapps [.] LAppS::ClientWebSocket::send 0.44% lapps libc-2.26.so [.] _int_free 0.44% lapps libssl.so.45.0.1 [.] ssl3_read_internal 0.43% lapps [kernel.vmlinux] [k] futex_wake 0.42% lapps libluajit-5.1.so.2.0.5 [.] lj_tab_get 0.42% lapps libc-2.26.so [.] vfprintf 0.41% lapps [kernel.vmlinux] [k] tcp_v4_rcv
Source: https://habr.com/ru/post/421421/
All Articles