📜 ⬆️ ⬇️

LAppS: Half a million 1KB-WebSocket messages per second with TLS on one CPU

For those who do not know: LAppS - Lua Application Server , it is almost like nginx or apache, but only for the WebSocket protocol, instead of HTTP.


HTTP in it is supported only at the level of Upgrade request.


LAppS initially sharpened on high load and vertical scalability, and today I reached the peak of my capabilities on my hardware (well, almost, you can further optimize, but it will be a long and hard work).


Most importantly, LAppS in WebSocket performance on the stack exceeded the uWebSockets library, which is positioned as the fastest WebSocket implementation.


Interested please under the cat.


A couple of months have already passed since my last article about LAppS , and that article did not cause any interest. I hope this article will seem more interesting to habrovchanam. LAppS during this time has done a rather difficult path to version 0.7.0, has acquired functionality and has grown in terms of performance (which was promised earlier).


One of the features that has appeared: the loadable module with the implementation of the client part of the WebSocket protocol, is cws.


Thanks to this module, I was finally able to squeeze everything from my home computer, and load LAppS for real.


Previously, testing was performed using the websocketpp library client echo (for more details, see the project's github page), which is not only slow, but also difficult to parallelize. The tests were performed simply: a bunch of clients were started, the results from each client were collected using awk and simple arithmetic showed performance results. The results were as follows:


ServerNumber of customersRPS serverRPS per customerpayload (bytes)
LAppS 0.7.024084997354.154128
uWebSockets (latest)24074172.7309.053128
LAppS 0.7.024083627.4348.447512
uWebSockets (latest)24071024.4295.935512
LAppS 0.7.024079270.1330.2921024
uWebSockets (latest)24066499.8277.0831024
LAppS 0.7.024051621215.0878192
uWebSockets (latest)24045341.6188.9248192

In this test, as in the subsequent ones, the number of packages in sm business is twice as high, since measurement is performed on on_message and in the client's on_message method, a new package of the same size is sent. Those. the client's request and the server's response are the same size, and if you count the amount of traffic processed by the server, then you need to double the RPS result multiplied by payload and neglecting the headers you can get the approximate amount of traffic in bytes.


Obviously, with 240 client processes running simultaneously, the LAppS itself (like uWebSockets) doesn’t have that many CPU resources.


I looked through several client implementations for WebSocket under Lua, and unfortunately I did not find a simple and sufficiently productive module with which I could load LAppS properly. Therefore, as usual made my bike.


The module has a fairly simple interface and imitates the behavior of the browser WebSocket API.


A simple example of how to work with this module (service for receiving transactions with BitMEX):


Hidden text
bitmex={} bitmex.__index=bitmex bitmex.init=function() end - bitmex.run=function() --   BitMEX local websocket,errmsg=cws:new( "wss://www.bitmex.com/realtime", { ["onopen"]=function(handler) --   WebSocket    local result, errstr=cws:send(handler,[[{"op": "subscribe", "args": ["orderBookL2:XBTUSD"]}]],1); --    1 (OpCode 1 - ) if(not result) --     , -  then print("Error on websocket send at handler "..handler..": "..errstr); end end, ["onmessage"]=function(handler,message,opcode) print(message) --     BitMEX   . end, ["onerror"]=function(handler, message) --    print(message..". Socket FD: "..handler); end, ["onclose"]=function(handler) --     print("WebSocket "..handler.." is closed by peer."); end }); if(websocket == nil) --     then print(errmsg) else while not must_stop() do cws:eventLoop(); -- poll  end end end return bitmex; 

Immediately I warn you, the module appeared only today and it is poorly tested.


For testing, I wrote a simple service for LAppS and called it the same simple benchmark .


At the start, this service creates 100 connections to the echo WebSocket server (no matter which one), and upon a successful connection, sends a 1kb message. When receiving a message from the server, it sends it back.


My home computer: Intel® Core (TM) i7-7700 CPU @ 3.60GHz, microcode 0x5e
Memory: DDRM DIM4 Synchronized Unbuffered (Unregistered) 2400 MHz (0.4 ns), Kingston KHX2400C15 / 16G


All testing was conducted on this localhost.


Echo service configuration in LAppS:


  "echo": { "auto_start": true, "instances": 2, "internal": false, "max_inbound_message_size": 16777216, "preload": null, "protocol": "raw", "request_target": "/echo" } 

The instances parameter requires LAppS to start two parallel echo services.


Benchmark service (client) configuration:


  "benchmark" : { "auto_start" : true, "instances": 4, "internal": true, "preload" : [ "cws", "time" ] } 

Te creates 4 instances of the benchmark service at startup.


Result with TLS enabled


ServerNumber of customersRPS serverRPS per customerpayload (bytes)
LAppS 0.7.0-Upstream400257828644.571024
nginx & lua-resty-websocket 4 workers4003378884.471024
websocketpp4009789.5224.471024

Testing uWebSockets has not yet succeeded - TLS handshake swears at SSLv3 (my client uses TLSv1.2 and the libreSSL SSLv3 I use is cut out).


Result without TLS


ServerNumber of customersRPS serverRPS per customerpayload (bytes)
LAppS 0.7.0-upstream4004397001099.251024
uWebSockets-upstream400247549618.871024

Why in the header of "half a million" messages, and in test 257828? Because there are twice as many messages (as explained above).


uWebsockets, shows unenviable results in this test, only because it works on the 1st core, the multi-threaded version of the uWebSockets from the project repository does not actually work, and when TLS is enabled, it has a data-race in the OpenSSL stack.


If we imagine that uWebSockets works fine on 2 cores (like 2 LAppS echo services), then it can be conditionally set off as 495098 RPS (simply double the result from the table).


But keep in mind that the echo server ( uWebSockets ) does not do anything with the received data, but immediately sends it back. LAppS transfers the data to the Lua stack corresponding to the service.


What else is new in LAppS



All this can be found on the project wiki page.


Well, for a snack, for connoisseurs, what does LAppS actually do during this test.


Without TLS


Hidden text
   iptables. 4.98% lapps [ip_tables] [k] ipt_do_table     3.80% lapps [kernel.vmlinux] [.] syscall_return_via_sysret       Lua  3.52% lapps libluajit-5.1.so.2.0.5 [.] lj_str_new    WebSocket  1.96% lapps lapps [.] WSStreamProcessing::WSStreamServerParser::parse     1.88% lapps [kernel.vmlinux] [k] copy_user_enhanced_fast_string 1.81% lapps [kernel.vmlinux] [k] __fget 1.61% lapps [kernel.vmlinux] [k] tcp_ack 1.49% lapps [kernel.vmlinux] [k] _raw_spin_lock_irqsave 1.48% lapps [kernel.vmlinux] [k] sys_epoll_ctl 1.45% lapps [xt_tcpudp] [k] tcp_mt  LAppS 1.35% lapps lapps [.] LAppS::IOWorker<false, true>::execute   1.28% lapps lapps [.] cws_eventloop ... 1.27% lapps [nf_conntrack] [k] __nf_conntrack_find_get.isra.11 1.14% lapps [kernel.vmlinux] [k] __inet_lookup_established 1.14% lapps libluajit-5.1.so.2.0.5 [.] lj_BC_TGETS      C++ 1.01% lapps lapps [.] LAppS::Application<false, true, (abstract::Application::Protocol)0>::execute ... 0.98% lapps [kernel.vmlinux] [k] ep_send_events_proc 0.98% lapps [kernel.vmlinux] [k] tcp_recvmsg 0.96% lapps libc-2.26.so [.] __memmove_avx_unaligned_erms 0.93% lapps libc-2.26.so [.] malloc 0.92% lapps [kernel.vmlinux] [k] tcp_transmit_skb 0.88% lapps [kernel.vmlinux] [k] sock_poll 0.85% lapps [nf_conntrack] [k] nf_conntrack_in 0.83% lapps [nf_conntrack] [k] tcp_packet 0.79% lapps [kernel.vmlinux] [k] do_syscall_64 0.78% lapps [kernel.vmlinux] [k] ___slab_alloc 0.78% lapps [kernel.vmlinux] [k] _raw_spin_lock_bh 0.73% lapps libc-2.26.so [.] _int_free 0.69% lapps [kernel.vmlinux] [k] __slab_free 0.66% lapps libcryptopp.so.5.6.5 [.] CryptoPP::Rijndael::Base::UncheckedSetKey 0.66% lapps [kernel.vmlinux] [k] tcp_write_xmit 0.65% lapps [kernel.vmlinux] [k] sock_def_readable 0.65% lapps [kernel.vmlinux] [k] tcp_sendmsg_locked 0.64% lapps libc-2.26.so [.] vfprintf     ( - bemchmark) 0.64% lapps lapps [.] LAppS::ClientWebSocket::send ... 0.64% lapps [kernel.vmlinux] [k] tcp_v4_rcv 0.63% lapps [kernel.vmlinux] [k] __alloc_skb 0.61% lapps lapps [.] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release 0.61% lapps [kernel.vmlinux] [k] _raw_spin_lock 0.60% lapps libc-2.26.so [.] __memset_avx2_unaligned_erms 0.60% lapps [kernel.vmlinux] [k] kmem_cache_alloc_node 0.59% lapps libluajit-5.1.so.2.0.5 [.] lj_tab_get 0.59% lapps [kernel.vmlinux] [k] __local_bh_enable_ip 0.58% lapps [kernel.vmlinux] [k] __dev_queue_xmit 0.57% lapps [kernel.vmlinux] [k] nf_hook_slow 0.55% lapps [kernel.vmlinux] [k] ep_poll_callback 0.55% lapps [kernel.vmlinux] [k] skb_release_data 0.54% lapps [kernel.vmlinux] [k] native_queued_spin_lock_slowpath 0.54% lapps libc-2.26.so [.] cfree@GLIBC_2.2.5 0.53% lapps [kernel.vmlinux] [k] ip_finish_output2 0.49% lapps libluajit-5.1.so.2.0.5 [.] lj_BC_RET 0.49% lapps libc-2.26.so [.] __strlen_avx2 0.48% lapps [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore 

C find 10 differences when working with TLS


Hidden text
  3.73% lapps [kernel.vmlinux] [k] syscall_return_via_sysret 3.49% lapps libcrypto.so.43.0.1 [.] gcm_ghash_clmul 3.42% lapps libcrypto.so.43.0.1 [.] aesni_ctr32_encrypt_blocks 2.74% lapps [ip_tables] [k] ipt_do_table 2.17% lapps libluajit-5.1.so.2.0.5 [.] lj_str_new 1.41% lapps libpthread-2.26.so [.] __pthread_mutex_lock 1.34% lapps libssl.so.45.0.1 [.] tls1_enc 1.32% lapps [kernel.vmlinux] [k] __fget 1.16% lapps libcrypto.so.43.0.1 [.] getrn 1.06% lapps libc-2.26.so [.] __memmove_avx_unaligned_erms 1.06% lapps lapps [.] WSStreamProcessing::WSStreamServerParser::parse 1.05% lapps [kernel.vmlinux] [k] tcp_ack 1.02% lapps [kernel.vmlinux] [k] copy_user_enhanced_fast_string 1.02% lapps [nf_conntrack] [k] __nf_conntrack_find_get.isra.11 0.98% lapps lapps [.] cws_eventloop 0.98% lapps [kernel.vmlinux] [k] native_queued_spin_lock_slowpath 0.93% lapps libcrypto.so.43.0.1 [.] aead_aes_gcm_open 0.92% lapps lapps [.] LAppS::IOWorker<true, true>::execute 0.91% lapps [kernel.vmlinux] [k] tcp_recvmsg 0.89% lapps [kernel.vmlinux] [k] sys_epoll_ctl 0.88% lapps libcrypto.so.43.0.1 [.] aead_aes_gcm_seal 0.84% lapps [kernel.vmlinux] [k] do_syscall_64 0.82% lapps [kernel.vmlinux] [k] __inet_lookup_established 0.82% lapps [kernel.vmlinux] [k] tcp_transmit_skb 0.79% lapps libpthread-2.26.so [.] __pthread_mutex_unlock_usercnt 0.77% lapps [kernel.vmlinux] [k] _raw_spin_lock_irqsave 0.76% lapps [xt_tcpudp] [k] tcp_mt 0.71% lapps libcrypto.so.43.0.1 [.] aesni_encrypt 0.70% lapps [kernel.vmlinux] [k] _raw_spin_lock 0.67% lapps [kernel.vmlinux] [k] ep_send_events_proc 0.66% lapps libcrypto.so.43.0.1 [.] ERR_clear_error 0.63% lapps [kernel.vmlinux] [k] sock_def_readable 0.62% lapps lapps [.] LAppS::Application<true, true, (abstract::Application::Protocol)0>::execute 0.61% lapps libc-2.26.so [.] malloc 0.61% lapps [nf_conntrack] [k] nf_conntrack_in 0.58% lapps libssl.so.45.0.1 [.] ssl3_read_bytes 0.58% lapps libluajit-5.1.so.2.0.5 [.] lj_BC_TGETS 0.57% lapps [kernel.vmlinux] [k] tcp_write_xmit 0.56% lapps libssl.so.45.0.1 [.] do_ssl3_write 0.55% lapps [kernel.vmlinux] [k] __netif_receive_skb_core 0.54% lapps [kernel.vmlinux] [k] ___slab_alloc 0.54% lapps libc-2.26.so [.] __memset_avx2_unaligned_erms 0.51% lapps [kernel.vmlinux] [k] _raw_spin_lock_bh 0.51% lapps libcrypto.so.43.0.1 [.] gcm_gmult_clmul 0.51% lapps [kernel.vmlinux] [k] sock_poll 0.48% lapps [nf_conntrack] [k] tcp_packet 0.48% lapps libc-2.26.so [.] cfree@GLIBC_2.2.5 0.48% lapps libssl.so.45.0.1 [.] SSL_read 0.46% lapps [kernel.vmlinux] [k] copy_user_generic_unrolled 0.45% lapps [kernel.vmlinux] [k] tcp_sendmsg_locked 0.45% lapps lapps [.] LAppS::ClientWebSocket::send 0.44% lapps libc-2.26.so [.] _int_free 0.44% lapps libssl.so.45.0.1 [.] ssl3_read_internal 0.43% lapps [kernel.vmlinux] [k] futex_wake 0.42% lapps libluajit-5.1.so.2.0.5 [.] lj_tab_get 0.42% lapps libc-2.26.so [.] vfprintf 0.41% lapps [kernel.vmlinux] [k] tcp_v4_rcv 

')

Source: https://habr.com/ru/post/421421/


All Articles