Top 6 netty optimizations

Hello. This article is a continuation of 10k per core with specific examples of optimizations that have been done to improve server performance. 5 months have passed since the writing of the first part, and during this time the load on our production server has grown from 500 rec sec to 2000, with peaks up to 5000 rec sec. Thanks to netty, we didn’t even notice this boost (unless the disk space goes faster).

Blynk load

(Do not pay attention to the peaks, these are the bugs with the delay)

This article will be useful to all those who work with netty or just starting. So let's go.
')

Native Epoll transport for Linux

One of the key optimizations that everyone should use is to connect native Epoll transport instead of java implementation. Moreover, with netty it means adding only 1 dependency:

<dependency> <groupId>io.netty</groupId> <artifactId>netty-transport-native-epoll</artifactId> <version>${netty.version}</version> <classifier>linux-x86_64</classifier> </dependency>

and autocorrecting by code to replace the following classes:

NioEventLoopGroup → EpollEventLoopGroup
NioEventLoop → EpollEventLoop
NioServerSocketChannel → EpollServerSocketChannel
NioSocketChannel → EpollSocketChannel

The fact is that the java implementation for working with non-blocking sockets is implemented through the Selector class, which allows you to work effectively with many connections, but its implementation on java is not the most optimal. Immediately for three reasons:

The selectedKeys () method on each call creates a new HashSet
Iterating over this set creates an iterator
And everything else inside the selectedKeys () method is a huge number of synchronization blocks

In my particular case, I received a performance boost of about 30%. Of course, this optimization is possible only for Linux servers.

Native OpenSSL

I don’t know how in the CIS, but TAM is a key factor for any project. “What about security?” Is an inevitable question that you will be asked if you are interested in your project, system, service or product.

In the outsourcing world from which I came, the team always had 1-2 DevOps on which I could always shift the issue. For example, instead of adding support for https, SSL / TLS at the application level, you could always ask the administrators to configure nginx and from it already proxy the usual http to your server. And quickly and efficiently. Today, when I am a igretz and a reaper and on a dude igrets - I have to do everything myself - to develop, deploy, monitor. Therefore, connecting https at the application level is much faster and easier than deploying nginx.

Making openSSL work with netty is a bit more difficult than connecting native epoll transport. You will need to add a new dependency to the project:

 <dependency> <groupId>io.netty</groupId> <artifactId>netty-tcnative</artifactId> <version>${netty.tcnative.version}</version> <classifier>linux-x86_64</classifier> </dependency>

Specify openSSL as an SSL provider:

  return SslContextBuilder.forServer(serverCert, serverKey, serverPass) .sslProvider(SslProvider.OPENSSL) .build();

Add another handler to the pipeline:

 new SslHandler(engine)

Finally, compile the native code to work with openSSL on the server. The instruction is here . In fact, the whole process comes down to:

Download Source
mvn clean install

For me, the performance increase was ~ 15%.
A full example can be found here and here .

We save on system calls

Very often you have to send several messages to the same socket. It might look like this:

 for (Message msg : messages) { ctx.writeAndFlush(msg); }

This code can be optimized.

 for (Message msg : messages) { ctx.write(msg); } ctx.flush();

In the second case, when writing, netty will not immediately send the message over the network, but after processing it will put it in the buffer (in case the message is smaller than the buffer). Thus reducing the number of system calls to send data over the network.

Better sync is no sync.

As I already wrote in the previous article - netty asynchronous framework with a small number of threads of logic handlers (usually n core * 2). Therefore, each such handler thread should be executed as quickly as possible. Any kind of synchronization can prevent this, especially with loads of tens of thousands of requests per second.

To this end, netty binds each new connection to the same handler (stream) to reduce the need for synchronization code. For example, if a user has joined a server and performs certain actions — say, changes the state of a model that is associated only with him, then no synchronization and volatile is needed. All messages from this user will be processed by the same thread. This is great and works for some projects.

But what if the state can change from several connections that are most likely to be tied to different threads? For example, for the case when we make a game room and the team from the user has to change the world around us?

To do this, netty has a register method that allows you to rebind the connection from one handler to another.

 ChannelFuture cf = ctx.deregister(); cf.addListener(new ChannelFutureListener() { @Override public void operationComplete(ChannelFuture channelFuture) throws Exception { targetEventLoop.register(channelFuture.channel()).addListener(completeHandler); } });

This approach allows to process events for one game room in one stream and completely get rid of synchronization and volatile to change the state of this room.
An example of rebinding to login in my code here and here .

Reuse EventLoop

Netty is often chosen for a server solution, since servers must support the work of different protocols. For example, my humble IoT cloud supports HTTP / S, WebSockets, SSL / TCP sockets for different hardware and its own binary protocol. This means that for each of these protocols there must be an IO stream (boss group) and threads logic handlers (work group). Usually creating several such handlers looks like this:

 //http server new ServerBootstrap().group(new EpollEventLoopGroup(1), new EpollEventLoopGroup(workerThreads)) .channel(channelClass) .childHandler(getHTTPChannelInitializer(()) .bind(80); //https server new ServerBootstrap().group(new EpollEventLoopGroup(1), new EpollEventLoopGroup(workerThreads)) .channel(channelClass) .childHandler(getHTTPSChannelInitializer(()) .bind(443);

But in the case of netty, the less extra threads you create, the more likely it is to create a more productive application. Fortunately, in netty EventLoop you can reuse :

 EventLoopGroup boss = new EpollEventLoopGroup(1); EventLoopGroup workers = new EpollEventLoopGroup(workerThreads); //http server new ServerBootstrap().group(boss, workers) .channel(channelClass) .childHandler(getHTTPChannelInitializer(()) .bind(80); //https server new ServerBootstrap().group(boss, workers) .channel(channelClass) .childHandler(getHTTPSChannelInitializer(()) .bind(443);

Off-heap messages

It's no secret for anyone that for high-loaded applications one of the bottlenecks is the garbage collector. Netty is fast, including, just due to the ubiquitous use of memory outside the java heap. Netty even has its own ecosystem around off-heap buffers and a memory leak detection system. So you can do it. For example:

 ctx.writeAndFlush(new ResponseMessage(messageId, OK, 0));

change to

 ByteBuf buf = ctx.alloc().directBuffer(5); buf.writeByte(messageId); buf.writeShort(OK); buf.writeShort(0); ctx.writeAndFlush(buf); //buf.release();

In this case, however, you must be sure that one of their handlers in the pipeline will free this buffer. This does not mean that you should immediately run and change your code, but you should know about this possibility of optimization. Despite the more complex code and the ability to get a memory leak. For hot methods this can be the perfect solution.

I hope these simple tips will allow you to speed up your application.
Let me remind you that my project is open-source. Therefore, if you are interested in how these optimizations look in existing code - see here .

Source: https://habr.com/ru/post/277695/

All Articles