How to fix the error in Node.js and inadvertently raise performance by 2 times

It all started with the fact that I optimized the return of the HTTP 408 Request Timeout error in the Impress application server running on Node.js. As you know, the node. http.Server has a timeout event that must be raised for each open socket if it has not closed during the specified time. I want to clarify that not for every request. not for each request event, whose function has two arguments (req, res), namely for each socket. Through a single socket can consistently receive many requests in the mode of keep-alive . If we set this event, via server.setTimeout (2 * 60 * 1000, function (socket) {...}), we must destroy socket.destroy () by ourselves. But if you do not install your handler, then http.Server has a built-in that will destroy the socket after 2 minutes automatically. At this very timeout, you can give an error 408 and consider the incident settled. If it were not for one thing ... I was surprised that I found out that the timeout event is also triggered for those sockets that have hung up for those that have already received an answer and for those that are closed by the client side, in general, for all who are in the keep-alive mode. This strange behavior turned out to be quite complicated, and I will tell about it below. It would be possible to insert one check into the timeout event, but with my idealism I could not resist and it was useful to correct the bug a level deeper. It turned out that the keep-alive mode was not implemented in http.Server not by RFC, but frankly not added. Instead of a separate timeout for the connection and a separate keep-alive timeout, everything is on one timeout, which is implemented on fast pseudo-timers (enroll / unenroll), but is set to 2 minutes by default. It would not be so bad if browsers worked well with keep-alive and reused it effectively or closed unused connections.

First results

After 12 lines of changes, the timeout event started to fire only when the server did not respond to the client and the client waits for it. The connection timeout remained with the default value of 2 minutes, but http.Server.keepAliveTimeout appeared with a default value of 5 seconds (like in Apache). Repository with corrections: tshemsedinov / node (for node.js 0.12) and tshemsedinov / io.js (for io.js). Soon I will send pull requests, respectively, to joyent / node and nodejs / node (the former io.js, and now there are already glued projects in it).

The essence of the correction is that the connection timeout should work if the connection is stalled, leaving the request unanswered, and if the socket is open, but all requests are answered, then you need to wait much less, giving you the opportunity to send another request in keep-alive mode.
')
It is already possible to guess about the side effect, a lot of memory and socket descriptors have been freed, which immediately caused in my current high-load projects an increase in the overall performance of more than 2 times. And here I will show a small test with a code, the results of which can be seen on the graphs below and giving an idea of what is happening.

The essence of the test: create 15 thousand HTTP / 1.1 connections (which are considered keep-alive by default, even without special headers) and check the intensity of creating and closing sockets and memory costs. The test was performed for 200 seconds, data was recorded every 10 seconds. The charts on the left are Node.js 0.2.7 without fixes, and on the right are the patched and reassembled Node.js. Blue line - the number of open sockets, and red - closed sockets. For this, I of course had to write all the sockets into an array, which did not allow completely freeing the memory. Therefore, there are two variants of the client part of the test, with an array of sockets, and without it, to check the memory. As expected, the sockets are released 2 times faster, which means that they do not occupy descriptors and do not load the TCP / IP stack of the operating system, which in addition to the node holds data structures and buffers for each descriptor.

Blue line - RSS (resident set size) - how long the whole process takes, red - heap total - allocated memory for the application, green - heap used - used memory. Naturally, all freed up memory can be reused for other sockets, even faster than at the first allocation.

Test code:

Client part of the test

var net = require('net'); var count = 0; keepAliveConnect(); function keepAliveConnect() { var c = net.connect({ port: 80, allowHalfOpen: true }, function() { c.write('GET / HTTP/1.1\r\nHost: localhost\r\n\r\n'); if (count++ < 15000) keepAliveConnect(); }); }

Server side with socket counters

 var http = require('http'); var pad = ''; for (var i = 0; i < 10; i++) pad += '- - - - - - - - - - - - - - - - - '; var sockets = []; var server = http.createServer(function (req, res) { var socket = req.socket; sockets.push(socket); res.writeHead(200, {'Content-Type': 'text/plain'}); res.end(pad + 'Hello World\n'); }); setInterval(function() { var destroyedSockets = 0; for (var i = 0; i < sockets.length; i++) { if (sockets[i].destroyed) destroyedSockets++; } var m = process.memoryUsage(), a = [m.rss, m.heapTotal, m.heapUsed, sockets.length, destroyedSockets]; console.log(a.join(',')); }, 10000); server.listen(80, '127.0.0.1');

Server side without socket counters

 var http = require('http'); var pad = ''; for (var i = 0; i < 10; i++) pad += '- - - - - - - - - - - - - - - - - '; var server = http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end(pad + 'Hello World\n'); }); setInterval(function() { var m = process.memoryUsage(); console.log([m.rss, m.heapTotal, m.heapUsed].join(',')); }, 10000); server.listen(80, '127.0.0.1');

Details of the problem

If the client side does not request keep-alive, Node.js closes the socket immediately upon a call to res.end () and no resource leaks occur. Therefore, all the tests in which we do massively http.get ('/'). On ('error', function () {}) or curl http://domain.com/ or via ab (apache benchmark) show that all OK. And browsers always want to keep-alive, with which they work badly, like a node. The problem of keep-alive is that through it you can send several requests only sequentially, there is no batch mechanism in it that would mark which of the competitive requests each answer answers. Granted, this is wildly uncomfortable. There is no such problem in SPDY and HTTP / 2 . When browsers load a page with many resources, they sometimes use keep-alive, but more often they send the correct headers, telling the server to keep open connections, while they themselves use it very little or even ignore it, guided by some incomprehensible logic. Here Firebug and DevTools show that requests are completed, and sockets hang. Even if the page has already been fully loaded, several sockets were created, they are not closed, and we need to make one unfortunate request to the API, my observations show that browsers always create a new connection, and sockets keep them until the server will close. Such suspended sockets are not considered to be parallel requests, so they do not affect browser restrictions (I understand that they are marked as half-open, not used and excluded from the counter). This can be checked, if you close the browser, then on the node server immediately closes a whole bundle of sockets that did not have time to wait for their 2 minutes of timeout.

From the side of the node, the timeout is set to 2 minutes, regardless of whether the response is sent to the client side or not. Lowering this timeout, for example, to 5 seconds is not an option, as a result, connections that objectively take longer than 5 seconds will fail. We need a separate timeout for keep-alive, the counting of which does not begin immediately, but after the last activity in the socket, i.e. This is the real time waiting for the next request from the client.

In general, for the full implementation of keep-alive, you need to do much more, take the desired timeout from the HTTP headers sent by the client, send the client the actual set timeout time in the response headers, process the max and Keep-Alive Extensions parameter. But modern browsers do not use all these things, in any case, from the experiments I conducted, they ignored these HTTP headers. Therefore, I calmed down minor edits that gave great results.

Node.js fixes

First, I decided to patch the problem with unnecessary timeouts in a simple way, preventing emit events: ae9a1a5 . But for this, I had to familiarize myself with the code and I did not like the way it was written. In some places there are comments that it is impossible to write like that, you need to decompose large closures, get rid of the nesting of functions, but nobody touches these libraries, because then you can’t collect tests and you can ruin a lot of people with all dependent code. Okay, everything will not work out, but the leakage of sockets did not give me rest. And I decided to solve the problem by destroying the socket after ServerResponse.prototype.detachSocket, when one res.end () was already sent, but it broke a lot of useful behavior related to keep-alive: 9d9484b . After experimenting, reading RFCs and documentation on other servers, it became obvious that you need to implement a keep-alive timeout, and that it differs from just the connection timeout.

Corrections:

Added server.keepAliveTimeout parameter, which can be set manually /lib/_http_server.js#L259
Renamed the prefinish event function to use it elsewhere /lib/_http_server.js#L455,L456
I hung the finish event to catch the moment when everything was answered. No, I’ll delete event handlers from the EventEmitter, which were hanged on the timeout event of the socket, and broadcast the event that destroys the socket
For the https server we add the keepAliveTimeout parameter, because it inherits everything else from the prototype /lib/https.js#L51

For Impress Application Server, all these changes are implemented inside, in the form of a beautiful patch and the effect is available even without a patch on Node.js, in its source codes you can see how simple it is done. In addition, on recent projects we have achieved other, impressive results, for example, 10 million permanent connections on 4 servers, clustered (2.5 million per server), based on the SSE protocol (Server-Sent Events), and now we are preparing to make the same for websockets. They implemented application balancing for the Impress cluster, tied the cluster nodes with their TCP-based protocol, instead of the previously used ZMQ, from which they received significant acceleration. I am also going to partially publish the results of this work in the following articles. Many people tell me that nobody needs this optimization and performance, everyone doesn’t care. But, at least on four live, highly loaded examples, for my customers from the PRC and for the interactive TV format “The Seventh Sense”, I observe a general increase in productivity from 2-3 times to 1 order, and this is already significant. To do this, I had to give up the principle of middleware, and rewrite interprocess communication, and implement applied balancing (hardware balancers do not cope), etc. This will be a separate article about the horrors of performance when using middleware: "What the node gave, the middleware took away." For which I have already prepared enough facts, statistics and examples, and I have something to offer in return.

Do you want everything at once, right now?

Then you need to test just such a patch and not on the basis of your build, but to show its impact on the official version of Node.js 0.12.7. Now we’ll check what happens if we add an additional 7 lines of code to the request event. Sockets will be closed as needed and even the error with the extra timeout event also disappears, this is understandable. But with memory, the situation is certainly much better, but not as much as when reassembling Node.js.

 http.createServer(function (req, res) { var socket = req.socket; res.on('finish', function() { socket.removeAllListeners('timeout'); socket.setTimeout(5000, function() { socket.destroy(); }); }); res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello World\n'); });

Let's compare the results on the graphs: on the left - the initial state of Node.js 0.12.7, in the middle - adding 7 lines to the request and running on the official 0.12.7, on the right - the patched Node.js from my repository. The reasons for this are clear, I did not slope 0.12.7, but a slightly newer version, and repelled from it. Of course, all the tests except the last were carried out on my repository, with the patch and without the patch. And I compared the last test with the official version 0.12.7, to make it clear how this will affect your code now.

The version of V8 in my repository is the same as in 0.12.7, but it is obvious that optimization has happened in the node. If you wait quite a bit, then you will be able to use either the patch mentioned above or the corrections get into the node. The results of these two options are almost the same. In general, I am going to continue to engage in experiments and optimization in this direction, and if you have ideas, please do not hesitate to offer and connect to bring the code of the most critical embedded libraries of the node into a decent look. Believe me, there is a lot of work for a specialist of any level. In addition, the study of the source code is the best known method for me to master the platform.

Update : found another problem there with _last, no one figured it out. Now merged with neighboring edits, tested and laid out a pull-request and https://github.com/nodejs/node/pull/2534

Source: https://habr.com/ru/post/264851/

All Articles

How to fix the error in Node.js and inadvertently raise performance by 2 times

First results

Details of the problem

Node.js fixes

Do you want everything at once, right now?

More articles: