We load Node to the eyeballs (2nd of 12 articles about Node.js from the Mozilla Identity team)

From the translator: This is the second article from the cycle about Node.js from the Mozilla Identity team that deals with the Persona project. This article is based on the speech of Lloyd Hilayel at the Node Philly 2012 conference in Philadelphia.

All articles of the cycle:

The Node.js process runs on a single processor core, so building a scalable server on Node requires special care. Thanks to the ability to write native extensions and a thoughtful set of APIs for managing processes, there are several different ways to get Node to execute code in parallel. We will look at them in this article.
')
In addition, we will introduce the compute-cluster module - a small library that facilitates the management of a collection of processes for performing distributed computing.

Formulation of the problem

For Persona, we needed to create a server that could handle handling multiple requests with mixed characteristics. We chose Node.js for this purpose. We had to handle two main types of requests: “interactive”, which did not require complex calculations and had to be performed quickly for the application interface to be responsive, and “batch”, which took about half a second of processor time and could be delayed for a while without harming for the convenience of the user.

In search of the best application architecture, we have long and carefully thought out how to handle these types of requests, taking into account usability and scaling costs, and finally formulated four basic requirements:

Saturation . Our solution was to use all available processor cores.
Responsiveness . The user interface should remain responsive. Is always.
Fault tolerance . When the load exceeds the limit, we should normally serve as many clients as we can, and the rest should show an error message.
Simplicity The solution should be easily and gradually integrated into an already running server.

Armed with these requirements, we can meaningfully compare different approaches.

Approach # 1. Just doing everything in the main thread.

When heavy calculations are done in the main thread, the result is terrible. There is no saturation - only one core is loaded, neither responsiveness nor fault tolerance - while the calculations are in progress, the application does not respond to any requests. The only advantage of this approach is simplicity.

function myRequestHandler(request, response) { //     -. var results = doComputationWorkSync(request.somesuch); }

Synchronous computing in a Node.js application that needs to process more than one request at a time is a bad idea.

Approach # 2. We do everything asynchronously

Asynchronous functions that run in the background will solve our problems, right?

Well, it depends on what it actually means "in the background." If the function that performs the calculations is implemented in such a way that it actually works in the main thread, then the performance will be no better than with the synchronous approach. Take a look:

 function doComputationWork(input, callback) { //       //      ,   , //      . var output = doComputationWorkSync(input); process.nextTick(function() { callback(null, output); }); } function myRequestHandler(request, response) { //   ,    ** , //      . doComputationWork(request.somesuch, function(err, results) { // ...  -   ... }); }

Using asynchronous APIs in Node alone does not guarantee that you will get an application that runs on multiple cores.

Approach number 3. We do everything asynchronously with multi-threaded libraries.

Having a library correctly written using native code, it is quite possible to use several streams from the application on Node.js. There are many such libraries, for example, node.bcrypt.js , written by Nick Campbell.

On a machine with four cores, the result looks great. Productivity is quadrupled, leveraging all available resources. However, if you run the application on a server with 24 cores, the picture is no longer so magical - all the same four cores work, and the rest are idle.

The problem is that this library uses the internal thread pool of Node.js, which is not intended for this purpose at all, and is strictly limited to only 4 threads.

And this is not the only problem:

Filling the Node's system thread pool with computational tasks can slow down file or network operations, thereby impairing responsiveness.
There is no way to control the queue of tasks. If the server is already loaded with work for 5 minutes ahead, do you want to load it even more?

Libraries that use such multithreading cannot saturate a multitude of cores, have a bad effect on responsiveness, and limit the ability of an application to respond correctly to an overload, that is, failover.

Approach number 4. We use inline clustering

Node.js version 0.6.x and above has a built-in clustering module that allows you to create several processes that listen to the same socket in order to balance the load. What if you combine this opportunity with one of the previous approaches?

This architecture will inherit the disadvantages of the previous approaches, we just can not ensure responsiveness and resiliency.

Simply running multiple additional instances of an application is not always the right option.

Approach number 5. Introducing compute-cluster

For Persona, we solved the problem of paralleling computations by creating a cluster of processes specifically designed for computational work. As a result, the compute-cluster library appeared.

compute-cluster spawns processes and manages them, providing you with a convenient means of distributing work to child processes. Here is how to use it:

 const computecluster = require('compute-cluster'); //    var cc = new computecluster({ module: './worker.js' }); //    cc.enqueue({ input: "foo" }, function (error, result) { console.log("foo done", result); }); cc.enqueue({ input: "bar" }, function (error, result) { console.log("bar done", result); });

The worker.js file must contain a message event handler for receiving input data.

 process.on('message', function(m) { var output; //     ,    ,    //  ,           var output = doComputationWorkSync(m.input); process.send(output); });

compute-cluster can be integrated into already existing asynchronous APIs without rewriting the calling code and run truly fast parallel computations with minimal changes in the program.

How does this approach meet our four requirements?

Saturation : a variety of workflows use all available cores.

Responsiveness : Since the control process does nothing but create child processes and send messages to them, it can process interactive requests most of the time. Even if the machine is 100% loaded, you can set a higher priority to the control process in the task scheduler of the operating system level.

Simplicity : this solution is easy to integrate into an existing project. Hiding the details behind a simple asynchronous API, compute-cluster leaves the calling process happy and unaware of the implementation details.

What about fault tolerance during heavy traffic surges? After all, our goal is to work as efficiently as possible while being able to serve the maximum number of clients.

compute-cluster can do more than create processes and transfer messages. It keeps track of how many tasks are already performed, and how much time on average one task requires. Thanks to this information, it is possible to reliably predict how long it will take to complete a request even before it is queued.

The max_request_time parameter allows max_request_time to set the maximum acceptable time to execute the request. Attempting to queue a request will result in an error if the expected execution time exceeds the maximum allowed.

For example, the requirement of the form “the user does not have to wait for completion of authorization for more than 10 seconds” can be set by setting max_request_time to 7 seconds (we leave the margin to 3 seconds for possible network delays).

Compute-cluster load testing showed promising results. Even under extreme load, authorized users could continue to use the system, and some of those who tried to log on to the overloaded server immediately received an error message.

What's next?

Parallelization at the application level using processes works well only in a single-layer architecture, when there is only one type of nodes, and scaling consists in simply increasing their number. But when the application becomes more complex, the architecture evolves towards the selection of several layers for performance or security reasons.

In addition to layering, high-load applications often require placement in several geographically distant data centers. Finally, the application can be scaled by adding cloud resources on demand. The multi-layered architecture, geographical diversity and dynamically connected cloud resources noticeably change the parameters of the task scaling, while the goal remains unchanged.

Possible directions of development of the compute-cluster may include the distribution of tasks across different layers of a complex application, and coordination between different data centers to handle local load peaks, and the ability to use cloud resources on demand.

If you have ideas and suggestions for improving the compute-cluster, I will be glad to hear them. Join the Persona discussion on our mailing list . Thank you for reading!