On the structure and scaling of complex applications for Node.JS

The structure of software projects is important. From the decisions made at the very beginning of work, it depends on what this work will be during the entire product life cycle.

This material is based on answers to frequently asked questions regarding the structuring of complex applications for Node.js. It is intended for everyone who feels the need to improve the structure of their own development.
')
Here are the main topics that we will cover here:

Developing highly scalable applications that are easy to maintain.
High-quality separation of configuration data and the main application code.
Use in Node.js applications of various types of processes.

Here, illustrating various concepts, we will use an example application, the full code of which can be found on GitHub .

Overview of the demonstration project

Our application receives data from Twitter by subscribing to updates for certain keywords. Matching tweets are sent to the RabbitMQ queue. The contents of the queue are processed and stored in the Redis database. In addition, the application has a REST API that provides access to saved tweets.

The project file structure looks like this:

. |-- config |   |-- components |   |   |-- common.js |   |   |-- logger.js |   |   |-- rabbitmq.js |   |   |-- redis.js |   |   |-- server.js |   |   `-- twitter.js |   |-- index.js |   |-- social-preprocessor-worker.js |   |-- twitter-stream-worker.js |   `-- web.js |-- models |   |-- redis |   |   |-- index.js |   |   `-- redis.js |   |-- tortoise |   |   |-- index.js |   |   `-- tortoise.js |   `-- twitter |       |-- index.js |       `-- twitter.js |-- scripts |-- test |   `-- setup.js |-- web |   |-- middleware |   |   |-- index.js |   |   `-- parseQuery.js |   |-- router |   |   |-- api |   |   |   |-- tweets |   |   |   |   |-- get.js |   |   |   |   |-- get.spec.js |   |   |   |   `-- index.js |   |   |   `-- index.js |   |   `-- index.js |   |-- index.js |   `-- server.js |-- worker |   |-- social-preprocessor |   |   |-- index.js |   |   `-- worker.js |   `-- twitter-stream |       |-- index.js |       `-- worker.js |-- index.js `-- package.json

There are 3 processes in the project:

The twitter-stream-worker process interacts with Twitter using the streaming API. He receives tweets containing certain keywords, and then sends them to the RabbitMQ queue.
The social-preprocessor-worker process works with the RabbitMQ queue. Namely, it writes tweets from it to the Redis repository and deletes old data.
The web process serves a REST API with one endpoint: GET /api/v1/tweets?limit&offset .

We will dwell on the differences between web and worker processes, and now let's talk about the configuration data of the solution.

Support for various runtime environments and application configurations

The configuration data for a specific instance of the application should be loaded from environment variables. They do not need to be added to the code as constants.

We are talking about parameters that may not coincide in different variants of application deployment and in different execution environments. For example, this could be a launch in the development environment, on a build server, in an environment as close as possible to the working one, and finally in a production environment. This approach allows you to have a single code base of the application that can work in any conditions.

A good way to verify the correctness of the separation of configuration data and the internal mechanisms of the application is as follows. If the project code, at any time working on it, put in open access, then the logic and settings are divided as it should. This automatically means protection against secret data or account parameters in the version control system.

Access to environment variables can be obtained using the process.env object. Only string values are stored in the object, so type conversion may be needed here.

 // config/config.js 'use strict' //    [ 'NODE_ENV', 'PORT' ].forEach((name) => { if (!process.env[name]) {   throw new Error(`Environment variable ${name} is missing`) } }) const config = {  env: process.env.NODE_ENV, logger: {   level: process.env.LOG_LEVEL || 'info',   enabled: process.env.BOOLEAN ? process.env.BOOLEAN.toLowerCase() === 'true' : false }, server: {   port: Number(process.env.PORT) } // ... } module.exports = config

Configuration Check

Do not include settings in the code - the solution is correct, but it is also very useful to check the environment variables before using them. This will help to detect configuration errors at the very beginning of work and avoid situations in which the application will try to work with incorrect or missing settings. You can read about the pros of early detection of errors in the configuration data here .

This is how we improved the config.js file by adding data validation using the joi validator.

 // config/config.js 'use strict' const joi = require('joi') const envVarsSchema = joi.object({  NODE_ENV: joi.string()   .allow(['development', 'production', 'test', 'provision'])   .required(), PORT: joi.number()   .required(), LOGGER_LEVEL: joi.string()   .allow(['error', 'warn', 'info', 'verbose', 'debug', 'silly'])   .default('info'), LOGGER_ENABLED: joi.boolean()   .truthy('TRUE')   .truthy('true')   .falsy('FALSE')   .falsy('false')   .default(true) }).unknown() .required() const { error, value: envVars } = joi.validate(process.env, envVarsSchema) if (error) {  throw new Error(`Config validation error: ${error.message}`) } const config = {  env: envVars.NODE_ENV, isTest: envVars.NODE_ENV === 'test', isDevelopment: envVars.NODE_ENV === 'development', logger: {   level: envVars.LOGGER_LEVEL,   enabled: envVars.LOGGER_ENABLED }, server: {   port: envVars.PORT } // ... } module.exports = config

Separation of configuration data

All configuration data can be kept in one file, but during the growth and development of the project, such a file will increase in size, it will be inconvenient to work with it. In order to avoid these problems, it makes sense to split the settings based, for example, on the application components. In our example, it looks like this:

 // config/components/logger.js 'use strict' const joi = require('joi') const envVarsSchema = joi.object({  LOGGER_LEVEL: joi.string()   .allow(['error', 'warn', 'info', 'verbose', 'debug', 'silly'])   .default('info'), LOGGER_ENABLED: joi.boolean()   .truthy('TRUE')   .truthy('true')   .falsy('FALSE')   .falsy('false')   .default(true) }).unknown() .required() const { error, value: envVars } = joi.validate(process.env, envVarsSchema) if (error) {  throw new Error(`Config validation error: ${error.message}`) } const config = {  logger: {   level: envVars.LOGGER_LEVEL,   enabled: envVars.LOGGER_ENABLED } } module.exports = config

After that, in the main config.js file config.js you only need to combine the parameters of the components.

 // config/config.js 'use strict' const common = require('./components/common') const logger = require('./components/logger') const redis = require('./components/redis') const server = require('./components/server') module.exports = Object.assign({}, common, logger, redis, server)

Please note that you should not group configuration data on the basis of the working environment, that is, say, keep settings for the production version in the config/production.js file. This approach prevents the scalability of the application, for example, in a situation where over time the same production version will have to be deployed in different environments.

Organization of a multiprocess application

The process is the main building block of modern applications. A software product may consist of many processes that do not track their own state. In our example, such processes are used. So, HTTP requests can process the web process, and worker processes can do something in accordance with the schedule, or perform some operations that take a lot of time. The information to be stored is recorded in the database. Thanks to this architecture, the solution lends itself well to scaling due to the launch of parallel executing processes. Criteria for the need to increase the number of processes can be different metrics, for example, the load on the application.

Above we talked about the separation of configuration data into components. This approach is very useful if there are various types of processes in the project. Each type of process can get its own settings, requesting only the components it needs, without waiting for the presence of environment variables that have not been used before.

In the config/index.js :

 // config/index.js 'use strict' const processType = process.env.PROCESS_TYPE let config try {  config = require(`./${processType}`) } catch (ex) { if (ex.code === 'MODULE_NOT_FOUND') {   throw new Error(`No config for process type: ${processType}`) } throw ex } module.exports = config

In the root index.js file, index.js start the necessary process with the PROCESS_TYPE environment variable:

 // index.js 'use strict' const processType = process.env.PROCESS_TYPE if (processType === 'web') {  require('./web') } else if (processType === 'twitter-stream-worker') { require('./worker/twitter-stream') } else if (processType === 'social-preprocessor-worker') { require('./worker/social-preprocessor') } else { throw new Error(`${processType} is an unsupported process type. Use one of: 'web', 'twitter-stream-worker', 'social-preprocessor-worker'!`) }

As a result, it turns out that we have one application, but it is divided into many independent processes. Each of them can be launched individually, if necessary - to raise several parallel processes of the same type, which will not affect other parts of the application. At the same time, different processes can share parts of the code, like models, which contributes to the observance of the DRY principle during the development process.

Organization of files with tests

Files with tests should be placed next to the tested modules, using a certain naming convention, such as <module_name>.spec.js and <module_name>.e2e.spec.js . Tests must evolve with the modules that they test. If test files are separated from files with application logic, they will be harder to find and keep up to date.

In a separate folder, /test has washed to store all additional tests and utilities that are not used by the application itself.

Placement of build and script files

We usually create the /scripts folder in which we place bash scripts, Node.js scripts to synchronize the database, build the front end, and so on. Due to this approach, the scripts are separated from the main application code, and the root directory of the project will not, over time, be filled with script files. To make it all easier to use, you can register scripts in the scripts section of the package.json file.

findings

We hope that our ideas on structuring and scaling complex projects for Node.js will be useful to you. By the way, there is more material on this topic.

Node.js is a very flexible environment, so you cannot say that some solutions in the areas of application structure and scaling are the last resort, while others are completely unacceptable. It is possible that you have your own work, which, perhaps, is fundamentally different from the above recommendations, and maybe - go in the same direction. If you have such ideas, it would be great if you share them.

Source: https://habr.com/ru/post/322388/

All Articles