Development of a WEB project on Node.JS: Part 2

In the last article, I began to talk about my experience in developing an experimental WEB project “ What to do ?” On Node.JS. The first part was a review, in it I tried to reveal the pros and cons of technology, as well as warn about the problems that may have to face during the development. In this article I will focus on technical details.

Some words about habraeffekt

Honestly, after periodic observations of the fall of sites, links to which fall on the main habr, I expected to see much more serious numbers. Both previous articles visited the main page. Although the first article was in the closed blog “I am PR” and was visible only to its subscribers, the second one in the profile blog “Node.JS” caused quite a lengthy discussion in the comments - the number of people who came to the site from both articles was about the same. Equally small.

')
These numbers are too small to talk about any serious load. At the very peak of visits, htop showed approximately the following picture:

Load average sometimes reached 1, but then again went down to 0.3-0.5. Pages were given away quickly. The average time of page generation, the data for the formation of which is in memcached - 15-20ms. If the data in memcached is missing, the generation time is increased to 40-100ms, but this happens very rarely. Some visitors tested the site using the siege and ab utilities, as well as using the LoadImpact service. At that time I was sure that all pages are well cached by Nginx and these requests do not reach Node.JS. It turned out that this was not the case. Later, I discovered the incorrect behavior of one of the modules, which prevented page caching (I will discuss this in more detail later). In fact, all requests were served by Node.JS and at the same time the site worked stably.

Unfortunately, I do not know how much the “habraeffect” differs depending on the subject of the article and the subject of the site to which the link is set. But if the site falls from the same (well, or even x2) number of people, then the problem here is far from the choice of technology.

Based on the tests and data on attendance, I concluded that the project is quite stable and will not fall during a sharp influx of visitors.

Architecture

Iron and software

The site lives on a VPS with modest characteristics:

1 CPU with a guaranteed frequency of 1200Mhz;
1024Mb RAM;
25Gb HDD (for this project, this figure does not play a special role).

Ubuntu Server is installed on the server. I'm used to it, it is convenient for me to work with it, so I chose her.

Additional software installed at minimum. In this case, it is:

Node.JS - JavaScript application execution environment on the server;
MongoDB - NoSQL DBMS;
Memcached - Caching daemon;
Nginx - Frontend server.

Software versions I try to keep up to date.

Configuration

By default, Node.JS runs in one thread, which is not very convenient and not optimal, especially for multi-core processors. Almost immediately, modules appeared for conveniently launching several processes (various implementations of Web Workers). It was not difficult to do this with the standard API Node.JS. With the release of version 0.6.0, Node.JS has a new module - Cluster . It greatly simplifies the task of starting multiple Node.JS processes. The API of this module allows forking processes of node, net / http servers which will use a common TCP port. The parent process can manage child processes: stop, start new, respond to unexpected terminations. Child processes can exchange messages with their parent.

Despite the fact that with the help of Cluster it is convenient to start the required number of processes, I run 2 instances of node on different TCP ports. I do this in order to avoid application downtime during the update, otherwise during a reboot (which, by the way, it only takes a few seconds), the site will be unavailable to users. Between instances of node, the load is distributed using Nginx's HttpUpstreamModule. When, during a reboot, one of the instances becomes unavailable, the second takes over all requests.

Nginx is configured so that for non-authorized users all pages on the site are cached for a short period of time - 1 minute. This allows you to significantly remove the load from Node.JS and at the same time display the actual content rather quickly. For authorized users, the cache time is set to 3 seconds. This is completely invisible to ordinary users, but it will save from intruders who are trying to load the site with a large number of requests containing cookie authorizations.

Modules

When developing applications on Node.JS, very often there is a question of choosing a module for performing a particular task. For some tasks, there are already proven, popular modules; for others, the choice is more difficult to make. Choosing a module should focus on the number of observers, the number of forks and the date of the latest commits (now we are talking about GitHub). These indicators can determine whether the project is alive. Greatly facilitates this task recently appeared service The Node Toolbox .

Now it's time to talk about the modules that I chose to develop the project.

connect

github.com/senchalabs/connect
This module is an add-on over the Node.JS http-server and significantly expands its capabilities. It adds such functionality as routing, support for cookies, session support, parsing the request body and much more, without which the development of a web application on Node.JS is likely to turn into a nightmare. Most connect capabilities are implemented as plugins. There are also many equally useful plugins for connect , which are not included in its standard distribution. Adding the missing functionality by developing your plugin is also quite simple.

Despite the popularity of this module and its rapid development, the problem that prevented Nginx from caching the response from Node.JS was precisely in it. By default, the proxy_cache directive in Nginx does not cache backend responses if at least one of the following headers is present in them:

Set-Cookie;
Cache-Control containing “no-cache”, “no-store”, “private”, or “max-age” values with a non-numeric or zero value;
Expires with a past date;
X-Accel-Expires: 0.

In the connect session, the sessions were implemented so that the Set-Cookie header was sent with each response. This was done to support sessions with a lifetime longer than the lifetime of the browser session. In PHP, if you set the session time to a specific value - it will end after this time, even if the user is active on the site. Connect uses a different policy - the cookie is updated with each request and its lifetime begins to count from the current, i.e. while the user is active - the session will not end. PHP approach seems to me more true, because session is not intended for long-term data storage. I made the appropriate changes to the code and sent a pull request. Further, after a brief discussion (do not kick for my English), a compromise solution was found - for those sessions that have not been set, expires, the cookie is now sent only once. For sessions with a rigidly defined lifetime, this problem has not yet been resolved.

connect-memcached

github.com/balor/connect-memcached
This module is a plugin for connect. It adds the ability to store sessions in memcached. Without additional plug-ins, connect can only store sessions in the memory of one process. This is clearly not enough for use in combat conditions, so the corresponding plug-ins have already been developed for all popular repositories.

async

github.com/caolan/async
Without this module, writing asynchronous code for Node.JS would be much more difficult. This library contains methods that allow you to "juggle" asynchronous calls and not inflate the code with a lot of functions nested in each other. For example, it is much easier to start several asynchronous calls and perform some action on their completion. I highly recommend that you familiarize yourself with the full range of features of this library in order to avoid further inventing bicycles.

node-oauth

github.com/ciaranj/node-oauth
This module implements the OAuth and OAuth2 protocols, which makes it quite simple to ensure user authorization on the site through social networks that support these protocols.

node-cron

github.com/ncb000gt/node-cron
The name of this module speaks for itself. It allows you to perform tasks on a schedule. Schedule syntax is very similar to the cron to which everyone is used to linux, but, unlike him, node-cron supports one-second startup intervals. Ie, you can customize the launch of the method every 10 seconds or even every second. Tasks such as displaying popular questions on the main page and posting them on Twitter are launched using this module.

node-twitter

github.com/jdub/node-twitter
This module implements the interaction of the application with the Twitter API. For work, he uses the above node-oauth module.

node-mongodb-native

github.com/christkv/node-mongodb-native
This module is an interface to MongoDB NoSQL DBMS. Among its competitors, it stands out for better development and rapid development. Opening multiple database connections (pool) is supported out of the box, which saves you from writing your own crutches. Based on this module, a rather convenient ORM Mongoose was developed.

node-memcached

github.com/3rd-Eden/node-memcached
This is the best, in my opinion, memcached access interface from Node.JS. It supports multiple memcached servers and the distribution of keys between them, as well as a pool of connections.

http-get

github.com/SaltwaterC/http-get
This module is designed to access remote resources via HTTP / HTTPS protocols. With its help, photos of users who log into the site through social networks are downloaded.

sprintf

github.com/maritz/node-sprintf
A small, but very useful module, which, as can be seen from its name, implements the functions sprintf and vsprintf in JavaScript.

daemon.node

github.com/indexzero/daemon.node
This module makes it very easy to make a daemon from Node.JS applications. With its help it is convenient to untie the application from the console and redirect the output to the log files.

My contribution

The following modules were developed by me while working on the project, since at the time of writing, I was not able to find ready-made solutions suitable for them. These modules are published on GitHub and in the npm module directory.

aop

github.com/baryshev/aop
This module does not yet claim the full implementation of the AOP pattern . Now it contains a single method that allows you to wrap a function in an aspect that, if necessary, can change its behavior. This technique is very convenient to use for caching the results of the functions.

For example, we have some asynchronous function:

var someAsyncFunction = function(num, callback) { var result = num * 2; callback(undefined, result); };

This function is often called and the result needs to be cached. Usually it looks like this:

 var someAsyncFunction = function(num, callback) { var key = 'someModule' + '_' + 'someAsyncFunction' + '_' + num; cache.get(key, function(error, cachedResult) { if (error || !cachedResult) { var result = num * 2; callback(undefined, result); cache.set(key, result); } else { callback(undefined, cachedResult); } }); };

There may be a lot of such functions in the project. The code will swell up a lot and becomes less readable, a large amount of copy-paste appears. And this is how you can do the same with aop.wrap:

 var someAsyncFunction = function(num, callback) { var result = num * 2; callback(undefined, result); }; /** *   -    this   *   - ,      *   - ,        *   -  ,    (    ) */ someAsyncFunction = aop.wrap(someAsyncFunction, someAsyncFunction, aspects.cache, 'someModule', 'someAsyncFunction');

Separately, we create the aspects library and define the cache function there, which will be responsible for caching everything and everyone.

 module.exports.cache = function(method, params, moduleName, functionName) { var that = this; //         var key = moduleName + '_' + functionName + '_' + params[0]; cache.get(key, function(error, cachedResult) { //    callback- (   ) var callback = params[params.length - 1]; if (error || !cachedResult) { //     ,    ,  callback- params[params.length - 1] = function(error, result) { callback(error, result); if (!error) cache.set(key, result); }; method.apply(that, params); } else { callback(undefined, cachedResult); } }); };

As required, the aspect functionality can be expanded. In a large project, this approach greatly saves the amount of code and localizes the entire end-to-end functionality in one place.

In the future, I plan to expand this library with the implementation of the remaining features of the AOP pattern.

form

github.com/baryshev/form
The task of this module is to check and filter the input data. Most often these are forms, but it can also be data obtained from a request from external APIs, etc. This module includes the node-validator library and allows full use of its capabilities.

The principle of operation of this module is as follows: each form is described by a set of fields on which filters are hung (functions affecting the field value) and validators (functions checking the field value for compliance with the condition). When data is received, it is passed to the form method process. In callback, we will receive either an error description (if any data did not meet the criteria of the form) or an object containing a filtered set of fields and ready for further use. A small example of use:

 var fields = { text: [ form.filter(form.Filter.trim), form.validator(form.Validator.notEmpty, 'Empty text'), form.validator(form.Validator.len, 'Bad text length', 30, 1000) ], name: [ form.filter(form.Filter.trim), form.validator(form.Validator.notEmpty, 'Empty name') ] }; var textForm = form.create(fields); textForm.process({'text' : 'some short text', 'name': 'tester'}, function(error, data) { console.log(error); console.log(data); });

In this case, we get the error 'Bad text length' for the text field, since The length of the transmitted text is less than 30 characters.

Filters and validators are executed sequentially, so even if we add lots of spaces to the end of the line, we still get an error, because before checking the spaces will be removed by the trim filter.

How to create your own filters and validators can be found on the node-validator page or see the source code. In future plans, make a port of this module for use in the browser and document its capabilities well.

configjs

github.com/baryshev/configjs
This module is designed for easy application configuration. Configurations are stored in regular JS-files, this makes it possible to use JavaScript during configuration and does not require additional parsing of files. You can create several additional configurations for different environments (development, production, testing, etc.) that will expand and / or change the basic configuration.

localejs

github.com/baryshev/localejs
This module is very similar to configjs, but it is designed to store strings of different locales to support multilanguage. The module is hardly suitable for applications with a large amount of text. In this case, it will be more convenient to use solutions like GetText . In addition to the means of loading the necessary locale, the module contains a function for outputting numerals that supports Russian and English.

hub

github.com/baryshev/hub/blob/master/lib/index.js
Perhaps this module can claim to be the smallest module for Node.JS. It consists of just one line: module.exports = {}; In this case, without it, the development would be much more difficult. This module is a container for storing objects while the application is running. It uses the Node.JS feature — when connected, the module is initialized only once. All calls to require ('moduleName'), no matter how many in the application, return a reference to the same module instance, initialized at the first mention. In fact, it replaces the use of global space to share resources between parts of an application. This need arises quite often. Examples: database connection pool, cache connection pool, link to loaded configuration and locale. These resources are needed in many parts of the application and access to them should be easy. When a resource is initialized, it is assigned to the property of the hub object and later it can be accessed from any other module by connecting the hub to it.

connect-response

This plugin for connect adds the ability to easily work with cookies, and also includes a template engine to form a response to the user. I developed my template engine. It turned out pretty good. The EJS template engine was taken as a basis, but in the end it turned out to be a completely different product, with its own functionality, albeit with a similar syntax. But this is a big topic for a separate article.

Unfortunately, this module has not yet been published, because not properly executed and not all errors have been fixed. I am going to finish and publish it in the near future, as soon as some free time appears.

Application structure

Since the application does not use frameworks, its structure is not tied to any rules other than the general style of writing applications on Node.JS and common sense. The application uses the MVC model.

The server.js file that is started contains the initialization of the main application resources: starting the http server, configuring connect, loading the configuration and language, establishing connections with MongoDB and Memcached, connecting controllers, setting references to resources that need to be shared between modules in the hub. Here forks the required number of processes. In the master process, node-cron is started, to perform tasks on a schedule, and in child processes, http servers are run.

A connect url is set to the handler method in each controller by means of connect. Each request passes through a chain of methods that is created when connect is initialized. Example:

 var server = connect(); server.listen(port, hub.config.app.host); if (hub.config.app.profiler) server.use(connect.profiler()); server.use(connect.cookieParser()); server.use(connect.bodyParser()); server.use(connect.session({ store: new connectMemcached(hub.config.app.session.memcached), secret: hub.config.app.session.secret, key: hub.config.app.session.cookie_name, cookie: hub.config.app.session.cookie })); server.use(connect.query()); server.use(connect.router(function(router) { hub.router = router; }));

This is a convenient mechanism that makes it very easy to add new behavior to the request processing and response generation algorithm.

The controller, if necessary, calls model methods that receive data from MongoDb or Memcached. When all the data for the response is ready, the controller gives a command to the template engine to form the page and sends the generated html to the user.

Conclusion

The topic of developing WEB applications on Node.JS is quite large and interesting. Explain it completely in two articles is impossible. Yes, this is probably not necessary. I tried to describe the basic principles of development and point out possible problems. This should be enough to “enter” the topic, and then Google and GitHub will come to the rescue. All links to module pages on GitHub that I cited in the article contain a detailed description of the installation of modules and examples of their use.

Thanks to everyone who read it. I would be very interested to hear feedback and questions in the comments.

Source: https://habr.com/ru/post/138629/

All Articles