Tactoom.com inside - social blogging platform on NodeJS / NoSQL

So, it's time to reveal some of the maps and talk about how the Tactoom works from the inside.

In this article I will talk about the development and production of a web service in production using:
NodeJS (fibers), MongoDB, Redis, ElasticSearch, Capistrano, Rackspace.

Introduction

Three weeks ago, David and I ( DMiloshev ) launched the info-social network Tactoom.com. About what it is you can read here .
')
Against the background of the noise recently raised around NodeJS, it is probably interesting to many what this technology is not in words, but in deeds.

NodeJS is not a panacea at all. It's just another technology, in fact, no better than others. In order to achieve good performance and scalability, you have to sweat well - just like everywhere else.

Application architecture

NodeJS application is divided into two types of processes:
1. Web process (http)
2. Cloud process (queues)

All processes are completely independent of each other, can be located on different servers and even in different parts of the globe. At the same time, the application is scaled just by multiplying these processes. Communication between them takes place exclusively through a centralized message server (redis).

Web processes serve direct http user requests. Each process can process multiple requests simultaneously. Considering the specificity of Eventloop, depending on the ratio of CPU / IO of each specific request, the limit of parallel processing can either be reduced or increased for a separate process at a time.

Cloud processes perform operations that are not directly related to user requests. For example: sending emails, data denormalization, search indexing. Like the Web, a single Cloud process can handle many different types of tasks at the same time.
It should be noted that the "atomicity" of tasks / requests is very important here. That is, you need to ensure that the capacious task / calculation is divided into many smaller parts, which will then be evenly distributed across the rest of the processes. This will increase the speed of the task, fault tolerance and reduce memory consumption and the blocking factor of each process and the entire server.

Web → Cloud
I try to organize Web processes in such a way as to increase the overall IO versus CPU time ratio, and therefore focus them on quickly issuing http with high concurrency of requests. This means that the Web delegates high-cpu logic to the Cloud , waits for its execution, then receives the result of the calculations. Accordingly, due to the asynchronous architecture of the nodejs, while waiting for the Web can perform other requests.

Clustering
The Web and Cloud architecture is very similar, except that instead of http, the Cloud socket is “listening” to the redis queue.

Clustering node processes occurs according to the following principles:
1. On each physical server, one supervisor process is running ( node-cluster )
2. The child processes of the supervisor are our Web s and Cloud s, the number of which is always equal to the number of server cores.
3. The supervisor controls the memory consumption of each child process and, in case of exceeding the specified rate, restarts it (after waiting for the completion of the current requests of this process).

Tactoom nodejs cluster

Fibers

The entire high-level application layer is written using node-sync (fibers) , without which I can hardly imagine its development at all. The fact is that such complicated things as the same static construction are very difficult to implement on the “official” callback-driven paradigm, if not stupid. Those who have not yet seen the code of the same npm , I strongly recommend to look at it , and try to understand what is happening there, and most importantly - why. And the holivars and trolling that grow around the nodejs asynchronous paradigm almost every day, to put it mildly, make me puzzled.

Learn more about node-sync in my article:
node-sync - pseudo-synchronous programming on nodejs using fibers

Web

The general logic of the Web application is implemented on the expressjs style express. Except that each request is wrapped in a separate Fiber, within which all operations are performed in a synchronous style.

Because it was impossible to override some parts of the expressjs functionality, in particular, routing, it had to be removed from npm and included in the main project repository. The same applies to a number of other modules (especially those developed by LearnBoost ), because contributing to their projects is very hard and not always possible .

CSS is generated via stylus . It is really very convenient.
Template engine - ejs (both on the server and on the client).
Download files - connect-form .

The web is very fast because all modules and initialization are loaded into the process memory at startup. I try to keep the average response time of the Web process to any page - up to 300ms (excluding uploading images, registration, etc.). Conducting profiling, I was surprised to find that 70% of this time is taken by the work of mongoose (mongodb ORM for nodejs) - more on this below.

i18n
I’ve been looking for a suitable solution for internationalization in nodejs for a long time, and my searches converged on node-gettext with a little doping. It works like a clock, the locale files are tightened "on the fly" by server nodejs processes during the update.

Cache
The caching functionality with all its logic fit into two screens of code. Redis is used as a cache backend.

Memory
In Web processes, memory flows like water, as it later turned out, because of mongoose. One process (during the day, during the average load) eats up to 800MB in two hours, after which it is restarted by the supervisor.
It is quite difficult to look for memory leaks in nodejs, if you know interesting ways - let me know.

Data

As shown, the mongodb schema-less paradigm is ideal for the Tactoom model. The database itself behaves well (weighs 376MB, of which 122MB is an index), the data is selected exclusively by indices, so the result of any query is no more than 30ms, even under high load (most requests are <1ms in general).

If it is interesting, in the second part I can tell you in more detail how mongodb was “tamed” for a number of non-trivial tasks (and how it was not possible).

mongoosejs (mongodb ORM for nodejs)
About him I want to say separately. Selecting a list of 20 users: requesting and selecting data in mongo takes 2ms, data transfer is 10ms, then mongoose does something else 200ms (I am already silent about memory) and as a result I get objects. If you rewrite it to a lower-level node-mongodb-native , then all this will take 30ms.
Gradually, I had to rewrite almost everything on mongodb-native, while increasing the system speed as a whole every 10.

Statics

All Tactoom statics are stored on Rackspace Cloud Storage . In this case, I use the static domain cdn X .infosocial.net , where X is 1..n. This domain sends through DNS to the internal domain of the container in Cloud Storage, allowing browsers to load static files in parallel. Each static file is stored in two copies (plain and gzip) and has a unique name in which the version is sewn. If the file version is updated, the address will change, and browsers will download the new file.

The application statics are compiled (client js and css, pictures) via a samopisny mechanism that determines modified files (via git-log), minify, makes a gzip copy and uploads them to the CDN. The build script also monitors the modified images and updates their addresses in the corresponding css files.
The list (mapping) of the static addresses of all files is stored in Redis. This list loads into memory each Web process at startup or when updating static versions.
In fact, the de-deployment of any static changes is done by one team, which does everything itself. And it does not require any reloading, since the nodejs applications pick up the modified static file addresses on the fly through redis pub / sub.

User statics are also stored on Rackspace, but unlike application statics, it has no versions, but simply undergoes a certain canonization, which allows using the hash of the picture to get addresses of all its sizes on a CDN.

For definitions of a host (cdn X ) on which the specific static file is stored, consistent hashing is used.

Server architecture

In fact, Tactoom is scattered on 3 tight areas:
1. Rackspace - a platform for fast scaling and storing static
2. Platform in Europe - here our physical servers
3. The secret (here logs are rotated, background calculations are made and statistics are collected)

Only one server is watching the world - nginx, with open ports 80 and 4000. The latter is used for COMET connections.
The remaining servers communicate with each other via direct ip, closed from the world via iptables.

: 80
nginx proxies requests through upstream configuration to Web servers. At the moment there are two upstream: tac_main and tac_media . Each of them contains a list of Web servers running node-cluster on port 3000, each Web server has its own priority when distributing requests.
tac_main is a cluster of Web servers that are close to the database and are responsible for issuing most of the web pages to registered Tactoom users.
tac_media is a cluster of Web servers located close to the CDN. Through them all operations on loading and resizing images take place.

The webN and cloudN servers are depicted to show where I add servers for habraeffect and other nice events.
New servers go up in 10 minutes - in the image stored on the CDN.

: 4000
Here is the usual proxy-pass to the comet server, where the Beseda application jets COMET works , which I will discuss in the second part.

tac1, tac2, data1
These are the main Tactoom servers: XEON X3440 4x2.53 GHz 16 GB 2x1500 GB Raid1.
On each, the Mongod process works, they are all combined in ReplicaSet with automatic failover and distribution of read operations to the slaves.

On tac1 - the main Web cluster, on tac2 - Cloud cluster. Each cluster has 8 nodejs processes.

In the near future, we will create another upstream tac_search for which only search queries will be routed. It will be Web-cluster, which I put next to elasticsearch (about him in the second part) server.

findings

Quoting the slogan of the creators of NodeJS:
“Because nothing blocks, less-than-expert programmers are able to develop fast systems.”
"Due to the fact that nothing is blocked, less-than-experts can develop fast systems."

This is a lie. I have been using nodejs for almost 2 years now and I know from my own experience that in order to develop “fast systems” on it, you need not less experience (or even more) than on any other technology. In reality, with the callback-driven paradigm in nodejs and javascript features in general, it’s rather easier to make a mistake (and then look for it for a very long time) than to win in performance.
On the other hand, the trolling of Mr. Ted Dziuba is also complete nonsense, for the example with the Fibonacci numbers is sucked from the finger. Only a person who does not understand how Eventloop works and why it is needed at all will do this (which, incidentally, is proved by point 1).

After reporting to DevConf this spring, I am often asked questions about whether to make a new project on NodeJS. My answer to all:
If you have a lot of time and you are ready to invest it in the development of a new, crude and controversial technology - go ahead. But if you have deadlines / customers / investors in front of you and you don’t have much experience with server-side JS behind your back - it’s not worth it .

As practice has shown, to raise the project on NodeJS is real. It works. But it cost me a lot. That only costs node open-source community, which I still try to have time to write.

Part 2

The second part of the article will be on the other day. Here is a brief list of what I'll write about in it:
1. Search (elasticsearch)
2. Mail (google app engine)
3. Deployment (capistrano, npm)
4. Queues (redis, kue)
5. COMET server (beseda)

Too much information for one article.
If I see interesting questions in the comments, I will answer them in the second part.

PS

There will be no food . Any comments containing criticism will be ignored without reference to your own achievements.
We are looking for front-end ninja, details here.
Let me remind you that Tactoom is in closed beta testing. Registration is limited. Leave the email, and you may soon come to invite.

UPD 19.10:
The second part is delayed, because a lot of work.

Source: https://habr.com/ru/post/130345/

All Articles