Node: Scaling in small versus scaling in general

For the past few weeks I have been using all my free time I can find to think about which technologies we will use to implement the first version of BankSimple . Many people are likely to assume that I immediately preferred Scala, due to the fact that I was a co-author of a book about this language, but I approach the solution of engineering problems completely differently. Each problem has a corresponding set of applied technologies, and the task of the developer is to justify the need to use them.

(By the way, Scala may be well suited for BankSimple, in no small part because of the large amount of third-party Java code with which we have to integrate, but this is a completely different topic for the blog, and even, most likely, for a completely different blog).

One of the most talked about technologies among Hacker News is Node, the environment for developing and launching event-driven JavaScript applications on a V8 virtual machine . As part of the selection of technologies for the project, I completed the Node assessment. Yesterday I expressed some general skepticism about Node, and the author of this environment, Ryan Dahl, asked me to state my thoughts in more detail. So, proceed.
')
I, of course, have no purpose in discrediting Ryan, a good guy and a great programmer who knows more about low-level C than most of us will ever be able to, and without ponto (in the original, without neckbeard ). And I am not discussing here a community of enthusiasts that quickly grew around Node; if you have found a tool you like to work with and strive to grow with it, then it gives you more strength.

Rather, the purpose of the article is to study how Node satisfies the second of the tasks assigned to the Node project, a task that seems important to me for several applications.

What is Node created for?

Section "About the project" Node home page reads:

“The goal of Node is to provide a simple way to build scalable network applications.”

A few paragraphs below stated:

"Since nothing is blocked, even non-programming experts are able to create fast systems [with a Node]."

So the goal of Node is to provide an easy way to create scalable network programs, or for non-expert programmers to develop “fast systems”?

Although these goals may seem related, they are very different in practice. In order to better understand why, we need to distinguish between what I call “scaling in the small” and “scaling in general”.

Scaling in small

In a small-scale system, in general, everything works.

The power of modern equipment is such that, for example, you can create a web application that supports thousands of users using one of the slowest programming languages available, terribly inefficient access to data storage and inefficient data storage templates, absolutely without caching, without sensible work distribution , without regard to the context of use, etc. etc ... In principle, you can apply all available anti-patterns and still get a working system as a result, simply because the equipment can work efficiently even with a poor choice of solutions.

This is wonderful, actually. This means that we can prototype mindlessly using any technology we like, and these prototypes will often work better than we expected. Better yet, when we get stuck in traffic, it’s trivial to drive around. Moving forward simply means spending a few minutes thinking about your problem and choosing implementation technologies with slightly higher performance characteristics than the ones you used before.

Here, I think Node fits perfectly.

If you look at the people who use Node, these are largely web developers who work in dynamic languages with what we could politely call limited performance characteristics . Adding Node to their architectures means that these developers came from the fact that they did not have parallelism and had very limited application performance at run time to switch to relatively good parallelism — the hard-imposed Node environment running on a virtual machine with relatively good performance. These developers removed the painful part of their application that was more suitable for an asynchronous implementation, rewrote it with the help of Node, and move on.

It is wonderful. Such a result definitely corresponds to the declared secondary goal of Node, “less than an expert programmer” “able to develop a fast system”. However, it has very little to do with scaling in general, in a more general sense of the term.

Overall scaling

In a system of significant proportions, we do not have a magic bullet.

If your system encounters a waterfall of work that needs to be done, none of the technologies are able to make it all better. When you work on a large scale, you move around the razor's edge, forming a coherent dance of well-applied technologies, development methods, statistical analysis, in-house communications, sound engineering management, fast and reliable hardware and software, vigilant monitoring, and so on. Scaling is difficult . So difficult that, in fact, the ability to scale is a deep competitive advantage of those that you can not just download, copy, buy or steal, just going out.

This is my criticism of the main stated goal of Node: “to provide an easy way to create scalable network programs”. I basically do not believe that there is a simple way to create a scalable anything . People confuse easy problems with simple solutions.

If you have a problem that was easy and convenient to solve by moving the code from one part of an extremely restrictive technology to the tip of a slightly less limited technology, consider yourself lucky, but that does not mean you are working to scale. Twitter won an easy victory when a part of the service, for example, a self-written message queue in Ruby , was rewritten to Scala. It was great, but it was scaling in small. Twitter is still in a tough battle for scaling in general, as it means much, much more than choosing any technology.

Growth Node

As for me, I think that Node will be hard to grow along with the developers in the transition from scaling in small to scaling in general (no, I don’t argue that “callbacks will turn into a bunch of spaghetti code”, though I think you hear about it again and again, because this is actually the painful point of the developers of asynchronous systems).

A bold decision in the Node architecture is that all operations are asynchronous, down to file I / O, and I admire Ryan's commitment to consistency and clarity in the implementation of this thesis in his software. Engineers who deeply understand the load on their systems can find places where the Node model works well and can be good and efficient for an indefinite period of time; this we do not know, since we are not yet observing long-term and mature Node deployments. Most of the systems I worked with change all the time. Changing workload. The data you work with changes with the system. What used to be well suited for an asynchronous solution has suddenly become better served by a multi-threaded solution, or vice versa, or you are faced with some other, unpredictable, complete changes.

If you are deep into Node, you are stuck on one of the ways to achieve concurrency, on one of the ways to model your problems and solutions. If the solution does not fit into the basis of the event model, you are trapped. On the other hand, if you work with a system that allows you to implement several different parallelization approaches ( JVM , CLR , C, C ++, GHC , etc.), you have the opportunity to change your parallelism model as your system evolves .

At the moment, the main premise of Node - that events necessarily mean high performance - is still questionable. Researchers at the University of California at Berkeley found that "execution threads may have the strengths of an event-based model, including support for high concurrency, low overhead, and a simple concurrency model." Later research based on previous work shows that events and the approach to the pipeline model are equally good, and that blocking sockets can actually increase their performance. In the industrial world of Java, intermittently, non-blocking I / O is not necessarily better suited to execution threads . Even one of the most cited documents on this issue with the blatant heading "Why threads are a bad idea" ends with the conclusion that you should not give up threads for high-end servers. There it is just stated that there is no solution that is equally suitable for everyone in terms of parallelism.

In fact, adopting a hybrid approach to concurrency seems to be moving forward if there are no contraindications. Computer scientists at the University of Pennsylvania have discovered that a combination of streams and events offers the best of both worlds . The EPFL Scala team claims that Actors combine thread-based and event-based programming into one neat, easy-to-understand abstraction. Russ Cox , a former Bell Labs employee who is now engaged in the Go programming language of Google, goes even further, arguing that the discussion “flows against events” itself is meaningless (note that all of this does not even affect the distribution of system scaling; flows are designs for a single computer, and events, designs for a single processor; we are not even talking about the distribution of work between machines in a simple manner; by the way, this is included in Erlang, and you should think about it if you are nursing a fast-growing system).

Statement: experienced developers use a mixture of threads and events, as well as alternative approaches such as Actors and, experimentally, STM. For them, the idea that “non-blocking means that it is fast” sounds at least a little silly, this refers to the mythology of scalability. Guys who pay big money to deliver scalable solutions do not rewrite their systems using Node at night feverishly. They do what they have always done: measurement, testing, performance testing, thinking, studying the scientific literature related to their problems. This is what is needed for scaling in general.

Conclusion

For my working time investments, I would rather be based on a system that allows me to flexibly mix the asynchronous approach with other methods of parallelism modeling. The hybrid concurrency model cannot be as simple and clean as the Node approach, but it will be more flexible. Although BankSimple is in its infancy, we will face the joyful challenges of scaling in the small, and the Node can be a sensible choice for us at this early stage. But when we need scaling in general, I would prefer an assortment of different options open to me, and I would not like to face the prospect of a large rewriting under the pressure of circumstances.

Node is a great piece of code with a community of enthusiasts, a whip accompanying it, and a bright future. It makes sense as a “unifying technology” that offers an immediate solution to the problem of early scaling in a way that is particularly accessible to a generation of web developers who largely originate from users of dynamic languages. Node more than seems to satisfy its secondary stated goal, attracting acceptable performance to developers with little experience who need to solve network-oriented tasks. Node for a certain type of programmers is very convenient and brings pleasure, and, undoubtedly, it is easy to start working with it. People from the Node community are at a good stage, inventing wheels inspired by other well-known web frames, package managers, testing libraries, etc., and therefore I do not regret them. Each community of programmers rethinks early things, leading to their norms.

After we figured out why Node is more suited and less so, it is important to remember that there is no panacea for tasks of significant scale. Node and its approach with strictly asynchronous events should be considered as a very early point on the continuum of technologies and techniques, which includes scaling in general.

Approach popular solutions with caution. Everyone can talk about a hot new technology, but very few people actually work on a scale in which these technologies will be used when passing through various rakes. Those who tend to be short-circuited with numbers and scientific studies are busy working with tools and methods that have been good for a long time. If you invest your time in new technologies, be prepared to learn and grow with them, and, perhaps, to ensure that you desert from the ship when you find yourself limited.

It is not easy.

Source: https://habr.com/ru/post/127696/

All Articles