In this and subsequent articles (
part 2 ) I want to talk about the Erlang / Erlang programming language, its use in our
Risovaska project, as well as which applications and ready-made modules (most of which are also written on Erlang) we used in the server part.
Searching on Habré on the topic of Erlang / Erlang, I realized that the topic was covered a little, there are only a couple of really good articles on the topic of language (for example, an excellent article from the creator of the language in translation
alex_blank That is why I want to dwell first on the language itself and its differences from traditional languages.
Virtual machine and nodes
To begin with, I would like to clarify that programs written on the Erlang are executed only inside a virtual machine, which is called a node in Erlang terms. There are versions of a virtual machine (or
Erlang / OTP ) for most operating systems (Windows, Linux, Mac OS, FreeBSD, read that Erlang was even launched on the iPhone). Since its C source is open, compiling for any operating system is not a problem. Multiple nodes may be running on the same computer, although this is rarely needed. Each node must have its own unique name in order to communicate with other nodes on other computers on the network. If interaction with other nodes in the network is not assumed, then the node may not have a name. The name of the node has the following format: “name @ IP or computer name”. For example, an example for a local network: test@192.168.0.101. There is also its own compiler of Erlang programs in the native processor code: HiPE (
high-performance native code compiler ). It is part of Erlang / OTP. The highest acceleration of HiPE gives when working with binary data (almost tenfold) and with floating arithmetic (foating point arithmetic), in other cases the speed increase is insignificant.
')
Variables that can be assigned only once
What is so cool about Erlang, what is not found in other languages ​​and what is it that breaks the brain to programmers in traditional programming languages ​​(C, Delphi, Basic, etc.)?
There are variables in the Erlang, but after assigning them a value, it cannot be changed. What are these variables, you say? Yes, these are still variables, so they have two states: unbound (value not yet assigned) and related (value assigned). And this is done in the language is not accidental. First, it greatly simplifies the garbage collection of a virtual machine (no need to follow the pointers). Secondly, it makes it easy to write algorithms that are broken down into many separate processes, without fear that one process will damage the data of another, since they do not share anything with each other.
Recursion
But how can I implement, for example, a cycle, if the value of a variable cannot be changed? Recursion! Here is the simplest example on Erlang of raising to the power N of the number X:
raising(1, _X, Result) ->
Result;
raising(N, X, Result) ->
raising(N-1, X, Result*X).
Compact and beautiful. For example, raising (3, 10, 10) will return 1000. By the way, you can write and raising (2000, 2, 2). You will see a very large number, but the calculations will be performed without problems. There is hidden another interesting Erlang property: it does not have a strict type of variables, but more on that later.
Recursion is bad, you say! But not in Erlang, if the recursion is written as tail recursion, that is, when nothing is performed after the function call itself. The example above is just an example of tail recursion: after raising (N-1, X, Amount * X) there is nothing. And then the virtual machine does not need to remember function calls in the stack. She immediately forgets them, so it works very quickly and there is no limit on the number of nestings.
Processes
All that is performed on the node is either a separate process or part of another process. Processes can spawn other processes and follow each other. Each process has its own unique number, called PID. And, what is very important, PID is unique not only on one node, but on all nodes in the network! You can find out your own PID process by calling the self () function. Here is an example of running our function as a separate process:
Pid = spawn(?MODULE, raising, [3, 10, 10]).
Any process other than PID can be given a unique name:
register(unique_name, Pid).
After that, it is convenient to call this process by name from any node on the network without knowing its PID.
Again, the question may arise that it all works slowly. Well, I do not! Firstly, the processes in Erlang have no relation to the processes of the operating system (Erlang has its own process scheduler). On a typical average machine, simple processes run at about 350,000 per second. Maybe the processes eat up a lot of memory? Not much at all: from 4kb to the simplest process. In addition, processes can be put to sleep (hibernating) immediately after launch, then you can reduce the memory size to 1kb per process in general.
To communicate with each other, processes can send messages to each other:
Pid ! {self(), hi}.
Moreover, the syntax does not change whether you send a message to a process on the same node or on another computer on the network!
And this is how messages are received by the process:
receive
{Pid, hi} ->
Pid ! hello;
OtherMessage ->
Io:format(“I received some strange message: ~p~n”, [OtherMessage])
end.
Agree, the code for sending and receiving messages is very compact. In the example above, the process will wait forever until it receives a message. You can avoid this by adding a “after N” block to the receive construction, where N is the number of milliseconds waiting for a message to be received.
Key features
In short in Erlang:
- The creation and destruction of processes is very fast.
- The transfer of messages between processes is very fast.
- You can run a lot of processes (in practice, running 20 million processes on a single node, in theory, up to 120 million).
- Processes share nothing with each other and are completely independent.
- The main way to interact between processes is to send a message.
And here is a good article describing the creation of a comet-server serving one million simultaneous connections on the Erlang:
http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3/ (article consists of three parts) . On the Erlang, this problem is solved quite easily.
Types
Any variable in the Erlang can be assigned a value of any type. But you can also check what type of value is in a given variable using built-in functions (is_integer, is_binary, etc.). Usually, the absence of strict typing in a language is considered a disadvantage, but in practice I found that this is more often an advantage and greatly increases the flexibility of the program. In addition, to avoid potential errors with types, Erlang includes a
Dialyzer static analyzer that
detects such errors.
Supervisors
And now we come to the really interesting properties of the Erlang / OTP development environment. Programs written in Erlang are usually considered not just reliable, but super-reliable. How does this work out? It's simple. The entire application written on the Erlang, simply speaking, is divided into supervisors (supervisor - a special process that monitors the child processes) and workflows that perform the main work. You can read more about this in the
OTP Design Principles . There is even such a ridiculous concept in Erlang: “let the process die”. And indeed it is. If the process is under the supervisor, then if it falls, it will be restarted by the supervisor. In general, the supervisor can flexibly customize his behavior in the event of a falling child process. For example, in such a situation, it can stop all child processes and start them all over again. This is necessary in a situation where the child processes are somehow dependent on each other and the fall of one can lead to the inoperability of the others.
Thus, the program built on the principles of OTP will look like a process tree:
Picture taken from OTP Design Principles
Considering that supervisors do not perform any calculations, but only observe, the probability of their falling tends to zero. Of course, the entire node can fall down completely, for example due to lack of memory, but there is a solution here: you can start a special operating system process that monitors the node and restarts it after a certain time. There is another option: Distributed Applications (
Distributed Applications ) - these are applications that can work on several nodes. And at one time, the application works only on one node. In the case of a node crash on which such an application is running, it automatically restarts at the next node in the list. The list of nodes where a distributed application can work can be dynamically changed during operation.
Work with binary data
Erlang has really fast work with binary data (especially in conjunction with HiPE), so it is very natural to write processing of input binary data on it, up to and including work with individual bits. Here, in comparison with Java 6, Erlang wins several times in speed.
disadvantages
The article would not be complete, if not to say about the main shortcomings of Erlang:
- the lack of support for Unicode strings (all the beaten lack of language, although this spring they promise to add their support to Erlang R13B; we are waiting),
- slow math, write some serious mathematical calculations on it is inefficient,
- a small number of additional libraries (although all the most necessary are present and the number of libraries is constantly growing, yet it is still far from such diversity as in .NET or C),
- the lack of a debugged and fast graphical library, so I would not write client applications on the Yerlang (of course, for example, there is wxErlang , but the library is still far from completion).
But all these shortcomings, except perhaps the lack of support for Unicode strings, can either be circumvented, or in fact are not shortcomings, since it is not what this language was created for. And it was created for
high-load, scalable, ultra -
reliable systems. At the very beginning, the language was created specifically for use in telecommunications, but as it turned out later, it is perfectly suitable for creating Web servers and distributed systems.
To be continued
At this point I will probably finish the first article, it already turns out to be quite large. In the following articles, I will discuss the use of the distributed
Mnesia database (which is included in Erlang / OTP), the use of
Amazon S3 and
Amazon EC2 .
Write, plz, in the comments to this article, about what features of the language I would like to know in more detail.
Continuing the cycle of articles -
part 2 .