Nowadays, the term “BIG DATA” is widely known. After the appearance in the network and in the press of numerous publications related to the processing of "big data", interest in this topic is constantly growing. Increasingly in demand are database management systems with
using NoSQL technology. It is clear to everyone that to build systems “BIG DATA” it is necessary to have impressive hardware resources. It is even more important to be able to optimally use the computational resources of the system and effectively scale them. This inevitably changes the approach to building data processing systems.
If earlier systems were built according to the principle of centralizing the data warehouse, with which a set of powerful computing servers are working, now this approach is gradually fading into the background. There are more and more systems built on the basis of a cluster of a large number of standard servers of average power. There is no centralized storage in such a system. To work with data within the cluster, modular distributed storage system is used using the local disk resources of each server. If earlier scaling was carried out by adding disks to the centralized storage system and upgrading of computing servers, now these same issues are solved simply by adding standard nodes to the cluster. This approach is becoming more common.
During the operation of such systems often have to face the problems of lack of computing resources on some nodes. There are situations when the load on the cluster nodes is unevenly distributed - one part of the nodes is idle and the other is overloaded with various tasks. These problems can be solved "extensively" by adding more and more nodes to the cluster - which many, by the way, practice. You can, however, apply an “intensive” approach, optimizing the distribution of resources between different tasks and different cluster nodes.
One way or another, there has recently been a pressing need for a system capable of quickly and flexibly redistributing available resources in response to changing load conditions. In practice, to implement the above functions, the system must ensure that three main conditions are met:
')
- Provide the ability to combine the computing resources of individual servers into a common set.
- Provide the ability to run an arbitrary application on an arbitrary cluster node.
- Provide the ability to allocate computing resources from the total set of each task separately in a certain amount.
The idea of creating a cluster management system with generalized computing resources some time ago was implemented by the Apache Foundation in a product called MESOS. This product allows you to ensure the fulfillment of the first condition - to combine the computing resources of several hardware servers into one distributed set of resources by organizing a cluster computing system.

In a nutshell, how it works: on each node of the cluster, the MESOS service is started, and it can operate in two modes - mesos-master and mesos-slave. Thus, each cluster node receives either the mesos-slave role, the mesos-master role, or both roles in the space, which is also possible. Mesos-slave nodes are designed to launch applications using the command received from the mesos-master node. The mesos-master nodes control the application startup process on the cluster nodes, thus ensuring that the second condition is met - they allow you to start an arbitrary application on an arbitrary cluster node. Mesos-master nodes are usually 2 or 3 for fault tolerance. Since, by default, both roles are started on the node at the same time, it is advisable to disable the mesos-master role on most nodes. The mesos-slave and mesos-master nodes interact with Apache Zookeeper. The functionality of the Apache MESOS system can be flexibly expanded by embedding third-party applications in MESOS as a framework.
The key Apache MESOS approach is that mesos-slave nodes take into account the free hardware resources on them — CPU and RAM — by informing the mesos-master nodes about their number. As a result, mesos-master has complete information about the computing resources available on the mesos-slave nodes. At the same time, he not only issues a command to a slave node to launch an application, but is also able to forcibly set the number of computing resources that this application can have. This problem is solved using the application containerization mechanism. The process starts in the so-called. container - closed operating environment. The basis of the container is the image file in which the OS kernel is installed, the root FS is deployed, all the necessary system libraries, and so on. When the container is started, the system kernel is launched from the image. After that, the application itself starts up in the selected, previously prepared for it operating environment. It turns out a kind of virtual machine for a single process. Here it is important that the use of containerization allows you to allocate for each container a fixed amount of RAM and a fixed number of CPU cores (including fractional, less than 1 - 0.5 cores, for example).

Thus, we can ensure the fulfillment of the third condition - we realize the possibility to allocate to each task a certain amount of computing resources from the total set. It should be noted that Apache MESOS initially solved the voiced tasks using its own application containment algorithms; however, after the appearance of the docker product on the market, the latter completely replaced the built-in Apache MESOS containerization tools. Anyway, currently docker integration into solutions with Apache MESOS is widely used and is the de facto standard. Docker positions in this area have been further consolidated thanks to the docker hub service - the system of free distribution of containerized applications, which is ideologically similar to the well-known git-hub service. Using this service, developers can publish their applications in the format of ready-made docker containers, which many, by the way, have been actively using lately.
Now, if on any mesos-slave node resources are exhausted, the mesos-master will immediately "become known" to the node, and it cannot start the application on such a slave node. In this case, the mesos-master will be forced to look for another, less loaded mesos-slave. This leads to the fact that more demanding tasks to the availability of computing resources will be “displaced” to less loaded cluster nodes. Thus, we get a full-fledged cluster with generalized resources, which can set a certain amount of resources allocated to the application for work.
Unfortunately, the obvious disadvantage of Apache MESOS-based solutions is the focus on launching one copy of the application on a single cluster node at a time, without monitoring the state of the application and maintaining its performance on a long-term basis. However, this problem was recently solved by MESOS PHERE. MARATHON and CHRONOS products were launched to the market. These products allow you to control the launch of applications in the Apache MESOS environment. Interacting through Apache zookeeper with mesos-master, they are embedded in its structures as a framework, providing the system with new functionality. MARATHON is designed to run applications that need to work for a long time in continuous mode. It implements the capabilities of scaling up and monitoring the health of applications running with it. The set of standard functions of MARATHON includes the ability to simultaneously launch an application instance on all cluster nodes, launch an application on a certain part of cluster nodes, control the number of application copies launched on a single cluster node, and so on. CHRONOS, in turn, having similar functionality, is focused on running one-time tasks on a schedule.

Both of the listed applications are equipped with their own management interface, which is implemented using the HTTP protocol and REST API technology. In fact, MARATHON is between the user and the mesos-master, accepting the request to launch the application via the REST API in JSON format. In the request, the user, among other things, can configure the total number of instances of the application being launched across the cluster, the number of application instances per node, the amount of system resources issued to each application instance, etc.
As mentioned above, many developers are starting to distribute their applications in the form of docker containers. In particular, MESOSPHERE successfully uses this approach, as a result of which the MARATHON and CHRONOS applications discussed in this article are currently available as ready-made containers. Their use simplifies the process of servicing these subsystems, makes it easy to move them between nodes of the cluster, significantly speeds up the process of updating versions, and so on. The experience of our own developments, as well as the experience of third-party companies, makes it possible to consider this approach the most likely trend in the development of technologies in the industry.
Summing up, we can say that we have at our disposal a technology that in practice can solve the tasks formulated at the beginning of the article, fulfill three main requirements: to ensure the possibility of transparent use of the generalized computing resources of an IT system, to provide the ability to run arbitrary tasks on an arbitrary node of the system and establish a mechanism for assigning a certain amount of computing resources from a common cluster set to each task separately.
In conclusion, it would not be superfluous to note that the effectiveness of the campaign described in the article was confirmed by the successful experience of implementing and operating solutions developed using the technologies described above by several St. Petersburg IT companies.
In more detail the technologies considered in this article are described
here .