Just the hard stuff: high performance computing for engineering and research tasks.

What you need to dig up the garden? In the presence of a vegetable garden, we need working tools and labor (workers). And what if you need to dig up faster or more? You can call friends or hire other people, that is, increase the number of employees. This is an example of a high-performance garden digging. It is not always possible to increase the productivity of digging a kitchen garden by searching for strong workers, since the productivity of each individual worker is limited. Therefore, one has to resort to the services of a larger number of workers.

Similarly with high performance computing. Employees (eng. Workers) are called separate computers and processor cores in computing clusters, if you rely on the terminology of the MATLAB package. In the documentation of other clusters, these cores and computers are called nodes (English nodes), so I will call them in this note.

Just the hard stuff: high performance computing for engineering and research tasks.

Introduction

Habrahabr has already written a lot about high-performance, distributed and parallel computing (BB). freetonik has already made a detailed and visual introduction to parallel computing and continued here , explosives were reviewed by keleg in here , the theory of distributed computing was disclosed in a note by mkosyakov , Melges described the experience of organizing parallel computing over C and XakepRU described how to parallelize processes in Linux After reading them, I realized that there was no note that could help start using explosives to solve engineering and scientific problems. This is most likely a common feature of many sources of information on this topic. Programmers write good programs that perform the tasks assigned to them. University lecturers explain how and why high-performance computing should be used. But as soon as researchers realize that it is time for them to use explosives, they encounter a small number of 'bridges' that link the understanding of explosives with the direct use of explosive systems in their work. At universities, students can find such a 'bridge' in laboratory and practical work. And I will try to fill this gap in the hope that the material will be useful for those who have not studied it and will help them start using explosives. First there will be a brief introduction to VW, after which the possibilities of using MATLAB, HPCondor supercomputers will be considered.

High-performance computing (BB) comes to the rescue in cases where you need to reduce the computation time or get access to more memory. For example, your program may perform the necessary calculations during the week, but you need to get the results tomorrow. If we divide this program into parts and perform each of them on a separate node, then theoretically it is possible to speed up the calculations in proportion to the number of nodes involved. But this is only theoretically, but in practice this is always something in the way (what was written here in detail). It is worth mentioning another case where your program requires a large amount of RAM. For example, only 4 GB of RAM is installed on your computer, but at least 64 GB is required for calculations. In the systems of explosives on each node the memory of a certain capacity is installed. So if each node has 2 GB of memory, then again you can divide the program into 32 parts, each of which will run on a separate node, interact with other parts, exchange data and, ultimately, the program as a whole will have access to 64 GB of memory.
')
From these examples, you probably understand that high-performance computing is computing performed on computer systems with specifications that are much higher than regular computers. This concept is conditional, perhaps there is a more precise definition, but I could not find it now. There are parallel, distributed explosives, as well as their combinations.

Parallel computing involves the development of programs that during their execution represent several parallel and interacting processes. For example, modeling the characteristics of a solar cell cell involves the interaction of three models describing: charge carrier transport, the propagation of incident light inside the cell, temperature effects, stretching and compression. Since carrier transport, tension-compression and the refractive index of the material used in the optical model of the incident light, depend on the temperature and the models describing these effects should interact with each other in the calculation process. To speed up the calculations, you can carry out the model code describing the transport of carriers on one node, the code responsible for the propagation of light on the other, the temperature model on the third, and so on. That is, the nodes will perform interactive calculations in parallel.

Distributed computing involves the use of several nodes and processes that do not interact with each other. Very often, in this case, the same code is executed on different nodes. For example, we need to estimate the tension and compression of the same cell of the solar battery depending on the temperature. In this case, temperature is the input parameter of the model and the same program code of this model can be executed on different nodes for different temperatures.

The choice between distributed and parallel calculations depends on the organization of the program code used for the calculations, the physical model itself, and the availability of explosive systems for the end user. The rest of this post is about

how the end user interacts with the explosive system;
what are the BB systems available and what are their limitations;
about clusters built using Condor software (eng. Condor) and MATLAB (the choice fell on them simply because of the author’s experience with them);
a little about supercomputers and grids;
and how all this farm can be used.

User interaction with high performance computing

If the user uses the system's BB remotely, then he needs a computer on which he will:

prepare programs to run on the systems of explosives;
run calculations on explosive systems or connect to explosive systems to start programs;
on which it will process the results of calculations.

Much depends on the personal preferences of the user, the availability of the necessary software and other requirements. There is a possibility to choose a programming language, an operating system of a computer and a cluster, used program libraries and software for organizing clusters. Personally, I always try to write programs in C or C ++ using MPI and openMP for Linux ( here and here there are already good articles on this topic of these dads and moms of high-performance computing), but for various reasons it does not always work. A typical situation - the boss comes on Friday and says that we urgently need results. It ends with the fact that the program is written in MATLAB for carrying out the necessary calculations. And in order to get results faster, this program works on the MATLAB cluster of our organization until Monday.

As for the operating system of the user's computer, in most cases it is most convenient to use the same operating system and its distribution kit that is installed on the explosive system. At the moment, most BB systems are running different Linux distributions . If Scientific Linux is on our cluster, then it’s easier to install the same system on a work computer so that you don’t get confused by the commands. If you plan to use a cluster based on MATLAB, then the choice of the operating system does not matter, since programs written in MATLAB can run on computers with any OS (available for installing MATLAB, of course).

If you choose a mixed scheme in which you have an MS Windows OS installed on your computer, and an explosive system built on Linux OS, then you need a client to connect to a remote system (for example, PuTTY), and possibly an X server or Immediately Cygwin, in which it is all there. The local administrators of the BB systems will always help you in choosing the software.

Important point: BB systems usually either do not support programs that require work in the interactive mode (which, in the process of execution, request data input, expect other user actions such as keystrokes or mouse manipulations) or support them limitedly. Similarly, with respect to the graphical interface, its use is most often not provided for and the system's explosive systems are used in text mode from the command line (the exception is the same MATLAB). Before using your program in the BB system, it must be debugged and then converted so that it can be run on the BB system and, without further human intervention, it carried out calculations and stored the results in files or transferred them to the user in another way.

The requirements for the user's computer performance are minimal, since the main calculations are still planned to be carried out on the BB systems. You can monitor the status of calculations, start and interrupt them from a minimum configuration computer and from a mobile phone.

Some BB systems

general review

Most often, supercomputers, computer clusters and grids are used for explosives.
Supercomputers are computer systems that significantly exceed most existing computers in their parameters such as performance, available RAM, and the number of processors available. For more information about them, you can see a list of five hundred of the most productive supercomputers in the world .

Computer cluster - a group of computers that can interact with each other to increase the available memory and the number of processors involved in the work. Most often, such clusters are built within research groups or organizations.

Grids are groups of clusters and supercomputers scattered across different cities and countries. For example, you can transfer your computing task to a server in Switzerland, but it will be performed either on clusters in Germany, France or Poland. The most famous example of the grid is the European grid system EGEE , which combines about forty thousand processors and several petabytes of disk space.

It is often difficult or not possible for the end user to distinguish between supercomputers and clusters. Here are three examples:

1. Not infrequently, a group of computers connected to each other through high-speed communication networks is also called supercomputers, which is essentially the same computer cluster;

2. At the same time, there are clusters built on the basis of HPCondor software, it is also a group of computers interacting with a server on a local network (often a slow network) and no one would risk calling such clusters supercomputers;

3. There are NVIDIA supercomputers (which have a larger system block than ordinary office computers) for which the entire computing system is not scattered across the network, but fits in this system block.

If you take examples 2 and 3, the difference between the supercomputer and the cluster is obvious. In the first and third, a special face is not visible, again both systems of these two examples are called supercomputers.

HPCondor Clusters (Condor, after 2012 - HTCondor)

Software for organizing such a cluster can be downloaded for free from the project page . Clusters of this type consist of work computers and a server. Dunordavind made an important clarification to the comments: such explosive systems are not clusters in the classical sense, but rather resource managers (but in order not to rewrite the whole text, I will still call them clusters anyway). The advantage of such a cluster is that the work computers can be ordinary office and laboratory computers on which client software is installed. During the daytime, these computers can be used for basic work, but as soon as they are no longer used (it depends on the settings), the server starts to run tasks on these computers that were previously transferred to it. A prerequisite for using this cluster is to install the client software on the computer from which users transfer tasks. That is, their computer must be part of a cluster. Supported operating systems: MS Windows, MacOS and Linux.

To execute the program, this program must be compiled into executable code for the desired OS and transmitted to the server along with the necessary libraries. This also applies to programs written for MATLAB - you also need to compile them using the C compiler that comes with MATLAB. To run this program in a cluster, you need to write a simple configuration script in which the requirements for the execution environment of your program (RAM size, operating system, etc.) and the list of files transferred with this program are recorded. As an example, below is the text from one of these files (let's call it cost_top.txt):

universe = vanilla executable = cost_top.bat transfer_input_files = cost_top.exe output = dump.txt error = errdump.txt log = foo.log requirements = (OpSys == "WINNT51") rank = kflops transfer_files = ALWAYS queue

I am sure that you have already guessed - this file “explains” the Condor software such important points as the name of the executable program, which files need to be transferred to the cluster, which file to write the results of the program, to which error messages, to which additional messages, which It is the requirements put forward to the OS node and its performance and whether to transfer files.

The contents of the file cost_top.bat, which runs on the node:

 path=c:\windows\system32;c:\windows;c:\windows\system;p:\matlab6\bin\win32 cost_top.exe

Most likely you will understand that the first line in this script is responsible for adding the necessary paths to the environment variable, the second - for launching the program we need.

To transfer your task to the cluster server, you will need to type in the command line 'condor_submit cost_top.txt'. After that, your task will be queued and after a while the server will be ready to start your task on client computers. The waiting time in the queue depends on the priority of each user and the load on the cluster and is selected by the server task balancing system.

Clusters of this type have limitations:

from the moment the task is queued until the end of the calculation, your client computer must be turned on and connected to the local network as the server and the client exchange files;
This cluster only supports distributed explosive tasks;
There are difficulties in using any third-party program (different from the one written and compiled by you) and programs requiring multiple libraries.

MATLAB clusters

MATLAB itself is able to create a cluster . To do this, you need the appropriate libraries and server - Distributed Computing Toolbox and Distributed Computing Server. Nowadays, modern computer processors have more than one core, and MATLAB is able to deploy your own local cluster directly on the basis of your work computer. This cluster configuration is known as local configuration. It is convenient in those cases when you want to speed up calculations a little without much effort, as then when you need to test the program before starting it on a more serious BB system such as a supercomputer or cluster.

Along with the local configuration, there are other configurations. For example, for a cluster uniting a group of computers in a local network, a group of computers in a cluster or in a grid. If administrators have the opportunity and they are not lazy, they usually set up MATLAB clusters and conduct training courses so that users can easily use such clusters.

Advantages of MATLAB clusters:

the client computer from which tasks are transferred for calculation can be turned off after the task has been transferred and the user can pick up the calculation results later;
can perform both distributed and parallel computing tasks;
MATLAB users find it easier to start using such clusters, since the programming language is already familiar;
programs do not require compilation;
adaptation of the program for parallel calculations in which there are already operators for the 'for' cycle is very simple - just replace such operator with 'parfor' and add a couple of lines to initialize the cluster and close it after work is finished.

For example, code without using parfor:

 clear all; Na=4:50; Nc=4:30; for i1=1:length(Na), for i2=1:length(Nc), [out1,out2]=fom(Na(i1),Na(i1),Nc(i2),0) ; end end save FigOM.dat FigOM -ascii save dF.dat dF -ascii exit

And now the same thing with the use of parfor and four nodes:

 clear all; matlabpool open 4 Na=4:50; Nc=4:30; for i1=1:length(Na), parfor i2=1:length(Nc), [out1,out2]=fom(Na(i1),Na(i1),Nc(i2),0) end end matlabpool close save FigOM.dat FigOM -ascii save dF.dat dF -ascii exit

Disadvantages:

MATLAB is not a free product and some users simply cannot afford it;
cluster software is not supplied with a load balancing program (it can be installed separately), which leads to situations when some users occupy all nodes of the cluster and block access by other users.

Supercomputers and grids

As mentioned above, it is sometimes difficult to find the difference between a supercomputer, a computing cluster and a grid. From this side of the terminal window, they all look the same. All of them have a large number of processors and memory in the explosive system. Among the installed software they have compilers and libraries MPI and OpenMP. Sometimes installed MATLAB and other programs that support the use of a group of nodes and their memory.

The most common algorithm works like this:

the user connects (usually via SSH) to special nodes (English login nodes) on which he can interactively perform part of the commands and from which he can control his calculations;
loads modules needed to perform a task, for example, the gcc compiler and the MPI library;
if necessary, it compiles its program with the support of the necessary libraries;
similar to the HPCondor cluster, prepares a file of settings and commands for executing its program (job submiossion file);
sends this file of settings and commands using the 'qsub filename' command to the queue for execution;
as soon as the program is completed, the user can get the results of its execution (and it is easier to save them to files).

Configuration files are similar to HPCondor cluster files. For example, to run the above example with parfor, you can use the following file:

 #!/bin/sh #$ -l h_rt=10:00:00 /usr/local/bin/matlab /home/el/calmap.m

The second line indicates the maximum time required to perform this task, and the third line indicates the command that must be executed on this system in order to launch the MATLAB program code required by the user.
Another example file for running a program that uses the MPI libraries:

 #!/bin/bash #$ -l h_rt=4:00:00 #$ -pe mvapich2-ib 12 # LDFLAGS="-L$HOME/opt/lib -lm" export LDFLAGS CPPFLAGS="-I$HOME/opt/include" export CPPFLAGS LD_LIBRARY_PATH="$HOME/opt/lib:$LD_LIBRARY_PATH" export LD_LIBRARY_PATH PATH=$PATH:$HOME/opt/bin export PATH module add compilers/intel/12.1.15 module add mpi/intel/mvapich2/1.8.1 mpirun -np 12 m-mpi test7.ct

In the second line - the maximum time required for the calculation, in the third - specifying the name of the medium for parallel calculations (specified by administrators) and the number of requested nodes, then 4 lines with assignment of the desired value to environment variables, followed by two lines that are responsible for connecting the necessary modules and the end of the script - launch the desired program that will use 12 nodes.

Conclusion

You can not grasp the immensity, but you can and should try. In this post, I tried to make an overview of high-performance computing systems, to help novice users understand the spectrum of possibilities and understand what is available and how it can be used. As you can see, even if you do not have access to supercomputers and grids, you can build your cluster based on MATLAB or free Condor software.

ps If you can add to this note or find an error, please write about it below. In the end, it will only benefit the knowledge and understanding of the issue and it will provide an opportunity to improve the note.
pps There is also the possibility of using CUDA technology to speed up calculations in C / C ++ and MATLAB by involving the core of the graphics processor in the work, but much has been written about it.

Source: https://habr.com/ru/post/240899/

All Articles

Just the hard stuff: high performance computing for engineering and research tasks.

Just the hard stuff: high performance computing for engineering and research tasks.

Introduction

User interaction with high performance computing

Some BB systems

general review

HPCondor Clusters (Condor, after 2012 - HTCondor)

MATLAB clusters

Supercomputers and grids

Conclusion

More articles: