Supercomputer in NArFU: the development of the Arctic by numerical methods

In modern technical and engineering universities, quite serious computational problems are often solved, such that on a regular computer they will be considered days and weeks. Powerful computing systems, the number chimes, have already created dozens of Russian universities. One of them is a supercomputer recently built by Fujitsu and Softline at the Northern Arctic Federal University in Arkhangelsk.

What computations needed a supercomputer?

')
A number of tasks are much easier to solve by numerical methods than analytically. These are usually applied tasks of mathematical modeling of various production units, for example, chemical reactors, heat exchangers or the burner of the welding apparatus. A reliable model allows you to accurately predict the behavior of a real device depending on changes in certain parameters of work and improve it. To obtain a reliable model, it is usually necessary to verify the calculated data with the data of a real experiment more than once, to make corrections to the model, to recalculate anew. It is very expensive in terms of computation, even if you calculate intermediate versions of the model with reduced accuracy. A few days or weeks of calculations on an ordinary computer is a common reality.

In NArFU, such resource-intensive calculations are used in several scientific and applied fields at once.

The first direction is the tasks in the field of molecular dynamics. This, for example, modeling of diffusion, absorption, mass transfer in gas mixtures, all this is calculated with high accuracy - to the behavior of hundreds and thousands of molecules. In practice, the tasks of improving the properties of filter materials, improving the technology of separation of mixtures and cleaning chemicals are solved here.

The second direction is fluid dynamics. These are also applied tasks oriented towards production, in particular, machine building. One example is the numerical calculation of the flame behavior in a gas burner. Calculation of speeds, pressures and temperatures in different layers of gas, turbulence, allow eventually to improve the welding technology, improve tools, improve the quality and speed of work. Similar tasks are handled by the NArFU branch in Severodvinsk. This is a forge of the fleet, and there really is a lot of work on improving production technologies.

The third direction is heat engineering, calculations in the field of thermodynamics. It was from the department of heat engineering that the first task came, which was considered on a supercomputer. In the student work for the bachelor’s degree, a mathematical model of a heat exchanger was created - a heat exchanger for selecting side heat from industrial furnaces in the form of heated gases.

In addition, the Institute of Mathematics, Information and Space Technologies NArFU actively uses a supercomputer for training and practical work on the creation and optimization of parallel algorithms.

What does it consist of

The supercomputer NArFU is relatively small - it has 20 computing nodes, each of them is a 2-processor server with 10 cores on each processor. Total 40 processors and 400 cores. This is not so much compared to the 1000-processor monsters, but for the university level is very good, and quite enough to solve the computational problems of NArFU.

On eight of these 20 nodes, Intel Xeon Phi co-processors are installed - these are 60 nuclear “threshers”, similar in functionality to nVidia graphic processors. They very quickly consider a number of specific problems, first of all calculations with large matrices, numerical solution of systems of differential equations. Their use gives a tangible increase in productivity, especially on such specific tasks for which they are intended.

Although Intel Xeon Phi is not a cheap pleasure, using coprocessors is much more profitable in terms of performance / cost than taking the same tasks on conventional computing nodes without coprocessors.

In addition to computing nodes, there are two more head servers for queuing tasks for computing and administering a cluster. And four servers serve the storage system, more about it below.

Communication in the supercomputer

In clusters there are two factors that are most critical for productivity:

1. the speed of communication between nodes
2. access speed to large files.

The idea is that the program should read, and not waste time waiting for I / O operations. It is here that bottlenecks must be eliminated first.

For data exchange between processes running on different nodes, a separate, fastest network is used. This network InfiniBand, which has a very large bandwidth (up to 56 gigabits per second) and low latency. This network is used with very high intensity, it is indicated in the diagram pink.

The second separate network (indicated in the diagram in orange) is used by the job management system for connecting to the nodes, sending commands and service messages. The speed requirements are much lower here than on the first network.

And the third network, marked green in the diagram, is the technological network for servicing hardware components. Modern servers allow you to manage yourself at the hardware level, regardless of the installed system. Turn on / off, check the parameters of the hardware components, run diagnostics, reboot - all this is possible at the hardware level and all this is done through this network.

Data storage

Fujitsu's Exabyte File System (FEFS) 60 TB network storage system provides 1.7 gigabytes per second bandwidth. It is much faster than any hard drive. Physically, these are 2 baskets of hard drives, which are served by 4 servers.

The FEFS file system contains a metadata server that stores metadata about the namespace and several servers storing objects with, in fact, files.

Software

The operating system on the compute nodes is Redhat linux.

PBS Professional job management system.

Cluster management system HPC Gateway from Fujitsu, its task is to install and reinstall compute nodes, turn them on and off, etc.

Ansys system was purchased from engineering commercial “software”, and it is actually responsible for the calculations themselves.

How it all looks from the user's point of view.

There is a head server on which users come, for example, far off. Through ssh, they can place their files, compile them and send the generated task to a queue for calculations. This is done via PBS Pro. When the task is calculated - you watch the results, and repeat if necessary.

And the second way is to send your models to the one-button from the engineering working environment to the supercomputer. This can be done from Ansys and from other engineering software, too. It is only necessary to integrate them correctly with the task management system.

How does all this look physically?

In the central building of the university there is a fairly large server room, it has several rows of racks, the supercomputer equipment is distributed in three racks that are not fully loaded to optimize cooling.

Computing nodes are dual-processor (Intel Xeon E5-2680 v) servers in a half-width 1U form factor. Two models: Fujitsu PRIMERGY CX250 S2 and CX270 S2 are distinguished by the presence of the second Intel Xeon Phi coprocessor.

Fujitsu PRIMERGY RX300 and R200 rack servers are used to maintain the storage system and as headends.

The supercomputer could consume up to 50 kilowatts of electricity (taking into account cooling and power redundancy), which is a lot in the city of Arkhangelsk. Fortunately, when connected, it was possible to integrate into the existing reserve and infrastructure of the university. But in general, in high school, large power consumption can be a problem.

Welcome to the club

Many Russian universities have already built their supercomputer; they are united by the Supercomputer Consortium of Russian Universities ( http://hpc-russia.ru/ ), which also includes NArFU. The main task of the consortium is the popularization of parallel computing and mutual assistance of participants. If there is a need to find something more resource-intensive - we can turn to our partners. The result of the joint work was the inclusion of the annual youth scientific-practical school “High-performance computing on GRID-systems” ( http://itprojects.narfu.ru/grid/ ), held in NArFU, in the list of events of the supercomputer consortium.

Before purchasing a supercomputer, NArFU employees considered their tasks at clusters of other universities both in Russia and in the neighboring countries of the north-western region - Sweden, Norway, Finland. And now colleagues from other places use the NArFU cluster.

We thank for the cooperation in writing this article by Alexander Vasilyevich Rudalev, a leading software engineer at the Department of Applied Mathematics and High-Performance Computing at NArFU.

Source: https://habr.com/ru/post/250933/

All Articles