Parallel technology

Hello!

My name is Alexander, I am an administrator in the cluster sharing department at a university in Tomsk. This blog will be devoted to what happens in the process of my work as a scientific (more on that later), and as a system administrator.

The first post - introductory in parallel technologies.
')
So, cluster. All that will be written later will relate to our test cluster: a homogeneous system of 24 nodes with the following performance characteristics:

- Number of compute nodes: 24
- Number of processors: 48 (Intel XEON 5150)
- Number of cores: 96 (2.66Ghz)
- Total OP: 192GB
- Total HDD: 2880GB
- System network: Infiniband 4x, 24 ports
- Auxiliary network: Gigabit Etherhet, 48 ports
- Service network: ServNet , 25 ports

Each node is installed on different SuSE 10.3 and Windows HPC Server 2008 partitions.
The test cluster is divided into two “virtual” clusters - under SuSE 10.3 and Server 2008, respectively. Both head nodes (both for SuSE and Server) are placed on separate nodes that do not touch, so as not to disrupt the infrastructure.
Naturally, when the node is rebooted under SuSE or under Windows, the head node automatically determines that a new computational unit has appeared in its “domain”.

Now quite a bit of theory.

Distributed architectures

Since in further posts the multiprocessor and grid architecture will be affected superficially, I will limit myself to references to the relevant literature:
Grid Technologies Internet Portal
Multiprocessor technology

We will consider briefly the available multicomputer architecture (namely, cluster of workstations).

Multicomputer architecture

According to pioneer in the field of cluster technologies Gregory Pfister,
“A cluster is a kind of parallel or distributed system that:
1. consists of several interconnected computers;
2. used as a single, unified computer resource ".
"

Better not tell. A computer "farm" connected by a network and which the end user sees as a single resource. It is a network, because for the concept of "cluster" the type of cable or network topology is not critical. But: most often, clusters with a low-speed network (UTP) are used in home or strictly limited financial conditions. Otherwise, the de facto standard is InfiniBand.

Another de facto standard for programming for multicomputer systems is the Message Passing Interface (MPI), a message passing library, a collection of functions in C / C ++ / Fortran, facilitating communication between processes of a parallel program with distributed memory.

MPI is not a formal standard, but a community standard that includes not only simple developers, but also large firms that supply computer hardware. After its adoption, MPI received rapid development and at the moment there is its second version and more than a dozen implementations of both free-distribution (MPICH) and commercial ones with their own “buns” (for example, Intel MPI optimized for its own etc architecture).

Naturally, the problems of administration and programming for such systems are particularly acute for the modern IT community (compared to the administration of the classical system). Some of the problems that I personally encountered were not described at all in Google and the integrator’s technical support was difficult to answer, it was necessary to get out on their own. I will try to illuminate some of these moments in the future in order to help at least something with the Russian “parallel” community (I very much hope that this is the place to be).

Source: https://habr.com/ru/post/99515/

All Articles

Parallel technology

Distributed architectures

Multicomputer architecture

More articles: