Hello,
Today's post is dedicated to the delicate problem of cluster performance testing. Many will say (and will be right) that, in general, the results of such tests are intended solely for press releases and reporting to the TOP500 and have no practical use. However, testing tools can also be used to identify system bottlenecks. So, in the first post we will talk about Linpack & Lizard.
Table of contents:
')
1) Linpack Overview
2) Linpack basic parameters
3) Lizard. Implementing Linpack for Windows Systems
4) Lizard. Linpack Optimization for Windows Systems
5) Native-tools for cluster testing
Note: in some moments we are talking about the performance of computers, in some - the network. From these two indicators and the total performance of the cluster1) Linpack Overview
The benchmark library for testing the performance of supercomputers (not only clusters) since the 1980s has been considered the Linpack library, now expanded to a more functional LAPACK (Linear Algebra PACKage). It has interfaces for Fortran and C.
Analogs LAPACK:
* Intel MKL
* AMD ACML
* Sun Performance Library
* NAG's LAPACK
* HP's MLIB
Each manufacturer, in the best traditions of IT, develops and sharpens its own library for its architecture. Naturally, on Intel
MKL library will give better performance than LAPACK.
The main task of Linpack and its analogs / modifications is to solve a system of linear arithmetic equations of the form Ax = f using the LU factorization method, choosing the leading element of the column, where A is a filled matrix of dimension N. The original matrix is divided into logical blocks of dimension NB × NB. These blocks, in turn, are broken into a smaller P × Q grid. Each of these units will "get" a separate processor system.
More information about the mathematical basis of the test can be <a href= the
www.intuit.ru/department/supercomputing/tbucs/4/2.html> read on the
Intuit website.
Performance in the Linpack test is measured in the number of floating point operations per second. The unit is 1 flops (one such operation per second).
2) Linpack basic parameters
•
N , rank of the matrix. The higher the rank, the more floating point arithmetic will be executed. N is limited by the amount of memory the system can allocate to the HPL process. LIZARD can select the optimal parameters, he believes. So, 26,000 is suitable for four nodes with 2 GB of RAM on each. But it is better to choose the value empirically, starting with the smallest. A drop in performance will be detected when the system starts writing to the paging file, and, accordingly, it will be necessary to slightly lower the rank value in order to get the optimal one. N must be equal to or greater than P * Q.
•
P and Q are additional coefficients, the product of which must be adjusted to the value of N. P * Q = Number of Processes. You can equate P to the number of processors, and Q to the number of nodes - will be quite optimal. Before setting up, you need to take into account Hyperthreading (or better off at all).
•
NB - coefficient reflecting the number of parts into which the task will be broken. Shows how much a piece of data will be received by each node. Speaking practically, the smaller the value of this coefficient, the more optimal the processor load. But you can customize it as you see fit, and watch the performance that you end up with (based on the needs of the architecture). When dividing N by NB, the remainder should be zero.
For convenience, you can use
Excel Linpack , when filling in the corresponding cells independently calculating the values of the coefficients.
HPL saves the results to an hpl file in its working folder with detailed comments. Unfortunately, I did not manage to bring such a file from our configuration into a digestible form.
3) Lizard. Implementing Linpack for Windows Systems
It is logical that Microsoft, who suddenly burst into the TOP500 with its new system, could not stand aside. For lazy Windows system administrators, a shell was specially developed for cluster performance testing (Lizard, Linpack Wizard), which is based on the canonical library, wrapped in a convenient visual wizard (shipped with HPC Tool Pack 2008). This wizard allows both rapid test (with standard parameters automatically selected by the wizard) and advanced for specific coefficient settings. Accompanied by all comments.
4) Lizard. Linpack Optimization for Windows Systems
Microsoft recommends to optimize the shutdown of all services on which the system does not directly depend. Script:
sc stop wuauserv
sc stop winrm
sc stop WinHttpAutoProxySvc
sc stop wass
sc stop W32Time
sc stop TrkWks
sc stop SstpSvc
sc stop spooler
sc stop ShellHWDetection
sc stop RemoteRegistry
sc stop RasMan
sc stop NlaSvc
sc stop netTcpActivator
sc stop netTcpPortSharing
sc stop netprofm
sc stop NetPipeActivator
sc stop MSDTC
sc stop KtmRm
sc stop keyIso
rem sc stop gpsvc
sc stop bfe
sc stop CryptSvc
sc stop BITS
sc stop AudioSrv
sc stop SharedAccess
sc stop SENS
sc stop EventSystem
sc stop PolicyAgent
sc stop AeLookupSvc
sc stop WerSvc
sc stop hkmsvc
sc stop UmRdpService
sc stop MpsSvc
sc config wuauserv start = disabled
sc config WinRM start = disabled
sc config WinHttpAutoProxySvc start = disabled
sc config WAS start = disabled
sc config W32Time start = disabled
sc config TrkWks start = disabled
sc config SstpSvc start = disabled
sc config Spooler start = disabled
sc config ShellHWDetection start = disabled
sc config RemoteRegistry start = disabled
sc config RasMan start = disabled
sc config NlaSvc start = disabled
sc config NetTcpActivator start = disabled
sc config NetTcpPortSharing start = disabled
sc config netprofm start = disabled
sc config NetPipeActivator start = disabled
sc config MSDTC start = disabled
sc config KtmRm start = disabled
sc config KeyIso start = disabled
rem sc config gpsvc start = disabled
sc config bfe start = disabled
sc config CryptSvc start = disabled
sc config BITS start = disabled
sc config AudioSrv start = disabled
sc config SharedAccess start = disabled
sc config SENS start = disabled
sc config EventSystem start = disabled
sc config PolicyAgent start = disabled
sc config AeLookupSvc start = disabled
sc config WerSvc start = disabled
sc config hkmsvc start = disabled
sc config UmRdpService start = disabled
sc config MpsSvc start = disabled
5) Native-tools for cluster testing
In addition to Linpack and Lizard, Windows HPC Server 2008 (namely, HPC Pack 2008) has standard cluster performance testing tools, such as:
MPI Ping-Pong Lightweight Throughput (packet forwarding between nodes)
MPI Ping-Pong Quick Check (check network latency, bandwidth etc)
Of course, the list of tests does not end there, there are more than 10 of them covering the entire functionality of the cluster.
Thanks for attention.