📜 ⬆️ ⬇️

DIY supercomputer

Today it is possible to build a home supercomputer, which will be discussed.

The article discusses the methods of hardware construction of high-performance computing systems. One of the interesting uses is cryptography. For example, thanks to modern technologies, MD5 or WPA hacking has become available to anyone. If you try (the information is quickly cut out), on the Internet you can find a way to hack the A5 / 2 algorithm used in GSM. Another application is engineering, financial, medical calculations, bitcoin-mining.

A bit of history


Article in the supercomputer newspaper of 19201920s supercomputer The date of the first written mention of supercomputers can be considered March 1, 1920. New York newspapers wrote about machines with a capacity of one hundred mathematicians. They were tabulators - electromechanical computers made by IBM (which was then called CTR). In the future, computers have become electronic. In the market of supercomputers, several players have emerged, such as Cray, HP, IBM, Nec. These computers had vector processors (that is, they operated on not separate numbers, but vectors). For communication between computing nodes, proprietary technologies of manufacturing companies were used. For example, one of these technologies is the connection of processors according to the topology of a four-dimensional torus — a very simple meaning lies behind these words: each node is connected with six others. The further development of supercomputers spawned the direction of massively parallel systems and clusters. In clusters, as quintessence of this direction, approximately the same communication algorithms between computational nodes are used as in supercomputers, only on the basis of network interfaces. They are the weak point of such systems. In addition to the non-standard (compared to the classical star) network topology as Fat Tree, “multidimensional torus” or Dragonfly, special switching devices are required.

Regarding the topic we have taken, it is impossible not to mention that today one of the promising areas for the development of supercomputers is the use of co-processors in the standard computer architecture, which resemble video cards in architecture.
')

CPU selection


Today, the main processor manufacturers are Intel and AMD. RISC processors, such as the Power 7+ , despite their attractiveness, are quite exotic and expensive. For example, not the newest model of such a server is worth more than a million .

(By the way, speaking, while it is possible to build an inexpensive and efficient cluster of xbox 360 or PS3, the processors there are about as Power, and for a million you can buy more than one prefix.)

On this basis, we note interesting for the price options for building a high-performance system. Of course, it must be multiprocessing. Intel uses Xeon processors for such tasks, while AMD uses Opteron.

If a lot of money

Xeon E7-8870 Performance MonitorProcessor boards Separately, we note the extremely expensive, but productive line of processors on the socket Intel Xeon LGA1567.
The top processor of this series is E7-8870 with ten 2.4 GHz cores. Its price is $ 4,616. For such CPUs, HP and Supermicro are releasing! eight-processor! server chassis. Eight 10-core Xeon E7-8870 2.4 GHz processors with HyperThreading support support 8 * 10 * 2 = 160 threads, which is shown in Windows Task Manager as one hundred and sixty processor load charts, 10x16 matrix.

In order for the eight processors to fit in the case, they are not placed immediately on the motherboard, but on separate boards that plug into the motherboard. The photo shows the four motherboards installed on the motherboard with processors (two on each). This is a Supermicro solution. In the HP solution, each processor has its own board. The cost of the HP solution is two to three million, depending on the content of processors, memory and other things. The Supermicro chassis costs $ 10,000, which is more attractive. In addition, Supermicro can put four co-processor expansion cards in PCI-Express x16 ports (by the way, there is still room for an Infiniband adapter to assemble a cluster of such), and in HP only two. Thus, the eight-processor platform from Supermicro is more attractive for creating a supercomputer. The following photo from the exhibition presents a supercomputer assembled with four GPU boards.
Supercomputer with 4 GPU boards

However, it is very expensive.

What is cheaper

But there is the prospect of assembling a supercomputer on more accessible AMD Opteron G34, Intel Xeon LGA2011 and LGA 1366 processors.

To select a specific model, I made a table in which the price / (number of cores * frequency) was counted for each processor. I dropped the processor frequency below 2 GHz, and for Intel - with a bus below 6.4GT / s.
Model
Number of cores
Frequency
Price, $
Price / core, $
Price / Core / GHz
AMD
 
 
 
 
 
6386 SE
sixteen
2.8
1392
87
31
6380
sixteen
2.5
1088
68
27
6378
sixteen
2.4
867
54
23
6376
sixteen
2.3
703
44
nineteen
6348
12
2.8
575
48
17
6344
12
2.6
415
35
13
6328
eight
3.2
575
72
22
6320
eight
2.8
293
37
13
Intel
 
 
 
 
 
E5-2690
eight
2.9
2057
257
89
E5-2680
eight
2.7
1723
215
80
E5-2670
eight
2.6
1552
194
75
E5-2665
eight
2.4
1440
180
75
E5-2660
eight
2.2
1329
166
76
E5-2650
eight
2
1107
138
69
E5-2687W
eight
3.1
1885
236
76
E5-4650L
eight
2.6
3616
452
174
E5-4650
eight
2.7
3616
452
167
E5-4640
eight
2.4
2725
341
142
E5-4617
6
2.9
1611
269
93
E5-4610
6
2.4
1219
203
85
E5-2640
6
2.5
885
148
59
E5-2630
6
2.3
612
102
44
E5-2667
6
2.9
1552
259
89
X5690
6
3.46
1663
277
80
X5680
6
3.33
1663
277
83
X5675
6
3.06
1440
240
78
X5670
6
2.93
1440
240
82
X5660
6
2.8
1219
203
73
X5650
6
2.66
996
166
62
E5-4607
6
2.2
885
148
67
X5687
four
3.6
1663
416
115
X5677
four
3.46
1663
416
120
X5672
four
3.2
1440
360
113
X5667
four
3.06
1440
360
118
E5-2643
four
3.3
885
221
67

In bold italics, a model with a minimum ratio indicator is underlined, the underlined one is the most powerful AMD and in my opinion the closest Xeon in performance.

Thus, my choice of processors for the supercomputer is Opteron 6386 SE, Opteron 6344, Xeon E5-2687W and Xeon E5-2630.

motherboards


PICMG

It is impossible to install more than four double-slot expansion cards on regular motherboards. There is another architecture - the use of cross-boards, such as the BPG8032 PCI Express Backplane.
BPG8032 PCI Express Backplane

This card includes PCI Express expansion cards and one processor card, somewhat similar to those installed in eight-processor Supermicro-based servers discussed above. But only these processor boards are subject to industry standards PICMG. Standards evolve slowly and such cards often do not support the most advanced processors. At most, these processor boards are now being released on two Xeon E5-2448L - Trenton BXT7059 SBC.
Trenton BXT7059 SBC


Such a system will cost no GPU at least $ 5,000.

Ready platforms TYAN

For the same amount, you can purchase a ready-made platform for building TYAN FT72B7015 supercomputers . In this, you can install up to eight GPUs and two Xeon LGA1366.

"Normal" server motherboards

For LGA2011

Supermicro X9QR7-TF - 4 expansion cards and 4 processors can be installed on this motherboard.

Supermicro X9DRG-QF - this board is specially designed for the assembly of high-performance systems.

For Opteron

Supermicro H8QGL-6F - this card allows you to install four processors and three expansion cards

Reinforce the platform with expansion cards


This market is almost completely captured by NVidia, which, in addition to gaming graphics cards, also produces computing cards. AMD has a smaller market share, and relatively recently Intel has entered this market.

A feature of such coprocessors is the presence of a large amount of RAM on board, fast calculations with double precision and energy efficiency.
FP32, TflopsFP64, TflopsPriceMemory, GB
Nvidia Tesla K20X3.951.315.56
AMD FirePro S100005.911.483.66
Intel Xeon Phi 5110Pone2.7eight
Nvidia GTX Titan4.51.31.16
Nvidia GTX 68030.130.52
AMD HD 7970 GHz Editionfourone0.53
AMD HD 7990 Devil 132x3.72x0.921.62x3

The top Nvidia solution is called Tesla K20X on Kepler architecture. These cards are in the world's most powerful supercomputer Titan. However, Nvidia recently released a Geforce Titan video card. Older models had a reduced FP64 performance to 1/24 of the FP32 (GTX680). But in Titan, the manufacturer promises a fairly high performance in calculations with double precision. AMD solutions are also quite good, but they are built on a different architecture and this can create difficulties for running calculations optimized for CUDA (Nvidia technology).

The solution from Intel - Xeon Phi 5110P is interesting in that all the cores in the coprocessor are based on the x86 architecture and do not require special code optimization to run calculations. But my favorite among the coprocessors is the relatively inexpensive AMD HD 7970 GHz Edition. Theoretically, this video card will show the maximum performance in the calculation of the cost.

Can be connected in a cluster


To improve system performance, several computers can be combined into a cluster that will distribute the computational load among the computers included in the cluster.

Using normal Gigabit Ethernet as a network interface for connecting computers is too slow. For these purposes most often use Infiniband. The Infiniband host adapter for the server is inexpensive. For example, on an international Ebay auction such adapters sell for as low as $ 40. For example, an X4 DDR adapter (20Gb / s) will cost about $ 100 to be delivered to Russia.

In this case, the switching equipment for Infiniband is quite expensive. And as mentioned above, the classic star as the topology of the computer network is not the best choice.

However, InfiniBand hosts can be connected to each other directly, without a switch. Then, for example, such an option becomes quite interesting: a cluster of two computers connected via infiniband. Such a supercomputer can be assembled at home.

How many graphics cards do you need


In the most powerful supercomputer of the present Cray Titan, the ratio of processors to "video cards" is 1: 1, that is, it has 18688 16-core processors and 18688 Tesla K20X.

In Tianhe-1A - the Chinese supercomputer on xeonah, the relation is as follows. Two six-core processors to one Nvidia M2050 vidushka (weaker than K20X).

We will take such an attitude for our assemblies as optimal (for cheaper). That is, 12-16 processor cores per GPU. On the table below in bold are the practically possible options, the underlining is the most successful from my point of view.
GPUCorores6-core CPU8-core CPU12-core CPU16-core CPU
22432four
five
3
four
2
3
2
2
336486
eight
five
6
3
four
2
3
four4864eight
eleven
6
eight
four
five
3
four

If a system with an already established ratio of processors / video cards can take on board additional computing devices, we will add them to increase the power of the assembly.

So how much is


The options below are the supercomputer chassis without RAM, hard drives and software. All models use AMD HD 7970 GHz Edition video adapter. It can be replaced with another one, at the request of the task (for example, with xeon phi). Where the system allows, one of the AMD HD 7970 GHz Edition is replaced by a three-slot AMD HD 7990 Devil 13.

Option 1 on the motherboard Supermicro H8QGL-6F

SC748TQ-R1400B Enclosure

MotherboardSupermicro H8QGL-6Fone12001200
CPUAMD Opteron 6344four5002000
CPU coolerThermaltake CLS0017four40160
1400W caseSC748TQ-R1400Bone10001000
Graphic acceleratorAMD HD 7970 GHz Edition35001500
5860

Theoretically, the performance will be about 12 Tflops.

Option 2 on the TYAN S8232 motherboard, cluster

image

This board does not support Opteron 63xx, so 62xx is used. In this embodiment, the two computers are clustered over Infiniband x4 DDR with two cables. Theoretically, the connection speed in this case will rest against the PCIe x8 speed, that is, 32GB / s. Two power supplies are used. How to coordinate them among themselves can be found on the Internet.
amountPriceAmount
MotherboardTYAN S8232one790790
CPUAMD Opteron 6282SE210002000
CPU coolerNoctua NH-U12DO A3260120
HousingAntec Twelve Hundred Blackone200200
Power SupplyFSP AURUM PRO 1200W2200400
Graphic acceleratorAMD HD 7970 GHz Edition25001000
Graphic acceleratorAX7990 6GBD5-A2DHJone10001000
Infiniband adapterX4 DDR Infinibandone140140
Infiniband cableX4 DDR Infinibandonethirtythirty
5680 (in one block)

For a cluster of such configurations you need two, and their cost will be $ 11360 . Its power consumption at full load will be about 3000W. Theoretically, the performance will be up to 31Tflops.

Option 3 on the platform Tyan FT72B7015

Tyan FT72B7015 Case

This version differs in that with eight GPUs there are only two CPUs. Accordingly, its performance in real-world tasks will depend on the ability of the program to be highly parallelized.
amountPriceAmount
Chassis (3000W)Tyan FT72B7015one49004900
CPUXeon X5680213002600
CPU coolerSuperMicro SNK-P0040AP424080
Graphic acceleratorAMD HD 7970 GHz Editioneight5004,000
11580

Theoretically, the performance will be up to 32 Tflops.

Option 4 for LGA2011, clustered

amountPriceAmount
MotherboardSupermicro X9DRG-QFone600600
CPUIntel Xeon E5-2687W220004,000
CPU coolerSupermicro SNK-P0050AP4250100
HousingAntec Twelve Hundred Blackone200200
Power SupplyFSP AURUM PRO 1200W2200400
Graphic acceleratorAMD HD 7970 GHz Edition35001500
Graphic acceleratorAX7990 6GBD5-A2DHJone10001000
Infiniband adapterX4 DDR Infinibandone140140
Infiniband cableX4 DDR Infinibandonethirtythirty
7970 (in one block)

For a cluster of such configurations, you need two and their cost will be $ 15940. Total power consumption at full load will be about 4000 watts. Theoretically, the performance will be up to 39Tflops.

Source: https://habr.com/ru/post/170349/


All Articles