Today it is possible to build a home supercomputer, which will be discussed.
The article discusses the methods of hardware construction of high-performance computing systems. One of the interesting uses is cryptography. For example, thanks to modern technologies, MD5 or WPA hacking has become available to anyone. If you try (the information is quickly cut out), on the Internet you can find a way to hack the A5 / 2 algorithm used in GSM. Another application is engineering, financial, medical calculations, bitcoin-mining.
A bit of history


The date of the first written mention of supercomputers can be considered March 1, 1920. New York newspapers wrote about machines with a capacity of one hundred mathematicians. They were tabulators - electromechanical computers made by IBM (which was then called CTR). In the future, computers have become electronic. In the market of supercomputers, several players have emerged, such as Cray, HP, IBM, Nec. These computers had vector processors (that is, they operated on not separate numbers, but vectors). For communication between computing nodes, proprietary technologies of manufacturing companies were used. For example, one of these technologies is the connection of processors according to the topology of a four-dimensional torus — a very simple meaning lies behind these words: each node is connected with six others. The further development of supercomputers spawned the direction of massively parallel systems and clusters. In clusters, as quintessence of this direction, approximately the same communication algorithms between computational nodes are used as in supercomputers, only on the basis of network interfaces. They are the weak point of such systems. In addition to the non-standard (compared to the classical star) network topology as Fat Tree, “multidimensional torus” or Dragonfly, special switching devices are required.
Regarding the topic we have taken, it is impossible not to mention that today one of the promising areas for the development of supercomputers is the use of co-processors in the standard computer architecture, which resemble video cards in architecture.
')
CPU selection
Today, the main processor manufacturers are Intel and AMD. RISC processors, such as the
Power 7+ , despite their attractiveness, are quite exotic and expensive. For example, not the newest model of such a server is
worth more than a million .
(By the way, speaking, while it is possible to build an inexpensive and efficient cluster of xbox 360 or PS3, the processors there are about as Power, and for a million you can buy more than one prefix.)
On this basis, we note interesting for the price options for building a high-performance system. Of course, it must be multiprocessing. Intel uses Xeon processors for such tasks, while AMD uses Opteron.
If a lot of money


Separately, we note the extremely expensive, but productive line of processors on the socket Intel Xeon LGA1567.
The top processor of this series is E7-8870 with ten 2.4 GHz cores. Its price is $ 4,616. For such CPUs, HP and Supermicro are releasing! eight-processor! server chassis. Eight 10-core Xeon E7-8870 2.4 GHz processors with HyperThreading support support 8 * 10 * 2 = 160 threads, which is shown in Windows Task Manager as one hundred and sixty processor load charts, 10x16 matrix.
In order for the eight processors to fit in the case, they are not placed immediately on the motherboard, but on separate boards that plug into the motherboard. The photo shows the four motherboards installed on the motherboard with processors (two on each). This is a Supermicro solution. In
the HP solution, each processor has its own board. The cost of the HP solution is two to three million, depending on the content of processors, memory and other things. The Supermicro chassis costs $ 10,000, which is more attractive. In addition, Supermicro can put four co-processor expansion cards in PCI-Express x16 ports (by the way, there is still room for an Infiniband adapter to assemble a cluster of such), and in HP only two. Thus, the eight-processor platform from Supermicro is more attractive for creating a supercomputer. The following photo from the exhibition presents a supercomputer assembled with four GPU boards.
However, it is very expensive.
What is cheaper
But there is the prospect of assembling a supercomputer on more accessible AMD Opteron G34, Intel Xeon LGA2011 and LGA 1366 processors.
To select a specific model, I made a table in which the price / (number of cores * frequency) was counted for each processor. I dropped the processor frequency below 2 GHz, and for Intel - with a bus below 6.4GT / s.
Model
| Number of cores
| Frequency
| Price, $
| Price / core, $
| Price / Core / GHz
|
AMD
|
|
|
|
|
|
6386 SE
| sixteen
| 2.8
| 1392
| 87
| 31
|
6380
| sixteen
| 2.5
| 1088
| 68
| 27
|
6378
| sixteen
| 2.4
| 867
| 54
| 23
|
6376
| sixteen
| 2.3
| 703
| 44
| nineteen
|
6348
| 12
| 2.8
| 575
| 48
| 17
|
6344
| 12
| 2.6
| 415
| 35
| 13
|
6328
| eight
| 3.2
| 575
| 72
| 22
|
6320
| eight
| 2.8
| 293
| 37
| 13
|
Intel
|
|
|
|
|
|
E5-2690
| eight
| 2.9
| 2057
| 257
| 89
|
E5-2680
| eight
| 2.7
| 1723
| 215
| 80
|
E5-2670
| eight
| 2.6
| 1552
| 194
| 75
|
E5-2665
| eight
| 2.4
| 1440
| 180
| 75
|
E5-2660
| eight
| 2.2
| 1329
| 166
| 76
|
E5-2650
| eight
| 2
| 1107
| 138
| 69
|
E5-2687W
| eight
| 3.1
| 1885
| 236
| 76
|
E5-4650L
| eight
| 2.6
| 3616
| 452
| 174
|
E5-4650
| eight
| 2.7
| 3616
| 452
| 167
|
E5-4640
| eight
| 2.4
| 2725
| 341
| 142
|
E5-4617
| 6
| 2.9
| 1611
| 269
| 93
|
E5-4610
| 6
| 2.4
| 1219
| 203
| 85
|
E5-2640
| 6
| 2.5
| 885
| 148
| 59
|
E5-2630
| 6
| 2.3
| 612
| 102
| 44
|
E5-2667
| 6
| 2.9
| 1552
| 259
| 89
|
X5690
| 6
| 3.46
| 1663
| 277
| 80
|
X5680
| 6
| 3.33
| 1663
| 277
| 83
|
X5675
| 6
| 3.06
| 1440
| 240
| 78
|
X5670
| 6
| 2.93
| 1440
| 240
| 82
|
X5660
| 6
| 2.8
| 1219
| 203
| 73
|
X5650
| 6
| 2.66
| 996
| 166
| 62
|
E5-4607
| 6
| 2.2
| 885
| 148
| 67
|
X5687
| four
| 3.6
| 1663
| 416
| 115
|
X5677
| four
| 3.46
| 1663
| 416
| 120
|
X5672
| four
| 3.2
| 1440
| 360
| 113
|
X5667
| four
| 3.06
| 1440
| 360
| 118
|
E5-2643
| four
| 3.3
| 885
| 221
| 67
|
In bold italics, a model with a minimum ratio indicator is underlined, the underlined one is the most powerful AMD and in my opinion the closest Xeon in performance.
Thus, my choice of processors for the supercomputer is Opteron 6386 SE, Opteron 6344, Xeon E5-2687W and Xeon E5-2630.
motherboards
PICMG
It is impossible to install more than four double-slot expansion cards on regular motherboards. There is another architecture - the use of cross-boards, such as the BPG8032 PCI Express Backplane.
This card includes PCI Express expansion cards and one processor card, somewhat similar to those installed in eight-processor Supermicro-based servers discussed above. But only these processor boards are subject to industry standards PICMG. Standards evolve slowly and such cards often do not support the most advanced processors. At most, these processor boards are now being released on two Xeon E5-2448L - Trenton BXT7059 SBC.
Such a system will cost no GPU at least $ 5,000.
Ready platforms TYAN
For the same amount, you can purchase a
ready-made platform for building TYAN FT72B7015 supercomputers . In this, you can install up to eight GPUs and two Xeon LGA1366.
"Normal" server motherboards
For LGA2011
Supermicro X9QR7-TF - 4 expansion cards and 4 processors can be installed on this motherboard.
Supermicro X9DRG-QF - this board is specially designed for the assembly of high-performance systems.
For Opteron
Supermicro H8QGL-6F - this card allows you to install four processors and three expansion cards
Reinforce the platform with expansion cards
This market is almost completely captured by NVidia, which, in addition to gaming graphics cards, also produces computing cards. AMD has a smaller market share, and relatively recently Intel has entered this market.
A feature of such coprocessors is the presence of a large amount of RAM on board, fast calculations with double precision and energy efficiency.
| FP32, Tflops | FP64, Tflops | Price | Memory, GB |
Nvidia Tesla K20X | 3.95 | 1.31 | 5.5 | 6 |
AMD FirePro S10000 | 5.91 | 1.48 | 3.6 | 6 |
Intel Xeon Phi 5110P | | one | 2.7 | eight |
Nvidia GTX Titan | 4.5 | 1.3 | 1.1 | 6 |
Nvidia GTX 680 | 3 | 0.13 | 0.5 | 2 |
AMD HD 7970 GHz Edition | four | one | 0.5 | 3 |
AMD HD 7990 Devil 13 | 2x3.7 | 2x0.92 | 1.6 | 2x3 |
The top Nvidia solution is called Tesla K20X on Kepler architecture. These cards are in the world's most powerful supercomputer Titan. However, Nvidia recently released a Geforce Titan video card. Older models had a reduced FP64 performance to 1/24 of the FP32 (GTX680). But in Titan, the manufacturer promises a fairly high performance in calculations with double precision. AMD solutions are also quite good, but they are built on a different architecture and this can create difficulties for running calculations optimized for CUDA (Nvidia technology).
The solution from Intel - Xeon Phi 5110P is interesting in that all the cores in the coprocessor are based on the x86 architecture and do not require special code optimization to run calculations. But my favorite among the coprocessors is the relatively inexpensive AMD HD 7970 GHz Edition. Theoretically, this video card will show the maximum performance in the calculation of the cost.
Can be connected in a cluster
To improve system performance, several computers can be combined into a cluster that will distribute the computational load among the computers included in the cluster.
Using normal Gigabit Ethernet as a network interface for connecting computers is too slow. For these purposes most often use Infiniband. The Infiniband host adapter for the server is inexpensive. For example, on an international Ebay auction such adapters sell for as low as $ 40. For example, an X4 DDR adapter (20Gb / s) will cost about $ 100 to be delivered to Russia.
In this case, the switching equipment for Infiniband is quite expensive. And as mentioned above, the classic star as the topology of the computer network is not the best choice.
However, InfiniBand hosts can be connected to each other directly, without a switch. Then, for example, such an option becomes quite interesting: a cluster of two computers connected via infiniband. Such a supercomputer can be assembled at home.
How many graphics cards do you need
In the most powerful supercomputer of the present Cray Titan, the ratio of processors to "video cards" is 1: 1, that is, it has 18688 16-core processors and 18688 Tesla K20X.
In Tianhe-1A - the Chinese supercomputer on xeonah, the relation is as follows. Two six-core processors to one Nvidia M2050 vidushka (weaker than K20X).
We will take such an attitude for our assemblies as optimal (for cheaper). That is, 12-16 processor cores per GPU. On the table below in bold are the practically possible options, the underlining is the most successful from my point of view.
GPU | Corores | 6-core CPU | 8-core CPU | 12-core CPU | 16-core CPU |
2 | 24 | 32 | four
| five
| 3
| four
| 2
| 3
| 2
| 2
|
3 | 36 | 48 | 6
| eight
| five
| 6
| 3
| four
| 2
| 3
|
four | 48 | 64 | eight
| eleven
| 6
| eight
| four
| five
| 3
| four
|
If a system with an already established ratio of processors / video cards can take on board additional computing devices, we will add them to increase the power of the assembly.
So how much is
The options below are the supercomputer chassis without RAM, hard drives and software. All models use AMD HD 7970 GHz Edition video adapter. It can be replaced with another one, at the request of the task (for example, with xeon phi). Where the system allows, one of the AMD HD 7970 GHz Edition is replaced by a three-slot AMD HD 7990 Devil 13.
Option 1 on the motherboard Supermicro H8QGL-6F
| | | | |
Motherboard | Supermicro H8QGL-6F | one | 1200 | 1200 |
CPU | AMD Opteron 6344 | four | 500 | 2000 |
CPU cooler | Thermaltake CLS0017 | four | 40 | 160 |
1400W case | SC748TQ-R1400B | one | 1000 | 1000 |
Graphic accelerator | AMD HD 7970 GHz Edition | 3 | 500 | 1500 |
| | | | 5860 |
Theoretically, the performance will be about 12 Tflops.
Option 2 on the TYAN S8232 motherboard, cluster
This board does not support Opteron 63xx, so 62xx is used. In this embodiment, the two computers are clustered over Infiniband x4 DDR with two cables. Theoretically, the connection speed in this case will rest against the PCIe x8 speed, that is, 32GB / s. Two power supplies are used. How to coordinate them among themselves can be found on the Internet.
| | amount | Price | Amount |
Motherboard | TYAN S8232 | one | 790 | 790 |
CPU | AMD Opteron 6282SE | 2 | 1000 | 2000 |
CPU cooler | Noctua NH-U12DO A3 | 2 | 60 | 120 |
Housing | Antec Twelve Hundred Black | one | 200 | 200 |
Power Supply | FSP AURUM PRO 1200W | 2 | 200 | 400 |
Graphic accelerator | AMD HD 7970 GHz Edition | 2 | 500 | 1000 |
Graphic accelerator | AX7990 6GBD5-A2DHJ | one | 1000 | 1000 |
Infiniband adapter | X4 DDR Infiniband | one | 140 | 140 |
Infiniband cable | X4 DDR Infiniband | one | thirty | thirty |
| | | | 5680 (in one block) |
For a cluster of such configurations you need two, and their cost will be
$ 11360 . Its power consumption at full load will be about 3000W. Theoretically, the performance will be up to 31Tflops.
Option 3 on the platform Tyan FT72B7015
This version differs in that with eight GPUs there are only two CPUs. Accordingly, its performance in real-world tasks will depend on the ability of the program to be highly parallelized.
| | amount | Price | Amount |
Chassis (3000W) | Tyan FT72B7015 | one | 4900 | 4900 |
CPU | Xeon X5680 | 2 | 1300 | 2600 |
CPU cooler | SuperMicro SNK-P0040AP4 | 2 | 40 | 80 |
Graphic accelerator | AMD HD 7970 GHz Edition | eight | 500 | 4,000 |
| | | | 11580 |
Theoretically, the performance will be up to 32 Tflops.
Option 4 for LGA2011, clustered
| | amount | Price | Amount |
Motherboard | Supermicro X9DRG-QF | one | 600 | 600 |
CPU | Intel Xeon E5-2687W | 2 | 2000 | 4,000 |
CPU cooler | Supermicro SNK-P0050AP4 | 2 | 50 | 100 |
Housing | Antec Twelve Hundred Black | one | 200 | 200 |
Power Supply | FSP AURUM PRO 1200W | 2 | 200 | 400 |
Graphic accelerator | AMD HD 7970 GHz Edition | 3 | 500 | 1500 |
Graphic accelerator | AX7990 6GBD5-A2DHJ | one | 1000 | 1000 |
Infiniband adapter | X4 DDR Infiniband | one | 140 | 140 |
Infiniband cable | X4 DDR Infiniband | one | thirty | thirty |
| | | | 7970 (in one block) |
For a cluster of such configurations, you need two and their cost will be $ 15940. Total power consumption at full load will be about 4000 watts. Theoretically, the performance will be up to 39Tflops.