📜 ⬆️ ⬇️

Top 500 supercomputers rated for June 2016, China introduced 100 PFlops supercomputers

Today came the 47th rating of the world's top 500 supercomputers in accordance with the Linpack test. Unlike the previous 6 versions of the rating, the leader of the list changed, the Chinese supercomputer Sunway TaihuLight (神威 · 太湖 之 光) came out on top with a result of 93 petaflops on the Linpack test (theoretical performance 125.4 petaflops). Its performance is about 3 times higher than the previous leader of the list, the Chinese Tianhe-2 .

image of Sunway TaihuLight. Jack Dongarra, Report on the Sunway TaihuLight System, June 2016

The new supercomputer was developed by the National Research Center for Parallel Computer Equipment and Technologies of the People's Republic of China (National Research Center of Parallel Computer Engineering & Technology). The system is located in the National Center for Supercomputers in the city ​​of Wuxi , Jiangsu Province in eastern China.
')
image

The supercomputer is based on the new Chinese processors of the ShenWei family - SW26010 with the original 64-bit RISC architecture, presumably manufactured using 28 nm technology. Each processor is equipped with 260 cores, operates at a frequency of 1.45 GHz and has a performance of 3.06 teraflops.

The processor was developed in the Shanghai High Performance IC Design Center. The processor consists of 4 similar core units (core groups) connected by a built-in network on a chip. Each block has one management core (Management Processing Element, MPE), a DDR3 memory controller (128 bits), and 64 compute processing elements (CPE) in an 8x8 array. Both types of cores have a micro-architecture with an extraordinary execution of commands . The control MPE-kernels support the execution of both the operating system and user code, uses 264-bit vector operations, contains 32 KB of the first-level cache memory of instructions and data, and 256 KB of the second-level cache memory. Computing CPE-cores can execute only user code with 264-bit vectors, they use 16 KB of instruction cache memory and 64 KB of temporary memory ( Scratch Pad Memory ). Each of the 4 core blocks has access to 8 GB of DDR3-2133 RAM, so the node has 32 GB of RAM with a total bandwidth of up to 136.5 GB / s.

The performance of SIMD computing on MPE cores is 16 double-precision floating operations (64-bits) per clock, and 8 operations per clock on CPE cores. The overall performance of MPE-cores at a frequency of 1.45 GHz can reach 23.2 gigaflops, CPE-cores - 11.6 gflops.

In total, the supercomputer uses more than 10.6 million cores consisting of 40,960 uniprocessor nodes in 40 computing racks. Each rack has 4 superrules, the superrucker consists of 32 modules with 8 nodes on each. Modules are water cooled. There are few details about the main network of the supercomputer, it is known that each SW26010 chip has a PCI Express 3 (16x) connection to the three-level “Sunway Network” network. The network diameter is 7, “bisection bandwidth” - 70 TB / s. Dongarra said that Host Channel Adapter and Mellanox switches are used, with a channel bandwidth of about 12 GB / s (100 Gbit / s) and delays of about 1 μs.

Computational efficiency on the HPL test (Linpack) was 74% of theoretical performance. At the same time, on a more complex HPCG test, the system showed only 0.3% of the peak level (some systems reach 1-3%), which indicates a relatively slow memory and insufficient network bandwidth. For SW26010, the ratio of peak flops to memory bandwidth is 22.4 flops / bytes (for comparison, Intel Knights Landing has 7.2 flops / bytes). Dongarra also noted that the system has relatively little RAM, only 1.3 PB (for Tianhe-2 - 1.4 PB, for the American Titan , occupying 3rd place in Top500 - 0.71 PB).

The average power consumption of the supercomputer during the execution of the HPL test was 15.3 MW (which is slightly less than 17 MW in Tianhe-2), the maximum - a little less than 18 MW. According to Dongarra, energy efficiency was 6 gigaflops per watt (taking into account the consumption of the processor, memory and network). The new supercomputer ranked third in the green500.org rating (more energy efficient are RIKEN Shoubu c 6.6 gflops / W and RIKEN Satsuki with 6.2 gflops / W).

The supercomputer operating system, Sunway Raise OS 2.0.5, is based on Linux. Users are offered C / C ++ compilers, Fortran compilers, automatic vectorization utilities, mathematical libraries. The SunAC OpenACC utility offers support for the OpenACC 2.0 standard to simplify programming of multi-core processors.

The cost of creating a supercomputer - 1.8 billion yuan, about 270 million US dollars.

The most detailed information about the supercomputer is available in the article by the founder of Jack Dongarra : Jack Dongarra, Report on the Sunway TaihuLight System, June 2016, http://www.netlib.org/utk/people/JackDongarra/PAPERS/sunway-report-2016. pdf ; illustrations from the article “The Sunway TaihuLight Supercomputer: System and Applications”, by Fu HH, Liao JF, Yang JZ, et al., accepted for publication in Sci. China Inf. Sci., 2016, 59 (7): 072001, doi: 10.1007 / s11432-016-5588-7.
Also, several slides from the TOP500 & Green500 Awards presentation at ISC 2016 are published:

image

Source: https://habr.com/ru/post/395203/


All Articles