Today came the
47th rating of the world's top 500 supercomputers in accordance with the Linpack test. Unlike the previous 6 versions of the rating, the leader of the list changed, the Chinese supercomputer
Sunway TaihuLight (神威 · 太湖 之 光) came out on top with a result of 93 petaflops on the
Linpack test (theoretical performance 125.4 petaflops). Its performance is about 3 times higher than the previous leader of the list, the Chinese
Tianhe-2 .

The new supercomputer was developed by the National Research Center for Parallel Computer Equipment and Technologies of the People's Republic of China (National Research Center of Parallel Computer Engineering & Technology). The system is located in the National Center for Supercomputers in the
city of Wuxi ,
Jiangsu Province in eastern China.
')

The supercomputer is based on the new Chinese processors of the
ShenWei family - SW26010 with the original 64-bit RISC architecture,
presumably manufactured using 28 nm technology. Each processor is equipped with 260 cores, operates at a frequency of 1.45 GHz and has a performance of 3.06 teraflops.
The processor was developed in the Shanghai High Performance IC Design Center. The processor consists of 4 similar core units (core groups) connected by a built-in network on a chip. Each block has one management core (Management Processing Element, MPE), a DDR3 memory controller (128 bits), and 64 compute processing elements (CPE) in an 8x8 array. Both types of cores have a micro-architecture with
an extraordinary execution of commands . The control MPE-kernels support the execution of both the operating system and user code, uses 264-bit vector operations, contains 32 KB of the first-level cache memory of instructions and data, and 256 KB of the second-level cache memory. Computing CPE-cores can execute only user code with 264-bit vectors, they use 16 KB of instruction cache memory and 64 KB of temporary memory (
Scratch Pad Memory ). Each of the 4 core blocks has access to 8 GB of DDR3-2133 RAM, so the node has 32 GB of RAM with a total bandwidth of up to 136.5 GB / s.
The performance of SIMD computing on MPE cores is 16 double-precision floating operations (64-bits) per clock, and 8 operations per clock on CPE cores. The overall performance of MPE-cores at a frequency of 1.45 GHz can reach 23.2 gigaflops, CPE-cores - 11.6 gflops.
In total, the supercomputer uses more than 10.6 million cores consisting of 40,960 uniprocessor nodes in 40 computing racks. Each rack has 4 superrules, the superrucker consists of 32 modules with 8 nodes on each. Modules are water cooled. There are few details about the main network of the supercomputer, it is known that each SW26010 chip has a PCI Express 3 (16x) connection to the three-level “Sunway Network” network. The network diameter is 7, “bisection bandwidth” - 70 TB / s. Dongarra said that Host Channel Adapter and
Mellanox switches are used, with a channel bandwidth of about 12 GB / s (100 Gbit / s) and delays of about 1 μs.
Computational efficiency on the HPL test (Linpack) was 74% of theoretical performance. At the same time, on a more complex
HPCG test, the system showed only 0.3% of the peak level (some systems
reach 1-3%), which indicates a relatively slow memory and insufficient network bandwidth. For SW26010, the ratio of peak flops to memory bandwidth is 22.4 flops / bytes (for comparison, Intel Knights Landing has 7.2 flops / bytes). Dongarra also noted that the system has relatively little RAM, only 1.3
PB (for Tianhe-2 - 1.4 PB, for the American
Titan , occupying 3rd place in Top500 - 0.71 PB).
The average power consumption of the supercomputer during the execution of the HPL test was 15.3 MW (which is slightly less than 17 MW in Tianhe-2), the maximum - a little less than 18 MW. According to Dongarra, energy efficiency was 6 gigaflops per watt (taking into account the consumption of the processor, memory and network). The new supercomputer
ranked third in the green500.org rating (more energy efficient are
RIKEN Shoubu c 6.6 gflops / W and
RIKEN Satsuki with 6.2 gflops / W).
The supercomputer operating system, Sunway Raise OS 2.0.5, is based on Linux. Users are offered C / C ++ compilers, Fortran compilers, automatic vectorization utilities, mathematical libraries. The SunAC OpenACC utility offers support for the
OpenACC 2.0 standard to simplify programming of multi-core processors.
The cost of creating a supercomputer - 1.8 billion yuan, about 270 million US dollars.
The most detailed information about the supercomputer is available in the article by the founder of
Jack Dongarra : Jack Dongarra, Report on the Sunway TaihuLight System, June 2016,
http://www.netlib.org/utk/people/JackDongarra/PAPERS/sunway-report-2016. pdf ; illustrations from the article “The Sunway TaihuLight Supercomputer: System and Applications”, by Fu HH, Liao JF, Yang JZ, et al., accepted for publication in Sci. China Inf. Sci., 2016, 59 (7): 072001, doi: 10.1007 / s11432-016-5588-7.
Also, several slides from the
TOP500 & Green500 Awards presentation at
ISC 2016 are published:
