📜 ⬆️ ⬇️

Huawei KunLun server - our testing

First, a few words about the architecture of KunLun - there is practically no information about it in the Russian-language segment. KunLun was created as a high-end platform, respectively, all of its components are duplicated (including control modules and NUMA-node controllers). However, duplication of the High-End components of the server is not limited: the solution allows replacing not only PCIe boards without stopping the OS (this is not new in principle), but also processors with memory. The system will proactively let you know which components can fail soon, without waiting for the failure itself. You can replace them without stopping the OS. Today, hot swapping of processors and memory modules is provided only in the EulerOS OS (CentOS from Huawei). Support out of the box is promised soon for RHEL and SLES.

Server motherboards including 1 processor and 24 memory modules each, through the switching system, are combined into physical partitions with 4, 8, 16 or 32 processors. Less granularity can be obtained only by applying logical partitioning (hypervisor).

The server is also equipped with built-in disks - up to 4 baskets with 12 disks each - with the ability to create hardware RAID inside each basket. In some cases, this will allow you to do without an external disk array.

What is the main feature of KunLun? The ability to combine up to 32 Intel Xeon processors and up to 24 TB of memory in one partition. Well, as a bonus: the system uses Huawei's BIOS, and the vendor is ready to provide source codes for software certification.
')

Why not every manufacturer can offer a 32-processor system?


Regular means of Intel processors can combine no more than 8 processors in a single server. More can be combined only by creating special devices - NUMA-node controllers (node ​​controller). Intel does not produce them, but the possibility of using this functionality is embedded in the QPI bus. HP, SGI and Huawei took advantage of this - each manufacturer made its own controller. It is clear that behind the creation of such a controller is a large-scale scientific activity and the corresponding costs. Huawei, for example, took 8 years to develop.

The rest of the vendors (and Intel among them) refused to develop their controllers. The reasons? First, an increase in the number of processors leads to a decrease in the speed of work with memory. This is largely due to the need to synchronize the state of the processor caches: the more processors have cached a section of memory, the more alerts will be required when it is changed by one of the processors. The second reason is that for the vast majority of computational problems, between one and four processors are enough.

EulerOS


The manufacturer claims the possibility of replacing processors and memory on the fly. This requires a specialized OS - EulerOS. On the Internet, information about it is very scarce and concerns mainly the certification of the latest version of Linux Standard Base. It turned out that EulerOS was compiled from RHEL sources - Red Hat Enterprise Linux (similar to CentOS). Huawei customizes it for its hardware, in particular by adding hot swappable CPU / RAM drivers.

In addition to EulerOS, KunLun claims support for RHEL, SLES 11 & 12, Windows Server 2012.

SPECint / SPECfp performance test


Arithmetic in KunLun is fine. When conducting SPECint, processes are attached to specific cores and work only with local memory.

Server

SPECint

SPECfp

SGI UV 300 (32x, Intel Xeon E7-8890 v3)

22600

15700

KunLun 9032 (32x - Intel Xeon E7-8890 v3)

22900

16300

IBM Power E880 (16 x Power8 4.0 GHz, 192 core)

14400

11400

KunLun 9016 (16x - Intel Xeon E7-8890 v3)

11,700

8050

SGI UV 300 (16x, Intel Xeon E7-8890 v3)

11400

7880

Integrity Superdome X (16x, Intel Xeon E7-8890 v3)

11,100

7670


An interesting comparison was the KunLun with the top-end IBM Power E880 (also 16-processor) - the gap between them is not so great. That is, in the area of ​​Intel Xeon computing, Huawei server is quite a competitor to Power8.

SLOB (Oracle) performance test


Here, to a greater extent, the speed of not the calculations themselves was measured, but the memory access. DBMS processes are not tied to NUMA nodes; for a test, all memory is considered equidistant from the processors. The test results confirmed: the dependence of server performance on adding resources is nonlinear.

An increase of seven times the processor capacity (from 16 to 144 cores, taking into account the lower frequency) led to a 5-fold increase in server performance (71% efficiency). With a 4-fold increase in the number of cores - from 16 (4 CPUs) to 64 (16 CPUs), the performance increased by 2.7 times (efficiency 68%).

KunLun Applications


The main advantage of KunLun is an impressive amount of memory on board (24 TB now, 32 TB in the future). This is especially true for In-Memory analytics, when the entire database is placed in RAM. Using KunLun allows you to reduce data access time by 3 orders compared with hard drives, that is, to speed up the execution of queries to the database. KunLun is good for SAP HANA and SAP S / 4HANA tasks. The amount of memory allows HANA to be used even in a single-node KunLun configuration. Oracle Database (especially with the In-Memory option) and QlikView also look good in a Chinese superserver.

Retailers can use this solution as a platform for SAP HANA for analyzing large amounts of customer demand data for certain goods, warehouse balances, etc. The Oracle In-Memory Option and KunLun bundle will help banks assess on-the-fly customer creditworthiness, calculate bank capital adequacy ratios, etc. Telecom operators based on this solution will be able to implement subscriber loyalty management - forming their profiles, targeting.

In addition, KunLun can replace RISC systems with x86. For some companies, vertically scalable tasks are relevant, overgrown x86 servers of the past and performed on RISC. In this case, between the cost of KunLun and the price of the annual service of the RISC system, you can put an equal sign. KunLun is not inferior to them in terms of reliability and wins in a variety of applied software. It is noteworthy that at home KunLun is actively used for import substitution, mainly as a platform for migration from RISC systems.

The article was prepared by Dmitry Glushchenko, system architect of the Jet Systems Infosystems Design Center. We welcome your constructive comments.

Source: https://habr.com/ru/post/309216/


All Articles