📜 ⬆️ ⬇️

HP Superdome X - a modern solution for business critical tasks

According to our observations, in recent years, in addition to the traditional business-critical tasks working in Unix environments, an increasing number of applications running Linux and Windows are included in the category of business-critical our customers. Third-party analysts associate these changes with a number of factors, such as changes in the style of IT, the emergence of new technologies - Cloud, Big Data, Mobility. In addition, the price and a large number of applications written for x86 systems play an important role.



At the same time, a steady growth of the x86 segment is observed:


Shipments of x86 and non-x86 systems according to IDC, 2014
')
As a result, one can observe the emergence of a new IT market segment and the growing need for mission critical x86 systems, combining reliability and high availability from the Unix world and the standard x86 architecture (according to analysts, 67% of customers require for their business-critical tasks availability of 99.99% and above).


67% of organizations require at least 99.99% availability of their business applications, ITIC 2013

In 2011, HP announced the Odyssey project, which reflects the development strategy of computing platforms for mission-critical enterprise applications. This project provides for the expansion of the already existing and well-proven set of solutions for this category, as well as the creation of a new business-critical platform based on the x86 architecture.

What's up with Itanium? Systems do not go anywhere, they are in demand by our customers, so their development continues ( more about Superdome on Itanium ).

As part of the Odyssey project, HP transfers to the x86 architecture some of its technologies, intellectual property and all the experience gained over the decades in the field of traditional business-critical HP-UX, OpenVMS and NonStop based on Integrity, thereby enhancing the Linux and Windows environment for solving critical tasks of our users. This allows increasing productivity, scalability, fault tolerance and overall availability compared to the current x86 market.

The result of the implementation of the Odyssey project was the new HP Superdome X server platform, this system is designed for demanding critical loads, but it is designed using the industry-standard x86 architecture. One of the main applications of the HP Superdome X is analytical and transactional loads. At the same time, the customer gets high scalability (up to 16 processors in one system with 48 DIMM memory slots on each server blade, so far this is the only system on the x86 market that provides such scalability).


In addition to large scalability, each blade server is highly resilient: the HP Superdome X inherits from the Integrity Superdome a set of special HP sx3000 chips, backup data channels with automatic confirmation of the completion of the transaction, reliable error detection and malfunctioning system.

Switching between the HP Superdome X nodes is done according to the Crossbar architecture, it is distinguished by:

• Retransmission of data packets from start to finish, including along backup paths, to ensure the completion of the transaction;

• Electrical isolation of hardware partitions for maximum flexibility, maintainability (independent power on / off) and physical security of data


Blade Server Connection Architecture in the HP Superdome X Recycle Bin

The Crossbar has a bandwidth of more than 1.2TB / s, which allows the HP Superdome X to be used for even the most productive tasks. The aggregate throughput measured by internal tests is more than 1TB / s.

It is important to note that HP and Intel developed the platform together, the result was the transfer of RAS functionality (resilience, availability, serviceability) from the Itanium platform to the Xeon E7 platform.

In addition, there was an active work with the Linux community, which made it possible to add support for the RAS functionality in Linux itself.

Implementation of RAS functionality in Intel E7 processors, server memory, its support from Linux OS in combination with Firmware First microcode allowed to talk about the availability of HP Superdome X on x86 99.999% + comparable to the level of accessibility of RISC systems and beyond traditional x86 systems ( report ITIC comparing the availability levels of RISC and x86 systems, the report shows the cost of unplanned downtime):


ITIC's report on the availability of x86 systems versus traditional RISC systems

The principal difference in the Mission Critical x86 (MC x86) architecture from the traditional x86 architecture is the way to handle errors and faults. In a conventional x86 system, after detecting an uncorrectable error at the hardware level, the system software (firmware) stops the operation of the operating system in order to avoid further error propagation and, ultimately, data corruption.
In contrast, in the HP Superdome X system, the special microcode of the Firmware First system plays the main role in error handling. The E7v2 (Enhanced Machine Check Architecture) processor architecture allows the HP Superdome X system microcode to “examine” the error logs and perform actions to eliminate the consequences of these errors before moving them to the operating system and application level.

Firmware First handles both correctable and uncorrectable errors in system components (Processors, memory, I / O), and the microcode also collects all data on incidents that have occurred to further analyze them by the administrator. Thus, the system stops the work of only individual processes affected by the error and tries to bypass the malfunction and restore work at the software level of the system software, operating system or even the application.

If it is impossible to continue the work, the system initiates an automatic controlled reboot with further reconfiguration of the components and saving full information about the error and the state of the modules for the report to the administrator. Such a mechanism for handling failures becomes possible only with close integration of all levels of the system — hardware, firmware, and the operating system.

RAS functionality implemented in HP Superdome X:

• Deconfiguration of failed or failing components (allows you to continue to perform the operation of the application and system in the event of a problem with the memory modules or the CPU);

• Blade deconfiguration (allows the application or the system to continue in the event of a failure of the entire blade server in a multi blade configuration);

• Corrupt data containment (a mode in which the data containing an error is assigned the “Error Containment” bit, after which the firmware and OS use recovery scenarios, including UCNA, SRAO , SRAR . HP Superdome X supports all these scenarios);

• Live error containment (Running HP Superdome Firmware X on-the-fly I / O errors);

• Viral error containment (a mode similar to the principle of working with Corrupt data containment, tracking fatal addressing errors, and preventing their spread to input / output devices);

• Processor interconnect fault resiliency (all communications between CPUs, including QPI, memory interconnect and PCIe have redundant paths with CRC check and self-healing mechanism);

• Advanced MCA recovery (Development of HP Superdome firmware X memory errors);

• Clock Redundancy (duplication of clock frequency generators);

• Partition and error isolation (passive midplein providing electrical isolation of blade servers).

Can your manufacturer of business-critical equipment offer such functionality?

In the second part of the series of articles on the HP Superdome X, we will examine in more detail the Advanced Error Recovery, Live Error Containment, Partition and Error Isolation mechanisms.

FAQ


Q1: Are there any open HP Superdome X system performance tests?
A1: Yes, HP Superdome X showed high performance in the standard SPECjbb2013 test, the first among x86 systems to overcome the 1 million jOPS mark.

June 2014 | November 2014 | December 2014

SPEC CPU2006 test

Q: I heard that as the number of processors in the system grows, the performance does not grow linearly, does it?
A: Yes, when using the standard Intel architecture, this is true, but in the HP Superdome X system, when adding processors, there is an almost linear increase in performance due to the use of the high-performance Crossbar architecture (factor 1.92x with the system growing from 4 to 8 sockets and factor 1.86x with the system from 8 to 16 sockets, confirmation can be seen from the test results above.

Q: Are there any open implementations of the HP Superdome X system among Russian customers?
A: There is, for example, the company MTS .

Q: Are there numbers on the HP Superdome X database performance?
A: There is, for example, for SQL 2014 .

Q: Are there any documents showing the HP Superdome X test on Oracle?
A: Yes, there is on Oracle 12c , there are real customers who tested their data on HP Superdome X under Oracle, the references are not public, but the figures are available during the discussion.

Q: Is installation of hypervisor supported on HP Superdome X?
A: Yes, for example, VMware, this can be checked in the compatibility matrix (http://www.vmware.com/resources/compatibility/search.php)

Read


» Running Linux on BL920c Gen8
» Running Windows on HP Superdome X
» Running SQL 2014 on HP Superdome X - reference guide
» Best Practices for Optimizing Superdome X Performance in Linux: NUMA, Power Consumption, Network, I / O

Source: https://habr.com/ru/post/262583/


All Articles