📜 ⬆️ ⬇️

We test Chinese iron and find out how cheap and angry it is



No need to explain how sanctions have changed the Russian IT market. Because they haven't changed it that much. But, at least, the state of mind has changed: it was sanctions that generated interest in alternative brands. First of all - to the Chinese.


')
We decided to take a closer look at the achievements of the Chinese people's industry and test them in terms of typical computing tasks, and at the same time test how they will behave in the event of technological failures and other acts of vandalism. Below is the history of our testing of Inspur and Huawei products.


Everyone is talking about vendor replacement. It is in words quickly and easily. However, in practice, replacing the familiar, dear to the heart equipment of Western manufacturers with another, little-known is a difficult and risky step. Not so long ago, the “Chinese” stamp testified second-rate. And although nowadays many prestigious things are collected in China (where, for example, is your iPhone assembled?), It is still not easy to get rid of stereotypes.

Huawei brand has been known in the market for a long time, and first of all, in the segment of telecommunication equipment. Since the end of the 80s, the company has grown, successfully manufacturing low-cost competitors of Western types of equipment, primarily Cisco. And more than once she was involved in scandals related to unfair copying of technologies. Huawei got into the niche of server solutions approximately 8 years ago, and during that time managed to build up a rich portfolio of various hardware and software solutions.

Inspur, in turn, is known not so long ago outside of China, but in China this company with state participation has been developing the local IT market for a long time. In the 60's and 70's, it produced transistors, which were used, including in Chinese satellites, in the 80's and 90's the first in China assembled a PC and a server. In modern form, as a manufacturer of server hardware and related solutions, Inspur was founded in 2000, today the company is ranked 5th in the world in the number of server sales (largely due to the Chinese market).

We tested individual samples of Chinese equipment partly in our laboratory, partly at the Huawei site. But before proceeding to the description of iron and what we got up with, I want to focus on one more task that we pursued along the way. This is a global task, which is to gradually migrate information systems from proprietary RISC / UNIX platforms to a good and well-lived x86 platform.

There is a global trend to reduce the share of commercial UNIX in the server market. For more than 40 years, RISC / UNIX systems dominated the enterprise segment because they worked reliably even under heavy load and almost linearly scaled as processors and memory were added. But progress does not stand still, and the x86 architecture has begun to catch up with mainframe class machines, and Linux has increasingly improved its ability to efficiently manage the “big hardware”. Of course, to this day there are tasks where the RISC-architecture has no equal, when, say, 1500 database users are “hammered” into one point, which is typical of such systems as billing or card processing of the bank. However, various tasks, including databases, can now be successfully transferred to a cheaper and simpler x86 architecture. The servers of the standard architecture are intuitively clear to everyone who in the army as a child for a time dismantled and assembled a “pisyuk”. And Linux has long been no longer exotic, but a set of software that an inquisitive man in the street can easily deploy on a home computer, and especially advanced ones in a virtual machine. In contrast, RISC servers are not found in everyday life, and it is impossible to master them at home. Therefore, training a specialist who is able to manage a buzzing cabinet weighing a ton from the command line, to which even the usual monitor has nowhere to connect, is a separate and lengthy process. Yet not everyone is given. Along with the hefty cost of the iron itself, which costs as much as two used tanks and an ammo pack, a separate headache is the training and maintenance of specialists. Besides, they’re just trying to sniff and go on the high price to sell themselves to your “market partners”. So the trend to oust RISC-servers is quite natural.

So, about testing. Of course, we didn’t make any discovery by putting Linux on Intel’s hardware and chasing the base on it. And here I do not claim the award MUZ-TV. But on the whole, it was interesting to see how Chinese hardware will cope with typical tasks, as well as the software that has been installed on it. How to behave, so to speak, a red dragon in a coop. In relation to iron, we set ourselves the tasks to evaluate:


Inspur


The first copy of the production of the great Chinese corporation that we caught was the NX5440 blade server. This is the youngest copy of the blade line, there are also NX5840 and NX8840. The NX5440 server occupies one slot in the basket (half-height), and the NX5840 and NX8840 have two each. All three options are installed in a single I8000 chassis, as in our case. Our chassis was equipped with a control module and an Ethernet module.



I’ll make a reservation that Inspur in its x86 server product line offers Rack servers of the NF series, NP tower servers, NX server blades, and a high density disk storage server. All server types are built on a single chipset - Intel C600 - and use Xeon E3 V2 processors. Inspur also has the first and so far the only large UNIX server based on Itanium and CC-NUMA architecture in China - Inspur Tiansuo TSK1, scalable up to 128 physical cores with a UNIX-like OS of its own K-UX.





The manufacturer claims unprecedented scalability and adaptability of their creations, but on the impartial view of the creation in terms of printing and design features are not much different from their classmates, released by other vendors. Power supplies and fans are duplicated, memory with error correction, network interfaces allow you to create fault-tolerant aggregation. There is flexibility in the layout of the servers - they can really be assembled from a large selection of components.

The experience of working with these products that have been in our laboratory, however, showed that the very stereotypes about which I wrote above are not so outdated. Our tribulations began almost immediately after the inclusion of this economy. Immediately make a reservation about the documentation from the manufacturer. The information is mostly descriptive: there are such buttons on such a screen. The pictures in the user manual are screenshots from the Chinese interface. In general, it turned out to be easier to understand yourself, without a manual.



During that week, while we were testing this unit, we lost contact with the chassis control module (SMC) and the blade control module (BMC) three times. This attack happened outside of all laws. When this happened the first time, we summoned an engineer from Inspur to the rescue. By the way, Inspur is doing well with this, except that all the engineers in the Russian office are Chinese, probably because the office is open only recently. They leave for the site immediately with the translator.

The Chinese quickly realized that we were good guys, and happily repaired the blade-basket with a reshuffling of the control module. We have already repaired all subsequent cases of loss of communication in the same way: something was overloaded or reloaded. Here is such a "floating" nature of the problem - when it manifests itself spontaneously and is repaired by random actions.

The SMC interface is not rich in features. The most useful - turn on, turn off the server, restart its control module. You cannot change the network settings of a BMC blade module from it, it can only be done from the web interface of the management module of the blade itself, and the current IP address, if unknown, can be found in the BIOS.



In the "Ethernet Module" menu you can see whether the module is installed, whether its power is on and off if desired. To top it all off, this module turned out to be a hub, not a switch. However, now this model is already outdated, and Inspur supplies more advanced modules.

All options were enabled in the logging settings menu, but when I tried to view the logs, only the message “Empty logs” was displayed. In the user management menu, it turned out that it is impossible to create an additional user with administrator rights, and a user like “Common User” can in fact only switch menus and view, but cannot change anything. In the “Power Supply and Fan Module” menu, you can find out the voltage and “turns” and, if desired, manually set the last for each Fan Module. When remotely controlling the blade through KVM OVER IP, we quite often had various errors that were resolved by reconnecting, less often by rebooting the BMC. When I tried to install Windows, I needed a driver for the RAID controller, which in theory comes on the “Inspur driver CD” disk. However, the installation program from this disk did not work that way, although the CD carrier had no damage.

When we learned to handle this product and, if necessary, repair it, we hoisted RHEL 6.6 x86-64 OS on the server and put Oracle 11.2.0.4 DBMS on top. The Swingbench 2.4 software was used for testing, as well as the perl square root calculation procedure for warming up the CPU. The mentioned Swingbench (http://dominicgiles.com/swingbench.html) is a small free application, the standard way to give a synthetic load on a database that shows graphs.

The server successfully passed the test within a day. The temperature of the processor cores was kept at the level of 55-60 ° C at the temperature of the server cooling air of about 20 ° C. I will not specifically focus on performance indicators (IOPS, TPS, MB / S), because they all depend on the input data, on the settings. We did not set a goal to squeeze performance records from the server, the meaning of load testing was to test the server’s ability to work in real-world mode.



In general, we concluded that the equipment that got us on the tests looked damp. For the sake of fairness, it must be said that the loss of communication with the control modules, of course, could be a feature of a particular instance with which we dealt. Otherwise, this is a familiar x86 server with two Intel Xeon E5-2690 v2 and 128 Gb DDR3. But in general, Chinese manufacturers are improving their products at such a rate that in a few months we can face a new model of this technology, which will be spared from all children's diseases. Actually, for this reason, we recently became official partners of Inspur, watching as the vendor develops its line and seeks to sell it in Russia. Given the sanctions, another alternative supplier in our portfolio will not be superfluous.

Huawei


We had the E6000H chassis with four blades BH620 V2 installed (configuration: 2x Intel Xeon CPU E5-2407 2.20GHz, 16 GB RAM, 2 Gigabit Ethernet, 2 Emulex FC3532 HBA).


Photo Huawei from our laboratory. Rear view - so the angle went better

This iron belongs to the previous generation, while the current line is now v3. However, it was our laboratory equipment, which we did not feel sorry for. We put RHEL and Oracle Linux OS on it, on top of the same Oracle 11g DBMS, created a load in it using Swingbench, as well as our own scripts, emulating different load profiles: large write volumes, large amounts of data reading, heavy queries with heavy sorts. In the end, as optional information, I cited the sequence of our steps in load testing.

While the systems were working, processing transactions, we suddenly jumped out from around the corner, applied various violent methods to them. We pulled out on the go drives, network, cut down the electricity switch. We were mainly interested in whether the OS could start correctly, and whether the DBMS could recover from the resumption of work and access to data. On the classic industrial systems it happens in most cases. But how did the Chinese product behave?

Frankly, everything worked well, and in the end we had nothing to complain about. The hardware worked stably, did not show any daring tricks. The OS and the base behaved normally, practicing all the measures to restore after bullying. We didn’t manage to create data in the database.

We also felt a new generation of iron. This opportunity was kindly provided by Huawei at their demo site, which looks even futuristic in places.


Here it is - a demo platform Huawei

I must say that in the Huawei product portfolio there is a whole range of solutions: from servers and storage to virtualization software solutions. However, Huawei employees, despite our ardent desire to get acquainted with FusionCompute - a proprietary virtualization environment based on Xen technology - were not able to launch it during our stay with them. It is positioned as a replacement for VMware products, however, as you can see, it does not work politely. However, the Huawei hardware did exactly learn, and therefore the remaining time I will tell about our impressions about it.

Third-generation server rulers contain both Rack servers and Blade and High-Density options, thus providing a large number of competing solutions for medium and large businesses that are used to building infrastructure on machines like HP DL360e and 360p, HP DL380 Gen9 , IBM x3650 and x3850, Dell R730 and R930, etc. Used run-in technological solutions and the usual modern iron.



Huawei Rack Servers lineup


Intel's E3 entry-level processors are not, as are no AMD processors. In the new generation v3, DDR4 is supported, and support for 64-GB memory bars is announced, naturally, when they appear. Network interfaces at the servers are replaceable, LOM-card also includes the management port. SAS RAID controllers in the V2 generation are 6 Gb / s, and the V3 is 12 Gb / s. In either case, LSI controllers are used with the manufacturer's native firmware, and not modified by Huawei. A supercapacitor is used as the cache battery.

In general, the set of components in servers is standard and duplicated for fault tolerance. Power supplies and fans support hot swapping — just like adults.

Huawei servers can operate in a wide range of temperatures, up to 45ÂşC. For competitors, the mark rarely exceeds the value of 35ÂşC. These indicators were achieved due to a more thoughtful design of the motherboard and the location of the memory modules. There are no complaints about the server assembly, the whole construction and installation of the elements is reliable, without backlash, replacement of components does not cause difficulties.







Huawei Blade servers are installed in a new E9000 basket with a size of 12U, that is, only three of them fit into a standard rack. The number of Blade servers in the basket is 16 pieces, all installed horizontally. Blade servers do not differ from the Rack servers except for the form factor.

Huawei Blade Server Line


There are 4-socket configurations in a full-size blade that occupies 2 basket slots, as well as standard 2-socket configurations that differ in size depending on the number of disks. Blades can connect to the LAN and SAN through a large number of switches presented in the line. The general idea of ​​Huawei is to make everything convergent.

All Huawei servers, starting with V2, officially support the following operating systems: Windows Server, RHEL, SLES, Solaris, CentOS, Citrix XenServer, VMware ESXi, FusionSphere. To avoid driver problems, you need to install the OS using the FusionServer Tools Service CD. The solution is a boot image that will first copy all the necessary drivers to disk. Also, Service CD allows you to configure a RAID array on the drives in the server and create a logical volume of the desired size. In general, everything worked clearly and flawlessly. Drivers for all officially supported operating systems are constantly updated and are available for download through the portal. The E9000 basket management tool is called HMM. Since the first version of the MMO, only the graphical interface has changed. The rest of the functionality, of course, is still far from ideal. The information content of alerts, readability of logs from the recycle bin, built-in documentation, information on the server and its components, information on the mapping of server interfaces to the switches, and incorrect LDAP authorization remained intact.



iMana and iBMC are the managing web interface of the servers of the second and third generations respectively. iBMC is a continuation of iMana, also with the redesign of the remaining flaws.

We spent several days in the demo center, and besides the servers we also studied data storage systems and software. We also pulled wires and discs on the move, although no one gave us a complete repetition of inhuman experiments with other equipment. On the fourth day, for example, we still wanted to test the equipment for moisture resistance, but for some reason the vendor’s employees did not allow us to unwind the fire hose.

Already no joke, I will say that in general, Huawei products have left a favorable impression of themselves, despite the small roughness. There are some shortcomings in the management interfaces, English-language documentation still leaves much to be desired, and, obviously, not much information can be gathered from the forums. Otherwise, these are workable samples built according to all the modern rules of server hardware.

Laboratory Chronicles


I will give the methods of torture that we used to servers running Oracle 11g on them. Here is the sequence of steps for load testing:
  1. Swingbench with heavy settings. The goal is to test the operation of the system under a diverse load, remotely resembling the work of users in a spherical application in a vacuum. We installed and configured Swingbench, set the parameters more complicated and run. If it did not fall, then gradually increased the load - and so on until the complete exhaustion of productivity / resources.
  2. Large amounts of records in the database. The goal is to make sure that the server-OS-DB bundle remains in a healthy state with overwhelming recording volumes. To do this, we wrote a script that inserts a large amount of data into the table (something like insert into test_table01 as select * from dba_objects a;). We run this script at the same time with a large number of threads, each of which frantically wrote to its table.
  3. Large read volumes from disk. The goal is to make sure that the server-OS-DB bundle remains in a healthy state with over-the-top read volumes (without cache). To do this, we set a small value for the buffer_cache parameter, create a huge table without indexes. And then they began to select from there on a random key value and so many times in many threads.
  4. Heavy queries with heavy sorts. We write maximally crooked queries on huge tables (using full join, for example, or simply complex and non-indexed). We start a lot and in parallel. This in itself is a serious burden, it was interesting how to cope.
  5. Concurency, latch, mutex. Our goal was to see how the base will work with high competition for unshared resources. To do this, we changed a) a single row of data in many streams, b) a large, but always one, data set.


Instead of conclusions


Of the many options that the engineer may be puzzled at work, testing of new equipment seems almost the most exciting thing to do. But in this case, we were dealing not just with a new model of a well-known brand, but with fundamentally new market representatives, who may in the near future take their place in the computing equipment park. The x86 platform seems familiar, but, figuratively speaking, it should be eaten in Chinese with spicy sauce and chopsticks. And for this you need a skill. We have taken the first step in meeting the Inspur and Huawei servers, but we are not going to stop at this. And with a number of companies that have already turned to us on the topic of vendor replacement, we are starting a series of pilot implementations, where real problems will be transferred to Chinese iron. In general, if interested, please contact.

Source: https://habr.com/ru/post/258433/


All Articles