📜 ⬆️ ⬇️

Errors and problems of the Big Three servers: part two. HP



We continue the cycle of publications about the problems encountered in the preparation of refurbished servers . Earlier we wrote about DELL servers , this time it will be about HP products. All these problems were solved by our engineers, and this is only a small part of the surprises that servers of this vendor can bring. However, if you yourself are engaged in server maintenance, then perhaps our experience may come in handy.

RAM


When upgrading HP servers (and not only), there are often difficulties with the selection of RAM. As practice shows, even experienced system administrators and engineers are not always well versed in this matter. If, on a whim, to install memory modules, then, most likely, the server simply will not start. If the RAM configuration is incorrect, a softer version is also possible: the machine works, but not with maximum performance.

For HP multiprocessor servers, as a rule, it is necessary to use only register memory with error correction function (ECC RDIMM), and for single-processor ones - unbuffered with ECC (UDIMM). Although the official manuals say that UDIMM can be installed in multiprocessor servers, it is not worth doing this for several reasons:
')
  1. Memory limit. Typically, this is 24-32 GB on the CPU.
  2. UDIMM strips, as a rule, should be native HP, otherwise spontaneous server restarts may occur. This phenomenon is recorded on at least three models: DL380p Gen8, DL360e Gen8, ML310e Gen8v2. At the same time, you can easily install RDIMM-memory of any vendor.

    The advantage of UDIMM memory is that it operates slightly faster than RDIMM, in which the operating buffer delay is present. However, with proper memory configuration in multichannel systems, RDIMM can outperform unbuffered memory in performance. You cannot install RDIMM and UDIMM modules at the same time.

    You can distinguish UDIMM memory from RDIMM by label. For example, if it is written 12800 R , then it is a register memory, if it is 12800 E , then it is unbuffered with ECC.

    When installing RDIMM, preference should be given to single- and two-rank memory (1rx4, 2rx4). Unlike the same IBM (Lenovo), HP servers are sensitive to memory configuration. When installing modules, it is recommended to evenly allocate memory between server processors and between channels. Otherwise, the server may simply not turn on, or its performance will be reduced. Voltage strips in HP servers is not fundamental, but still try to set the same voltage on the slats.

    Information about the optimal placement of RAM in DIMM slots is always under the cover of the server and in the official manual.

Please note that HP servers prior to Gen9 do not support DDR4 memory. Therefore, first check what memory is compatible with your model. To select the correct configuration, you can use the company's online configurator .

When it comes to upgrading or repairing servers, the eternal question arises about the manufacturer of components. Someone uses only original components, regardless of costs, and someone selects compatible components from third-party manufacturers. We consider that here it is necessary to take into account:


HP servers can safely use the memory of different manufacturers. The main thing is that the modules have the same technical parameters. For example, if several 4Gb 1Rx4 PC3L-10600R modules are already installed in the server, you need to increase the volume using memory with the same parameters. A manufacturer can be any.

Drives


When choosing new drives for the server, it is more difficult to make a mistake than when changing the memory configuration. But still there are its pitfalls and, partly, myths.

There is an opinion that for HP servers you need to buy drives exclusively from the same manufacturer. This is justified by the fact that all drives with the HP logo have firmware. In this case, "native" discs are significantly more expensive. And, frankly, speaking, it is a dubious pleasure to overpay 2-2.5 times. However, Hewlett-Packard itself does not manufacture drives, it orders them from other vendors. And as experience shows, in many HP server models it is quite possible to use HGST, Toshiba, Seagate, Western Digital products.

When choosing drives, specify which drives your server's Raid controller supports. Some controllers do not support SAS storage devices, drives with a volume of more than 2-3 TB may also not be supported.

If the server does not see a third-party drive when it is connected, then most often this is due to a malfunction of the drive or the Raid controller. One more important detail: in no case do not put disks for desktop systems in enterprise servers. Judging by our experience, we can distinguish several of the most popular models of non-native disks, which will work without any problems on servers from G7 to Gen9:


Processors


When replacing processors with more powerful ones, you need to find out in the server specification which processor models it supports. Do not forget to take into account the supported TDP of the radiator and the CPU itself. In most cases, this helps to avoid potential problems.

However, when increasing the number of processors in no case should we neglect the installation of coolers on each of them, relying on the air conditioning of the server room. Each fan cools certain zones on the motherboard. Without regular cooling, the risk of temporary overheating of processors and RAM, up to server outage due to melting or burning electronic components, increases many times.

After installing two processors in the server by an order of magnitude more powerful than one stock, it may not turn on. For example, in our case it was with the HP ML350p Gen8 server. The reason is that some models on the motherboard have a fuse that blocks the power supply if the required voltage exceeds a certain basic threshold. If this lock is triggered, the only option is to only replace the motherboard. If the server is not under warranty, then it can cost a pretty penny, since HP is famous for its rather big prices for its hardware.

However, there is a technique for circumventing this protection. Suppose that instead of one or two processors of the initial level E5-2609 (v1 / v2 / v3) you need to install two productive E5-2690 (v1 / v2 / v3). To avoid problems when upgrading it is best to do this:

  1. Update all software to the latest versions (iLO, BIOS, AHS, etc.)
  2. Wait until the server is fully initialized with both E5-2609 installed.
  3. Install two "intermediate" level processors, for example, E5-2640. Wait until the end of POST-check.
  4. And only after that set the desired E5-2690.

Do not forget that all the firmware should be the latest versions.

Intelligent Provisioning and Server Update


HP ProLiant Gen8 and Gen9 servers use the powerful Intelligent Provisioning tool to configure the server, update the firmware of some components, and control the hardware part of the machine. Sometimes when I try to update, I get an error about the inability to connect to the HP database. The reason is the outdated version of Intelligent Provisioning itself. You can update it as follows:

  1. For Gen8, download the Intelligent Provisioning recovery media version 1.62b image, and for Gen9, download the latest version.
  2. Mount the image using iLO or burn to a CD / DVD. Do not write the image to the USB flash drive, when running from it Intelligent Provisioning will not be updated.
  3. When loading the server, select the One Time Boot to CD-ROM option.
  4. When the server boots from a disk (or image), then in the case of Gen9, select Interactive HP Intelligent Provisioning recovery media from the menu. On the Gen8 server, the update will start automatically.
  5. On the next screen, click the Reinstall Intelligent Provisioning button, wait for it to complete and reboot regularly (Gen9 only).

Many owners of Gen8 and 9 generation servers are trying to update the BIOS using Intelligent Provisioning. But this tool allows you to update only the firmware iLO , network card (Ethernet) and in some cases - Raid-controller.

There are two options for a full server update.

  1. Manually download and install all the necessary drivers and firmware for your server model. This option is convenient if the server is one and it already has an OS.
  2. If there are several servers and Windows is deployed on them, then it is more expedient to use the Service Pack for ProLiant (SPP) service pack.
    • You need to download the service pack image.
    • Install the HP USB Key Utility program for Windows .
    • With this program, we deploy the image of the service pack on a flash drive with a capacity of at least 8 GB.
    • We load the server from the USB stick. We recommend choosing Interactive Firmware Update, so you can control the update process.
    • After downloading the client, select Update Firmware. When the equipment is checked, the system will offer a list of updates that will be installed after clicking on the Deploy button.
    • After the update is complete, you must reboot. The server will turn on and off several times, installing the firmware, after which a regular download will occur.

Network adapters are not detected.


If you update the Emulex drivers for network adapters from version 3.x.x immediately to version 10.x.x, then at restart the network adapters may stop being detected. To prevent this problem, it is recommended that you first install Emulex 4.x.x, and then the latest version. You can avoid this error in another way: first upgrade from the OneConnect image, and then with the Service Pack for ProLiant. And if the adapters have already ceased to be determined, then simply upgrade from the OneConnect image.

"Feature" servers HP DL360p Gen8


Initially, the model of this series was designed for E5-26xx processors of the first revision, but in 2013 Intel released the second iteration - V2. Vendors, including HP, began to update the line. Dell and IBM did not begin to change the engineering base, only motherboards began to wear another part number. And HP went the other way. As a result, the market has two models of HP DL360p, no different, with the exception of fasteners radiators. In the first version, the lever mount, in the second - screw.

In fact, a trifle. However, it may bring additional costs. Therefore, if you decide to install a second processor, be sure to find out the revision of your server (by serial number, or by looking under the cover).
Part number of the old lever radiator - 654770-B21.
Part number of the new screw radiator - 712731-B21.

Insufficient number of power supplies


Some owners of HP servers with x4-backed-up backup power supply (RPS), for example, ML350 Gen9, are wondering why starting up the machine requires connecting at least three power supplies whose total power significantly exceeds the maximum current server consumption.

The fact is that in ML350 Gen9 up to 9 PCI-E cards and up to 6 HDD backplanes can be installed (or, for example, an internal streamer + 5 HDD backplanes). And all this can consume a lot of watts. RPS backplanes allow you to provide excess server power in case of a sharp increase in load, and hence energy consumption. Power supplies are connected to the backplane using the N-1 scheme, where N is the total number of connectors. If you need extra power to the server, then the power supplies must be connected to all the backplane connectors. If excess power is not required, then to start the server with x4 backplane you need three power supply units, and with x2 backplane you need one unit.

Management Error via IPMI


IPMI can be used to remotely manage servers. There may be situations when it is impossible to establish a connection with the IPMI server service:

ipmitool -I lanplus -H $ip -U $user -P $pass
Error: Unable to establish IPMI v2 / RMCP+ session

There can be two reasons:

  1. The service is disabled for the sake of increased security: IPMI v.2 has a potential RAKP vulnerability (Remote Password Hash Vulnerability) . You must re-enable the service.
  2. Used account has no administrator rights. In this case, the account must be granted the appropriate rights.


Chaotic server reboot


This problem is rare, and is expressed in the chaotic self-reboot of the server. There are no errors in the OS logs, iLO logs are usually nothing critical either. In such situations, software upgrades, power supply and UPS cable replacement usually do not help. The problem is solved by changing the power management settings in the server BIOS. In short, all CPU clock reduction mechanisms are disabled:


Failure after shutting down the server


We have encountered several cases when the LEDs turn on when the server is turned on, but there is no video signal. The machine does not ping, iLO does not respond, although iLO and Ethernet are active on the LEDs. Keyboard and mouse do not work. More often than not, this happened after a regular shutdown of the server, without any manipulations, without power failures. Similar failure was noted on servers of generations from Gen5 to Gen8.

The exact solution to this problem, as well as its causes, has not yet been found. In one case, it helped to turn all the “System Maintenance Switch” switches to the ON position, and after a while back to OFF. Once the server came to life after swapping memory modules. Unfortunately, in several cases, the servers could not be restored.

Strong noise of the cooling system


This problem most often manifested itself in ML350e Gen8 servers. Immediately after turning on the server, the fans go to high speeds. Rotation speed does not decrease with any load. The result is a constant and high level of noise.

In some cases, the problem was solved by removing the PCI-E expansion cards: network and USB hubs. But this problem was encountered in servers without installed expansion cards. Dismantling and re-installation of all fans and their baskets, with the reconnection of the power wires, helped several times. Once the fans returned to normal speed after updating the firmware and resetting the iLO. There was also a case when the cooling control setting was changed in the BIOS, and it was enough to change the value from Increased to Optimal Cooling.

Reset configuration in Gen8 servers


Finally, we want to tell not about the error, but about the feature of HP servers of the Gen8 and Gen9 generation: there are no usual configuration reset jumpers on the motherboards. If you need to use a reset, this can be done as follows:

  1. Shut down the server and unplug the power cord.
  2. Locate on the motherboard a group of small switches “System Maintenance Switch” (see the image on the inside of the server cover).
  3. With the help of a thin tool - pens, sewing, needles, etc. - translate image switch number 6 to ON.
  4. Connect the power cord to the server.
  5. If an image appears on the screen and the reset process begins, wait until the NVRAM clear procedure is completed and the server is restarted. If after connecting the power cord on the screen for a long time nothing is displayed, turn off the server.
  6. Shut down the server, unplug the power cord.
  7. Return switch number 6 to the OFF position.

Installing a second raid controller on Gen8 and Gen9 servers


When installing a second raid controller (for example, one raid on the system, the second on the data), the server may hang during the OS boot phase or not pass POST. This is most often due to an incorrect boot queue.

To solve the problem, you need to do the following configuration:


Benefits of HP Servers


It would be unfair to talk only about the problems of HP servers, because it is not for nothing that the products of this manufacturer are highly popular. Servers of the Proliant series are considered among the best in their class, and certainly will be remembered for their reliability, rather than falling off iLO and somewhat overpriced. It is HP that often sets the bar in server functionality and fault tolerance by offering non-standard, but effective engineering solutions.

Here are just a few of the benefits of HP servers:


If you encounter any errors in the HP servers, but ultimately won, then share in the comments. Thank.

Source: https://habr.com/ru/post/302360/


All Articles