Each administrator once stops responding to a server on the other side of the city. On Friday evening, you have to drive along the muddy spring city, since the patient is not available via the Internet and there is no sane staff nearby. I remember how in one of these cases it was a shame to see "Press F1 to continue": the UPS failed and the old ProLiant, after a reboot, thought that "Power Supply Failed". Well, one had to forget to turn off the notification while waiting for a reaction in the BIOS!
But there will not be an ode to IPMI, but some personal rakes and tricks of remote recovery of network and server hardware.
For example, I am lazy to go somewhere far to update the firmware of a router or add a few rules to the border firewall, so I do it remotely. Here it is important to treat the process extremely carefully and carefully think through each step so as not to suddenly be cut off from the console settings, and from the network in general.
A backup link on an additional router or automatic rollback of incorrect settings insure against such failures. The option with the settings is more interesting, so I’ll tell you a little more about how to save the devices that fell into the hands.
The DFL series has little in common with the usual D-link and its strange glitches - the devices themselves and their software were developed by Clavister.
Such devices have the simplest protection mechanism: if, after applying the settings, the administrator could not reconnect to the web interface, the new parameters are not saved. Reliable like a bayonet and no extra gestures.
In the products of our Baltic brothers there is Safe Mode . Since all changes made to the configuration are applied immediately, they should be done only in Safe Mode, after which the device should be correctly switched to normal mode. Only after moving back will the new configuration be recorded.
If Safe Mode was not disabled before terminating the setup session, then after 9 minutes the configuration will be replaced by the one that worked before enabling the safe mode.
Relatively recently, Cisco added a mechanism to automatically roll back timer settings if a configuration archive was configured. Before important changes you just need to set the time after which the device will cancel the settings made:
configure terminal revert timer X
Where X is the time in minutes.
You can disable the timer after successfully applying the settings with the command confirm confirm . For more examples and details, see the Cisco documentation .
With the cancellation of the unsuccessful process of updating the firmware, things are more complicated. If there is no fully independent channel to the remote branch where you want to update the router's software, it is better to go on a trip and do everything personally.
If there is no IP-KVM for a specific server or the channel is not wide enough to work remotely with the ISO distribution, then it is worthwhile to consider alternative ways to load recovery tools. As a primitive solution, you can put a bootable USB flash drive with a set of familiar utilities next to the server rack. You can activate this emergency dialing with the help of Smart Hands - the most "technical" employee of the branch, who will press the buttons on your pointer.
Of course, the most popular remote boot method begins with the web interface of the BMC module.
BMC (Baseboard management controller) is a mini-computer installed on the server board and working autonomously, even if the entire system is turned off. With this device, the server is diagnosed and controlled through several available protocols, the most famous of which is IPMI. Depending on the manufacturer and model of the server, the controlling module can provide access to the console via IP-KVM. A special case of the implementation of the BMC are:
HPE iLO.
Lenovo IMM.
For example, for HPE ProLiant DL380 Gen9, it looks like this:
But even more interesting is the download of recovery tools via PXE:
There are a lot of instructions for this case , so I’ll just note that when using BMC without IP-KVM, you need to configure the start of the image without asking any questions.
It is not necessary to keep a separate server for network boot. For example, the already mentioned Mikrotik routers can cope with the distribution of the image over the network. With their help, you can even make a ready-made "rescue module" with a backup wireless channel and different images.
Then the IPMI command to enable PXE boot will look something like this:
ipmitool -H <ip> -U <user> -P <pass> chassis bootdev pxe
Other ways to control downloads via IPMI can be found in this article .
Attention should be paid to the very image. In addition to providing access to the system being launched over the network, you need to not forget your favorite set of tools for resuscitation. I usually use Microsoft DaR T , so I’ll tell you more about it.
With DaRT, you can repair any problem Windows system, but many use the tool only locally. To fix the situation remotely, someone local must run the Remote Connection tool on the patient and give you the ticket data (address, port and number). Then you can connect to the system using the DaRT Remote Connection Viewer and do all the necessary recovery procedures. The file with the ticket data after launching Remote Connection is created in Windows \ System32 \ inv32.xml .
For automatic support of remote connection, a ready image from the Network will not work - you will have to do it manually:
Download the image creation tool;
At the Remote Connection step, turn it on;
Create an image and connect it;
[LaunchApps] "%windir%\system32\netstart.exe -network -remount" "cmd /C start %windir%\system32\RemoteRecovery.exe -nomessage"
in this line, copy the% windir% \ system32 \ inv32.xml file to a network folder or // send it to the admin "
"%windir%\system32\WaitForConnection.exe" "%SYSTEMDRIVE%\sources\recovery\recenv.exe"
After all this, collect the image back. It remains to take the inv32.xml file from the network storage and specify the data to the remote connection tool.
Almost all server manufacturers supply their hardware management module (BMC). But some sell some of the features separately, so be sure to buy a license or learn a free set of tools. For example, console commands are usually available for free.
If the BMC has IP-KVM, then in most cases you will be able to restart the system, as well as boot from the network or from an ISO image. All this without using the risky resource Smart Hands. By the way, “the most intelligent employee” can easily say something like “We learned German at school - I don’t understand what’s written here”, which will destroy the whole crystal world of disaster recovery.
But there are pleasant exceptions. In one emergency situation, he asked for the telephone the most computer-savvy employee. The bright eastern accent from the tube immediately somehow tuned to the pessimistic mood ... But it turned out that Fayzulla graduated from a technical university in Tashkent, works as a freelance programmer at night, and during the day he works in a warehouse — so to speak, he rests with his brain and saves on fitness.
The connection through the BMC should be perceived as a kind of backdoor, which differs from the vulnerability only what serves all of your goals. Therefore, to put it in a public network is not the best idea. Even inside the local network, the control interfaces should be output to a separate VLAN with access only from the admin machines. In the case of a remote object, you must also configure VPN access.
The ports for the graphical console and Remote Media forwarding for the three popular vendors are as follows:
IMM port | ILO port | IDRAC port | |
Remote media | 3900 | 17998 | 5900 |
Remote console | 3900 | 17990 | 5900 |
More complete information on network settings is in the official documentation:
When planning a network configuration, it is worth remembering that the BMC usually operates in one of three modes:
Dedicated - when the port is used for control only;
Shared - IPMI runs on the LAN1 network interface;
Using Shared or Failover modes, it is possible to accidentally release the BMC to a common network. Therefore, wherever possible, you should use a separate interface.
As an additional control lever, you can use a GSM-socket:
The power management module with a SIM card allows you to restart any connected devices, regardless of the availability of the main Internet channel. But such a restart can negatively affect the integrity of the server data, because everything happens only in “hard” mode. For switches and routers, this emergency SMS switch via SMS is very convenient.
If the server is still purchased without any BMC implementation, then there are a couple of fallback options:
Separate IP-KVM device. You can connect many "unmanaged" servers;
BMC module on the PCI-E bus , which can be installed in almost any machine;
By law, Murphy, at the same time with a server crash, the Internet channel or router may fail. Therefore, it is a good idea to use not one router with two channels, but several separate devices. A good option is a 3G \ LTE modem with support for VPN tunnels - the speed provided by these networks is enough even for a remote desktop session.
In the era of universal “digital curiosity”, it is archaic to use a leaky server management protocol, and even using UDP. In addition to convenient web interfaces, manufacturers often provide alternative management methods:
For example, the PowerShell command to reboot using WS-Management looks like this:
Restart-Computer -ComputerName "Server01" -Protocol WSMan -WSManAuthentication Kerberos
reset
Almost all vendors provide such alternative interfaces, but they are more often used by monitoring tools, and not by the administrators themselves.
Of course, cluster solutions successfully save from many problems. But we do not live in a perfect spherical world, and not all services can be arranged reservations. Therefore, the nuances of "cunning" restorations also come in handy at least once for everyone.
You probably have your own tricks for the case of "ping gone, and go far" - share with colleagues in the comments.
Source: https://habr.com/ru/post/315710/
All Articles