📜 ⬆️ ⬇️

Karmic incompatibility and other vicissitudes of service engineer



Recently, we told a couple of instructive stories about how things can go awry, no matter how much straw. We remembered another case, both funny and extremely dramatic, especially for direct participants. Although as time passes, they already recall those nightmarish hours with laughter.

We have one great customer, let's call him the Customer. And we worked in our company as an engineer N. A sensible engineer, literate, able to do a lot, successfully worked with various clients. And according to the decisions of one of the network vendors, he had quite a rare expertise at all - it is not easy to find such specialists on the market, even for impressive money. But when N fulfilled the Customer's requests, everything invariably ended with long and painful downtime. And each time it was not the lack of knowledge of the engineer N, but some omissions, random errors, oversights. As a result, the Customer asked us to replace this engineer. Again: the engineer was good, and there were no problems with any other clients. But as soon as it reached the Customer ... God knows, it is clear that they had some kind of karmic incompatibility.

It was necessary to close N access to all applications of the Customer and assign another specialist to this site.
')
Thus, at that time we had two engineers who were working on the systems of this company. The piquancy of the situation was that once, when it was necessary to work on the next request of the Customer, one of them was on vacation, and the second got sick. As an exception, we proposed that engineer N, already familiar with the systems of the organization, do everything. The customer agreed. There is a problem - it is necessary to eliminate. Restored N access, he executed the application. Everyone was glad that everything went smoothly. But literally two hours later, an angry call followed: engineer N did not control the mobile number segment, which caused him to disconnect ... They closed N access to the infrastructure of this client again.

As time went on, storms rebel storms / Dispelled former dreams. Customer's infrastructure grew at a very high rate - about twice a year. He was cramped in his local data center. The equipment no longer contained, and the load on the network was extreme. We decided to move to a new spacious data center. The migration project was entrusted to us. We have successfully worked it for six months, we have foreseen everything, we have prepared everything.

By the approved date of the move, we provided a list of employees who will perform the work. The engineer N. also figured there. The customer looked and doubted: “Again there will be problems”. However, he agreed, because the scope of work was such that a group of engineers had to work hard on the object continuously for two days. We agreed that N will only participate in the initial installation of the equipment, and other specialists will be engaged in lifting and adjusting the basic systems.

Here it should be noted that the timing of the launch of the new data center was very tight. For each hour of downtime after the deadline, we were threatened with seven-digit penalties. In general, we had no right to make a mistake or delay.

We arrived at the old data center, dismantled the equipment in two hours. They brought the first batch to the new data center and started to mount it. In order to raise critical business systems, they quickly switched equipment to a virtualization system, collected, launched, checked and waited for the appearance of links outside, so that business could start using appropriate applications. That is, everything is ready, we are waiting for the end of the installation of telephone gateways. It remains two hours before the deadline, after which huge fines begin. Everyone is very tense.

For the rapid transfer of the virtualization system, a sufficiently large IBM X series server was allocated. It was a necessary measure, because the customer did not have a backup data center at the time of the move, and he asked us to think of something. In about a week, we were able to select a single server that is suitable for the number of cores to run a business system. There was a lot of RAM on the server, we also scratched the bottom of the barrel and increased the amount of memory. By a strange coincidence, this server in the new data center was located in the same rack where the telephone gateways were mounted. And the installation was done by engineer N.

And then, quite suddenly, from a height of 38 units (ie, more than 1.5 meters), he drops the 10-kilogram telephone gateway clearly onto the virtualization server, on which the necessary software was already raised and set up while waiting for external links.

In the hall - deathly silence. Next to the counter was a trolley with a monitor for setting up the server. The monitor has gone out. A scattering of warning lights lit up, and the server went into reboot.

Silence exploded with mats, engineer N was removed from the machine room and sent to the ancestors for a walk. Began to check the server. It did not start either from the first, or from the second, or from the third time. The alarm display flashes on all processors and memory bars. They suspected that the motherboard had cracked because the cover that had been mounted on the latches had flown off the server from the impact. They were afraid that the server could not be started at all. And then, if you use the whole team, it will take about 20 hours to assemble, switch and test a new server. And each such hour will cost the company millions of rubles of fine. It was just painful to look at the project manager.

In about an hour, the guys from the service center went over the server by components and by trial and error found the faulty components. Several memory bars were damaged. They were replaced, the rest were reinstalled, they also reinstalled all the processors and reloaded the mezzanines. Checked all the connectors, all poddergali, wiggled. The server has started.

Began to raise business applications again. Initially, the server resources for them were end-to-end, so we had to artificially limit the performance. By the time applications were raised, external links were raised, and the data center was able to be commissioned two minutes before the deadline.

We had time.

"Do you have many such engineers?" - you ask. No, N was one of a kind, therefore it was remembered for years. But we nevertheless made the conclusions: 1) not every professional will suit a particular customer 2) karmic incompatibility still exists :). By the way, N today works in another company and he is doing well.

Department of Remote Monitoring and Administration of Jet Infosystems

Source: https://habr.com/ru/post/345322/


All Articles