Among other things, we support Vatsap. Evening, nothing foretells problems, as a video call window suddenly opens here. Close-up - telecom equipment installed at the customer site ... And it is on fire. Literally. You can see the light, it seems - the insulation of the wires around the power supply. The man asks what to do. I shout:
- Carcasses!
Is he:
- Can i?
- Can!
And only then he puts out.

It turned out that not everything can be extinguished with conventional means: in response, it can hit a couple of tens of thousands of volts. Or, in general, suppression will interfere with the operation of important equipment. In general, he saw a fire, called for support, and while the connection was established, he found and prepared a fire extinguisher.
')
In general, hello, Habr! I am from a remote technical support team, and we often communicate with users throughout the country and abroad. And they do pretty weird things. Below are the stories.
What are we doing and what is it
CROC may take on support offices, production and individual services. We have been doing this for many, many years. There is a call center team that responds using standard scripts and helps in typical situations, the second line (this is me and colleagues) - to analyze difficult cases when you need to climb to the network, server or application software, and mobile engineers who ride and change iron. Plus, the reboot command in each city, but about it will be next. There is a lot of romance in the work, because we often do very dense SLAs for banks and retailers, we support transport infrastructure facilities. For obvious reasons, I don’t mention the names of the customers, but the security officers have changed some of the not very important details so that no one can be clearly identified.
Heat
In the peaks of heat, the connection with one of the local servers is lost. There are many such servers at the facilities, they are mounted quite compactly in technical rooms, and there are difficulties with cooling everywhere, and the external forced one is often used. Well, that is a powerful fan, aimed directly at the rack. Colleagues call it the buzzword "freecooling", but this is a fan aimed at the rack.
But this does not happen every day in the heat, but only about every second. We begin to understand - sometimes, as in a detective story: it turns out that two people work there in the same room. One specialist knows what a rack is, or he is close to guessing about the mysterious connection between flashing lights and a fan. The second specialist is a grandmother. She does not know. And when the heat reaches the maximum, the grandmother feels the thermal threshold, then takes and turns the fan towards herself. Because her little fan is not so powerful.
The logical consequence is that the grandmother is cooling, the rack is overheating. Further along the temperature threshold, a regular thermal shutdown occurs. And we have another ticket.
The case is not uncommon. We write memos and train key people of the customer, and they must train linear people. But it does not always happen correctly. In another similar room, the desk was turned off at night for six to eight minutes. Then they found out: they did not warn the new guard, he chopped off the rack’s power from the outlet, turned on the kettle, and then returned everything as it was.
There are just strange inputs. Another unfortunate electrician brought the air conditioner to the light switch in the technical room. While there is someone there - everything works. People leave - stand off. As a result, there is now a sign hanging "Do not turn off the light !!! I will pull out my hands !!! ”It seems that the electrician has already been pulled out, so he cannot do the correct layout, he has to decide with this crutch.
Permission to use the toilet
We send an on-site engineer to service one of the nodes of a large network. The girl engineer goes to the place. I must say that this is a very peculiar room with high ceilings, which was built during the time of the birth of the USSR. After several renovations over the men's room, they made a space above the cabins where equipment can be laid. The frequent situation in the country, by the way: there is not enough space for iron, they make a “false ceiling”. For some reason, usually there. I myself connected the switches a couple of times while standing on the toilet.
The girl comes to the head of the object and asks for permission to visit the men's room. At first, people for a long time do not understand why she needs it. Then the bureaucratic machine turns on: an unfamiliar case, and no one knows what to do. In the end, it cost her great work to do everything right. The guys just closed the toilet at the time of the work officially and allowed to create anything inside.
In retail networks, for some reason, equipment is often mounted near water or funnel pipes. In a pair of server and indoor watched as the water flows. The last case was generally seen on monitoring cameras: it starts to rain. There is a rack with equipment (naturally powered), next to it there are three basins, and from the ceiling it drips evenly and monotonously. Everything worked out, and, it seems, this situation confused only us. Only our engineers were worried about the customer.
Another time the pipe above the server broke. The engineer directly on the video removes the switch from the mount, turns it over - a glass of water pours out of it. Tellingly, the switch continues to work. We brought him to our laboratory, and the customer was given a new one in return.
Somehow, the telecom equipment survived after the launch of a powder fire extinguishing system in one of the customer’s offices. They just shook out all the powder (it was quite difficult, I had to disassemble), but the piece of iron itself still works.
Teachings
Audit of network equipment on one protected object. The head of the technical part is in front of the commission. Protected. At the end complains:
- The food we have from the city is bad, constantly the voltage is not right. Now, if you take the plug, insert it into this outlet, it is usually bad. Rack cuts down.
And inserts a fork to show.
The rack was not only knocked out, but also the gateway was disabled, and then the server. On the server, the hard drive burned, where the applications for managing the object were spinning. Everything fell just reinforced.
The commission was reassigned the next day. And we had to raise new equipment overnight and bring it into place.
In a similar case (only there was a real power failure, and not such exercises), the object was served by a large domestic provider. Very large and very domestic. We open an application for the fact that their equipment has burned. They have SLA - eight hours. Answer their support:
- Well, yes, we know that there the iron broke. Do not you see, we have lunch? Tomorrow or the day after, the installer will arrive.
It turned out that they have an SLA, but there is no penalty for violation.
The second case with the teachings was this. Bank. Two o'clock in the morning, the application for a critical piece of iron. Four hours to replace. With shouts: “Colleagues, everything is lost!” (But only in one word) - we dial up to the Americans, they say where to pick up a piece of metal in Moscow, go there, collect, while a colleague crawls on his knees in front of logisticians. We have time. In an hour and fifteen we bring them. They don't even let us into the building:
- Thank you, but we no longer need it.
- Guys! What was it?
- Teaching!
Homeless SMS
We support a foreign mobile operator. One of the services that we have on monitoring is the transformation of SMS in the spirit of "The subscriber tried to call you, but he does not have money" in an unanswered call. That is, instead of the message comes unanswered, but the phone does not ring. The operator, by the way, considered that the probability of a callback was much higher.
One day, all transactions disappear from the schedule. Just no calls without money at all. We begin to understand, but can not find the ends. Only an hour later it comes that there are no calls in the country at all.
And then they start at night. This is the Muslim holiday of Ramadan, and the schedule of calls has twisted. We have this happen on the New Year, when on January 1 in the morning almost no calls, and there it happened in the spring.
Even with foreign customers always need to check their engineers, where exactly they are connected. One Swedish vendor installs systems for managing people. In Russia - two installations. One is asked to upgrade to the latest version, because they need some kind of new feature. The other has been working steadily for almost half a year, and there are no questions there. The Swedes connect, silently update the second customer, report to the first about the update, close the case.
We are preparing to apologize and compensate (because the system did not work for 20 minutes for the second, and now it will be necessary to reconcile the new window with the first), when it suddenly turns out that:
- The first customer is satisfied and confirms the ticket.
- The second did not notice any downtime.
We did not tell anyone then, but it was very strange.
Shooting legs
When a customer on support is hosted in the cloud and asks for direct access to the car instead of describing to us what will happen, we make bets on how quickly they will shoot their legs there. This case is not the first or even the hundredth. Customer admins regularly lose remote access to the machine for a variety of reasons. Here's a fresh case: they set up a new authentication there, and she took and dropped the current users. And in order to get this authentication and re-forward remote access, you need to somehow get inside and configure everything first. In general, setting the firewall for remote access - to a long road.
We in such cases hire a reload team. That is, the admin who can restart the server or play a remote-controlled robot with Vatsap. This is so that when you set up something in Khabarovsk, then not to go on a business trip at night to Khabarovsk.
For a new network hardware and normal configs, a large vendor has a full-time command to roll back to the previous config. Activate the timer for half an hour. If this task is not canceled in half an hour, then there will be a restart and restoration of the previous version. If everything is well set up - check (twice) and cancel this task. When I am sure that everything works.
Sometimes you need to go to put the equipment. We have a guy named "13th". Because when a business trip to Surgut came out, he had already collected a piece of iron to the airport, and on the way he was told that the same piece of iron is much more necessary for the same customer in Krasnodar. And changed the ticket. The second time he came for a replacement, and there everything rose during the flight, and he sent us photos of feet on the beach to the working chat.
But the best case was this. Before leaving, the customer picked up and deleted the connection between two working servers in a pair. We sit, the request comes: "Nothing works." Connect, look:
- What they were doing?
- Before leaving home, I deleted the connection between the servers.
- What for?
- Why was it so impossible?
Do you have binoculars?
When we tested the recognition system of people who climbed over the fence for a single transport company (recognition for video surveillance), we somehow left in the mornings to mark the places for the installation of video cameras. It was important to find the “hares” and not to frighten them away, in order to put the cameras in places of frequent over-climbing. They took the binoculars, but he was not needed, because the “hares” did not hesitate or scare away anything.
Last month a photo studio opened in the building opposite our office. With large windows and natural light. They regularly remove naked or very conditionally dressed models, but their faces are not visible from afar. Therefore, binoculars proved to be in demand. On the day of particularly hot shooting, several tickets arrived at once with a request from colleagues from the office.
Under control
I came to a customer who has many offices in the Russian Federation. There is a main server in Moscow and many connected from additional offices in the Russian Federation. I pick one of the regional glands. A local manager comes up to me and says:
- Too long picking.
- Well, this work.
- You understand that this is under control of the very ...
- The president of the company?
- No, at the very ...
- Specifically, this server?
- Yes.
I laughed. He is such a:
- Wrong doing that laugh.
And left.
And I thought it was a dangerous job. Maybe he really is in control. Maybe I could get such a boldness in the face. Personally from the very ...
Wi-Fi
The customer non-stop opens incidents on problems with wi-fay. And I must say that this is a large hangar, there is a warehouse in the hangar, and there, because of the metal shelving (there are blanks for the plant), they did not always reach the center. We did them a quick radio survey and recommended what and where to put it. They reported that they did everything for him. And here, it seems, the central access point does not cling and constantly disappears. Sent there a mobile engineer. It turned out that at that moment, when the location of the points was calculated, there was a crane in the center of the hangar. Actually, the installers of the customer liked it very much, and they fixed the point directly to it. A crane goes through the warehouse, and when it leaves in one direction, the other network is no longer there. For some time they tried to understand why the network either disappears, then repairs itself, and then knocks on us.
Best case
Complex application, we deal with the user for almost half an hour on the phone. I already curse everything, because this is the very case when a person cannot clearly articulate what exactly he did. And does not report all that he sees on the screen. And it does not say everything that is doing right now. I already have a presentiment that the need to do everything slowly and consciously infuriates him no less than I do. But for another reason. And during the next explanation that if he does not read everything he sees on the screen, I will not be able to help him, he suddenly reports:
- Sorry, we have a fire here.
And hangs up. In the ticket, I wrote "the building burned down with the equipment" and went personally to check - and then you never know ...
Links