📜 ⬆️ ⬇️

Network Optimization Practice Stories

Hello! My colleagues and I optimize the available channels. We are constantly confused with optical seals, sometimes with cable engineers, then, in general, with loaders.



And our work is the analysis of the protocol stack and its complete redesign for the channel features, setting the optimal frame size, collecting several packets on the channel with large latency into one, deduplication, normal compression, SSL parsing and reassembly with the same certificate. This is solved in the simplest case by installing a special iron on the receiving and transmitting sides. Since we must get to every point, we also work as field engineers. And, like any field engineers, our stories are sea. Below, I will tell you a few, I will only change a number of secondary circumstances so that you cannot recognize the customer.
')
For example, there are night work in a very large store . The customer's admin and our engineer went to the server room at one in the morning, they work. The engineer went to the toilet, returned. A few minutes later - a knock on the door. Open - and immediately flies in the GBR with machine guns, immediately with their feet on the body, face into the floor and in handcuffs.

Then the police and the main store drive up. The chief assesses the situation and weightily states:
- I know that this is our admin. But I do not know. Take away.
It ended, fortunately, well. Admin hissing:
“We wrote to you!”
The main one gets the phone, reads:
- There will be night work ... I and such and such ... Ahhh, there you have it, - turns to the police and continues with obvious pity:
- We'll have to let go.

Did not notice how it became good


Stuck mobile optimizers on banking machines in one branch. I must say that the Internet there was so “good” that we eventually delivered the image of a virtual machine on a disk. So, in this image was a software agent that works with hardware in this case, in the data center. They hooked up, distributed trial licenses - and the test deployment began.

There is no feedback: well, we think, well, it means that they do not need an optimizer there. Maybe the channel is corrected. But! Three months later they call, they want to complete. It turned out that the trial was over, and they suddenly noticed how it became bad. What is interesting is how it became good, they didn’t even utter a word. And how bad it became back - users complained about management.

Telephone bully


They made a test implementation of network diagnostic systems. Deployed complex in the data center, he is slowly learning. And I must say that the local infrastructure grew as a result of the gradual merger of companies, and therefore not very homogeneous. In a sense, many systems are used, and each shows the situation at its own level, and no one has a whole picture of application traffic.

At that moment, they were just launching a new application, which showed itself well in the test segment, but was not yet seen in the industrial network under load. Implemented, but because of the complex structure, not all IT specialists knew about it. 3-4 days after the service was turned on, problems with ip-telephony began. A call center was on it. And calls lost in all directions. Client. Half a day they tried to solve the problem with their own means, but they could not catch the cause at all - a floating bug, and then reproduced, then no.

Admins knocked us to ask if our diagnostic system can be used as a detective. There is a "Cascade", he just knows how to disassemble traffic on the fly. We went into debug mode and began to look for network anomalies.

It turned out that the new service used the same ports as the telephony. New traffic with the same ports began to fall into the priority and crowd out the voice. After that, we changed the settings for the classification of traffic telephony.

A similar situation was in one large call center operator. The subscriber calls, he hears the operator, but the operator does not. For more than a month, local networkers were looking for a solution with their own means and forces - I must say that for so long, because the bug was floating, and detecting it was quite nontrivial. As a result, it came to the Moscow team, more precisely, to their top. He knocked something on the table and set the deadline. Contact us. We brought the same Riverbread diagnostic system. We collected traffic from different segments - we compared it and in real-time we started watching the voice quality on different segments. Within 5-6 hours, we found a failed switch that had queues. There was a strange problem with iron, settings were flying from time to time. The team already dealt with the body of the hooligan - we just changed the switch, and this completely closed the question.

Never update Friday afternoon


With the update, the story was great. Also a bank, but another one. Admin updated balancers firmware around 12 at night. At about two, one of them began silently dropping traffic that he didn’t like. Not all. Random packages. Found it in a few minutes, when the security guard from Khabarovsk suspected something strange with the transaction. After another 6 minutes of classic alarm, "X-team, at the exit . " Moreover, the preliminary diagnosis is a hardware failure in the kernel switch. As usual, the elevator, the storekeepers with a new switchboard are waiting at the bottom, and cars are waiting for them. Fortunately, while driving, my colleague, who picked up remotely, figured out and rolled back the balancers' firmware.

Power of progress


We put the optimizer in the training mode on the object in the bank. The local admin was supposed to throw him into battle mode, as he "mastered" with local flows. Usually from a day to a week for profiling, because there are more features with safes. There was a need for a small reconnection with a night vigil, so, probably, for two months nothing happened in terms of switching. Zhelezka received her copy of the traffic and patiently studied. The bank was in the process of purchasing in several stages, so the main admin used this license for another object. Like, here and so well so far, closer to the peak set.

And then they dropped (and very seriously) the ground channel (and, maybe, it did not fall, and the admin knew about the fact that he would have to change - and therefore did not cut the optimizer in the period when he would have to live without a reserve). They jumped to the satellite. The teleport was right on the roof, it was used several times already, therefore, logically, there should be no problems. It was supposed to wait on it for two weeks until the new land line was let down. But in the time that has passed since the last long sitting on the satellite, the butt has been slightly updated and has become more demanding of the strip.

Simply put, work software has become bad.

A day later, customers came. The service went slowly, the people were confused about who was behind whom, they looped around the queues and were not bad at all being virtualized, occupying several of them at once. Virtual visitors in the bank stood at three hundred, physical - four times less. About once every three hours, this entire system of queues had a kernel panic, and it threatened to get to a fight.

We urgently requested another license to sell. So we learned the whole story. Further, technical progress helped people solve their problems.

Thank you, guys


Base replication between Moscow and a very distant Russian city. The band is a satellite. Traffic chases compressed base, jerks, non-optimal packets, often lost. We have proposed optimizers. With them, replication went faster, the queues of requests did not accumulate, the band (as usual) utilized much more. But there was still room to grow. The main point - we really did not like that the compressed base replicated. Compressed disassemble and accelerate - sense only a few percent. It was much better to drive a raw base so that the glands themselves could take it apart.

The raw base could not be driven for some reason, it did not go beyond the firewall. We dig deeper in the settings, and after a couple of days, the raw combat base began to fly normally. Here, the standard capabilities of their hardware were connected, the processor time (leaving for compression) was released - from 27 hours for replication, they began to hit 23. While we were doing 12-14 with optimizers. And here we are told that everything, thank you, guys, the task was to meet the SLA. Already fit. Your optimizers are cool, but for now it's enough.

So we ourselves during the execution of the order canceled this order.

Server per bottle


We are doing a pilot implementation of the Riverbed’s traffic optimizer at one particularly large industrial facility. The device is large, full rack. The server at the facility on the second floor, and on the first floor, the shop itself, fittings sticking out, are walked by tough grimy men. Under the server some administrative unit, there is accounting and control.

So, we are going through this workshop, and here, from one of the machines, two directly pronounced representatives of the working class are taxiing out. As it should be - on the face there are traces of either paint or fuel oil in the form of glasses, scratched helmets on their heads. They say:
- Listen, can you have a piece of iron while you’re not shooting?
We:
- In terms of? We are just behind her.
One hesitated:
- And how much does it cost?
The question, of course, was very alarming. Trying to gain time:
- We have no right to discuss the terms of the contract with you.
The second worker immediately goes to work:
- Well approximately? We want to buy it ourselves.
Here we are a little horny. The men did not quite correctly interpret our silence, and decided that everything can be started to bargain:
- Well ... for the box of vodka give?

We somehow got rid of them, explained that the optimizer costs several tens of thousands of dollars and quickly reached the server room. There they began to find out what it was.

It turned out that this is not just hard workers, but operators of automated workplaces who are responsible, in particular, for reporting. And they have such vital periods when they come to work in this administrative block. There is their ERP, where you need to drive data. DSL channel, moreover, from those that are not very far from ISDN. Thin, noisy as an empty freight train, gives away the legacy of the 90s, in general - factory classics. A modern application, "talkative", chasing a bunch of encrypted traffic. Therefore, the lag between clicking on the drop-down list and displaying it is 10-20 seconds. Best case scenario. And to drive in there hoo how much everything. The situation is somewhat aggravated by the fact that they knock with one finger and often make mistakes, so you have to go back. That does not add much joy because of all the same lags.

So, they learned that the miracle piece of iron that was dragged with fanfare through the entire plant to the second floor is a traffic optimizer. In general, they would not understand anything if they did not stand behind the backs of those who tested this device in the next room, where the same riverbird was connected to the pilot. At that moment they immediately realized everything and asked for a couple of days there for a detailed test. They were let out.

As a result, they for the last 2 days of April made all the necessary report. This made them very impressed, because a couple of past years they hung out at the factory during the May holidays until at least the 6th. As a result, they decided that it would be necessary to buy such a piece of iron, maybe even for their own, to be thrown off the salary. I did not want to stay overtime even if you burst.

Naturally, they just did not expect such prices.

After a couple of hours we go out to tell the movers that everything is ready, you can take. At the exit, two of our heroes are already waiting:
- Guys, and let's leave it for tests for another month, eh? BOX give.

What ended - I do not know. Management of the holding after the test implementation bought a few dozen iron optimizers. Whether I got a piece of iron to our guys in the end or not, I can’t say, but I really want to believe that I’ve got it. Well, or at least they have installed a software agent that can respond to the “big” server.

References:


Source: https://habr.com/ru/post/274339/


All Articles