How traffic optimization is done

"Efficiency" standard WAN - only about 10%

If you look into almost any communication channel between a branch of a company and a data center, you can see a rather suboptimal picture:

First of all, a lot of (up to 60–70% of the channel) redundant information is being transmitted , which one way or another was already requested.
Secondly, the channel is loaded with “talkative” applications designed for operation in a local network - they exchange short messages, which negatively affects their performance in the communication channel.
Third, the TCP protocol was originally designed for local area networks and is excellent for small RTT delays and in the absence of packet loss on the network. In real channels with packet loss, the speed greatly degrades and slowly recovers due to large RTT.

I work as the head of the engineering team of the CROC telecommunications department and regularly optimize the communication channels of data centers of both our and energy companies, banks and other organizations. Below I will tell the basics and give the most interesting, in my opinion, solution.

Compression and Deduplication

The first problem has already been described: a lot of redundant duplicate data is transmitted in the channel. The most striking example is the Citrix farm, in which branches of a bank operate: in a single office, the same data can request 20-30 different machines. Accordingly, the channel could be easily unloaded by 60–70% due to deduplication.
')
At Citrix itself, of course, you can enable data compression, but efficiency (compression) is several times lower than with specialized traffic optimizers. Mainly due to the fact that optimizers not only compress data, but also deduplicate. Through the optimizer passes the traffic of the entire branch. And the more users in the branch, the more repeated user requests and the greater the effect of deduplication. For one user, standard compression, for example, Limpel-Ziv, may be even higher than deduplication, but if there are more devices, deduplication will come out on top.

As a rule, optimizers are a PACK, but they can also be implemented in the form of virtual machines. To optimize traffic on the communication channel, optimizers should be installed at both sites. Optimizers are placed before VPN gateways, since deduplicating encrypted traffic is useless.

The deduplication algorithm works as follows:

The branch makes a request to the data center;
The server sends data to the office;
Before getting into the communication channel, the data passes through the optimizer on the data center side;
The optimizer segments data and deduplicates it. The data is divided into blocks, each of which receives a short name - a link to the block;
Links and data blocks are stored in the local repository - the so-called dictionary;
Links and data blocks are passed to the optimizer in the branch. But before sending, the data center optimizer compresses the data according to an algorithm. As a result, the data does not become more even with the first (cold) data transfer;
The branch optimizer recovers algorithm-compressed data obtained from the data center, builds its local symmetric dictionary of correspondences (data block - reference), removes the reference data from the data and sends the initial data to the client;
Now, any data passing through the branch office or data center optimizers will be checked for duplicated data. If a match is found with a block located in the dictionary, then this raw data block will be replaced by a short link. A known (already transmitted) block is not transmitted.

It remains to add that the dictionary is constantly updated and, thanks to a special algorithm, the most popular data blocks remain in the dictionary.

We see a fundamental difference from traditional caching devices. Caching devices work at the file level. If the file has undergone any changes, even minor, then it should be transferred again. Optimizers work at the level of data blocks, and when a previously transferred file is changed, only changes will be sent to the communication channel, and the rest will be replaced by links.

Another problem is that TCP speed is limited by the window size (TCP Windows Size). Window size - the amount of data transmitted by the sender before receiving confirmation from the recipient. At the same time, for transmitting compressed traffic, it is required to transmit TCP Windows Size less times, which leads to an increase in transmission speed.

So again, it works like this:

Device A deduplicates traffic.
Device B collects the “big picture” from its local storage.
Both of these devices work symmetrically.
Both of these devices do not affect the infrastructure and the configuration of everything that lies behind them, that is, they are simply included in the break of the channel, for example, at the exit from the data center and the entrance to the regional office of the company.
Devices do not limit communication with nodes where there are no such devices.

Deduplication for encrypted channels

The encrypted channel is obviously worse suited for compression and deduplication, that is, there is almost no practical benefit from working with already encrypted traffic. Therefore, optimizers are included in the gap before the encryption device: the data center sends data to the optimizer, the optimizer sends it to encryption (for example, to a secure VPN channel), on that side the traffic is decrypted and sent to the optimizer in place, and the latter sends it to the network. This is the regular function of “optimizer boxes”, and all this happens without reducing the risk of traffic compromise.

Mobile deduplication

In recent years, people with laptops and tablets, who also need a lot of data (the same images of virtual machines or a sample from the database), often work directly with data centers. For them, not “optimizer boxes” are used, but a special software that simply consumes part of the processor resources and part of the hard disk for the same purposes. In fact, we are changing a certain decrease in notebook performance and hard disk space in the cache to a faster channel. Users usually do not notice anything other than speeding up network services.

Who makes these optimizers?

We use Riverbed solutions. This company was founded in 2002, and in 2004 introduced its first model of optimizers for communication channels. Riverbed products and solutions, including WAN network optimization, performance management, application delivery and data warehousing acceleration, enable IT professionals to increase and manage performance. Optimizers are very easy to integrate into the network. The easiest way is to install a "break" from the LAN to the router or VPN gateway.

Competitive decisions. In 2013, Riverbed occupied 50% of the market for the WAN optimization segment.

From the point of view of the commercial director of the customer, these are several boxes, which, after a simple connection to the network, speed up slow channels by 2-3 times and reduce the load on channels by 2 times. For this, almost everyone loves them!

Optimizer connection

The easiest and most reliable way is to “break” between the border router and the LAN switch. If the optimizer fails, it closes the contacts of the LAN and WAN interfaces - and the traffic simply passes through it, like a normal cross-over cable. Accordingly, seeing non-optimized traffic, the optimizer on that side also simply passes it through itself without processing.

Respectively:

Communication branch with the optimizer and the data center with the optimizer - traffic is optimized.
Connection of a branch without an optimizer and a data center with an optimizer - the data center optimizer just transparently passes traffic without changes.
The affiliate's connection with the optimizer and the data center with the optimizer when any of the optimizers fails - the traffic simply does not compress and goes “as is”.

Naturally, in data centers, optimizers are clustered for fault tolerance or increasing power plus are supplied with Interceptor balancers. But about it a little lower, when we get to the specific equipment.

TCP acceleration

TCP speed is limited by the size of the window. The window is the amount of information that the server can send to the client before receiving confirmation of receipt.

TCP's standard behavior is:

slow connection acceleration; TCP window size increases;
in case of packet loss - a sharp drop in speed (window decrease by 2 times);
and again its slow increase (increase in the window);
again packet loss and sagging bands and so on.

Orange "saw" on the chart - the standard behavior of TCP

On communication channels with high bandwidth, but with the presence of any level of losses and large RTT delays, the available bandwidth is used inefficiently, that is, the channel is never fully loaded.

At Riverbed, they were thinking about the same direction. And since we already have optimizer boxes in the input and output, it is foolish not to use them to modify the TCP protocol to avoid standard problems. Therefore, optimizers are able not only to optimize traffic at the data level (deduplication / compression), but also to speed up the transport layer.

Here are a number of modes available for TCP acceleration:

HighSpeed TCP mode - here the speed reaches its maximum much faster than during normal operation with TCP. With losses, it is not so low and does not sink so much as standard TCP;
MaxTCP mode - uses 100% bandwidth without slowing down. The packet is lost - the slowdown does not occur. However, this mode requires the setting of QoS QoS rules to determine the limitations of the available bandwidth that MX-TCP traffic can occupy;
SCPS mode - designed specifically for satellite communication channels. Here the bands are not limited, as in MaxTCP. SCPS perfectly adapts to the floating characteristics of satellite channels.

Application optimization

Many applications are "talkative", that is, they can send up to 50 packets when one is enough. As I have already said, this is a consequence of designing for local networks, and not for work through the channels of “distant” communication. Using optimizers, the number of round-trip passes is reduced by more than 50 times.

Here's what it looks like:

Optimizers act as transparent proxies at the seventh level for a number of the most common application protocols.

The data center optimizer acts as a client in relation to the server. The branch optimizer acts as a server in relation to clients. Thus, inefficient, “chatty” communication of applications remains in the local network. Between optimizers, application messaging occurs in a more suitable form for communication channels - the number of messages decreases.

Riverbed Optimization Devices can speed up the following application protocols at the seventh level:

Interestingly, there are also encrypted applications, including encrypted Citrix and MAPI. When optimizing encrypted traffic does not decrease the level of security.

Examples of application acceleration. In a real network, the acceleration will depend on the communication channel. The worse the communication channel, the greater the rate of acceleration can be achieved.

Typical connection scheme

Steelhead optimizers are placed in front of the data link, but before encryption devices. For data centers with special requirements, clustering is also used to improve reliability, plus Interceptor load balancers.

Result (example)

Green - WAN traffic. Blue - LAN traffic. Without Riverbed, they would be the same.

The highlighted column shows the percentage of compression on TCP ports.

Iron rulers

Capacity can be extended by license. To improve performance, in some cases a hardware upgrade is required. Upgrade capabilities within the platform are indicated by green arrows.

The younger model is suitable even for a small online store: it is from 1 megabit per second and 20 channels. A flagship supports up to 150,000 simultaneous open connections on channels of 1.5 gigabits per second. If this is not enough, the Inteceptor balancer is used. Clusters of balancers and optimizers allow you to work with a channel up to 40 gigabits per second with 1 million connections open simultaneously.

How much is the price list?

The younger model is about 100 thousand rubles, a device for medium data centers - 1.1 million rubles, for large data centers - from 5.5 million rubles. At the same time, the price changes quite strongly depending on the specific usage patterns, plus there may be discounts, therefore the mentioned numbers are purely approximate, better check by mail (it is at the end of the topic). The payback of such solutions for medium and large businesses is easy enough to calculate, simply figuring that you will get from 30 to 60% of the channel (again, I can call a specific indicator with an accuracy of 10% depending on the type of utilization of the channel), and Users will not complain about application brakes.

More Riverbed items:

After the channel is optimized in the manner described, we often monitor and resolve problems with specific services and equipment. In practice, they are whole detectives. I will tell about them a bit later. If interested - subscribe to the CROC corporate blog on Habré.

For whom I specifically implemented:

I do not have the right to call all customers, but I can say that Riverbed’s traffic optimization solution was used to:

the five largest representatives of the banking sector;
a large gold company;
a large logistics company;
a number of smaller companies.

Questions

If you are interested in something specific, ask in the comments or by mail AVrublevsky@croc.ru . By the same mail, I can send pricing, implementation schemes and evaluation of channel optimization after discussing your specific situation. It is clear that an accurate assessment is possible only after the test, but on average, the error after discussion is about 10%.

Source: https://habr.com/ru/post/214693/

All Articles