Protection from DDoS attacks from the point of view of a telecom operator. Part 1

We are actively following all the articles on DDoS that are published on Habrahabr, and despite the fact that the search on all streams at the time of writing showed 820 publications, we decided that it would be nice to share a vision of the problem of identifying and controlling DDoS attacks.

In the first article we will try to acquaint readers with basic concepts. The article is intended mostly for newbies who understand network technologies in the database, but have never come across industrial solutions for protecting against DDoS, and if this material causes interest, then in the next cycle of articles we will begin to disclose technical details in detail.

What is typical for a solution to protect against DDoS-attacks for telecom operators?

The peculiarity of building traffic analysis solutions and detecting DDoS attacks for a telecom operator is inextricably linked with the architecture of building its networks, as well as with the capabilities of network equipment. Let's look at this with an example: the simplified architecture of the backbone IP / MPLS network of Rostelecom (AS12389) is as follows.
')

Here, upstream is the upstream carrier, peer is a peer carrier or also a large content generator, and customer is AS12389
And now let's mentally translate the network design into geography:

And finally, in numbers, let us represent the number of interconnections with upstream / peer / customer ( https://radar.qrator.net )

Thus, even having never dealt with the design or operation of the operator's network is easy to understand: the network has many joints and connections, and the nature of traffic routing is asymmetric, that is, the traffic in the direction of an IP prefix is different from it routes. Unlike data centers or corporate networks, telecom operators have no border in the classical sense, and it is not possible to deliver analysis tools at one or several points on the border. Therefore, it is architecturally effective to build an AntiDDoS system consisting of two subsystems:

Anomaly detection subsystem: collects and analyzes traffic data.
Filtering subsystem: blocking parasitic traffic.

How is the detection of DDoS attacks?

To be able to analyze traffic and detect anomalies with respect to any IP address - belonging to, directly connected or in transit passing through AS12389 - it is necessary to analyze all traffic (of each router, each IP interface). To solve this problem effectively (from an economic point of view), we collect traffic information using network telemetry protocols ( J-Flow v5 / 9 , Netstream , IPFIX ). Further, for simplicity, the whole family of these protocols will be called NetFlow. These protocols do not allow to analyze application-level information and transmit information up to the 4th level of the OSI model, for example, J-Flow v5 has the following header structure:

Where:

Source IP address - source IP address
Destination IP address - destination IP address
Next-Hop IP address - the IP address of the next router to which the network stream will be transmitted.
Input ifIndex - SNMP index of the interface through which the router receives flow
Output ifIndex - SNMP index of the interface through which the router sends flow
Packets - the total number of packets received within the stream
Bytes - the total number of bytes received within the stream.
Start time of flow - start time of flow
End time of flow - end time of flow
Source port - source port
Destination port - destination port
TCP Flags - TCP Flags
IP protocol - IP protocol number
ToS - type of service
Source AS - Autonomous IP Source Number
Destination AS - Destination IP Autonomous System Number
Source Mask - IP source network mask
Destination Mask - Destination IP Network Mask
Padding - indents for effective use of the entire length of the header

J-Flow v9 and IPFIX additionally add information about:

ICMP type / code
IPv6
MPLS
BGP Peer AS

But the key difference between v9 and IPFIX from v5 is that the user can determine which fields he wants to analyze by creating a template. We are not using NetStream in a productive system yet, but we plan to add it in the near future.

At the moment, AS12389 is more than 300 routers, therefore, to collect NetFlow, the infrastructure of collectors is deployed, which allow receiving, processing and writing to the database at high speed. Taking into account the fact that terabits per second are transmitted over the network, even when using a high-coefficient sampling mechanism (> 4k), routers generate more than 300 thousand NetFlow records per second. Sampling allows you to analyze not every packet that passed through the router, but selectively in accordance with the proprietary algorithm that vendors implement in their equipment, which reduces the load on the Control Plane or on the router's service card.

The so-called Binning Table is created on the collectors, into which NetFlow is mapped and statistics on the objects of protection is collected. By object of protection, we mean an entity in a system that is described by any of the following features:

IP prefix list (CIDR blocks and groups)
ASN, including the ability to set the attributes AS-Path and community
Network interfaces
Flow Filter is a logical expression describing various parameters and combinations of IP fields and a transport header. For example, " dst host 1.1.1.1 and proto tcp and dst port 80 ".

List of available fields:
- Average packet lengths
- Destination addresses
- Destination ports
- ICMP codes
- ICMP types
- Protocols
- Source addresses
- Source ports
- TCP flags
- TOS bits

Based on the statistics obtained, the system generates a dynamic profile of normal traffic behavior for the protected object. You can also manually set a static profile in the form of threshold values for the most popular attack signatures. For example, most DDoS attacks like Amplification (NTP, DNS, Chargen, SSDP, etc.) are well detected by this method. In case of traffic deviation from threshold values, the system generates an anomaly report.

Depending on the percentage of exceeding the threshold, anomalies are divided into three types by severity level: low, medium and high. Most often, low-anomalies are characterized by a surge in legitimate traffic, for example, a running marketing company, during which more users came to the protected website than usual. Therefore, specialists on duty shift more closely monitor the medium and high anomalies.

How is the filtering of DDoS attacks?

After the system has detected an anomaly in relation to the protected resource, its traffic can be redirected to filtering in manual or automatic mode.

There are several filtering methods:

Flow Specification Filters;
Black-Hole Routing - when rendering AntiDDoS service we do not use it, therefore we will pay very little space in this article;
Intelligent filtering in Traffic Clearing Centers (DOT).

In total, the network has two geo-redundant DSPs deployed in a fault-tolerant version at each site (GR + HA).

Redirection is carried out by announcing within AS12389 a more-specific route to the protected object through the COT. Thus, all traffic, including parasitic, is squeezed into the COT, where it is filtered, and then the “pure” traffic is delivered to the client’s network. In order to avoid routing loops, we use the mechanism for delivering traffic through MPLS, passing route labels through BGP Labeled-Unicast (a separate article will be devoted to the mechanisms for delivering purified traffic). Selecting this method, as well as once setting up your equipment, we eliminate the need for additional settings on the client side. Thus, anyone who has a connection to AS12389 can be protected. The response traffic from the client is routed through the best-path, i.e. no change in routing, and thus misses the DOT. Therefore, unconditional asymmetry is formed, which has both its drawbacks (the ability to apply certain countermeasures and analyze application responses) and advantages (does not increase the delay for return traffic).

The asymmetry in the traffic delivery method affects the set of possible countermeasures (filtering rules), which forces system developers to look for such options for determining parasitic traffic and bots, which would be based only on incoming traffic.

Despite the fact that attack detection does not include the application layer, traffic filtering occurs up to the L7 level of the OSI model, using both signature and behavioral methods.

CTS is built on specialized equipment based on ATCA platforms, which allows to obtain high filtration performance (including the application layer) on a single chassis. In recent years, with the advent of technologies such as Intel DPDK , HyperScan , 10G and 40G network cards, as well as an increase in the number of CPU cores, it has become possible to efficiently parallelize the processing of network streams, so in the near future we plan to leave ATCA on x86 architecture .

Why then is the Flow Specification needed and how to use it?

All modern carrier-class routers have built-in filtering mechanisms up to L4 OSI, which can be called from different manufacturers in their own way, but in general they are called Access Control List (ACL). The ACL is implemented hardware-wise in the line cards and is able to filter both transit packets and those that are intended for the router itself at the channel speed or close to it (line-rate), which makes this technology quite useful in case we need to cut spurious traffic as close as possible to the source of the attack, i.e. on the border of our network. But since The ACL is configured locally on each router, and as we said, we have more than 300 of them, then in the event of an attack, the operational application of filters becomes impossible. In order to centrally control (create, delete) the ACL, the BGP Flow Specification (RFC 5575) protocol was developed.

Some operators provide FlowSpec as a service to their customers, while Rostelecom doesn’t yet, because actively uses it for its own purposes, and the number of rules supported by routers is not yet large enough. We recommend that you contact your operator and find out about the availability of such a service, since FlowSpec is implemented in projects such as ExaBGP , which allows you to get an affordable tool to install filters on the operator’s network and defend against attacks directed to the channel without buying an expensive service. This option of protection does not suit everyone, but it may be a sufficient and inexpensive alternative to a full-fledged AntiDDoS service.

The system that we use allows you to distribute these filters directly from the web interface. Thus, we can configure triggers and create filtering jobs from the anomalies detected by the system automatically.

Different manufacturers of network equipment, as well as different versions of the operating systems of these manufacturers, can apply these filters either to all interfaces or to selective ones, thus reducing the load on equipment without running each packet from each interface through a chain of rules.

Basically, we use FlowSpec as the first echelon of filtering for those attacks that lend themselves well to standardization up to L4: almost all UDP-based Amplification attacks fit perfectly here. This allows us not to drive parasitic traffic to the COT, but to cut it off as early as possible, and to perform a “fine” cleaning for the remaining traffic.

Is there a place for blackhole?

In the most basic case, including when parasitic traffic is directed towards a resource on which nothing is published (and this also happens), the operator has the opportunity to send all traffic to this resource to Blackhole. To do this, each router is assigned a route, the next-hop of which looks in the discard, i.e. traffic is simply dropped. If necessary, the central distribution of Blackhole use the route-reflectors system, the traffic to the prefix is prescribed on one of them, and as a result all routers receive this route.

What about the Blackhole community?

A good trick for the operator is to use different BGP CommunitiesAttribute to enable the client to manage its traffic. One such community is the Blackhole Community. Usually, this information is published by operators in remarks of the database of route information to their autonomous system, for example, RIPE . For Rostelecom, the data of the community is 12389: 55555 . Prefixes with this community are accepted up to / 32, while others are not more specific than / 24.

Do the operators interact with each other in terms of protection against DDoS attacks?

In some matters, yes, this mainly concerns the inclusion of BGP FlowSpec at its joints, but they do it rather cautiously, since periodically bugs are detected in the implementation of the protocol on the equipment of one or another vendor. In other cases, since Because the DDoS protection service is still a commercial service, then due to the competition, there are no technical and organizational methods for interacting with the exchange of information about attacks (such as IoC ).

On the basis of what solutions are the operator systems for detecting and protecting against DDoS attacks built?

In Russia, the following decisions earned the greatest popularity:

Arbor Networks " SP " and " TMS "
Radware " DefensePro "
MFI Soft " Perimeter "
Inventions technology " InvGuard "
NSFocus " ADS " and "NTA"
Huawei " AntiDDoS8000 / 10000 "

However, not all of the above manufacturers have an end-to-end solution (NetFlow traffic analysis and filtering devices) and are often bundled with another vendor, such as Genie Networks " GenieATM ", for example. And some support the work with different solutions to collect NetFlow. A comparison of the presented solutions deserves a separate article, so we will not dwell on each of them in detail.

What is the difference between operators and cloud services?

A telecom operator provides a service only to those clients that are physically connected to its network, because, as you already understood, the operator can collect traffic statistics and redirect them to the CTC for filtering only within their network. Connecting to the service for protection against DDoS-attacks does not require any action from the client (in our case). The operator also protects the entire channel, rather than individual applications and services, which allows you to get full protection for the entire IT infrastructure.

At the initial stage of its development, cloud services took only web sites under protection. Traffic was redirected by changing the A-record in DNS to an IP address from the cloud's IP pool. The cleared traffic to clients was delivered by a reverse-proxy method. This method of redirection and delivery is still relevant and is the most popular. But if the customer had other critical resources besides the website (DNS, mail servers, etc.) that had to be protected, this method did not allow all traffic to be redirected to the cloud. Then, cloud services began to connect customers' networks via VPN, which essentially made them overlayed by Internet service providers, which began to filter not the separately taken application, but the entire channel.

Recently, operators are also beginning to deploy clusters with reverse-proxy and WAF on their networks, which allows them to protect clients located outside their network. Thus, we see that the conditional border between operators and cloud services begins to blur.

Perhaps it does not make much sense to compare a spherical operator with a spherical cloud, since even the latter can differ significantly among themselves. For example, some develop the system on their own, some build on the basis of ready-made industrial solutions from various vendors, others have a distributed network of DSP connected to different upstream operators, and some have one or more DOT in one country connected to one of the local telecom operators. the fifth require mandatory installation of sensors at the customer site, the sixth specialize only in web traffic. We will try to reveal this topic in our future articles.

Summing up, as we saw above, the following features are characteristic of a telecom operator:

Due to its size, the telecom operator has the ability to “receive” a large amount of traffic for filtering;
large telecom operator its architecture allows you to filter at the entrance to the network;
after the initial filtering, the flow of “dirty” traffic becomes much less, and it can already be analyzed in the COT;
the quality of filtering depends on a number of parameters, among them the main ones are the capabilities of the filtration system and the experience of NOC / SOC;
the telecom operator, if the client already uses its services, is often easier and faster to connect it to protection and start filtering traffic.

In conclusion, I would like to say that since 2008 our company has been developing the infrastructure for analyzing traffic and protecting against DDoS attacks. During this time, we have several times modernized approaches in terms of collecting analytics, filtering traffic, delivering purified traffic, and have implemented additional options such as CloudSignaling. In the following articles, telling about the technologies we use, we will definitely show a retrospective and reveal the reasons that guided us in choosing the path of development.

Source: https://habr.com/ru/post/325138/

All Articles