In the network of any large content / eyeball provider there is a need for traffic management. And the larger the network, the more acutely this need is felt. In this article I will try to describe the basic principle of traffic management in the company's network, which I have a direct connection with. At once I will make a reservation that in this article many trademarks, terms and “jargon” are mentioned. There will be no examples of configuration of routers, no description of these terms.
We are accustomed to assume that the transport MPLS network is necessary mainly for applications, of which there are many: L3VPN, L2VPN / VPLS, etc. Traffic Enigineer in MPLS networks is remembered either from a “good” life, or rather theoretically.
It is also commonly accepted that backup capacity is a luxury and, as a rule, karyers / transports are billed for the backup port as well as for normal. A reasonable question is brewing: why buy capital, which will simply be idle and, perhaps, several times a month be used for a short time? But, on the other hand, to say that "backup'y for pants" is also impossible, backup'naya capacity should be. How to be? This will be discussed in the article.
')
Own peer-to-peer networks are quite common these days. Obviously, the traffic that goes through the peer-to-peer network is much cheaper than the traffic that goes into the network of karyer. Also, as a rule, cariers sell ports with a commit of 50% of the maximum link utilization (10GE link with a commit of only 5Gbps), thus, we can assume that at any moment in time we have a spare capital for our peer-to-peer network. Often kariery use 95% percentile for billing traffic and invoicing, i.e. in other words, 36 hours per month we can safely use this or that carie for free, while letting traffic into it within the rest of the month within an agreed commit.
But how to make it so as to exclude manual intervention as much as possible, in a hurry not to separate something somewhere, to quickly make some decisions (which are not always correct). After all, losses and overloaded channels are immediately visible, and it takes time to make a decision. How to build a minimum peer-to-peer network in order to maximize protection from congestion for available traffic running through the network at a given (peak, in particular) point in time?
Consider the simplest peering network of the hosting company:

Several Point of Presence (POP) links inside the POP shareware, but for domestic / transatlantic links you need to pay. And the more we utilize these links, the cost of these links becomes lower. In other words, the cost of traffic is inversely proportional to the load on the link.

As you can see, from the POP of Dallas to AMS-IX you can get many different routes. It would seem that it can be easier, specify the appropriate interface metrics and IGP will do everything for you, but not everything is so simple. Everything works fine until an accident happens on the way. There are several ways to build a network. The first, the easiest and most reliable, but far from the cheapest is to keep the transport links loading about 50% (as it is done, for example, by a transit carier). Thus, the failure of one of the two links of the network does not entail the degradation of the quality of service, even at peak times. But at the very beginning we agreed that we are interested in a cheap network, built rather not on books, but satisfying current market demands. Not everyone can afford to throw out half the capacity to the wind. The second method is manual: 24x7 NOC, which will manage traffic manually, depending on monitoring alerts. But which way is cheaper and / or better, first or second, we leave to the reader's reasoning.
What do we offer? We propose to build an MPLS network, whose transport label distribution protocol is RSVP-TE, i.e. from any point in PE-A to any point in PE-B network can be reached using Label Switching Path (LSP), the signal protocol of which is RSVP. Accordingly, no one LDP and even more so IP traffic outside RSVP LSP does not run across the network. Half done. But what is the fundamental difference from hundreds of the same networks? That's right, there is no difference yet. Now we will make our RSVP Tunnels reserve Bandwidth along the entire LSP on each link path.
Here you need to stop and explain in more detail what is meant. The reservation itself, in this case, is conditional. Each LSP has a number of properties, including the Requested Bandwidth. The RSVP Path message transmitted from the router to the router reserves the necessary capacity that is needed by this particular LSP. If the condition is fulfilled and there is enough traffic on the current link, the Path message is passed on to the downstream router until it reaches the egress router PE. Ultimately, if the LSP is still installed, then we are sure that the necessary kapasiti along the entire route section is available at this particular point in time.
At the same time, we will make RSVP tunnels look for the paths themselves, relying on small guides (ERO, setup / hold priority, link colors, etc.). So that our LSPs do not become too “fat”, we will roughly limit the LSP to within 1-2Gbps, i.e. Mapit to this tunnel about ~ 1Gbps traffic. We will provide this lesson to our BGP RRs. Mapping is done in a simple way: bgp next-hop reachability. According to the first paragraph of the BGP best path selection algorithm, the bgp prefix is ​​considered valid if the next-hop is valid for this very route. The RSVP tunnel is up - bgp next-hop is available, the route via AMS-IX, in this case, is also available. It is important to note that these same bgp next-hops that appear in our RIB / FIB when raising MPLS Tunnel are
not transferred by our IGP. This is the key point.
Suppose our network is built. At rush hour we have on every transport link between POP ~ 9Gbps. At the moment we have built a network, in fact, no different from the above IGP network.
But now the most basic begins. One or more of our transport links are falling due to an accident.

In the case of IGP, traffic would be routed through NY RTR1 and at 10GE junction between POPs NY and the next router on the way to Amsterdam RTR1 we would have a terrible congestion (in 10GE, the link would try to get almost 20Gbps at the same time) and a lot of customer complaints. What about our tunnels? Those tunnels that were on this link are beginning to actively look for alternative routes with sufficient capacity all the way. But if there are no such routes according to the Traffic Engineering Database (TED)? Those tunnels that did not find a route remain down until there is enough capital. But where to go for this traffic, which used to go through these tunnels? As we have already agreed, the directions (per AS-PATH basis) are mapped into the tunnel, the bgp next-hop of which rises by this very tunnel. In other words, on our PE Dallas RTR3 bgp best were those routes that came to us from a transit carrier and the extra traffic automatically turned into transit (green LSP).

Our traffic will remain there until either our fallen link rises or enough capital becomes free all the way to the final PE.
In this case, the reader will object, a separate carier (s) should be connected to each PE in the scheme. This is not entirely true. We are interested in backup, and if so, then vendors have long taken care of and invented
BGP best external for us (Cisco and Juniper support this). But a lot of manual work with RSVP! Again, this is not the case;
RSVP automatic mesh comes to the rescue of PEs in which the amount of traffic is insignificant. In fact, from the routine work in our network, there remains only the monitoring of the amount of traffic that is mapped to one or another LSP and equalized within the specified rate. In order for our network to reflect the real picture (we agreed, no LDP and / or IP traffic), we use Autobandwidth on all RSVP tunnels. Both
Cisco and
Juniper have this feature.
At the exit, we got a network that is both quality and not expensive. Thus, we save not only our money, but also money for customers who do not need to pay double the price for having idle reserve capacity.