Tale of the present Internet

Abstract: The story about the device of the Internet, as a "network of networks" in the form of text for reading, without binary number system and the nuances of BGP. Most of the story will not be about the process of communicating a laptop with an access point, but about what happens after the data passes through the “default gateway”. I warn you, a lot of letters.

Introduction

Little provocative stuffing: none of the readers of this article is connected to the Internet. All are connected to the network of their provider, and no more. Internet connection is expensive, it is difficult to do, you need very cool equipment, several contracts with several telecom operators and qualified staff. For a simple home user, it never does. Not to mention the fact that the Internet can be no more than 4 billion people who have connected (and until recently there was even “no more than 65536”) [1]. Even if the entire Internet goes to ipv6, this number will not change.

Here is the number of people connecting to the Internet [2]:

Y-axis is the number in pieces. Pieces, pieces. And you are not counted in this number.
')
Why?

The fact is that the Internet is, if translated literally, “inter-network.” Network of Networks. And participants of the Internet are not users (their computers, tablets, microwaves with wifi, etc.), but networks. Networks and only networks participate in the work of the Internet. The Internet is what connects different networks between each other.

But the individual nodes of these networks - they can, through their network connected to the Internet, communicate with other nodes of other networks.

However, first things first.

What is a network?

I'll skip the entire dramatic and dusty story of the first decades of computers. At some point, there was a desire to transfer information from computer to computer in a different way than the gaping thousands of punched cards. After long suffering and billions of dollars of investment in dead protocols, which did not become standards (and some became, but still died), the concept of a “local network” (or “local network”) emerged. A local network allows computers that are located next to each other to communicate with each other at an address on that network. The concept of "near" is very extensible, and can stretch into several buildings, and if you strain hard, then a couple of cities.

Why the "network"?

We are all accustomed to the fact that this is the most computer, called the "network". But we still remember that the network is called what fish and other cellular structures are caught.

So if the computer, but the network, then it must also be from the cells. At the same time, our household experience suggests that this is not a network, but a real computer tree. Leaves (computers, smartphones, tablets, etc.) connect to branches (routers, switches, access points) that connect again to routers / switches, and so on until the Main Router is formed, from which the link usually leaves to the Internet provider. Or, in the case of a completely local network, it does not go anywhere, because the Router is the main one.

Where is the network?

Answer: there is no network in this configuration. This is not a network. That is, it is still a computer network, but a very private, “torn” version of it. A true computer network means that we have more than one router on the network, and they are connected to each other by several links.

Below is a diagram of the average size of the local network. Round objects are routers, square switches are switches, minus the source point and target point, all switches are removed for simplicity. The preferred path is shown in green, the expensive segment in red.

Already more like a network?

It is in redundant connections that the main idea of the Internet consists. It was created by the American military (ARPANET) with a simple purpose - if any of the intermediate nodes on this scheme causes damage (war happens, you know, dug a trench, broke cables), then the connection should be maintained.

In fact, I am a little crafty - many local networks (it was) built not on the IP protocol, but on other protocols (ATM, IPX / SPX).

But we are talking about the winner - about the IP protocol (which is also decoded - Internet Protocol). Networks built on the basis of the IP protocol, and the Internet in particular, work on the principle of hop-by-hop.

Hop by hop

In order to exclude the existence of a “central router of all Internet”, each router, deciding where to send the received packet further, makes this decision independently. And only within their neighbors (directly connected). This principle is called “step by step” (hop by hop). An alternative to this approach could be either the central coordinating node, telling how to send packets, or a route indication in the packet itself.

The idea of the central coordinating node stumbles upon one simple problem - how to convey information about the new route to the router if the route used to communicate with the router is damaged? Oops ...

The idea of a “pre-laid route” was used in UUCP (the predecessor of regular e-mail), but in the conditions of war (at the same time: an earthquake, tsunami, and an accident at a nuclear power plant) to hope that the sender is aware of which nodes are working, which are not; naive.

Thus, the principle of hop-by-hop shifts all responsibility for the route in this area to the router responsible for this area (in this formulation sounds like banality).

A router can usually pretty well say which of its neighbors is alive and who is not. Plus, he can communicate with neighbors of neighbors and find out information about which links are live and which are not.

The second (communication with neighbors) is called the “routing protocol”. It describes how the router should communicate with its neighbors and how exactly this communication should influence the routing table. The protocols themselves are of two types - for work "within the network", and for work between networks.

The routing table is the holy of holies of any router. Its structure is simple: all network traffic is sent to and from such a network through a certain kind of network interface, plus the preference of each route. The more accurate the route, the more preferable it is, and other things being equal, the priority of this route is used. The final (worst) route is called “to the village of the grandfather,” that is, “all traffic.” This is the so-called “default gateway”. It is used only if there are no more accurate routes, and, what is most interesting, ordinary computers (phones, tablets, vacuum cleaners, video cameras, wifi toothpicks, etc.) very often have it — the default route, ~~that is, nothing good. in their lives there~~ .

But it was a saying. The tale will come.

And what is there for aplinkom?

Uplink (uplink) call the one from whom access to the Internet.

As we have already discussed, the real Internet unites the networks. Such networks are called “autonomous systems”, and they are called so because they do not depend on anyone - they are in themselves. Autonomous systems connect with each other (and now we will discuss how), transfer their traffic to neighbors, and even transfer traffic from one neighbor to another in transit.

It is important to understand that this is an autonomous system's personal right to receive traffic from a neighbor and send it to a neighbor. They want to - send. They want to - do not send, or send not to the nearest neighbor, but quite differently, which allows traffic to the third, third, fifth, fifth to Australia, and then back. Who transfers traffic to someone is determined by inter-operator agreements (or simpler contracts, if you have a small but proud autonomous system for two uplinks).

So, the real Internet consists of autonomous systems and connections between them.

Someone imagined that the connection between the autonomous system in China and, for example, in Moscow is thousands of kilometers. No no. The size (physical) of the link between autonomous systems is usually very small - sometimes it is tens of centimeters, sometimes meters, at least tens of meters.

Why? Because if the link between them was 10,000 kilometers, and it would also hang on poles, who would care for these poles, water them, support them and tie them to the wires? So most often all these thousands and thousands of kilometers of optics (copper died at such distances), which is an autonomous system. Note, this is a whole separate world, called “trunk operators”. Their business is precisely that they take traffic from one point and convey to the other across thousands of kilometers through the cold, tractors and bears.

But the connections between autonomous systems (they are called "joints") are usually located in cozy cold, dry and carefully guarded premises. It can be server-based (for example, a selector has a certain number of so-called “operator posts” in the server room - just so that the operators who have settled there can dock with each other in comfortable conditions), or, if we talk about really large specialized nodes, then separate rooms are used (most often formed spontaneously due to the high concentration of finished tracks) - Internet Exchange (IX). So MSK-IX is not “Moscow-9”, it is “Moscow Internet Exchange”). Operators go there (with their own wires or rented) and switches (whole, or a small piece through a VLAN / port lease). And then the hardworking ~~spiders begin to knit the world wide web,~~ engineers begin to engage in thousands of cross-connects (connecting one switch to another with a wire). On these cross-connects, the whole Internet is kept.

How do all these people manage to negotiate with each other? And most importantly, how do these agreements preserve the main feature - to experience death (including the death of a link with neighbors)?

Bgp

The main Internet protocol (not by traffic, but by importance) is the BGP (border gateway protocol). This protocol is used for communication between providers / operators routers at the interfaces of autonomous systems, that is, outside their networks.

Each autonomous system involved in the work of the Internet announces which routes it takes and through which uplink. But autonomous systems are many. Thousands of them! The full list of all announcements is called Full View, and it describes the existence of the entire Internet on planet Earth (as far as I know, there are no autonomous systems outside the planet, there are only individual nodes that route traffic through ground routers). Full View is quite large (under 400,000 entries for ipv4, from 200MB to 2GB in size, depending on hardware and software).

Note that a router with Full View does not need to have a default gateway - it has a map of the entire Internet in front of it.

Since the operator himself decides which prefixes (network fragments of a given size) to announce and through whom, he can indicate through whom to receive traffic. For example, choosing between “good and expensive” and “cheap”, the operator may prefer cheap. A "expensive" to leave as a reserve.

At the same time it is very important that “from where the operator receives traffic” is not equal to “where he sends it”. These are the so-called asymmetric routes. Their appearance is the result of economic policy and greed.

Here is an example of a modest asymmetric route (a fragment of the map was taken from the site [3], the route of its own invention). Suppose we, sitting in Kiev, decided to ask for a photo of a cat from a server in Vilnius. Our provider’s router knows that the closest link to Vilnius is via Warsaw (the green arrow). The server in Vilnius rustled, found the cat and sends it to us. But the network operator in Vilnius knows that for the traffic in the cable to Warsaw, it will cut down a lot of money. And he does not send traffic to Moscow for political reasons. And so, he sends it through another operator. In Riga. Which is again sent to Stockholm, he sends further, the traffic is sent again ... And so while the picture does not crawl to the bored kotofil in Kiev.

Note, announcing its network, the operator can work wonders (or horrors). The operator can announce its network through several uplinks - and in this case traffic will go to him through everyone, and the choice of uplink in one way or another will go through the most convenient way (which is either closer, or cheaper, then how to set it up). This, by the way, is the basis of most CDN (content distribution network) - the operator stores a copy of the distributed content on a pile of servers around the world, has a bunch of connections with local operators and everywhere announces its (same) addresses. It turns out that in each region the user receives requests on the server closest to him (along the route), and from there he is answered that it turns out much faster than through the entire planet to ask again.

The same operator, for example, may not announce some of the addresses. In this case, the traffic dies on the first router, which realized that there is no further way.

Here is an example of the output that I managed to get during recent short-term work on network equipment. As information about the end of BGP sessions between the router and its uplinks diverged over the Internet, the traffic was sent farther and farther to routers, which for now believed they knew where to send traffic. As a result, after 255 hops (i.e. transfers between 255 routers), the packet died from old age, never reaching the destination.

 PING selectel.ru (188.93.16.26) 56 (84) bytes of data.
 From ae0-0.par-gar-score-2-re1.interoute.net (212.23.42.26) icmp_seq = 51 Time to live exceeded
 From ae-3-80.edge5.Frankfurt1.Level3.net (4.69.154.137) icmp_seq = 54 Time to live exceeded
 From xe-0-2-1.par72.ip4.tinet.net (89.149.181.138) icmp_seq = 68 Time to live exceeded
 From 94.79.28.33 icmp_seq = 72 Time to live exceeded
 From so-0-0-0.IL2.NYC12.ALTER.NET (146.188.15.254) icmp_seq = 92 Time to live exceeded
 64 bytes from 188.93.16.26: icmp_seq = 326 ttl = 58 time = 0.732 ms

Uplink uplinks: Tier 1

Simple logical reasoning makes it easy to understand that if an uplink has an uplink, then either uplinks are an infinite number, or they are closed in a ring, or there are uplinks that do not have uplinks.

And there are such. They are called Tier 1 . Their difference from all the others is not that they do not have uplinks (after all, we have a network, there is no top / bottom in the formal sense), but that they do not pay anyone for the Internet. Imagine a company that receives hundreds of gigabit / s (terabits?) Of traffic, sends as much - and all this for free. To get an Internet freebie, you need to get closer to the nearest McDonald's / Starbucks, find their wifi ... Unfortunately, Tier 1 won't do it. To be Tier 1 you need one more condition - to get paid for the Internet. Thus, they do not pay anyone, and they are paid for the coherence.

This is due to the very good connectivity (number of joints) of these operators. Obviously, this place is very cozy and seductive, so that many there mark. More details about how “friends” between each other Tier 1 is well written on nag.ru [4].

Peers, Peering and Pyrrhic Victory

As we found out, Tier 1 from all receive money and do not give it to anyone. If there are two operators, between which there is a lot of traffic (say, this is another YouTube killer with millions of kittens videos and a new Megatelecom with millions eager to look at kittens), then the ideal (from the operator’s Tier 1 point of view) look like this: both operators connect to Tier 1 and pay for traffic. Killer YouTube for outgoing, the recipient of kittens - for incoming. Tier 1 is satisfied, the YouTube killer cannot find an adequate model of monetization of kittens, and mega-telecom asks for a grant from the budget.

Decision? Drag / rent a cable to a cozy switching and set up a local exchange. From youtube killers to megatelecom. Bottom line: gigabytes of kittens go directly, costs are reduced. Tier 1 is not very pleased, but his business does not drive kittens at all, but to do the “coolest connectivity”, so he will not remain without his piece of bread.

Such a compound is called peering. Its main condition is that the peering participants do not pay each other, or they pay, but a ridiculous amount for renting a port / piece of the physical link.

At one time, the peer-to-peer direction was “content generators” and “Internet providers”. But then it began ... Napster, sharez, edonkey, DC, and, under the trumpet voice of copyright fanfare ... torrents. Suddenly, the volume of traffic "between users" has become many times greater than between content providers and consumers. And if YouTube and its clones may well be offensive, then any site with “many letters, few pictures” (for example, Habrahabr) obviously cannot keep up with the users who decided to download the whole Futurama and the Simpsons with one pack, and even distribute it back with rate 2.

So especially popular are the joints between providers. Due to the fact that many providers are doing nat and gray addresses within the network, it reached peering with gray addresses, moreover, providers cut the band at the Internet-only tariff, and peering lokalka went at the speed of the environment.

As a result, the operators in the hands turned out to be a huge traffic terabyte scale.

Thus, the peering should be beneficial to both operators. Both pay only the port for the junction and some for the maintenance of this junction. And both save ... When a business saves, that's good. When a business competitor saves - this is bad. So a big corporate policy comes into play.

If we have provider A with traffic at 10 Gb / s and provider B with traffic at 1 Gb / s, and the approximate peering volume between them is 500 Mb / s, then ...

... I must also say, most often the highways take the money by the lane, 95% per centile, and according to which one was more - outgoing or incoming.

So, if there is a peering between A and B, then A saves 5% of traffic on the peering, and B saves 50%.Obviously, if A and B are competitors, then giving up peering A will lose almost nothing, but B will pay a lot of uplink to bring traffic to A. It is

even worse when there are three operators: A, B, B. A and B large, between them joint at 10GB / s, almost clogged. B - small, and he has only 500Mb / s. A and B feast, and B is not allowed. B goes to uplink and pays the hard-earned money. For traffic to A and B. And since most of the users are at A and B, then B has most of the users want to receive / send traffic A or B. For Alliance A and B, everything is fine - most of the traffic is local, and to competitors they leave real crumbs. And for B, this means that most of the traffic is paid and expensive.

Thus, two big friends are friends, while B is bad (expensive). And it happens that several large operators unite and decide to arrange a “business”. It turns out OPG. Like any organized criminal group, it begins to “milk” those whom it shelters and crush those who resist. Well, you know, everyone wants to eat.

... Oh yeah, the criminal group is deciphered very innocently - United Peering Group. A little more about this is in the blog Kipchatova [5].

Wars around peering exist and will exist, alas. Mesh network is good for users, but not for those who make money on the Internet.

Due to the fact that peering is a very demanded service, there are entire companies that build their business only on the provision of peering. In this case, there are several methods:

Direct switching Operator A sticks into Operator B physically
-. , , ,
-. : - -, , . - . - , , «» , .

Because of their primitiveness, DoS attacks (they usually like to add DDoS, but distributed is a separate conversation) are very easy to implement. A dozen lines on C, one line in the shell - and now, the next computer is struggling to try to rid the whole Internet with meaningless traffic. If there are several such computers, you can get a stream of junk in gigabytes, tens of gigabytes, hundreds of gigabytes.

If all this garbage goes to one address, then it turns out to be trouble. The incoming channel is blocked in the "ceiling" and bona fide users simply can not send their requests.

The problem is that they can clog not only the channel of a specific server, but also the incoming channel of the operator (yes, it happens). Considering how channels are paid between operators (95% percentile), a large flow of garbage traffic for a long time is an obvious unproductive cost.

The simplest household solution is to set the source to drop on the nearest router, or even a switch. But at the same time, it turns out that, firstly, the incoming channel is still overloaded, secondly, it must be paid for, and thirdly, we are faced with the problem of “who to ban”. If all traffic comes from one or two addresses, the task is simple. But to make a flood with a fake sender address is easier than easy. So in absolutely emergency situations, they block the recipient's address (yes, they “voluntarily die” in order to save their neighbors), and transfer this task to the blackhole in BGP. In the normal configuration, it should add its own addresses, and not strangers, but if uplink, by agreement or by inattention, allows you to announce other people's addresses, this can also be done.

The story about the technical part of the blackhole BGP is on Habré [6].

What does this look like? For black hole, a special community (relatively speaking, one more, special, small full view) is allocated, where the provider can announce its addresses with the / 32 prefix (for ipv4). They gave him the romantic number 666. The border routers exchange this information over BGP, so the black hole will crawl, slowly absorbing all the traffic addressed to the banned address on all the routers that these announcements see (and support). As a result, traffic to the "victim" begins to drop on uplinks, uplinks uplinks - and so to the "understanding" routers closest to the sources of attack. They are bad traffic and drop, so the Internet attacks will not notice. The address, however, will not be accessible from the Internet, since the router cannot parse the “good” or “bad” traffic.