The Internet seems to be a strong, independent and indestructible structure. In theory, the strength of the network is enough to survive a nuclear explosion. In reality, the Internet can drop one small router. All due to the fact that the Internet is a jumble of contradictions, vulnerabilities, errors and videos about seals. The basis of the Internet - the BGP protocol - contains a bunch of problems. It is amazing that he is still breathing. In addition to errors on the Internet itself, it is still being broken by all and sundry: large Internet providers, corporations, states and DDoS attacks. What to do with it and how to live with it?
Alexey Uchakin knows the answer ( Night_Snake ) - the leader of the team of network engineers in the company IQ Option. Its main task is the accessibility of the platform for users. In deciphering Alexey’s report on Saint HighLoad ++ 2019, we ’ll talk about BGP, DDOS attacks, Internet switches, error providers, decentralization, and cases when a small router sent the Internet to sleep. In the end - a couple of tips, how to survive it all.
The day when the Internet broke
I will cite only a few incidents when connectivity has broken on the Internet. This will be enough for the full picture.
')
"Incident with AS7007" . The first time the Internet broke in April 1997. There was an error in the software of one router from the 7007 autonomous system. At some point, the router announced its internal routing table to its neighbors and sent half of the network to black hole.
Pakistan vs YouTube . In 2008, the brave guys from Pakistan decided to block YouTube. They did it so well that half the world was left without the seals.
“Capturing VISA, MasterCard and Symantec Prefixes by Rostelecom” . In 2017, Rostelecom mistakenly began announcing the prefixes VISA, MasterCard and Symantec. As a result, the financial traffic went through the channel which the provider controls. The leak did not last long, but financial companies were unpleasant.
Google v. Japan . In August 2017, Google began to announce in part of its uplinks the prefixes of major Japanese providers NTT and KDDI. Traffic went to Google as a transit, most likely by mistake. Since Google is not a provider and transit traffic does not pass, a significant part of Japan is left without the Internet.
"DV LINK captures the prefixes of Google, Apple, Facebook, Microsoft . " In the same 2017, the Russian provider DV LINK began to announce for some reason the networks of Google, Apple, Facebook, Microsoft and some other major players.
"ENet from the USA captured the AWS Route53 and MyEtherwallet prefixes . " In 2018, a provider from Ohio or one of his clients announced the Amazon Route53 network and MyEtherwallet cryptographic network. The attack was successful: even despite the self-signed certificate, a warning about which appeared to the user when entering the MyEtherwallet website, many wallets stole and stole part of the cryptocurrency.
There were more than 14,000 such incidents in 2017 alone! The network is still decentralized, so not everything is broken and not at all. But there are thousands of incidents, and all of them are connected to the BGP protocol that the Internet is running on.
BGP and its problems
BGP - Border Gateway Protocol , was first described in 1989 by two engineers from IBM and Cisco Systems on three “napkins” - A4 sheets. These "napkins" still lie in the main office of Cisco Systems in San Francisco as a relic of the networked world.
The protocol is based on the interaction of autonomous systems - Autonomous Systems or AS. An autonomous system is simply some kind of ID that is assigned IP networks in the public registry. A router with such an ID can announce these networks to the world. Accordingly, any route on the Internet can be represented in the form of a vector, which is called AS Path . A vector consists of autonomous system numbers that must be traversed to reach the destination network.
For example, there is a network of a number of autonomous systems. It is necessary to get from the AS65001 system to the AS65003 system. The path from one system is represented by the AS Path in the diagram. It consists of two autonomous systems: 65002 and 65003. For each destination address, we have the AS Path vector, which consists of autonomous system numbers that we need to go through.
So what are the problems of BGP?
BGP is a trust protocol.
BGP is trust based. This means that we trust our neighbor by default. This is a feature of many protocols that were developed at the very beginning of the Internet. We will understand what it means to "trust"
No neighbor authentication . Formally, there is MD5, but MD5 in 2019 - well, that is ...
No filtering . BGP has filters and they are described, but they are not used or used incorrectly. I will explain later why.
It is very easy to establish a neighborhood . Configuring the neighborhood in the BGP protocol on almost any router is a couple of lines of the config.
BGP management rights are not required . No need to take exams that will confirm your qualifications. Nobody will take away rights for setting up BGP in a drunken state.
Two main problems
Prefix hijacks - prefix hijacks.Prefix hijacking - announcing a network that does not belong to you , as is the case with MyEtherwallet. We took some prefixes, agreed with the provider or hacked it, and through it we announce these networks.
Route leaks - route leaks . Leakage is a bit more complicated. Leak is a change in the AS Path . In the best case, the change will lead to a greater delay, because you need to go the route longer or along a less capacious link. At the worst - the case with Google and Japan will repeat.
Google itself is not an operator and non-transit autonomous system. But when he announced to his provider a network of Japanese operators, then traffic through Google over AS Path was seen as a higher priority. The traffic went there and dropped simply because the routing settings inside Google are more complicated than just the filters on the border.
Why don't the filters work?
Nobody cares . This is the main reason - all do not care. The admin of a small provider or a company that has connected to a BGP provider has taken MikroTik, configured BGP on it and does not even know that filters can be configured there.
Configuration errors . Something debazhili, wrong in the mask, put the wrong mesh - and here, again, the error.
No technical capability . For example, communication providers have many clients. Smartly, you should automatically update the filters for customers - make sure that he has a new network, that he has leased his network to someone. Watching this is difficult, hands - even more difficult. Therefore, they put simply relaxed filters or do not put filters at all.
Exceptions . There are exceptions for loved ones and big clients. Especially in the case of inter-operator joints. For example, TransTeleCom and Rostelecom have a lot of networks and a junction between them. If the joint goes down, it will not be good for anyone, therefore the filters relax or clean completely.
Outdated / irrelevant information in IRR . Filters are based on information that is recorded in the IRR - Internet Routing Registry . These are registries of regional online registrars. Often in registries outdated or irrelevant information, or all together.
Who are these recorders?
All Internet addresses are owned by the IANA organization - the Internet Assigned Numbers Authority . When you buy an IP network from someone, you do not buy addresses, but the right to use them. Addresses are an intangible resource, and by common agreement they all belong to the IANA agency.
The system works like this. IANA delegates the management of IP addresses and autonomous system numbers to five regional registrars. They provide autonomous LIR systems to local online registrars . Next LIR allocate IP addresses to end users.
The lack of a system is that each of the regional registrars maintains its registries in its own way. Everyone has their own views on what information should be contained in the registries, who should or should not check it. The result is a mess that eats now.
How else can you deal with these problems?
IRR - mediocre quality . With IRR it is clear - everything is bad there.
BGP-communities . This is some attribute that is described in the protocol. We can hang, for example, a special community on our announcement, so that the neighbor does not send our networks to its neighbors. When we have a P2P link, we only exchange our networks. In order to accidentally route did not go to other networks, we hang the community.
Community is not transitive . This is always a contract for two and it is their disadvantage. We can not hang any community, except for one, which is accepted by default by all. We cannot be sure that this community will accept and interpret everything correctly. Therefore, at best, if you agree with your uplink, he will understand what you want from him in the community. But his neighbor may not understand, or the operator will simply reset your label, and you will not achieve what you wanted.
RPKI + ROA solves only a small part of the problems . RPKI is Resource Public Key Infrastructure - a special framework for signing route information. A good idea is to force LIRs and their clients to maintain the current address space database. But there is one problem with him.
RPKI is also a hierarchical public key system. Does IANA have a key from which RIR keys are created, and LIR keys from them? with which they sign their address space using ROAs - Route Origin Authorisations:
- I assure you that this prefix will be announced on behalf of this autonomy.
In addition to ROA, there are other objects, but something about them later. It seems that the thing is good and useful. But it does not protect us from leaks from the word "absolutely" and does not solve all the problems with hijacking prefixes. Therefore, players are not in a hurry to introduce it. Although already from major players like AT & T and large IXs there are assurances that prefixes with invalid ROA entries will be dropped.
Perhaps they will do it, but so far we have a huge number of prefixes that are not signed in any way. On the one hand, it is unclear whether they are validly announced. On the other hand, we cannot drop them by default, because we are not sure whether this is right or not.
What else is there?
BGPSec . This is a cool thing that academics have come up with for a network of pink ponies. They said:
- We have RPKI + ROA - a mechanism for verifying the signature of the address space.Let's create a separate BGP attribute and name it BGPSec Path.Each router will sign with its signature announcements that it announces to its neighbors.So we will get the trusted path from the chain of signed announcements and will be able to check it.
In theory, good, but in practice a lot of problems. BGPSec breaks a lot of existing BGP mechanics at the choice of next-hop and managing incoming / outgoing traffic directly on the router. BGPSec does not work until it is implemented by 95% of participants in the entire market, which in itself is utopia.
BGPSec has huge performance problems. On the current hardware, the speed of checking announcements is about 50 prefixes per second. For comparison: the current Internet table with 700,000 prefixes will be poured over 5 hours, during which another 10 times will change.
BGP Open Policy (Role-based BGP) . Fresh offer based on Gao-Rexford model. These are two scientists who are engaged in BGP research.
The model of Gao-Rexford is as follows. If simplified, in the case of BGP there are a small number of types of interactions:
Provider Customer;
P2P;
internal interaction, say, iBGP.
Based on the role of the router, it is already possible by default to attach some import / export policies. The administrator does not need to configure the prefix lists. Based on the role that the routers agree on among themselves and which can be set, we already get some default filters. This is now a draft, which is discussed in the IETF. I hope that soon we will see this in the form of RFC and implementation on hardware.
Major Internet Service Providers
Consider the example of the provider CenturyLink . It is the third largest US provider that serves 37 states and has 15 data centers.
In December 2018, CenturyLink lay on the US market for 50 hours. During the incident, there were problems with the work of ATMs in two states, for several hours the number 911 did not work in five states. To the heap ripped off the lottery in Idaho. In this incident, the US Telecommunications Commission is now investigating.
The reason for the tragedy in one network card in one data center. The card went out of order, sent incorrect packages and all 15 data centers of the provider were down.
For this provider, the idea of “too big to fall” did not work. This idea does not work at all. You can take any major player and put some trifle. In the US, everything is still good with connectedness. CenturyLink's customers who had a reserve went massively into it. Then alternative operators complained about overloading their links.
If the conditional Kazakhtelecom is laid down, the whole country will remain without the Internet.
Corporations
Likely on Google, Amazon, FaceBook and other corporations keeps the Internet? No, they break it too.
In 2017 in St. Petersburg at the ENOG13 conference, Jeff Houston from APNIC presented the report “Death of Transit” . It says that we are used to the interaction, the flow of money and the traffic on the Internet are vertical. We have small providers who pay for connectedness to larger ones, and those already pay for connectedness to the global transit.
Now we have such a vertically oriented structure. Everything is good, but the world is changing - major players build their trans-ocean cables to build their own backbones.
The news about the CDN-cable.
In 2018, TeleGeography released a study that more than half of the traffic on the Internet is no longer the Internet, but backbones CDN of major players. This is traffic that is related to the Internet, but this is not the same network that we talked about.
The Internet falls into a large set of networks that are loosely coupled to each other.
Microsoft has its own network, Google has its own, and they weakly intersect with each other. The traffic that originated somewhere in the USA goes through the Microsoft channels across the ocean to Europe somewhere on the CDN, further through the CDN or IX it connects with your provider and gets to your router.
Decentralization disappears.
This strength of the Internet, which will help it survive after a nuclear explosion, is lost. There are places of concentration of users and traffic. If the conditional Google Cloud falls, there will be many victims at once. We felt this partly when Roskomnadzor blocked AWS. And the example of CenturyLink shows that this is enough for the little things.
Previously, not all broke and not all. In the future, we can come to the conclusion that by affecting one major player, you can break a lot of things, a lot of places and a lot of people.
States
The next in line with the state, and usually happens with them.
Here our Roskomnadzor is not even a pioneer at all. There is a similar Internet shutdown practice in Iran, India, and Pakistan. In England there is a bill on the possibility of disconnecting the Internet.
Any large country wants to get a switch to turn off the Internet, either completely or in parts: Twitter, Telegram, Facebook. They are not that they do not understand that they will never succeed in this, but they really want this. A switch is used, as a rule, for political purposes - to eliminate political rivals, or elections on the nose, or Russian hackers again broke something.
DDoS attacks
I will not take the bread from comrades from Qrator Labs, they do it much better than me. They have an annual report on the stability of the Internet. And that's what they wrote in the 2018 report.
The average duration of DDoS attacks drops to 2.5 hours . The attackers also begin to count money, and if the resource does not lay down right away, then it is quickly left alone.
The intensity of attacks is increasing . In 2018, we saw 1.7 TB / s on the Akamai network, and this is not the limit.
New vectors of attacks appear and the old ones become stronger . New protocols appear, subject to amplification, new attacks on existing protocols, especially on TLS and the like.
Most of the traffic is mobile devices . In this case, the Internet traffic goes to mobile clients. With this you need to be able to work both those who attack and those who defend themselves.
Invulnerable - no . This is the main idea - there is no and will not appear universal protection, which will definitely protect against any DDoS.
The system can not be put only if it is not connected to the Internet.
I hope I scared you enough. Let's think about what to do now.
What to do?!
If you have free time, desire and knowledge of English - participate in working groups: IETF, RIPE WG. These are open mail lists, subscribe to newsletters, participate in discussions, come to the conference. If you have LIR status, you can vote, for example, at RIPE for various initiatives.
For mere mortals it is monitoring . To know what is broken.
Monitoring: what to check?
Regular Ping , and not only binary check - it works or not. Record RTT in history to watch anomalies later.
Traceroute . This is a utility program for determining the route data in TCP / IP networks. Helps to detect anomalies and blockages.
HTTP checks-checks for custom URLs and TLS certificates will help detect DNS blocking or substitution for an attack, which is almost the same. Blocks are often performed by swapping DNS and wrapping traffic to a stub page.
If possible, from your clients check the resolve your origin from different places if you have an application. So you will find anomalies of interception of DNS, which sometimes sin providers.
Monitoring: from where to check?
There is no universal answer. Check where the user comes from. If users are sitting in Russia - check from Russia, but not limited to it. If your users live in different regions, check from these regions. But better from all over the world.
Monitoring: what to check?
I came up with three ways. If you know more - write in the comments.
RIPE Atlas.
Commercial monitoring.
Its network of virtual locks.
Let's talk about each of them.
RIPE Atlas is such a small box. For those who know the domestic "Auditor" - this is the same box, but with a different label.
RIPE Atlas is a free program . You register, get a router by mail and plug it into the network. For the fact that someone else uses your breakdown you drip some loans. For these loans, you can do some research yourself. You can test in different ways: ping, traceroute, check certificates. Coverage is quite large, a lot of nodes. But there are nuances.
The credit system does not allow to build production solutions . There are not enough credits for continuous research or commercial monitoring. There are enough credits for a short research or one-time check. The daily rate from one sample is eaten by 1-2 checks.
Coverage is uneven . Since the program is free in both directions, the coverage is good in Europe, in the European part of Russia and some regions. But if you need Indonesia or New Zealand, then everything is much worse - 50 samples per country may not be enough.
Do not check http from sample . This is due to technical details. They promise to fix it in the new version, but for now http cannot be checked. You can check only the certificate. Some kind of http check can only be done before a special RIPE Atlas device, called Anchor.
The second method is commercial monitoring . Is he okay? You pay money? They promise you a few dozen or hundreds of monitoring points around the world, draw beautiful dashboards out of the box. But, again, there are problems.
It is paid in some places very . Monitoring by ping, checks from all over the world and many http-checks can cost several thousand dollars a year. If finances allow and you like this decision - please.
Coverage may be missing in the region of interest . The same ping clarifies the maximum of the abstract part of the world - Asia, Europe, North America. Rare monitoring systems can detail a sample to a specific country or region.
Weak support for custom tests . If you need something custom, and not just “kurlyk” on the url, then this is also a problem.
The third way is your monitoring . This is a classic: "And let's write our own!"
Its monitoring turns into software development, and distributed. You are looking for an infrastructure provider, see how it is deployed and monitored - monitoring should be monitored, right? And still need support. Think about it ten times before you take it. It may be easier to pay someone to do it for you.
Monitoring BGP anomalies and DDoS attacks
Here, the available resources are still easier. BGP anomalies are detected using specialized services such as QRadar, BGPmon . They accept a full view-table from a variety of operators. Based on what they see from different operators, they can detect anomalies, look for amplifiers, and so on. Usually, registration is free - you’ll block your own number, subscribe to email notifications, and the service alerts you to problems.
Monitoring DDoS attacks is also easy. As a rule, it is NetFlow-based and logs . There are specialized systems like FastNetMon , modules for Splunk . In extreme cases, there is your DDoS protection provider. He can also merge NetFlow and, based on it, he will notify you about attacks in your direction.
findings
Do not harbor illusions - the Internet will surely break . Not everything and not everyone will break, but 14 thousand incidents in 2017 hint that there will be incidents.
Your task is to notice problems as soon as possible . At least, no later than your user. Not only that, it is necessary to notice, always keep in stock "Plan B". A plan is a strategy that you will do when everything breaks down : backup operators, DCs, CDNs. A plan is a separate checklist that checks the work of everything. The plan should work without involving network engineers, because they are usually small and they want to sleep.
That's all. Wishing you high availability and green monitoring.
Next week the sun, highload and high concentration of developers at HighLoad ++ Siberia 2019 are expected in Novosibirsk. In Siberia, projected front reports about monitoring, availability and testing, security and management. Precipitations in the form of written notes, networking, photos and posts in social networks are expected. We recommend to postpone all cases on June 24 and 25 and book tickets . We are waiting for you in Siberia!