Translation of the
article by Avery Pennarun, one of Google employees, about why the modern Internet is what it is, about the history and prerequisites of creating IPv6, and also about how an ideal IPv6 protocol would work, why it isn’t and how this ideal come closer.
Last November, I went to the IETF meeting for the first time. IETF is an interesting place: it seems that a third of it consists of heavy accompanying work, a third of expanding things already created, and a third of crazy research that is far from reality
(in this place Avery used the phrase "blue sky insanity", formed by him from the expression blue skies research - approx. transl.) . I took part mainly because I wanted to see how people would react to
TCP BBR, which was first introduced . (Answer: mostly positive, but with distrust. It seemed too good to meet expectations.)
Be that as it may, IETF meetings are comprised of many IPv6 presentations that would replace the IPv4 protocol, which forms the basis of the Internet. (Some would say that the replacement is already underway; some - that it has already occurred.) In addition to these IPv6 presentations, there are a large number of people who consider it the best, the greatest of all, and they are sure that it is about to finally come (At Any Moment), and IPv4 is just a big bunch of hacks, which is destined to die, so that the Internet becomes beautiful again.
I thought it would be a good opportunity to try to actually figure out what was going on. Why is IPv6 so messy compared to IPv4? Wouldn't it be better if it was just
IPv4 with an increased number of bits in the address ? But no, for the sake of all that is holy, everything is done differently. So I started asking everyone around, and this is what I learned.
')
Tires ruined everything
Once upon a time there was a telephone network that used physical circuit switching. In essence, this meant moving the connectors in such a way that your phone literally turned out to be connected with a very long wire (
OSI level 1 ). And the “leased line” was the very long wire that you leased from the telephone company. You put the bits on one side of this wire, and from the other end of it they left after a fixed period of time. You didn't need addresses, because there was only one car at each end.
Once, the phone companies optimized it a bit. There are time division multiplexing (TDM) and “virtual channel switching”. Telephone companies could transparently take bits at low speeds from many lines, group them together using multiplexers and demultiplexers, and let them through the telephone system using fewer wires than before. For this to work, more work was needed than before, but for the time being for us, the users of modems, everything was the same: we put the bits at one end, they crawl out of the other. No addresses needed.
The Internet (then not yet called) was built on top of these channels. You had a bunch of wires in which you can put the bits and catch on the other side. If one computer has two or three network interfaces, then it can, if properly instructed, send bits from one line to another, and you can do something much more efficient than separate communication lines between each pair of computers. And so there were IP addresses ("level 3"), subnets and routing. Even then, with these point-to-point channels, you didn’t need MAC addresses, because as soon as the packet appeared in the wire, there was only one place where it could go out. You needed IP addresses only to decide where he should go after that.
In the meantime, as an alternative, local area networks (LANs) were invented. If you wanted to connect your computers (or terminals and mainframe), you received the inconvenience in the form of a set of interfaces that you had to have for each connection in the star topology. To reduce electronics costs, people needed a bus network (also known as a “broadcast domain,” a concept that will be important in the future), where many stations could simply be connected to one wire and talk to anyone who connected in him. These were not the same people who built the Internet, so they did not use IP addresses for this. They invented their own scheme ("level 2").
One of the early bus-type LANs was arcnet dear to my heart (I wrote the first Linux arcnet driver and
arcnet verses in the distant nineties, long after arcnet was outdated). The level 2 arcnet addresses were very simple: only 8 bits set by jumpers or DIP switches on the back of the network card. It was your task as a network owner to set up addresses and make sure that you do not have duplicates, or otherwise any devilry can occur. It was a little painful, but the arcnet networks were usually quite small, so it was just a semblance of pain.
A few years later, Ethernet came along and solved this problem once and for all, using a lot more bits (in fact, 48) in the second level addresses. This is enough bits so that you can assign a different one (shardirovanno-sequential
(here, apparently, means that the first three bytes of the MAC-address are assigned as a range to a specific manufacturer - approx. Transl.) ) Address to each device that either it was released and not intersected. And that is exactly what they did! This is how Ethernet MAC addresses appeared.
Various LAN technologies came and went, including one of my favorites, IPX (inter-network packet exchange, although it had nothing to do with the “real” Internet), and Netware, which worked great, until until all the clients and servers were online from the same bus. You never had to set up any addresses. It was beautiful and reliable and workable. Practically, the golden age of construction.
Of course, someone had to destroy it: large networks of companies / universities. They wanted to have so many connected computers that the separation of 10 Mbit / s on a single bus between them all became a bottleneck, so they needed a way to have a lot of tires, and interconnect - internet, if you want - these tires together. You probably think “of course! Use Internet Protocol (IP) for this, right? Haha, no. The Internet Protocol, still not called that, was not yet quite mature and popular at that time, and no one took it seriously. Netware-over-IPX (and many other local area network protocols at the time) was a big deal, and as any serious business does, they invented their own pieces to expand Ethernet, which gained popularity. Ethernet devices already had addresses, MAC addresses, which were probably the only people who used different LAN protocols to negotiate, so they decided to use Ethernet addresses as keys for their routing mechanisms. (Actually, instead of “routing,” they called it bridging and switching.)
The problem with Ethernet addresses is that they are assigned sequentially in the factory, so they cannot constitute a hierarchy. This means that the “bridging table” is not as good as the modern IP routing table, which may contain a route entry to the whole subnet at once. To do bridging, you had to remember which bus network each MAC address could be found in. And people did not want to tune each of them with their hands, so it was necessary to find out on their own. If you had an intricate interconnection of networks with the help of bridges, everything became a bit complicated. As I understand it, this is what led to the
poem about the spanning tree , and I’ll probably just leave it here. Poetry is very important in networking technologies.
Be that as it may, for the most part it worked, although it was a bit confusing, and you had here and there broadcast “floods”, and the routes were not always optimal, and it was almost impossible to debug everything. (You definitely could not write something like traceroute for bridges, because nothing of the tools needed to make it work — such as the ability to configure an address on an intermediate bridge — does not exist in the bare Ethernet.)
On the other hand, all these bridges were hardware optimized. Zheleziachniki simply invented the whole system as a mechanism for cheating software, which had no idea about the multitude of tires and the bridges between them, so that it would work in larger networks. Hardware bridging means that the bridge can work really fast, as fast as Ethernet itself. Now it does not sound like something outstanding, but at that time it was very much. Ethernet was 10 Mbit / s, so you could probably score it by connecting several computers at once, but you could not issue one computer with 10 Mbit / s. In those days it sounded crazy.
In any case, the point is that bridging was a mess that could not be debugged, but it was fast.
Tire internet
While all this was happening, those same Internet users began to work, and, of course, they did not miss the appearance of cool low-cost LAN technologies. I think it could be about the same time that ARPANET was renamed to the Internet, although I’m not so sure. Let's say that it was, because the story sounds better when it is told confidently.
At some point, progress has moved from connecting individual computers to the Internet via long-distance point-to-point links to wanting to connect entire local networks together through point-to-point connections. In general, I wanted to have “long bridges”.
You might think, “Hey, no problem, why not build a bridge on a long line and be done with it?” That sounds good, but it doesn't work. I will not go into details, but in short the problem lies in
controlling the overload (unfortunately, for some reason there is no Russian translation of this article on the wiki - approx. Transl.) . The terrible dark secret of Ethernet bridging is the assumption that all your connections operate at approximately the same speed and / or are very underloaded because they do not have a braking mechanism. You just spit out the data as fast as you can, and expect it to come. But when your Ethernet is running at 10 Mbps, and your point-to-point connection is at 0.128 Mbps, this is completely hopeless. Another problem is that finding the routes by sending them across all channels to figure out which one is correct — and so bridging usually works — is too expensive for slow connections. And non-optimal routing, annoying and in local networks, where low latency and high throughput, on slow and expensive long-distance communication channels is absolutely disgusting. It just doesn't scale.
Fortunately, Internet users (if the Internet was already called this way) worked exactly on the same problems. If we could use the Internet tools to connect Ethernet buses together, we would be in good shape.
And then they developed a "frame format" for Internet packets over Ethernet (and arcnet at the same time, and all other types of LAN).
And here everything went awry.
The first problem that had to be solved was that now, when you put the package on the wire, it became completely incomprehensible what machine should “hear” it and, possibly, forward it further. If several Internet routers are in the same Ethernet segment, you will not be able to do so that they all accept the packet and try to redirect it; This is the path to batch storms and looped routes. No, you need to choose
which router on the Ethernet bus should pick it up. We cannot simply use the destination IP address field for this, because we have already recorded the address of the recipient of the message, and not the address of the router. Instead, we define the desired router using its MAC address in the Ethernet frame.
Thus, to set up your local IP route table, you would like to be able to say something like “send packets to 10.1.1.1 via a router with MAC 11: 22: 33: 44: 55: 66.” This is the very thing what would you like to express. Important! The destination of your packet is the IP address, but your router is the MAC. But if you’ve ever set up a routing table, you may have noticed that no one writes them this way. Instead, you write: “send packets to 10.1.1.1 through the router at 192.168.1.1”.
In fact, it only complicates things. Now your operating system must first find the MAC address for 192.168.1.1, understand that it is 11: 22: 33: 44: 55: 66, and finally build a packet with an Ethernet destination address 11: 22: 33: 44: 55: 66 and IP destination address 10.1.1.1. The address 192.168.1.1 is not specified anywhere in the package, it is just an abstraction for people.
To make this useless intermediate step, you need to add ARP (Address Resolution Protocol), a simple non-IP protocol whose task is to convert an IP address to an Ethernet address. This is done by a broadcast request to everyone on the local Ethernet segment, asking if they have this IP address. If you have bridges, they must forward all ARP packets to all their interfaces, because they are broadcast packets, which is exactly what the word broadcasting means. On a large, busy Ethernet network with many interconnected LANs, redundant broadcast-s become one of your nightmares. This is especially bad on WiFi networks. In the course of time, in order to fight this problem, people have invented bridges / switches with special hacks, which make it possible to avoid sending ARP until it is technically possible. Some devices (especially Wi-Fi hotspots) simply respond with fake ARP responses to help. But these are all crutches, although sometimes necessary.
Death due to heritage
Time went by. Once (and actually it took a decent amount of time) people almost stopped using non-IP protocols on Ethernet. So basically, all networks became physical wires (level 1), with many stations on the bus (level 2), buses are connected using bridges (caught! Still level 2), and these inter-buses are connected by IP routers (level 3 ).
Some time later, people were tired of manually configuring arcnet-style IP addresses, and wanted them to configure themselves, in Ethernet style, well, except that it was too late to do it in Ethernet style, because a) the devices were already released Ethernet addresses, not IP addresses; b) IP addresses were only 32-bit, which is not enough just to produce them endlessly without intersections, and c) simple sequential assignment of IP addresses instead of using subnets would return us to the beginning: this would be another Ethernet made from scratch, and we already have Etherne t.
And then bootp and DHCP appeared. These protocols, by the way, are special - like ARP (only they try not to be special, technically being IP packets). They need to be special, because the IP host must be able to send them before they receive the IP address, which is of course impossible, so it simply fills the IP headers with essentially nonsense (although indicated in the RFC), so that you can safely drop them . (You recognize these meaningless headers because DHCP has to open a raw socket and fill them in manually; the IP level in the kernel cannot do this.) But no one wanted to happily invent another protocol that was not IP, so they pretended This is an IP, and everyone was happy. Well, as much as possible when you invent DHCP.
I'm a little distracted. The distinguishing feature here is the following: unlike real IP services, bootp and DHCP protocols need to be aware of Ethernet addresses, because, after all, their job is to listen to your Ethernet addresses and assign you IP addresses for further work. Essentially, this is an appeal of the ARP protocol, except that we cannot say so, because there is already RARP protocol, which literally is “reverse ARP”
(reverse ARP) . Actually, RARP worked quite well and did the same as bootp and DHCP, being much simpler, but let's not talk about it.
The point of all this is that Ethernet and IP are increasingly intertwined. Now they are almost inseparable. It’s hard to imagine a network interface (except ppp0) without a 48-bit MAC address, and it’s hard to imagine this interface working without an IP address. You write down your IP routing table using IP addresses, but of course you know that you're lying, calling the router by its IP address; you just indirectly say that you want to route through the MAC address. And you have ARP, which goes through bridges, but for fun, and DHCP, which is IP, but actually Ethernet, and so on.
Moreover, we still have bridges (bridging) and routing (routing), and they both become more complex, while local networks and the Internet also become more complex and more complex. Bridging is still mostly hardware and is defined by IEEE, the people who manage the Ethernet standards. Routing is still mostly software and is defined by the IETF by people who control the Internet standards. Both groups are still trying to pretend that there is no other group. Network operators simply choose bridging vs routing, relying on how quickly they want it to work and how much they hate the setup of DHCP servers, which they actually hate very much, which means that they use bridges as much as possible and routing - when they have to.
In fact, the bridges are so out of control that people decide to make decisions at the bridge level entirely to a higher level (of course, the configuration exchange between the bridges is done using a protocol over IP!) So that they can be centrally controlled. This is called a software defined network (SDN). This is much better compared to when switches and bridges are allowed to do whatever they want, but it is also fundamentally stupid because you know what a software-defined network is? IP. This is it literally, and it has always been the SDN that you use to connect networks that have become too large. But the problem is that IPv4 was initially too difficult to speed up hardware, and in any case, it didn’t get hardware acceleration, and the DHCP setting is hell, so the network operators just learned how to bridge large and large entities. But now large data centers are simply based on SDN, and you could not use IP in the data center at all with the same success, because no one routes the packets. It's all just one big bus network.
This is, in short, a mess.
Now forget that I told all this ...
Good story, right? Good Now let's pretend that nothing of the kind happened, and we returned back to the 1990s, when most of it actually happened, but people in the IETF still pretended that it was not and an “impending” catastrophe could be avoided. This is the good part!
I forgot to mention something in this long story above: somewhere in this chain of events
we completely stopped using bus networks . Ethernet is actually no longer a bus. He only pretends to be a tire. ,
CSMA/CD , «». , . , , Ethernet, , … 1. , . , , .
, , WiFi — «» — ! — , WiFi , «», «». WiFi , , , «» . , MAC- . .
. . X - Z, IP Y, Wi-Fi A, ? , :
X -> [wifi] -> A -> [wifi] -> Y -> [internet] -> Z
Z — IP , , , IP destination Z. Y — , , , Ethernet MAC- Ethernet destination. Wi-Fi, X Y, ( , WPA2 ). A. , A?
No problem!
802.11 has such a thing as triaddress mode. They added a third Ethernet MAC address to each frame so that you can talk about a real Ethernet destination and an intermediate Ethernet destination. On top of this, there are still bit fields called “to-AP” and “from-AP”, which tell you that the packet goes from the station to the access point or from the access point to the station, respectively. But in general, they can both be true, because Wi-Fi repeaters do this (TDs send packets to TDs).Speaking of repeaters! If A is a repeater, send it back to base station B on a path that looks like this:X -> [wifi] -> A -> [wifi-repeater] -> B -> [wifi] -> Y -> [internet] -> Z
X->A , A->B : Ethernet — X, Ethernet — Y, A B; X Y . , , .
( - 802.11s , , .)
Avery, IPv6,
-. , ?
. IETF, IPv6, — , , , , , SDN WiFi — , - , . ! :
- ( !)
- 2 ( )
- ( -, ? — multicast)
- MAC- ( - , , IP )
- ARP DHCP ( MAC-, IP MAC)
- IP ( )
- IP ( )
- IP , ( — . .) ( IP , )
, : WiFi IPv6 . . Ethernet . SDN. ARP . tracerout-. , 12 (MAC ) Ethernet, 18 (// ) WiFi . , IPv6 24 ( IPv4), 12 Ethernet, 12 — 64- IP , Ethernet. , Ethernet, IPv6 .
. : .
: « , ».
For all these wonders, the opportunity to start over and throw away the legacy built by that moment is necessary. And this, unfortunately, is mostly impossible. Even if IPv6 reached a penetration of 99%, it would not mean that we got rid of IPv4. And if we did not get rid of IPv4, we did not get rid of Ethernet addresses, or WiFi addresses. And if we need to comply with the IEEE 802.3 and 802.11 frame standards, we can never throw those bytes. Therefore, we will always need the IPv6 neighbor discovery protocol, which is simply more complex ARP. Even though we no longer use bus networks, we will always need some semblance of broadcasts, because this is how ARP works. We will need to keep running a local DHCP server at home so that our outdated IPv4 light bulbs will continue to work. We still need NAT,so that outdated IPv4 light bulbs could get to the Internet.. , , - , IPv6
. , IPv6 1990-, IPv6 — — , IPv4 MAC- , , - « IP-». , — Ethernet , FTP? .
-: IP
, , — —
Ethernet , . . LTE, ! WiFi . , ?
, - : - bridging- . , . IP , IP , .
Corporate WiFi networks deceive you by uniting the entire LAN at the second level with a bridge, so that the giant central DHCP server always gives you the same IP address no matter which corporate access point you connect to, and then delivers you your packets, maximum of by a few seconds while the bridge is reconfigured. These newfangled home WiFi systems with multiple repeaters / extenders do the same. But if you switch from one WiFi network to another while walking along the street - if the public WiFi was in all stores in a row - then everything is bad. Each of them gives you a new IP address, and every time your IP address changes, all your connections break.LTE is trying harder. You save your IP address (usually an IPv6 address in the case of mobile networks), even when traveling for kilometers and numerous cell towers transfer you from one to another. How?
Well ... they usually just tunnel your traffic to the central point, where it is all connected by a bridge (albeit through enhanced filtering by firewalls) into one super-huge second-level virtual network. And your connections continue to live. At the cost of a great deal of complexity and a truly discouraging amount of additional delays that they would really like to remove, but this is almost impossible.How to make mobile networks work
Footnote 1, IPv6. IPv4 NAT, NAT-.
Well, it was a long story, but I managed to get it out of the people at IETF. When we got to here, to the problem of mobile IP, I could not help but ask. Something went wrong? Why can't we make it work?It turns out that the answer is surprisingly simple. The big drawback lies in the way the well-known "four" was defined (source IP, source port, destination IP, destination port). We use these fours to identify this TCP or UDP session; if the packet contains the same four fields, then it belongs to this session, and we can send it to the socket that serves the session. But the fourth covers two levels: network (third) and transport (fourth). If, instead, we had defined sessions using onlydata of the fourth level, the mobile clients would work perfectly.We give a short example. Port 1111 of client X communicates with port 80 Y, so it must send a four (X, 1111, Y, 80). The answer comes from (Y, 80, X, 1111), and the kernel delivers it to the socket, which the first packet created. When X sends more packets marked (X, 1111, Y, 80), Y sends them to the same server socket, etc.Then X changes the IP address and gets the name, say, Q. Now it starts sending packets with fours (Q, 1111, Y, 80). Y has no idea what this means, and throws it out. In the meantime, if Y sends packets marked (Y, 80, X, 1111), then they will be lost, because X is no longer ready to receive them., IP . , ( 16 ). , , 128 256 , - .
X Y (uuid,80). , IP (X,Y), 3 — .
3 , ; uuid. (80 ) , , .
, Y , (uuid) IP X, , (uuid).
, X Q. (uuid,80) IP Y, Q. Y , (uuid), , Q . , (uuid), Q X. ! ( , ).
2- , « ». , — - SYN-ACK-SYNACK, TCP. Y Q, X->Y Y . ( , 256- uuid ). Y , Q , , , Q --, ( , TCP ). (, QUIC), .
There is only one catch: UDP and TCP do not work this way, and it is too late to update them. Updating UDP and TCP would be comparable to upgrading IPv4 to IPv6; a project that seemed simple then in the 1990s, but a decade later and not half completed (and the first half was simple; the rest is much more complicated)., «». TCP — — QUIC UDP, UDP . , UDP , « », , UUID, .
: QUIC , , , . , (), stateless ( ) , QUIC. , , QUIC . !
, — UDP TCP , , -, MAC-, SDN, DHCP .
.