MIT course "Computer Systems Security". Lecture 12: "Network Security", part 2
Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014
Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.
Student: Perhaps you still have a conflict of interest problem because you could use 32 bits for the peer addresses and you have a lot of ports for each of them. Probably, you have a conflict of ordinal numbers of all these connections that you get? ')
Professor: it turns out that these sequence numbers are specific to the IP address and port number of the source / destination pair. So if these are different ports, then they do not interfere with each other at all. Specifically, ports have lower sequence numbers.
Student: if the sequence numbers are global, then can not an attacker get into a connection between other clients?
Professor: yes, that's a good point. In fact, if the server increases the sequence number, for example, by 64k for each connection, you connect to the server, and then 5 more people connect to it, and here you can organize an attack. So to some extent, you are right, it is a bit troublesome. On the other hand, you could probably make it so that the packet from the last row of S-> A would be delivered immediately before this packet in the first row of C-> S. If you send your packages one after the other, there is a good chance that they will arrive at the server also one by one.
The server will receive S-> A and respond with this sequence number (SNs). It will be different than (SNs) in the second line, but with the sequence number immediately following it. And then you will know exactly which sequence number (SNs) should be invested in the third packet of your sequence. Therefore, I think that this is not a very reliable way to connect to the server, it is based on assumptions. But if you carefully arrange your packages as needed, you can easily guess the sequence. Or maybe you try several times and you are lucky.
Student: even if the numbers are generated completely randomly, you need to guess one of the 4 billion possible numbers. It is not too much, right? I think that within a year you can probably get into this network.
Professor: yes, you are absolutely right. You should not rely too heavily on TCP in terms of security. Because you're right, it's only 4 billion guesses. And you will probably be able to send a lot of packages during the day if you have a fairly fast connection.
So here we have a kind of interesting argument about the unreliability of TCP, because we only have 32 bits. We can not protect it. But I think that many applications that rely sufficiently on this protocol do not even think about security at all, and this is indeed becoming a problem.
But you are absolutely right. In practice, you want to use some kind of encryption on top of this in order to get more serious guarantees that no one has forged your data, since you use encryption keys longer than 32 bits. In most cases, this is still effective in preventing tampering with the TCP connection.
Let's now see why it’s bad if people can forge TCP connections from arbitrary addresses?
One of the reasons why this is bad is that it can influence the authorization based on the IP address when the server checks from which address the request came. If a server decides, based on the IP address, whether to allow or deny the connection, this could potentially be a problem for the attacker who forged the connection from an arbitrary source address.
So, one example where this was a problem, today this problem is mostly solved, it is the use of a family of r commands, such as rlogin. It used to be that you could run something like rlogin for a computer at, say, athena.dialup.mit.edu. And if your connection comes from the MIT host, then this rlogin command will succeed if you say: "Yes, I am Alice user on this computer, let me log in as Alice user on another computer." And this operation will be allowed, since all the computers on the mit.edu network are trustworthy to make such statements.
I have to say that dial-up never had this problem. This compound used "Cerberus" from the very beginning. But other systems, of course, had such problems. And this is an example of using an IP address in a connection authentication mechanism when the system checks whether the client calling the server is trustworthy. So what used to be a problem now is not. But relying on IP still seems like a bad plan.
Now rlogin is no longer used, it has recently been replaced by the secure SSH shell, which is an excellent network layer protocol. On the other hand, there are many other examples of protocols that rely on IP-based authorization. One of them is SMTP. When you send an email, you use SMTP to talk to some mail server in order to send messages. To prevent spam, many SMTP servers only accept incoming messages from a specific source IP address. For example, the Comcast mail server accepts mail only from Comcast IP addresses. The same is true for MIT mail servers - they will only accept mail from MIT IP addresses. But we had at least one server that did not work as it should, using IP authentication.
Everything is not so bad here. In the worst case, you send some spam through your mail server. So probably, therefore, they still use rlogin, while the things that allow you to log into a random account have stopped using IP-based identification.
So why is such an authentication mechanism a bad plan? As an assumption, assume that some server used rlogin. What would you do to attack? What is wrong with this can happen?
Student: an attacker can simply get into your computer, fake a user who is going to log into the network with your login, and gain access to the network.
Professor: yes, mostly the attacker is hijacking the computer. It synthesizes data that looks like a valid set of rlogin commands that say: "Log in as this user and execute this command in my Unix shell."
You synthesize this data data (SNc +1), mount the entire attack, and send this data as if the legitimate user was interacting with the rlogin client, and then you can proceed.
Well, this is one of the reasons why you do not want your TCP sequence numbers to be guessed. Another problem is these reset attack reset attacks. Just as we could send a SYN packet if we know someone’s sequence number, we can also send a reset packet.
We briefly mentioned a legal client that sends a bogus reset packet that the attacker has installed. An attacker can also attempt to send reset packets for an existing connection, if he somehow knows that your sequence number is on that connection. In fact, it is unclear how big the problem is.
At some level, you must assume that all your TCP connections can be broken anyway and at any time, that is, it doesn't seem like your network is secure. Therefore, perhaps you should expect connections to break.
In the case when routers “talk” with each other, this assumption is especially critical. If you have a lot of routers that communicate with each other using some routing protocols, then there are some physical connections between them. But on top of these physical connections, they communicate over a network protocol that works over TCP. In fact, in each of these physical links that routers use to exchange routing information, a TCP session is started. This uses the BGP protocol, which we'll talk about later.
This BGP protocol uses the fact that if a TCP connection is alive, then the physical connection is alive. So if a TCP connection is broken, the router considers that the connection is broken and begins to recalculate all of its routing tables.
Therefore, if an adversary wants to do some kind of a DoS denial of service attack here, he can try to guess the sequence numbers of these routers and reset these sessions. If a TCP session between two routers goes down, both routers assume that this connection is dead and they have to recalculate all routing tables, which causes the routes to change. After that, the attacker can drop another connection, and so on.
Thus, it is a somewhat disturbing attack, and not because it violates someone’s secret and so on, at least not directly, but because it does cause many access problems for other users of the system.
Student: if you are an attacker and want to organize a targeted attack against a specific user, could you just keep sending requests to connect to the server on behalf of its IP address and force it to reset the connection to the server?
Professor: Suppose I use Gmail and you want to prevent me from receiving any information from Gmail, so you just send the packages to my machine, pretending that they come from the Gmail server. In this case, you must guess the correct source port and destination port numbers.
The destination port number is probably 443, because I use HTTPS. But the source port number will be some random 16-bit thing. In addition, the sequence numbers will vary. Therefore, if you do not guess the sequence number, which is in my TCP window and which is tens of kilobytes, you will not succeed.
So you have to guess a fair amount of things. There is no Oracle access. You can’t just request the server's sequence number from the server. That's the reason why that won't work either.
So many of these problems have been fixed, including this thing based on RST, especially for BGP routers. There were actually two funny fixes. One really shows how you can exploit existing things or use them to fix specific problems. It uses the property that these routers communicate only with each other, and not with anyone else on this network. As a result, if a packet does not arrive from a router located at the other end of the connection, then this packet is discarded.
A successful implementation of the developers of these protocols is a wonderful area in the package, which is called "lifetime", or TTL. This is an 8-bit field that is reduced by each router to ensure that packets do not fall into an infinite loop. The maximum TTL value is 255 and further decreases.
So, what are these smart protocols doing? They drop any packet with a TTL value that is not equal to 255. Because if the packet has a value of 255, then it can only come from the router on the other side of the connection. And if an adversary tries to inject any other packet into an existing BGP connection, it will have a TTL value less than 255, because this value will be reduced by other routers along the routing path, including this router. Therefore, this package will simply be rejected by the recipient.
So this is one example of the smart combination of backward-compatible techniques that solve this very specific problem.
Student: Doesn't the bottom right router send something with a TTL of 255?
Professor: this is a physical router. And he knows that these are separate links, so he looks at both TTL and where the package came from. So if a packet came from the top left router, it will not accept it for a TCP connection between it and the top right router.
For the most part, these routers trust their immediate neighbors, and this process can be controlled using the AutoPath multi-path routing mechanism.
Other fixes for BGP are to implement some form of authentication header, including the MD5 authentication header. But in reality, the developers focused on this particular application, for which a reset attack is particularly critical.
This problem persists today. If there is any long-existing connection and I want to interrupt it, I just have to send a large number of RST packets, approximately hundreds of thousands, but probably not 4 billion. Because the servers are actually somewhat vulnerable to what sequence number they take to reset.
This can be any package in a specific window. In this case, the attacker could break this connection, without making special efforts. This is still a problem for which there is no really good solution.
And the last bad thing that can happen because of the predictability of sequence numbers is the injection of data into existing connections. Suppose we have a hypothetical protocol, similar to rlogin, which does not actually perform IP-based authentication, so you must enter your password to log in.
The problem is that once you have entered your password, it is possible that your TCP connection is simply established and can accept arbitrary data. So the attacker just needs to wait until one of you guys log in to your computer by entering your password.
The attacker does not know what the password is, but as soon as you have established a TCP connection, he will immediately try to guess your sequence number and enter some data into your existing connection. So if I can correctly guess your sequence number, it will allow me to pretend that it was not me, but you entered some command after you were correctly authenticated with a password.
All this suggests why you really do not want to rely on these 32-bit sequence numbers in terms of security. But let's see what modern TCP stacks actually do to mitigate this problem. One approach to the problem, which we will consider in the next 2 lectures, is to implement some degree of security at the application level. At this level, we will use cryptography to authenticate, encrypt, sign, and validate messages without special input from TCP.
Some of the existing applications also help solve security problems or at least make it more difficult for an attacker to use these problems. People put this into practice today, for example, in Linux and Windows, supporting different initial sequence numbers for each source / destination pair.
Thus, most TCP SYN implementations still compute this initial ISN sequence number just as we did before. So this is an old style isn, let's say. And in order to actually generate a sequence number for any particular connection, we add to this old-fashioned ISN a random 32-bit offset. That is, we add a function to it - something like a hash function or SHA-1, or something better.
This feature includes the source IP address, source port number, destination IP address, destination port number, and some secret key that only the server knows. Thus, we create a good opportunity for any particular connection to determine the IP address and port for a source / destination pair, while retaining all the good features of this old style sequence number assignment algorithm.
But if you have connections from different source / destination sets, then there is nothing that allows you to find out the exact value of the sequence number of another connection set. In fact, you have to guess this key in order to calculate this value.
I hope that the OS kernel of the server stores this key somewhere in its memory and does not give it to anyone. This is how most TCP stacks solve today this particular problem in the area of ​​common 32-bit sequence numbers. This is not too cool, but it works.
Student: could you repeat that again? What about the uniqueness of the key ...
Professor: when my machine boots, or when any machine boots, it generates a random key. Every time you reboot it, it generates a new key. This means that each time the sequence numbers of a particular source / destination pair change with the same frequency offset. Thus, for a given source / destination pair, the function parameters are fixed. So you follow the sequence when the numbers evolve according to your initial sequence numbers for new compounds, varying according to a certain algorithm. In this way, protection is provided against the injection of old packages from previous compounds into new compounds, as well as protection against the reassignment of packages.
The only thing for which we need this serial number of the old sample is the choice of an algorithm to prevent problems with these duplicate packages. Earlier, we considered that if you get a sequence number for a single connection A: A -> S: SYN (...), then you can conclude about the sequence number for an ACK connection (SNs).