MIT course "Computer Systems Security". Lecture 19: "Anonymous Networks", part 2 (lecture from the creator of the Tor network)

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.

Lecture 1: "Introduction: threat models" Part 1 / Part 2 / Part 3
Lecture 2: "Control of hacker attacks" Part 1 / Part 2 / Part 3
Lecture 3: "Buffer overflow: exploits and protection" Part 1 / Part 2 / Part 3
Lecture 4: "Separation of privileges" Part 1 / Part 2 / Part 3
Lecture 5: "Where Security Errors Come From" Part 1 / Part 2
Lecture 6: "Opportunities" Part 1 / Part 2 / Part 3
Lecture 7: "Sandbox Native Client" Part 1 / Part 2 / Part 3
Lecture 8: "Model of network security" Part 1 / Part 2 / Part 3
Lecture 9: "Web Application Security" Part 1 / Part 2 / Part 3
Lecture 10: "Symbolic execution" Part 1 / Part 2 / Part 3
Lecture 11: "Ur / Web programming language" Part 1 / Part 2 / Part 3
Lecture 12: "Network Security" Part 1 / Part 2 / Part 3
Lecture 13: "Network Protocols" Part 1 / Part 2 / Part 3
Lecture 14: "SSL and HTTPS" Part 1 / Part 2 / Part 3
Lecture 15: "Medical Software" Part 1 / Part 2 / Part 3
Lecture 16: "Attacks through the side channel" Part 1 / Part 2 / Part 3
Lecture 17: "User Authentication" Part 1 / Part 2 / Part 3
Lecture 18: "Private Internet browsing" Part 1 / Part 2 / Part 3
Lecture 19: "Anonymous Networks" Part 1 / Part 2 / Part 3

Let's take a closer look at how the protocol works. Because it would be a shame to read a lecture article and not to talk about the things on which she focuses attention. I want to apologize again for my drawing on the blackboard, but most of the time I spend at the table typing on the computer.
')
This is an alien technology. So, here is the repeater. And here is Alice. Here is another repeater and here is Bob. Now Alice wants to talk to Bob, so the first thing she does is create a chain through these repeaters to Bob. Let's say she chose these two repeaters, R1 and R2. First, Alice makes a TLS link to R1, let's say that she already has a TLS link to R2. Then, first of all, Alice performs one-way authentication, one-way negotiation of anonymous keys.

The old Tor protocol was called TAP - Tor Authentication Protocol, the new one is called NTor. They both have evidence of security. This is the correct evidence, although their description made mistakes.

After authentication, Alice chooses the channel ID of the circuit ID, say, 3, instructs the relay to create the channel “3” - create “3”, and he responds to her that the channel created is created. Now Alice and the relay share the secret symmetric key S1. And they both store it with the index "3", which is a link to this channel.

Alice can now use this key to send R1 messages. She says that on the “troika”, this is the channel identifier, which is referred to in the lecture article, an extended cell with content is sent to the repeater.

The expanded cell basically contains the first half of the handshake. But this time it is not encrypted with the public key R1, but encrypted with the public key R2. This indicates that the message is being sent to R2. Thus, R1 knows that it is necessary to open a new channel to R2, and reports this to the R2 relay with the create (....) Message, in which that half of the handshake that came from Alice is placed in brackets. In doing so, R1 creates its own circuit ID, as the channel identifiers define other channels in this second TLS connection. Moreover, Alice does not know which channel identifiers are still used here, because this is a “personal matter” of R1 and R2.

So the repeater can choose, for example, ID 95. In fact, this is unlikely, because the channel number is randomly selected from 4 byte spaces, but I don’t want to write out all 32-bit numbers today.

After that, R2 responds to the first “created” relay, and R1 returns the expanded cell to Alice, encrypted with the S1 key. Now Alice and R2 relay share the key S2 and Alice can send messages, first encrypted with S2 and then with S1. It sends such a message, R1 removes S1 encryption and forwards it.

The first repeater knows that channel 3 messages should be sent to the second repeater via channel 95. Upon receiving this message, the second repeater sees that channel 95 corresponds to the S2 key, and with its help decrypts this message: “oh, it says to open a connection with Bob”! After reading this, R2 repeater opens a TCP connection with Bob and informs Alice of this, using the same reverse messaging process.

After all this, Alice says: “great, then tell Bob something like http: 1.0get /index.html,” and then life goes on.

Let's see what I missed in the lecture article ... so ... this, this and this. Ok, so what are we actually relaying? Some solutions in this area state that it is necessary to transfer IP packets back and forth, that is, this scheme should be simply a method of transmitting IP packets. One of the problems is that we want to support as many users as possible, and that means that we need to work on all kinds of operating systems.

But TCP stacks of different operating systems act differently. If you have ever used Nmap or any kind of network traffic analysis tool, you can easily distinguish between Windows TCP and FreeBSD TCP or TCP Linux. You can even distinguish different versions. Moreover, if you can send raw IP packets to a selected host, you can trigger responses that are partially based on what the host is doing.

So if you transfer IP packets back and forth, you need IP normalization. Since everything that is smaller than the full IP stack cannot work for normalization, you do not want to do this.

Instead, we choose the easiest way - we just take all the content from TCP streams, assuming that it is reliable and everything is fine with it. The program analyzes all the data transmitted by Alice, agrees to accept TCP connections originating from her applications, and simply relays the content without doing anything complicated at the network level.

You could try to increase productivity using other tools described in the lecture materials. But I described a scheme that can actually be implemented, because when creating Tor, we paid much more attention to security classes and compilers than to network classes. Now we have network specialists, but in 2003-2004 we were deficient in them.

TCP seems to be quite appropriate, the right level. The higher level protocols discussed in some of the original projects use separate proxies for HTTP, FTP on the Alice side and seem like a bad idea. This is because any protocol must have encryption from beginning to end throughout the entire Alice-Bob connection, and if we are lucky, Alice will be able to create a TLS connection between R2 and Bob, with integrity and security features.

But if this is the case, then any anonymity transformations that you want to apply to the encrypted data must occur in the application that Alice uses before the TLS connection is fully created. But this cannot be done using a proxy server, so TCP is more suitable for us.

Someone asked me where is our safety evidence? We have security evidence for the many encryption methods we use, these are standard editions of documents. In general, for the protocol, there is evidence of the safety of certain aspects of onion routing. But the models that they must use to prove that this provides anonymity must be based on such bizarre properties of the universe, network properties, or attacking abilities, that they can only satisfy programming commissions that sit on some theoretical conferences.
In short, these properties of anonymity must prove that an attacker who can see the data volume and timings in the Alice-R1 segment will not be able to identify them, observing only the output bytes in the R2-Bob segment. But this is not quite a satisfactory result. Let's just say - what kind of security guarantees would you want from a system that you don’t know how to build? Okay, I have to be careful with these statements ... Recall that there are systems with the strongest guarantees of anonymity, and you know how to create such systems, but you never want to use them. As, for example, classic DC-Net networks, which provide guaranteed anonymity, except that any participant can close the entire network, simply by ceasing to participate in it. In addition, this system does not scale.

But for the things created in our time, the properties of anonymity are more probabilistic, and not categorically guaranteed. So instead of asking whether this system guarantees Alice’s security, it would be worth asking how much traffic Alice can safely send if she wants to have a 99% chance that this network activity cannot be linked to her activities?

The first question we asked ourselves when we started to create Tor is who will manage all these things? We didn’t know if our system would really “stand up”, so the only option was to try and see what came out of it.

We had enough volunteers. A fair number of non-profit organizations just wanted to make donations and use them to purchase bandwidth and launch Tor nodes. Some universities and several private companies took part in the project, whose security services decided that it would be fun to run your own Tor server.
At the same time, there were legal issues, but again, I am not a lawyer and I cannot give a legal assessment of these things. However, five different people asked me about the legality of our system. As far as I can tell, at least in the US, there are no legal obstacles to starting the Tor server. And it seems to me that a similar situation occurs in most European countries. In countries with less internet freedom, using Tor is more strictly regulated.

The problem is not how legitimate or illegal the use of Tor is, but that someone can do something illegal or undesirable with my Tor server. For example, if my provider doesn’t disconnect me from the network, if I provide my computer as a Tor node, do law enforcement agencies believe that I’m just using Tor, or come and take my computer to verify this.

For this case, I would advise you not to start the Tor server from your dorm room, or rather, not to use your computer to broadcast a large amount of output traffic, assuming that network policy allows it. Honestly, I have no idea what this policy is now, because it has changed a lot since my student days. But in any case, large outgoing traffic from your computer to the hostel can lead to trouble. However, launching a repeater without issuing traffic to the Internet will be less problematic. But if your provider allows you to act in this way, then this is quite a reasonable thing.

Someone asked me what if users do not trust a particular site? This brings us to the next topic. Clients of the network use the software at their own discretion, and you cannot forbid them to use any particular programs and oblige them to use others. But remember that anonymity loves company. If I use three nodes, you use three other nodes, and you have three more nodes, our traffic will not mix at all.

As long as we share the parts of the network that we use, we can easily be distinguished from each other. Now, if I just exclude one or two nodes, and you just exclude one or two nodes, it will not be such a big network splitting into parts and will make our identification more difficult. But it would be best for everyone to use the same nodes as much as possible. How do we achieve this?

So, in the first version of Tor, we just dropped users a list of all nodes, there were about 6 of them, three of which worked on the same computer in the Tech Square computer science laboratory. But it was not a good idea, because the number of nodes increases and decreases, the nodes themselves change, and you would not want to release a new version of the software every time someone joins creating a network.

But you can make sure that each node contains a list of all the other nodes that are connected to it, and all of them "advertised" each other. Then, when a client connects to the network, he just needs to know one node to say, “Hey, who is online?”

In fact, many people have projects built on this principle. Many early peer-to-peer anonymity projects work this way. But this is a terrible idea. Because if you connect to the same node and ask who is online, and you trust the respondent, then I can answer you: “I'm online, and my friend is here on the network, and my friend from there is also online, and more there is no one in the network! ” That is, I can give you any number of fake nodes that I manage and which intercept all of your traffic. This is what is called a raw capture attack, or an attack to intercept the source node.

So, perhaps, if we have only one directory managed by a trusted party, this is not so good, so let's still assume that we have several trusted parties. Clients go to these trusted parties, receive a list of all nodes from each and combine them into one common list of nodes on the network.

This is not good because we are again divided into identifiable network clusters. If I select these three nodes, and you choose three other nodes, then we will use different sets of nodes, which is not good. In addition, if I use the list of nodes transferred to me, any of the trusted parties may prevent me from using the node that she does not like, simply by not listing it in the list. If I use the combined list, then someone can flood me with 20 thousand fake servers, specifying them in the list. I could vote for their exclusion and could somehow solve the last two problems, but I will still be separated from everyone who uses different trusted parties.

We could create a magical DHT, or distributed hash table, a kind of magical distributed structure passing through all nodes. I say "magic" because, although there are projects in this area, and some are better than others, none of them currently have solid evidence of security. So hard so that I can safely say that it is really safe.

So, here is the solution we came to as a result. Our network has several trusted bodies managed by trusted parties that collect lists of sites that vote on an hourly basis, which nodes can work on the network, and can vote to exclude suspicious nodes. All of them work on the same / 16, which gets up such strange things with traffic, and form a consensus, which is based on the calculation of the result of the vote.
And clients do not use the site if it is not signed by a sufficient number of “votes” of trusted parties.

This is not the final version of the project, but it is the best that we have been able to come up with so far. By the way, all you need to distribute among customers is a list of all authorized public keys and a list of some places to get directories. You want all the nodes to cache these directories, because if you don’t, the network load will become dangerous and the network bandwidth will drop dramatically.

I intend to skip the next question and go directly to how customers should choose which paths they should route through the network. I would like to talk about the problems of application and creating applications that would not give themselves away. I would like to talk about network abuse, about hidden services and how they work, talk about resistance to censorship, and I would also like to talk about attacks and defense. But we only have 35 minutes left, so I can’t talk about everything I want. I ask you to vote for topics that you consider most important for discussion.

If you think that one of the most important topics is the choice of paths and nodes, please raise your hand. If one of the most important topics is application problems and how to ensure that applications do not violate your anonymity, please raise your hand. If one of the most important problems is abuse and how to prevent it, please raise your hand. So, I see that this topic is popular, and I mark it.

If it matters to you how hidden services work and how they can be made to work better, please raise your hand. , , . , . , ? , . ?

, . , , . , , — .

, . , IP-, , . , Whole stack, -, , Tor.

«» -, , , , , , , , .
, : , , , . , -, . , . , . . , .

, . , , – , -, . , BitTorrent, Gnutella . , , , .

, , , 80 443. , 80. IRC- - IRC. -, , , , .

, , - 80 443, , , . , Tor. - , . , , .

, - IRC- IP-.

, My Little Pony, IRC-, , , , – . , IP-, , IP- . Tor .

IP- ? , IP ? Not. , IP, . , IP-, .
IP- , , , , , , Tor -.

. - «» ? , IP, , IP IP-. , , IP-.

, , , , . – « ». , , , , , IRC – , , , .

, . , . , , , - IP, .

- . , 2013 , « 2» , Silk Road. « 2» , Tor, , , .

, - , OPSEC – . , , . Tor , .

, , , , , « ». : «, , . , , . , , — ». , , .

54:00

MIT course "Computer Systems Security". 19: « », 3 ( Tor)

.

Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to friends, 30% discount for Habr users on a unique analogue of the entry-level servers that we invented for you: The whole truth about VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps from $ 20 or how to share the server? (Options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps until January free of charge if you pay for a period of six months, you can order here .

Dell R730xd 2 times cheaper? Only we have 2 x Intel Dodeca-Core Xeon E5-2650v4 128GB DDR4 6x480GB SSD 1Gbps 100 TV from $ 249 in the Netherlands and the USA! Read about How to build an infrastructure building. class c using servers Dell R730xd E5-2650 v4 worth 9000 euros for a penny?

Source: https://habr.com/ru/post/431264/

All Articles

MIT course "Computer Systems Security". Lecture 19: "Anonymous Networks", part 2 (lecture from the creator of the Tor network)

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

More articles: