
Tor is an anonymity tool used by people seeking privacy and struggling with Internet censorship. Over time, Tor has become a very, very good deal with its task. Therefore, the security, stability and speed of this network is critical for people who rely on it.
But how does Tor work “under the hood”? In this article, we will dive into the structure and protocols used in the network to get a close look at the work of Tor.
')
Short history of tor
The concept of bulbous routing (later we will explain this name) was first proposed in 1995. Initially, these studies were funded by the Ministry of Naval Studies, and then in 1997 DARPA joined the project. Since then, the Tor Project was funded by various sponsors, and not so long ago the project won the reddit campaign to collect donations.
The code for the modern version of Tor was opened in October 2003, and it was already the 3rd generation of onion-based software. The idea is that we wrap traffic in encrypted layers (like a bulb) in order to protect the data and the anonymity of the sender and receiver.
Tor Basics
With the story figured out - let's get down to the principles of work. At the highest level, Tor works by transferring the connection of your computer to the target computers (for example, google.com) through several intermediary computers, or relays (relay).

Package Path: Security Node, Intermediate Node, Output Node, Destination
Now (February 2015), about 6,000 routers are engaged in the transmission of the Tor network. They are located around the world and work thanks to volunteers who agree to give some traffic for a good cause. It is important that most nodes have no special hardware or additional software — they all work with Tor software that is configured to work as a node.
The speed and anonymity of the Tor network depends on the number of nodes - the more the better! And this is understandable, since the traffic of one node is limited. The more nodes you have, the more difficult it is to track the user.
Types of nodes
By default, Tor passes traffic through 3 nodes. Each of them has its own role (we will analyze them in detail later).
Client, security node, intermediate node, output node, destinationThe gateway or gateway node is the network entry point. Input nodes are selected from those that work for a long time, and have shown themselves to be stable and high-speed.
Intermediate node - transmits traffic from the security to the weekend. As a result, the former do not know anything about the latter.
The exit node is the exit point from the network, which sends traffic to the destination that the client needs.
Usually, a secure method of launching a watchdog or intermediate node is a virtual server (DigitalOcean, EC2) —in this case, the server operators will see only encrypted traffic.
But the output node operators have a special responsibility. Since they send traffic to the destination, all illegal actions performed through Tor will be associated with the output node. And this can lead to police raids, notifications of illegal activities and other things.
Meet the operator of the output node - tell him thanks. He deserves it.
And here onions?
Having understood the route of connections going through the nodes, let us ask ourselves: how can we trust them? Is it possible to be sure that they will not break the connection and will not extract all the data from it? In short, we don’t need to trust them!
The Tor network is designed so that nodes can be treated with minimal trust. This is achieved through encryption.
So what about the bulbs? Let's look at the work of encryption in the process of establishing a client connection through the Tor network.
The client encrypts the data so that only the output node can decrypt it.
This data is then encrypted again so that only the intermediate node can decrypt it.
And then this data is again encrypted so that only the sentry node can decrypt it.

It turns out that we wrapped the original data in layers of encryption - like a bow. As a result, each node has only the information it needs - where the encrypted data came from, and where to send it. Such encryption is useful for everyone - client traffic is not open, and the nodes are not responsible for the content of the transmitted data.
Note: output nodes can see the source data, since they need to send them to their destination. Therefore, they can extract valuable information from traffic transmitted in clear text over HTTP and FTP!
Nodes and bridges: a problem with nodes
After starting the Tor client, he needs to get lists of all input, intermediate and output nodes. And this list is not a secret - later I will tell you how it is distributed (you can look in the documentation for the word “concensus” yourself). The publicity of the list is necessary, but there is a problem in it.
To understand it, let's pretend to be attacking and ask ourselves: what would the Authoritarian Government (AP) do? Thinking this way, we can understand why Tor is built like this.
So what would the AP do? Censorship is a serious matter, and Tor allows you to bypass it, so the AP would want to block users from accessing Tor. There are two ways to do this:
- block users logging out of Tor;
- block users in Tor.
The first is possible, and this is a free choice of the owner of the router or website. He just needs to download a list of Tor output nodes, and block all traffic from them. It will be bad, but Tor cannot do anything about it.
The second option is seriously worse. Blocking users logging out of Tor can prevent them from visiting a particular service, and blocking all incoming users will prevent them from going to any sites - Tor will become useless for those users who are already suffering from censorship, as a result of which they turned to this service. And if there were only nodes in Tor, this would be possible, since the AP can download the list of sentinel nodes and block traffic to them.
It’s good that Tor developers thought about it and came up with a clever solution to the problem. Get to know the bridges.
Bridges
In essence, bridges are nodes that are not publicly shared. Users outside the wall of censorship can use them to access the Tor network. But if they are not published, how do users know where to find them? Do you need any special list? Let's talk about it later, but in short, yes - there is a list of bridges, which are engaged in the project developers.
It's just not public. Instead, users can get a small list of bridges to connect to the rest of the network. This list, BridgeDB, gives users only a few bridges at a time. This is reasonable, since they don’t need many bridges at once.
By issuing several bridges, it is possible to prevent the network from being blocked by the Authoritarian Government. Of course, getting information about new nodes, you can block them, but can someone discover all the bridges?
Can anyone discover all the bridges
The list of bridges is strictly secret. If the AP gets this list, it will be able to completely block Tor. Therefore, network developers conducted research on the possibility of obtaining a list of all bridges.
I will describe in detail two items from this list, the 2nd and 6th, since it was with these methods that I managed to gain access to the bridges. At point 6, researchers searched for Tor bridges and scanned the entire IPv4 space with a ZMap port scanner, and found between 79% and 86% of all bridges.
The 2nd point implies the launch of a Tor intermediate node that can track incoming requests to it. Only guard nodes and bridges refer to an intermediate node — and if the addressing node is not in the public list of nodes, then it is obvious that this node is a bridge. This is a serious challenge to Tor, or any other network. Since users cannot be trusted, it is necessary to make the network anonymous and closed as far as possible, which is why the network is done this way.
Consensus
Consider how the network functions at a lower level. How it is organized and how to find out which nodes in the network are active. We have already mentioned that there is a list of nodes and a list of bridges in the network. Let's talk about who makes these lists.
Each Tor client contains fixed information about 10 powerful nodes supported by trusted volunteers. They have a special task - to monitor the status of the entire network. These are called directory authorities (DA, list management).
They are distributed around the world and are responsible for distributing a constantly updated list of all known Tor nodes. They choose which nodes to work with, and when.
Why 10? Usually it is not necessary to make a committee of an even number of members, so that when a vote does not happen a draw. The bottom line is that 9 DAs deal with lists of nodes, and one DA (Tonga) - with a list of bridges
DA ListConsensus
So how does a DA support network performance?
The status of all nodes is contained in an updated document called "consensus". DA support it and update it hourly by voting. Here’s how it happens:
- each DA creates a list of known nodes;
- then it counts all other data — host flags, traffic weights, etc .;
- sends data as a “status vote” to everyone else;
- gets the votes of everyone else;
- combines and signs all parameters of all votes;
- sends the signed data to the rest;
- Most DAs must agree on data and confirm that there is consensus;
- consensus is published by every DA.
Publication of consensus occurs via HTTP, so that everyone can download the latest version. You can check for yourself by downloading the consensus through Tor or through the gate tor26.
And what does it mean?
Anatomy of consensus
Just reading the specification in this document is difficult to understand. I like the visual display to understand how the structure works. For this, I made a poster in the style of corkami. And here (clickable) graphic representation of this document.

What happens if the node is empty?
A detailed review of the principles of the network, we have not yet dealt with the principles of the work of the output nodes. These are the last links in the Tor chain, providing the path from the client to the server. Because they send data to their destination, they can see it as if they had just left the device.
Such transparency implies a lot of trust in the output nodes, and they usually behave responsibly. But not always. And what happens when an output node operator decides to take up arms against Tor users?
Sniffers case
Tor output nodes are a nearly reference example of a “man in the middle” (man-in-the-middle, MitM). This means that any unencrypted communication protocols (FTP, HTTP, SMTP) can be monitored by them. And these are logins and passwords, cookies, uploaded and downloaded files.
Output nodes can see traffic as if it had just left the device.
The ambush is that we can't do anything about it (except using encrypted protocols). Sniffing, passive listening to the network does not require active participation, so the only defense is to understand the problem and avoid transferring important data without encryption.
But suppose the exit node operator decides to harm the network in a big way. Listening is an occupation of fools. Let's modify traffic!
Squeeze the maximum
Recall that the output node operator is responsible for ensuring that traffic passing from and to the client will not be changed. Yeah of course…
Let's see how you can change it.
SSL MiTM & sslstrip
SSL spoils the whole raspberry when we try to podgadit users. Fortunately for attackers, many sites have problems with its implementation, allowing us to force the user to go through unencrypted connections. Examples are redirection from HTTP to HTTPS, the inclusion of HTTP content on HTTPS sites, and so on.
A convenient tool for exploiting vulnerabilities is sslstrip. We only need to pass through all the outgoing traffic, and in many cases we will be able to harm the user. Of course, we can simply use a self-signed certificate, and look at the SSL traffic passing through the site. Easy!
Putting browsers on BeEF
Having seen the details of the traffic, you can proceed to sabotage. For example, you can use the BeEF framework to gain control over browsers. Then you can use the function from Metasploit "browser autopwn", as a result of which the host will be compromised, and we will be able to execute commands on it. Come! ..
Backdoor Binary
Suppose, through our site, the binaries are downloaded - software or updates to it. Sometimes the user may not even suspect that updates are being downloaded. We just need to add a back door to them through tools like The Backdoor Factory. Then after the execution of the program the host will be compromised. Come again! ..
How to catch Walter White
And although most of Tor's output nodes behave decently, cases of destructive behavior of some of them are not so rare. All attacks, which we talked about in theory, have already taken place.
By the way, the developers thought about this and developed a precautionary measure against customers using bad output nodes. It works like a flag in a consensus called BadExit.
To solve the problem of catching bad output nodes, a clever exitmap system has been developed. It works like this: for each output node, a module is launched in Python that deals with logins, file downloads, and so on. The results of his work are then recorded.
The exitmap works using the Stem library (designed to work with Tor from Python) to help build the schemas for each output node. Simple but effective.
Exitmap was created in 2013 as part of the “damaged bulbs” program. The authors found 65 output nodes that change traffic. It turns out that although this is not a catastrophe (at the time of operation, there were about 1,000 output nodes), but the problem is serious enough to track violations. Therefore, exitmap is still working and supported.
In another example, the researcher simply made a fake login page, and logged in through each output node. Then, HTTP server logs were viewed for example login attempts. Many sites attempted to enter the site with the login and password used by the author.
This problem is not peculiar to Tor.
It is important to note that this problem is not just Tor. Between you and the photo of the cat on which you want to look, and so there are quite a lot of nodes. Only one person with hostile intentions is enough to cause a lot of harm. The best thing to do here is to force encryption where possible. If the traffic cannot be recognized, it cannot be easily changed.
And remember that this is only an example of bad behavior of operators, and not the norm. The overwhelming majority of the exit nodes take their role very seriously and deserve great thanks for all the risks they assume in the name of free dissemination of information.