📜 ⬆️ ⬇️

Shaping attacks in networks of low latency or why Tor does not save from special services



Timing attacks are a well-known weak point of the Tor network and have been repeatedly discussed, including on Habré, where you can find about 10 articles that somehow affect this topic. Why do we need another one? There is a fairly common misconception that such attacks always require statistical analysis and are rather difficult to implement. Previously published articles belong specifically to this class of attacks. We will consider a completely realistic scenario in which a single query is enough to de-anonymize the network user.

Since the question of the possibility of de-anonymization of Tor users is once again actively discussed in runet, I publish a “printed” version of my presentation from PHDays 2014. The attack below is not specific to Tor and can be used against any low latency means of hiding the source of the traffic - VPN, chain proxies and even their combinations.

We will look at a scenario for which two conditions must be met in Tor:
  1. Have the ability to interfere with traffic between the exit node and the destination node. This can be done by having access to the destination server, exit node, or any point where traffic passes between them. That is, in fact, this condition can be fulfilled by anyone, having organized their own exit node in sufficient quantities.
  2. Have passive access to the traffic between the network client and the entry node.

And here are the special services? SORM equipment installed with any Russian provider is usually completely passive, and can detect, log and interpret (for example, restore correspondence in instant messengers) traffic with specified characteristics. It is almost impossible to track the presence of this equipment “from the side”, since it does not generate any of its own traffic and does not affect the listened in any way.
')
The fact that an attacker has access to SORM equipment fully satisfies the second of the attack conditions. So the special services that have access to equipment and are able to organize a certain number of exit node in Tor have everything they need, including the desire to deal with this issue.

Principle of attack

From the exit node, the traffic introduces predefined changes that do not change the transmitted data, but affect the traffic “shape” —the size of passing packets and the delays between them change. Temporal characteristics in low latency networks do not change significantly, this fact is usually used in timing attacks. Packet sizes vary quite predictably, as will be demonstrated. This means that by repartitioning traffic into packets of a specific size sequence with a specific sequence of delays, we can mark traffic on the exit node side so that on the entry node side this marking can be detected, thus a connection or request to the server can be associated with the Tor user.

Moreover, in this traffic, you can send information to the listening party, for example, the unique identifier of the client request. That is, if there are 2 distinguishable variants of packet size and 2 distinct variants of time delays, you can transfer to the party listening to the encrypted traffic between the user and the entry node, 2 bits of information from each data packet, starting from the second. In fact, there are more distinct states that can be entered, but there are additional restrictions that we will discuss below.

How difficult is it implemented and how does it look in practice?

I implemented the attack by redirecting the exit node traffic to a proxy server, in which minor edits were made to organize the shaping of traffic according to a predefined pattern.

What does “normal” traffic look like in Tor? This is a fragment of traffic from the entry node to the client associated with the transfer of the results of the HTTP request without adding any markings to the traffic:



There is TLS-traffic, inside of which there are blobs (packets) of information, mainly in the size of 3648 octets. The blob size is determined by the number of tor traffic cells that have fallen into it, which are of a fixed size. At the TCP level, the blobs are broken into IP packets of a predominantly 1414 octet size, which is associated with Path MTU. A TCP packet can contain both a fragment of one blob, and the end of one blob and the beginning of the next. However, there are also blobs of 560 octets in size and smaller TCP packets. The splitting of the source data into blobs and the splitting of blobs into TCP packets depends on various parameters — server timings, buffer sizes with which it transmits data, network delays. When you re-request the picture may be slightly different. However, statistically the same request to the same server will have a fairly clear picture. Since when loading the same web page, a sufficiently large number of requests occur with the transfer, as a rule, of the same requests and responses, that is, typical amounts of information with typical delays, you can try to find out which resource the user is accessing by matching the data especially if he visits him regularly. Based on this classic timing attacks.

But we go the other way. Instead of passive measurement of timing, we add shaping marking (maRk) to the initial traffic from the exit node. Clear unencrypted traffic sent by Tor now looks like this:



How is the labeling going? We transfer small packets of two different sizes. The difference in size of several hundred bytes in this case does not affect anything, but allows you to visually distinguish between two different types of packets in a series. Delays between the two types of packages 60 and 110 milliseconds, they are chosen to show the most interesting picture on the output.

When the same traffic passes through Tor, between the entry node and the user, it looks like this:



What we now see. All blobs in TLS are now 560 octets in size (which means we can control the size, 3648 or 560 octets by sending large or small portions of traffic). By the way, in this place there is a problem of traffic amplification - the minimum data packet increases by more than 10 times. Each packet we send comes in a separate blob to TLS. At the same time, the IP packet with a size of 1414 octets contains two full blobs and the beginning of the third, a packet of 389 octets - the end of the third blob and a packet of 619 octets - a separate fourth blob. That is, 4 IP packets of source traffic come in 3 IP packets in Tor traffic. Is it good or bad? Bad, since we lost some of the information about timings.

What happened and why such a strange sequence: this is due to the peculiarities of the TCP stack, namely the joint operation of the Nagle algorithm and TCP delayed acknowledgments. However, between groups of packages and between packages in a group, the interval remains unchanged, that is, we get at least half of the timing information. To “overcome” the waiting and grouping in the Nagle algorithm, you can send a traffic amount such that the transmitted blobs exceed the MTU size (but not enough to form a “large” blob of 3648 bytes).

If we compare the third picture with the first one, it is clear that a sufficiently clear and easily detected anomaly is introduced into the traffic, which allows the tracking equipment to "trigger" when it is detected. Moreover, in the case of SORM, the detectability of this anomaly is sufficient for an evidence base.

This attack can be applied to almost any encrypted or unencrypted connection, both in the direction from the server to the client, and in reverse.

What are the limitations?

The “nonlinear” behavior of the traffic due to the Nagle algorithm can lead to the loss of some information about the timings, however, this loss can be detected on the receiving side and compensated by the redundancy of the transmitted information. In order for traffic to be shaped, it must be sufficient. It would seem difficult to mark in this way, for example, an ssh connection with running bash and periodically issued commands, since it does not transfer enough data to form packets with the necessary maRk signatures.

In fact, you can even mark a connection in which data was not transmitted at all. The fact is that even after the client application has initiated the closure of the TCP connection, the data sent via Tor towards the client will continue to be delivered. This makes it possible, after receiving FIN + ACK in a connection on the client’s side, to send a marked portion of random data towards it. This data will never be read by the client application, but they will still reach the client and thus reveal it. That is, the attack can be carried out completely covertly from the client and the amount of information that can mark the connection is more than enough. A similar method can be applied in most VPNs, but, fortunately, it will not work with proxies and other application gateways.

Is there a reliable solution?

The attack may be more difficult to carry out with the active work of the client, since it is necessary to detect packets belonging to the same chain, outside traffic is noisy signature. You can also take on the role of a relay on the Tor network, making it difficult to detect tagged traffic. There are several ways to further complicate the attack: the combination of Tor, VPN and proxy chain makes it difficult to guess the final form of traffic and partially eliminates the attack through a semi-closed connection. You can complicate the detection of corrupting known signatures with non-standard parameters of the TCP stack. But there is no reliable way to completely eliminate such threats within Tor or common VPN networks. Shaping attacks, as a kind of timing attack, are outside their defense profile.

The only reliable solution is to use a virtual link inside an encrypted connection, in which there is a constant stream of fixed-length cells with a fixed bandwidth. This is the way, for example, ATM networks (Asynchronous Transfer Mode). Moreover, encryption should be carried out without data compression, that is, the consumed band of traffic should be constant. Such technologies cannot yet be widely used for everyday work, as the additional costs for the channel, which is actively used even at the time of idle time, are too high.

Source: https://habr.com/ru/post/232563/


All Articles