📜 ⬆️ ⬇️

Simplicity in design. Episode 2. DHT and PEX

Peer-to-peer BitTorrent network is very popular. And the more offensive is that such a network is based on a website, a tracker, which is absolutely not pythonic and potentially dangerous. Accordingly, since BitTorrent is alive, various attempts have been made to decentralize and the rest - getting a list of peers.

Computer science students have a popular thinking pattern: “Decentralize? DHT! ” DHT, Distributed Hash Tables - a speculatively simple idea: the key ranges of the hash tables are scattered across peers, reciprocal links and cheers are lined up. Hooray - in the ass hole. Because when confronted with a real network, unlike a simulator or a cluster, an enormous number of problems begin. More than half of the peers, for example, are hidden behind NATs and fairwalls, so they respond to DHT requests from one peers, but not from others, and in an unpredictable way. Peers constantly come and go, some peers are buggy, there are malicious peers, someone is connected via dial-up. In order to provide for all this and tighten up the corresponding plugs, we had to work hard. And the resulting code still raises censure. The root problem is that DHT has to build its own separate P2P network according to its own rules. What is bad for complexity, efficiency, safety.

Another attempt in the same direction is PEX (Peer EXchange) * , a gossip protocol in which already connected peers simply exchange the addresses of those to whom they are already attached. The protocol had a difficult fate, because initially Bram Cohen (the author of BitTorrent) was confident that PEX would lead to a swarm disintegration. He did some kind of simulator quickly and saw a complete breakdown. Some time ago it seemed to me that I understood why his swarms also fell apart. I also made a simulator, but with reasonable disintegration parameters the swarm could not be achieved. Apparently, he had some kind of mistake.
')
And PEX works fine. It was originally implemented in unofficial clients, it seems Azureus and µTorrent (the second has not yet been purchased by BitTorrent Inc). Gradually, the implementation of µTorrent, called ut_pex, became generally accepted. The protocol is very effective: from my laptop, with my special BitTorrent spider, in a couple of minutes I copied all the peers in the 100 thousandth swarm. The logic of work is simple, exponential. Having received twenty peers from the tracker and successfully joining two, we immediately get two hundred more by ut_pex. Well, and so on. The protocol itself is extremely simple and consists of one (!) Message. Another popular pattern of thinking: everyone thinks there should be two messages: request and response. No, there is no request. Just the messages are so small that there is no point in saving. And with a lot of trouble requests. Therefore, if the peer sees that you understand ut_pex, it simply periodically sends you IP addresses. The implementation of ut_pex in libtorrent-rasterbar takes up 7 times less space than a fairly compact implementation of DHT in the same place **.

* And what is now written about PEX on Wikipedia is original research or simply crap.

** The attentive reader will probably notice the juggling - ut_pex does not provide complete decentralization of tracking, because he needs starting peers. I will share a secret. DHT does not provide this either. Firstly, users still go to the site, because otherwise they are difficult to assemble. Secondly, from what I know, DHT is in practice bootstrapped from root servers (only this is a big secret! :))

The computer system are those. - G. Bell

Source: https://habr.com/ru/post/66599/


All Articles