⬆️ ⬇️

Kademlia (DHT) - A Practical Guide

It will be about DHT on the example of its implementation known as Kademlia. DHT is translated as a distributed hash table and is intended for building a decentralized information exchange network. All of the above works in the client for ED2K networks for the Android platform and as a Linux daemon. Implementation details below.



Resource



Each node has its own identifier. In addition, any resource in DHT, whether it is a keyword or a file, also has an identifier. The identifier is the value of the hash function - in torrents it is SHA1, in ED2K it is MD4. The only difference in length is 160 bits versus 128. Thus, the hash of the desired resource is required to publish or find something on the network.



In Kademlii, the hash used in the ED2K network itself is taken to publish the file, but with some reservations. If the file hash itself is serialized simply as a sequence of bytes, the KAD identifier is serialized as a sequence of 4 32-bit words. This is a feature of Kademlii.



You have to rotate the bytes:

')

/** * save/load as 4 32 bits digits in little endian save as 4 32 bits digits network byte order * rotate bytes in each 4 byte portion * */ @Override public ByteBuffer get(ByteBuffer src) throws JED2KException { for (short i = 0; i < value.length; ++i) { byte b = src.get(); value[(i / 4)*4 + 3 - (i % 4)] = b; } return src; } @Override public ByteBuffer put(ByteBuffer dst) throws JED2KException { for (short i = 0; i < value.length; ++i) { dst.put(value[(i / 4) * 4 + 3 - (i % 4)]); } return dst; } 


Serialization in ED2K networks
Historically, it turned out that all primitives occupying more than one byte are serialized in the same order as on the target platform, namely LITTLE ENDIAN (the older part of the higher addresses). I think that this was due to the fact that the eDonkey / eMule client was originally developed only for the Windows platform and the corresponding architecture. The exception was MD4 hash serialized as a sequence of bytes. The IP address is stored in a 32-bit unsigned integer LITTLE ENDIAN and is also written to the package. Cademlia support was added later and most likely by other people. For some reason, these people decided that the network byte order should be applied - perhaps because it is (or is considered) standard for TCP / IP networks. Moreover, they applied the new byte order only to the IP address - when writing to the KAD packets, it is converted to BIG ENDIAN, everything else remains as it was. The new KadId primitive was added not as a hash, but as an array of 4 32-bit numbers — this is why when writing and reading the KAD identifier, you have to perform certain transformations.



To publish keywords, file names are broken down into words and each word is hashed. The resulting hashes are published. A hash can be represented as a large integer with a byte order. Big endian - high bytes at lower addresses (at the beginning) or just an array of bytes.



DHT's work is based on the ability to calculate the distance between the hashes, which allows you to consistently approach the goal of reducing the distance. HASH_SIZE is 128.



Just xor:



 // returns the distance between the two nodes // using the kademlia XOR-metric public static KadId distance(final KadId n1, final KadId n2) { assert n1 != null; assert n2 != null; KadId ret = new KadId(); for(int i = 0; i < MD4.HASH_SIZE; ++i) { ret.set(i, (byte)(n1.at(i) ^ n2.at(i))); } return ret; } 


A comparator of two hashes relative to some target hash. Typically, the target hash is its own node hash.



 // returns -1 if: distance(n1, ref) < distance(n2, ref) // returns 1 if: distance(n1, ref) > distance(n2, ref) // returns 0 if: distance(n1, ref) == distance(n2, ref) public static int compareRef(final KadId n1, final KadId n2, final KadId ref) { for (int i = 0; i != MD4.HASH_SIZE; ++i) { int lhs = (n1.at(i) ^ ref.at(i)) & 0xFF; int rhs = (n2.at(i) ^ ref.at(i)) & 0xFF; if (lhs < rhs) return -1; if (lhs > rhs) return 1; } return 0; } 


The most popular function of determining the distance in the K-bucket, so to speak.



 // returns n in: 2^n <= distance(n1, n2) < 2^(n+1) // useful for finding out which bucket a node belongs to public static int distanceExp(final KadId n1, final KadId n2) { int bt = MD4.HASH_SIZE - 1; for (int i = 0; i != MD4.HASH_SIZE; ++i, --bt) { assert bt >= 0; int t = (n1.at(i) ^ n2.at(i)) & 0xFF; if (t == 0) continue; assert t > 0; // we have found the first non-zero byte // return the bit-number of the first bit // that differs int bit = bt * 8; for (int b = 7; b >= 0; --b) if (t >= (1 << b)) return bit + b; return bit; } return 0; } 


Network connection



First of all, the client generates its identifier. This is a random MD4 hash (SHA1 for torrents). Part of the data in the generation of the hash can be mixed with the address, port and similar manipulations for more randomness.



One of the incomprehensible at first glance moments is how the hash of the node that he assigned to himself and the resources he provides are related. The answer is no way. The random selection of a hash means that the client in the network will be responsible for the resources close to his hash - other clients will come to him for publication and search. The client will also publish its resources on other sites indicating itself as a source.



Although DHT and a decentralized network to connect to it you need to know at least one node connected to the network. Knowing the address of such a node, the client sends a special bootstrap request and receives a list of nodes in response. Then you can send bootstrap to these nodes and so on. There are also sites from which you can download files with sets of nodes in ED2K format.



Routing table



The routing table is specifically designed to select the nodes closest to a certain hash. The table contains K-buckets. K-bucket is really nothing more than a container with structures describing a network node. Usually the table is illustrated as a tree such as here .



The routing table itself can be represented simply by a sheet of K-buckets, sorted in descending order of the distance to our identifier.



Initially, the table does not contain any K-bucket - they are added during the process of adding a node.



Let the table contain such a parameter as the number of already created K-bucket - N (numBuckets). The K-bucket number is calculated using the above-mentioned distanceExp function as 128 - 1 - distanceExp, the nodes closer to our hash are located closer to the tail of the list.



Each K-bucket position smaller than N-2 may contain nodes whose distance from our hash is n. The K-bucket number of which is equal to N-1 (extreme) contains not only nodes with a distance n, but all nodes located closer, in other words, all other nodes. The range of values ​​n [0..127]. It looks clearer in the K-bucket search function code (below).



Algorithm for adding a node



  1. We are looking for a K-bucket number for the new node. Easier to illustrate this with code.



     public int findBucket(final KadId id) { int numBuckets = buckets.size(); if (numBuckets == 0) { buckets.add(new RoutingTable.RoutingTableBucket()); ++numBuckets; } int bucketIndex = Math.min(KadId.TOTAL_BITS - 1 - KadId.distanceExp(self, id), numBuckets - 1); assert (bucketIndex < buckets.size()); assert (bucketIndex >= 0); return bucketIndex; } 


  2. If K-bucket has free space, add a node
  3. If K-bucket does not have free space, try to clear it using node metrics such as last update, number of unanswered requests, and so on. If a place appears, a node is added.
  4. If K-bucket does not have free space after all attempts, try to divide it. Split failed - node ignored


K-bucket split



K-bucket can split only if it is an extreme container and it is not the last possible K-bucket. Theoretically, all K-buckets can be 128 for MD4, but the last bunch cannot be used as hashes that match the node hash are not processed. The principle is simple - in the current container there remain nodes with a distance equal to n, all the others move to the new one. Thus, after splitting, the table will grow by one K-bucket.



The routing table is a K-bucket sheet. K-bucket is a list of structures describing a node from a DHT network. The implementation can be arbitrary, for example, you can put all this into a database table. The table is not compressed - if the nodes from a certain K-bucket disappear until the container is completely empty, it will remain in the table. Nodes can be deleted for example in case of inaccessibility for some time.



Table update



There is nothing to consider in detail - the update is an internal process designed to keep the routing table up to date. Periodically, a K-bucket is selected where an update is required, a random hash belonging to this K-bucket is generated, but not coinciding with any of the existing ones, and the search process is launched. The condition for starting the update - for example, the last update was more than 15 minutes ago.



Publish and search



Publication and search is one and the same process at the end of which is performed either the publication on the found nodes or the search request. Its essence lies in the fact that by successive iterations we approach the nodes whose identifier is close to the identifiers of the resources being sought. According to the network logic, it is these nodes that will contain information about keywords and hash files that are close to their hash.



  1. The first 50 (or how many) the closest nodes are extracted from the table and a survey list is formed.
  2. Kad2Req request is sent or simply requesting nodes that are even closer than the node being polled. A node looks in its routing table and forms a response from nodes that are closer than it to the requested hash.
  3. The answer is put back on the survey list and it all starts over.
  4. Nodes located close enough to the target are polled for the presence of the desired resource or publication is being performed. There is such a thing as the tolerance zone - how much a hash can differ in order for a direct search or publication to start
  5. Paragraph 2 is repeated until the limiting conditions are reached - enough nodes found, all polled and so on


Point 4 is a separate process that can run parallel to the main one. But in the presented implementation, it is launched separately at the end of the main process.



The structure of the publication of keywords and sources
Without further ado, I’ll give a table structure for storing keywords and sources. This structure is used in the aggregator on a separate host.



 create table kad.sources ( kad_id character(32) not null , host inet not null , port_tcp int not null default 0 , port_udp int not null default 0 , packet bytea not null , last_update timestamp not null default current_timestamp , total_updates int not null default 0 , source_type int not null , constraint sources_pk primary key (kad_id, host, port_tcp, port_udp) ); create table kad.keywords ( kad_id character(32) , file_id character(32) , host inet not null , packet bytea not null , last_update timestamp not null default current_timestamp , total_updates int not null default 0 , constraint keywords_pk primary key(kad_id, file_id, host) ); 


It can be seen that the source is published for one file hash one to one. The keyword is also published with related file hashes in the name of which it is referred to as one-to-many. The tables are denormalized for ease of use - one could have a certain key table as a master over sources / keywords.



Conclusion



A few words about architecture. DHT support is implemented in a separate tracker which is an asynchronous UDP server running in a dedicated thread. It can be run separately from the client application, which is convenient for testing. Actually now this tracker works as a demon on a separate machine. Requests to the network are organized in RPC calls through an RPC manager — this solves the problem of limiting the response time, allows marking unresponsive nodes, and so on.



Logically outgoing requests are combined in the manager (algorithm). For each request, a control structure (observer) is created. Well, it all starts as mentioned through the RPC manager. More details can be found in the code on the links.



Kademliya’s peculiarity is that it is not always possible to determine the exact answer to which request sent by the host if several processes are running simultaneously and several requests are sent simultaneously to the same node. For example, the process of finding nodes may well intersect, and here you have to resort to some tricks.



In torrents, more intelligent transaction support - when sending a request, the client sends a special transaction_id block that should be returned to it. This not only allows you to accurately identify the transaction, but also provides little network protection.



I did not consider the publication and the search for notes (notes) because I did not implement support for this possibility by Kademlia.



I hope I managed to explain the material without confusing anything.



Links



Source: https://habr.com/ru/post/325528/



All Articles