📜 ⬆️ ⬇️

Hyperlinks in the format “ed2k: // ...” and their use. Part 1: Description of the format of file ed2k hyperlinks

More than ten years ago (September 6, 2000, as can be seen in the Internet Archive), the first release of the eDonkey2000 program appeared, presenting the idea and implementation of hyperlinks in the format “ed2k: // ...” to the world - an amazing combination of three ideas for its time: first, file hashes , second, URIs (a single form of resource identification), and third, file sharing .

Although after 6 years (in September 2006), RIAA lawyers through court managed to force the eDonkey2000 manufacturer to stop distributing the program (and even replace its site with an ominous warning about the illegality of file sharing), the format of hyperlinks “ed2k: // ...” was inherited and widely used today. day in all file-sharing programs and on all sites dealing with ed2k or Kad file-sharing networks. Moreover, since then, this format has had a bit of underdevelopment compared to the 2006 version . Such is the epic power of ideas laid down in it.

The popularity of programs that implement the format of hyperlinks “ed2k: // ...” also turned out to be considerable. At the height of its fame, eDonkey2000 , the name of which really comes from the English word donkey (donkey), in the minds of Russian users fought equally well for the slang name donkey with the mega-popular browser IE , which owes this name only to an accidental similarity the transcriptions of “ie” and the name of the donkey Eeyore from the stories about Winnie the Pooh - and also, perhaps, to his donkey stubbornness in the incorrect interpretation of some web standards. (Apparently to the anime and understanding of “ie” as the Japanese denial of “no” in those years, as you can see, was not common.) And the main ideological “heirs” of eDonkey2000, the free open source eMule program, is still at the top the list of the most popular (by the number of downloads) products on the SourceForge site.
')
Hashing files. URI. File sharing. How are all three of these ideas intertwined in the format of hyperlinks "ed2k: // ..."?

Three sources, three components ...


Hashing a file is a mathematical process that can associate with each file some long (multi-bit), but still a small number in size, called a hash. Moreover, even minor changes in the file lead to significant changes in this number (hash), therefore, as a rule, different hashes correspond to different files. Of course, the number of hashes is of course, although very large (for example, ≈2,160 for 160-bit hashes), so the appearance of a collision (that is, such a pair of files that have the same hash) is possible. However, it is extremely unlikely. Therefore, if the hashing algorithm has a mathematically proven cryptographic strength (that is, if it would be computationally difficult to pick up a file corresponding to some previously known hash — or even pick two such different files, the hashes are the same), then the hash of the file can be used as a unique identifier the contents of the file (and at the same time as a means of checking the integrity of the contents of the file).

The idea of ​​a URI (a single form of resource identification ) originally came to Tim Berners-Lee in 1994 as a URL (a single form of addressing resources), that is, this way of recording a file address (or not a file, but another resource), on which any browser could understand where the resource lies. Later (in June 1994, creating RFC 1630 ), Berners-Lee summarized the idea of ​​a single form of addressing, formulating the idea of ​​a single form of identification — for example, the identifier urn: isbn: 0-395-36341-1 uses the international standard for book numbering (International Standard). Book Number, ISBN) in order to clearly and unambiguously indicate what book is meant, although it does not say anything about where this book can be taken.

Jed McCaleb, the creator of eDonkey2000, realized that the hash (the unique identifier of the file content) is just the perfect basis for recording the URI of this file. As for the question of where to get the file, the answer was p2p-file sharing - an automatic process of searching and subsequent direct data transfer between users of the global network, first implemented in June 1999 by Shawn Fanning with the advent of Napster. The use of hashes allowed eDonkey2000 to surpass Napster by two characteristics: first, the search for a file on the network took place according to the hash, so renaming the file did not prevent it from being found (in Napster, only the name and size were transferred from the client to the search server, but not about the contents of the file), and secondly, the recipient of the file could collect fragments of the file from several other network members (and not from one, as in Napster) and still be sure of the integrity of the file, since the integrity was checked by hash.

File link


Using the example of the Adobe Reader X distribution, I will show you what a typical ed2k hyperlink looks like that points to a file:
ed2k://|file|AdbeRdr1001_en_US.exe|48536984|249634B84340FEB5778EC09A2A9C2B87|/ 

In this example, you can see that its format is as follows:
 ed2k://|file||||=|/ 

Like the MediaWiki pattern (for example, on Wikipedia), the ed2k hyperlink consists of a series of values ​​separated by a vertical bar. The first is always the URI scheme ("ed2k: //"), and the last is the slash ("/"). The second is the keyword meaning the type of link. For ed2k file hyperlinks, this is always “file”. It is followed in strict order by the file name, file size (in bytes) and ed2k file hash , followed by optional parameters in the name = value format in arbitrary order .

The file name, as always in a URI, may contain special characters that are subject to mandatory hexadecimal encoding bytes. The space, for example, is written in the form "% 20", and the Russian letter "k" in the form "% D0% BA" (in UTF-8 it corresponds to two bytes), and so on.

The ed2k hash is computed by the MD4 algorithm in such a way as to make it possible for the above-mentioned receipt of separate file pieces from several file sharing participants.
Wikipedia states that for this purpose large files are divided into equal chunks, each measuring 9,500 kilobytes (9,728,000 bytes), and the last bit smaller, after which a 128-bit MD4 hash is calculated for each slice. (If the file size is a multiple of 9500 kilobytes, then the last chunk is considered empty, but the MD4 hash is still calculated from it.) After that, the resulting MD4 hashes are merged together, and their own MD4 hash becomes the ed2k hash of the file. If the entire file is less than 9500 kilobytes in size, then its MD4 hash becomes the ed2k hash of the file.
The file sharing protocol is designed in such a way that clients exchange lists of MD4 hashes of all slices. Accordingly, the general ed2k-hash allows you to check the adequacy of this list. And having received at least one of these 9,500-kilobyte file slices from another file-sharing participant, the client can already verify the integrity of this slice and immediately join the file exchange, handing out the existing chunk to others.

Optional parameters


Optional (named) parameters can be this:

Creating an ed2k hyperlink file


To create an ed2k hyperlink containing the file size (in bytes) and one (ed2k) or two (ed2k and AICH) hashes, it is not necessary to be a file sharing participant, install eMule or another similar client for ed2k networks and / or Kad. A simple LinkCreator program, distributed through the SourceForge website by the creators of eMule ( 125 Kb ZIP ), can easily cope with this task in Windows (or under Wine).

(To be continued…)

Source: https://habr.com/ru/post/118171/


All Articles