
More than ten years ago (September 6, 2000, as can be
seen in the Internet Archive), the first release of the eDonkey2000 program appeared, presenting the idea and implementation of hyperlinks in the format
“ed2k: // ...” to the world
- an amazing combination of three ideas for its time:
first, file hashes ,
second, URIs (a single form of resource identification),
and third, file sharing .
Although after 6 years (in September 2006), RIAA lawyers through court managed to force the eDonkey2000 manufacturer to stop distributing the program (and even replace its site with an ominous warning about the illegality of file sharing), the format of hyperlinks
“ed2k: // ...” was inherited and widely used
today. day in all file-sharing programs and on all sites dealing with ed2k or Kad file-sharing networks. Moreover, since then, this format has had a bit of underdevelopment compared
to the 2006 version . Such is the epic power of ideas laid down in it.

The popularity of programs that implement the format of hyperlinks
“ed2k: // ...” also turned out to be considerable. At the height of its fame,
eDonkey2000 , the name of which
really comes from the English word donkey (donkey), in the minds of Russian users fought equally well for the slang name donkey
with the mega-popular browser IE , which owes this name only to an
accidental similarity
the transcriptions of “ie” and the name of the
donkey Eeyore from the stories
about Winnie the Pooh - and also, perhaps, to his donkey stubbornness in the incorrect interpretation of some
web standards. (Apparently to the anime and understanding of “ie” as the Japanese denial of “no” in those years, as you can see, was not common.) And the main ideological “heirs” of eDonkey2000, the free open source
eMule program, is still at the top
the list of the most popular (by the number of downloads) products on the SourceForge site.
')
Hashing files. URI. File sharing. How are all three of these ideas intertwined in the format of
hyperlinks "ed2k: // ..."?Three sources, three components ...
Hashing a file is a mathematical process that can associate with each file some long (multi-bit), but still a small number in size, called a hash. Moreover, even minor changes in the file lead to significant changes in this number (hash), therefore, as a rule, different hashes correspond to different files. Of course, the number of hashes is of course, although very large
(for example, ≈2,160 for 160-bit hashes), so the appearance of a collision (that is, such a pair of files that have the same hash) is possible. However, it is extremely unlikely. Therefore, if the hashing algorithm has a mathematically proven cryptographic strength (that is, if it would be computationally difficult to pick up a file corresponding to some previously known hash — or even pick two such different files, the hashes are the same), then the hash of the file can be used as a
unique identifier the contents of the file (and at the same time as a means of checking the integrity of the contents of the file).
The idea of a
URI (a single form of resource
identification ) originally came to
Tim Berners-Lee in 1994 as a URL (a single form of
addressing resources), that is, this way of recording a file address (or not a file, but another resource), on which any browser could understand where the resource lies. Later (in June 1994, creating
RFC 1630 ),
Berners-Lee summarized the idea of a single form of addressing, formulating the idea of a single form of identification — for example, the identifier
urn: isbn: 0-395-36341-1 uses the international standard for book numbering (International Standard). Book Number, ISBN) in order to clearly and unambiguously indicate
what book is meant, although
it does not say
anything about where this book can be taken.
Jed
McCaleb, the creator of eDonkey2000, realized that the
hash (the unique identifier of the file content) is just the perfect basis for recording the
URI of this file. As for the question of
where to get the file, the answer was
p2p-file sharing - an automatic process of searching and subsequent direct data transfer between users of the global network, first implemented in June 1999 by
Shawn Fanning with the advent of Napster. The use of hashes allowed eDonkey2000 to surpass Napster by two characteristics:
first, the search for a file on the network took place according to the hash, so renaming the file did not prevent it from being found (in Napster, only the name and size were transferred from the client to the search server, but not about the contents of the file),
and secondly, the recipient of the file could collect fragments of the file from several other network members (and not from one, as in Napster) and still be sure of the integrity of the file, since the integrity was checked by hash.
File link
Using the example of the
Adobe Reader X distribution, I will show you what a typical
ed2k hyperlink looks like
that points to a file:
ed2k://|file|AdbeRdr1001_en_US.exe|48536984|249634B84340FEB5778EC09A2A9C2B87|/
In this example, you can see that its format is as follows:
ed2k://|file||||=|/
Like the MediaWiki pattern (for example, on Wikipedia), the
ed2k hyperlink consists of a series of values separated by a
vertical bar. The first is always the URI scheme
("ed2k: //"), and the last is the
slash ("/"). The second is the keyword meaning the type of link. For
ed2k file
hyperlinks, this is always “file”. It is followed in strict order by the file name, file size (in bytes)
and ed2k file
hash , followed by optional parameters in the
name = value format in arbitrary order
.The file name, as always in a URI, may contain special characters that are subject to mandatory hexadecimal
encoding bytes. The space, for example, is written
in the form "% 20", and the Russian
letter "k" in the form "% D0% BA" (in UTF-8 it corresponds to two bytes), and so on.
The ed2k hash is computed by the MD4 algorithm in such a way as to make it possible for the above-mentioned receipt of separate file pieces from several file sharing participants.
Wikipedia states that for this purpose large files are divided into equal chunks, each measuring 9,500 kilobytes (9,728,000 bytes), and the last bit smaller, after which a 128-bit MD4 hash is calculated for each slice. (If the file size is a multiple of 9500 kilobytes, then the last chunk is considered empty, but the MD4 hash is still calculated from it.) After that, the resulting MD4 hashes are merged together, and their own MD4 hash becomes the ed2k hash of the file. If the entire file is less than 9500 kilobytes in size, then its MD4 hash becomes the ed2k hash of the file.
The file sharing protocol is designed in such a way that clients exchange lists of
MD4 hashes of all slices. Accordingly, the general
ed2k-hash allows you to check the adequacy of this list. And having received at least one of these
9,500-kilobyte file slices from another file-sharing participant, the client can already verify the integrity of this slice and immediately join the file exchange, handing out the existing chunk to others.
Optional parameters
Optional (named) parameters can be this:
- The "s" parameter indicates the source of the file, alternative to file sharing. Currently, this is usually an HTTP server, although the 2006 standard provided for the possibility of receiving a file from the Overnet network. Example:
ed2k://|file|AdbeRdr1001_en_US.exe|48536984|249634B84340FEB5778EC09A2A9C2B87|s=http://ardownload.adobe.com/pub/adobe/reader/win/10.x/10.0.1/en_US/AdbeRdr1001_en_US.exe|/
There are several such parameters in one ed2k hyperlink .
- The "p" parameter contains the complete set of MD4 hashes of file slices, providing the possibility of a more reliable and complete verification of their integrity than the common ed2k hash. If the file was downloaded from a URL from the “s” parameter, and communication with other ed2k clients (to get hash chunks) was not made, this set will allow identifying one or more spoiled chunks, rather than simply making sure that the file and the hash do not match. Example:
ed2k://|file|AdbeRdr1001_en_US.exe|48536984|249634B84340FEB5778EC09A2A9C2B87| p=F9FB4A4E8EC04320AC49D0F796807795: 9159AD7B29693322F8455258F6D02B3C: A51E847EB4E2D67BD04F1AF95D0479EB: A489A6E25ADF20366E8C4BCD69DD0DA9: 3315A3CDAE777B7AE8E734161DAEFFE3|/
This example contains spaces for easy transfer to a new line. This URI is written without spaces.
- The “h” parameter allows you to specify the AICH hash used by some modern ed2k clients (for example, eMule) to further control the integrity of the file. (It is calculated using the SHA-1 algorithm on small pieces of a file, 180 kilobytes each, after which the hashes from the hashes are found within a sort of tree structure; the “h” parameter indicates the hash that occurs at the top of the tree.) The presence of this parameter also causes collisions less likely, and also simplifies the correction of errors in the event of the integrity of the file during transmission, allowing you to quickly find out which piece of the chunk is damaged (and restore one piece instead of re-pumping the whole piece of file sharing). Example:
ed2k://|file|AdbeRdr1001_en_US.exe|48536984|249634B84340FEB5778EC09A2A9C2B87| h=5XYGXYHANLAEAL3Y67HVF32OOJ2HXCCP|/
This example contains a space for easy transfer to a new line. This URI is written without spaces.
- The "f" parameter allows you to specify the location of a text file containing a more complete ed2k hyperlink - in case it exceeds the length of 2038 characters, which is the limit of the size of a URI in some standards and browsers. Example:
ed2k://|file|AdbeRdr1001_en_US.exe|48536984|249634B84340FEB5778EC09A2A9C2B87|f=http://example.org/long.ed2k|/
I am afraid that it is hardly supported by all modern customers.
- The “sources” parameter has a special form: it is always written at the end of the link, and is preceded by a parameter consisting of a slash (“/”). Thus, clients who do not understand the “sources” parameter (and at the same time consider a slash as a sign of the end of the ed2k hyperlink) are free to ignore it. Contains the keyword “sources”, followed by, comma-separated, addresses (or domain names) and ports of ed2k clients. Example:
ed2k://|file|AdbeRdr1001_en_US.exe|48536984|249634B84340FEB5778EC09A2A9C2B87|/|sources,ed2k.example.net:6789,ed2k.example.org:12345|/
This example contains a space for easy transfer to a new line. This URI is written without spaces.
It is quite clear that such addresses should be long-term, which is achieved by reserving a permanent IP address or associating a domain name with a dynamic DNS.
Creating an ed2k hyperlink file
To create an
ed2k hyperlink containing the file size (in bytes) and one (ed2k) or two (ed2k and AICH) hashes, it is not necessary to be a file sharing participant, install eMule or another similar client for ed2k networks
and / or Kad. A simple
LinkCreator program, distributed through the SourceForge website by the creators of eMule
( 125 Kb ZIP ), can easily cope with this task in Windows (or under Wine).
(To be continued…)