Interplanetary File System - Trivial Hash (Identity), DAG Block, and Protocol Buffers

Recently, I added support for a trivial (identity) hash in IPFS. In my article I will tell about it and show how it can be used.

Let me remind you: InterPlanetary File System is a new decentralized file sharing network (HTTP-server, Content Delivery Network ). About her, I began the story in the article "Interplanetary File System IPFS" .

Usually, when hashing passes through the hash function, the data is irreversibly “compressed” and as a result a short identifier is obtained. This identifier allows you to find the data on the network and check its integrity.

The trivial hash is the data itself. The data does not change at all and, accordingly, the size of the "hash" is equal to the size of the data.

The trivial hash performs the same function as the Data: URL . The content identifier in this case contains the data itself instead of the hash. This allows you to nest child blocks in the parent making them available immediately after receiving the parent. You can also include site data directly in the DNS record.

For example, encode the text string “Hello world” into a content identifier (CID) with a trivial hash.

ID structure:

[ ][varint  CID][varint  ][varint ID ][varint  ][]

Let's start from the end.

[hash]

The trivial hash in our case is the string itself. Translate it to HEX .

 " " = 0x"D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180"

This is the HEX of this line in utf-8 encoding. But so that the browser knows for sure that this utf-8 line is added to it at the beginning: 0xEFBBBF . This is a byte sequence marker (BOM).

 0x"EFBBBF D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180"

[varint is long hash]

Now we can calculate the length of the hash. Every two HEX characters is one byte. Accordingly, the resulting string is 22 bytes long. In HEX it will be 0x16 .

Add 0x16 to the beginning of the line.

 0x"16 EFBBBF D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180"

[varint ID hash]

Now we need a hash identifier. The trivial hash or identity in the hash table has the identifier 0x00 .

Add 0x00 to the beginning of the line.

 0x"00 16 EFBBBF D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180"

This is already a multi-cache part of the identifier. You can recode HEX into Base58 and the multi-cache is ready. But ipfs does not recognize it outside the content identifier (CID).

Go ahead.

[varint content type]

Now let's look at the multicodec table to get the content type. In our case, this is the raw data and the identifier, respectively, 0x55 .

Add 0x55 to the beginning of the line.

 0x"55 00 16 EFBBBF D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180"

[varint version CID]

We code to the format of the first version of the content identifier . Therefore we add 0x01.

Add 0x01 to the beginning of the line.

 0x"01 55 00 16 EFBBBF D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180"

And so we are already at the finish line.

[base prefix]

It indicates which variant of encoding binary data to text is used.

HEX (F)

We can directly use the HEX term by appending at the beginning the prefix of the base HEX character "F"

 F01550016EFBBBFD09FD180D0B8D0B2D0B5D18220D0BCD0B8D180

We got a HEX content identifier which contains utf-8 line: "Hello world"

We are testing : / ipfs / F01550016EFBBBFD09FD180D0B8D0B2D0B5D18220D0BCD0B8D180

Base58btc (z)

Base58btc will be shorter therefore

We translate our HEX string to base58btc. You can use the online converter .

 0x"01 55 00 16 EFBBBF D09F D180 D0B8 D0B2 D0B5 D182 20 D0BC D0B8 D180" = "3NDGAEgXCxbPucFFCQc9s5ScqZjqVFNr56P" (base58btc)

At the beginning of the resulting string, add the base prefix symbol base58btc "z"

 z3NDGAEgXCxbPucFFCQc9s5ScqZjqVFNr56P

We received the base58btc content identifier which contains the utf-8 line: "Hello world"

We test : / ipfs / z3NDGAEgXCxbPucFFCQc9s5ScqZjqVFNr56P

DAG block

The text is good, but in order to encode the HTML page we need to attach its data to the DAG directory block.

Here is our HTML:

 <b><i><u> </u></i></b>

Similarly, according to the instructions above, we get the content ID in base58btc for this text:

 zeExnPvBXdTRwCBhfkJ1fHFDaXpdW4ghvQjfaCRHYxtQnd3H4w1MPbLczSqyCqVo

Now we write the JSON file:

 { "links": [{ "Cid": { "/": "zeExnPvBXdTRwCBhfkJ1fHFDaXpdW4ghvQjfaCRHYxtQnd3H4w1MPbLczSqyCqVo" }, "Name": "index.html" }], "data": "CAE=" }

In "data" the type of DAG of the block is specified - directory.
"links" is an array of links to files.
"Name" is the corresponding file name.
"Cid" contains the content identifier

ipfs dag put -f"protobuf" convert JSON to DAG block via IPFS.

I received a multi-cache: QmXXixn4rCzGguhxQPjXQ8Mr5rdqwZfJTKkeB6DfZLt8EZ

At this stage, we got a block in which a directory with one file inscribed in the block.

Next, using this multi-cache, unload the finished block.

 ipfs block get QmXXixn4rCzGguhxQPjXQ8Mr5rdqwZfJTKkeB6DfZLt8EZ > block.dag

We translate the contents of block.dag to HEX:

 0x"123F0A2F0155002BEFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E120A696E6465782E68746D6C18000A020801"

Add:

CID version (0x01)
DAG Content Type (0x70)
trivial hash (0x00)
data size 69 bytes (0x45)

 0x"01 70 00 45 123F0A2F0155002BEFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E120A696E6465782E68746D6C18000A020801"

Convert to Base58btc and add the prefix "z"

 z6S3Z3W1zuRxio8AJC41jRTdyU9pZWnU6sNbvyGyypEdD8JVNdW42ZmGYWKWGbVDELLvJNWcMspaZMUPZKt7JQmhdyXCqq7j37GL

Thus, we received a content identifier with a directory in which the html index.html page with the text "Hello world".

We are testing : / ipfs / z6S3Z3W1zuRxio8AJC41jRTdyU9pZWnU6sNbvyGyypEdD8JVNdW42ZmGYWKWGbVDELLvJNWcMspaZMUPZKt7JQmhdyXCqQy4j4j4j4j4jmJJWCMspaZMUPZKt7JQmhYXCqj4j4j4j4j4j4jmjjpcdcdp4dvdbjp3d3d3d3d3d3c3d3d3d3d3c3d3d3d3d3d3c3d3d3d3d3c3d3d3d3d3c3d3dc

Further, this hash can also be attached to another block or written to the DNS dnslink record. So in one block you can fit a small simple site.

DAG block and Protocol Buffers

DAG unit can also be assembled manually. A DAG block is data in the Protocol Buffers format. The top layer is merkledag.proto which has unixfs.proto in Data.

Protocol buffers

Any protobuffer starts with a varint field id. Often the identifier occupies one byte because its total value is less than 0x80. In our case, the first byte is 0x12. The lower 3 bits of this field are type. The rest of the ID specified in the proto file.

Length-delimited

Decrypt the identifier:

 0x12 & 0x07 = 2 (: Length-delimited) 0x12 >> 3 = 2 (ID: 2)

Length-delimited means that the varint size of the field in bytes and its contents immediately follows. This type is used for various nested structures as well as raw data (string, bytes, embedded messages, packed repeated fields). What it defines already proto file.

Varint

We decipher the identifier of another type:

 0x18 & 0x07 = 0 (: Varint) 0x12 >> 3 = 3 (ID: 3)

Varint means that the next value immediately in the varint. This container is used to record many types of values (int32, int64, uint32, uint64, sint32, sint64, bool, enum). That it also defines the proto file.

Let's sort the block.dag that we translated to HEX above

To parse a block, you can use a site that will automatically parse any Protocol Buffer without using proto files.

 0x"123F0A2F0155002BEFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E120A696E6465782E68746D6C18000A020801"

Parse the block and match the identifiers from the proto files.

merkledag.proto

 // An IPFS MerkleDAG Link message PBLink { // multihash of the target object optional bytes Hash = 1; // utf string name. should be unique per object optional string Name = 2; // cumulative size of target object optional uint64 Tsize = 3; } // An IPFS MerkleDAG Node message PBNode { // refs to other objects repeated PBLink Links = 2; // opaque user data optional bytes Data = 1; }

unixfs.proto

 message Data { enum DataType { Raw = 0; Directory = 1; File = 2; Metadata = 3; Symlink = 4; HAMTShard = 5; } required DataType Type = 1; optional bytes Data = 2; optional uint64 filesize = 3; repeated uint64 blocksizes = 4; optional uint64 hashType = 5; optional uint64 fanout = 6; }

 12 (: 2 (Length-delimited). ID: 2 (PBLink PBNode.Links (merkledag.proto))) 3F (: 63 ) 0A (: 2 (Length-delimited). ID: 1 (PBLink.Hash)) 2F (: 47 ) 01 55 00 2B (CIDv1 Raw Identity 43 ) EFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E = "<b><i><u> </u></i></b>" 12 (: 2 (Length-delimited). ID: 2 (PBLink.Name)) 0A (: 10 ) 696E6465782E68746D6C = "index.html" 18 (: 0 (Varint). ID: 3 (PBLink.Size)) 00 (: 0) 0A (: 2 (Length-delimited). ID: 1 (PBNode.Data = Data (unixfs.proto))) 02 (: 2 ) 08 (: 0 (Varint). ID: 1 (Data.Type)) 01 (1 == Data.DataType.Directory)

Accordingly, a block with two files will look like this:

 12 (: 2 (Length-delimited). ID: 2 (PBLink PBNode.Links (merkledag.proto))) 3B (: 59 ) 0A (: 2 (Length-delimited). ID: 1 (PBLink.Hash)) 2F (: 47 ) 01 55 00 2B (CIDv1 Raw Identity 43 ) EFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E = "<b><i><u> </u></i></b>" 12 (: 2 (Length-delimited). ID: 2 (PBLink.Name)) 06 (: 6 ) 312E68746D6C = "1.html" 18 (: 0 (Varint). ID: 3 (PBLink.Size)) 00 (: 0) 12 (: 2 (Length-delimited). ID: 2 (PBLink PBNode.Links)) 3B (: 59 ) 0A (: 2 (Length-delimited). ID: 1 (PBLink.Hash)) 2F (: 47 ) 01 55 00 2B (CIDv1 Raw Identity 43 ) EFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E = "<b><i><u> </u></i></b>" 12 (: 2 (Length-delimited). ID: 2 (PBLink.Name)) 06 (: 6 ) 322E68746D6C = "2.html" 18 (: 0 (Varint). ID: 3 (PBLink.Size)) 00 (: 0) 0A (: 2 (Length-delimited). ID: 1 (PBNode.Data = Data(unixfs.proto))) 02 (: 2 ) 08 (: 0 (Varint). ID: 1 (Data.Type)) 01 (1 == Data.DataType.Directory)

That is, the PBNode.Links (0x12) field is repeated as many times as the number of files should be placed in the block.

To check, add at the beginning of "F 01 70 00" (HEX CIDv1 DAG Identity) and the DAG size of the block "7E" (126 bytes)

 F 01 70 00 7E 12 3B 0A 2F 01 55 00 2B EFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E 12 06 312E68746D6C 18 00 12 3B 0A 2F 01 55 00 2B EFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E 12 06 322E68746D6C 18 00 0A 02 08 01

Check: / ipfs / F0170007E123B0A2F0155002BEFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E1206312E68746D6C1800123B0A2F0155002BEFBBBF3C623E3C693E3C753ED09FD180D0B8D0B2D0B5D18220D0BCD0B8D1803C2F753E3C2F693E3C2F623E1206322E68746D6C18000A020801

Conclusion

I hope I gave enough information in order to be able to implement the creation of blocks and identifiers.

Source: https://habr.com/ru/post/423073/

All Articles