📜 ⬆️ ⬇️

Part 3. Where to store data for decentralized applications on the blockchain?

In the first part of the article, we found problems with storing application data in the blockchain. In the second part, we described the requirements for the data warehouse and examined how the existing implementations meet these requirements. The results were disappointing - there was no satisfactory implementation. In this part we will propose the concept of a decentralized data warehouse that meets the requirements. Of course, for a deeper understanding of what is happening, it is recommended to review the two previous parts .

So, current implementations of storages do not meet the requirements for a database that will suit a wide class of decentralized applications on the blockchain.

Given that a speedless connection with the blockchain is acceptable for the sake of speed (that is, not all transactions will pass through the blockchain), it must be resistant to malicious behavior of other database nodes, provide a sufficient level of replication, and have mechanisms for motivating participants to support the network. That is, such a database will need the support of the blockchain with smart contracts.

When creating such a database, it is proposed to build on existing noSql implementations, for example, Apache Cassandra , but at the same time to endow our database with the following properties:
')

Mandatory cryptographic signature of each record ensures that its modification or deletion by the malicious party is impossible without knowing the private key of the owner of the record. That is, the data storage thus constructed is stable to the Byzantine problem even without a consensus mechanism.

This gives hope that the speed of such a scheme will not differ much from the speed of existing noSql database implementations. However, an attacker can produce a sybil attack , generating key pairs and creating garbage entries in the database. This problem is solved by introducing motivation.

Motivation


The public network assumes that participants are free to join it, providing equipment that enhances computing power, information storage and distribution of the network. To stimulate this behavior, equipment owners should receive a reward that motivates them to work honestly.

This means that database operations will be paid for by the end user. This may sound crazy for a person who is not familiar with the blockchain, but it is quite reasonable. The fact is that blockchain projects often do not have an owner. They own the community. As a result, the community itself must pay the costs of the project. The money is very small, but nonzero. The existing decentralized file storages, which we discussed in the previous part of the article, also charge the user for file storage. And we are nowhere without this, at the basic level the functioning of the equipment must be paid for by its users. In principle, these costs can then be offset from other sources, but this topic is beyond the scope of this article.

Similar to Ethereum Swarm , the following awards are assumed:


Awards are allocated from the funds of the user making requests to the database. Since payments through the blockchain take a very long time, you can use two methods for fast payments - off-chain transactions and checkbooks. In the case of off-chain transactions, the user will need to create an off-chain channel with each database node (or use intermediate channels between nodes). Given that each such channel requires the reservation of funds on it, it can be quite expensive. Therefore, we will adopt the “checkbook” approach as the main approach. Explain what it is.

Before accessing the database, the user must reserve part of the funds on a special smart contract “checkbook”. The address of this contract is then used by the database node to receive remuneration — the checkbook contract stores its owner’s money and allows third parties to cash in checks signed by the owner simply by sending the transaction with the check as data to the cash contract method.


Check cashed if


Then, if you need to reward the database node, the user simply sends her a check. The receiving node can save only the last received check from each user and periodically cash it, sending a “checkbook” to the contract, which allows, with some confidence, to save on blockchain transactions.

Data Retrieval Reward


The data on the database nodes have a certain level of replication, that is, data with a certain key is stored only on a part of the nodes, for example, on N. However, the user can refer to any node for data. The node to which the user has addressed acts further as the “coordinator”.

The value of the data key is used to calculate the N nodes responsible for storing these keys, and requests are sent to them. The data returned by the nodes is checked by the coordinator for compliance with electronic signatures, compared by the time stamp, and the latest record is returned to the user.

Payment is subject to the work of the coordinator and the replicas that store data. The proportions of payment are subject to a more detailed calculation, but in order to stimulate the correct behavior, the following principles should be followed:


Together with the data, the coordinator issues an invoice, which indicates which node is entitled to how much. The user writes checks to everyone. The coordinator sends checks to the nodes. As well as updated data, if the node did not return anything or returned old data.

To protect against malicious coordinators and users who will not pay, each node keeps a list of users from whom it expects payment, and coordinators sending requests from these users. If the debt level exceeds a certain threshold value, the node may stop accepting requests from specified users and coordinators. Upon receipt of checks lists are adjusted.

Reward for storage


The extraction reward indirectly stimulates and stores, but only works for popular and frequently requested data. To encourage long-term storage of data, especially if they are rarely requested, a reward for storage is required.

The article on Ethereum Swarm describes the storage reward system. Nodes enter into a data storage contract with the information owner for a period of time. Payment for storage can occur at the moment of saving (updating) data or after some time, provided that the data is actually stored. In case of detection of data loss during the contract, the site can be fined, for which each node requires initial registration with a security deposit.

When saving data, the node returns a receipt, which serves as proof that the node has accepted the file for storage. This receipt subsequently allows you to check whether the data corresponding to them is still stored, and if not, initiate a transaction for a court smart contract that will allow the punished node to be punished.

In our case, the data is not static, a record with the same key can be overwritten several times. In this case, not only the original record, but also a newer record with the same key can correspond to the presented receipt.

When a user initiated a data deletion operation, instead of a physical deletion, the data is replaced with a special “zero” entry. The record can be physically deleted after the expiration of the storage contract on it.

Full-text search, secondary indexes


In noSql databases, a quick search using a small number of nodes is possible only by the primary key. Our database is not much different in this respect. Meanwhile, the search for records by keywords, as well as the grouping of records by some criteria is difficult to achieve without secondary indexes and full-text search. For full-text search, as well as the use of secondary indexes, it is proposed to use a solution similar to Elassandra . This solution is the local full-text ElasticSearch indexes on each node of the distributed noSql base of Cassandra . Full-text requests are sent by the coordinator to all nodes, then mixed and returned to the client. Since additional indexes are created locally and independently at each node, additional prevention of the problem of Byzantine generals is not required.

Results


Thus, we presented the concept of a public database associated with the blockchain, which satisfies the requirements for use by decentralized applications:


Such a database can be used by decentralized applications on any blockchains that support Turing-complete smart contracts (however, decentralized applications are not created for other blockchains). For example, it can be used for the needs of distributed applications on top of the Ethereum , RChain and other blockchains .

The concept of such a database was developed in the framework of the open-source project Superman . There is no implementation yet (but it is clearly in demand), and the concept is being submitted to the public today. We hope that for the first time the introduced concept of a public decentralized noSql database will interest you enough to discuss it or even join the development . You can also join the channel in telegrams , where you can chat with developers (in English).

→ The first part of the article
→ The second part of the article

Source: https://habr.com/ru/post/328004/


All Articles