In the
first part of the article, we found problems with storing application data in the blockchain. In the
second part, we described the requirements for the data warehouse and examined how the existing implementations meet these requirements. The results were disappointing - there was no satisfactory implementation. In this part we will propose the concept of a decentralized data warehouse that meets the requirements. Of course, for a deeper understanding of what is happening, it is recommended to review the two
previous parts .
So,
current implementations of storages do not meet the requirements for a database that will suit a wide class of decentralized applications on the blockchain.
Given that a speedless connection with the blockchain is acceptable for the sake of speed (that is, not all transactions will pass through the blockchain), it must be resistant to malicious behavior of other database nodes, provide a sufficient level of replication, and have mechanisms for motivating participants to support the network. That is, such a database will need the support of the blockchain with smart contracts.
When creating such a database, it is proposed to build on existing noSql implementations, for example,
Apache Cassandra , but at the same time to endow our database with the following properties:
')
- The database is public, the user (client) of the database is identified by its public key (the same key as in the blockchain) is the user ID.
- Each user can send transactions to the database, each transaction must be signed by this user.
- The new record created by the user remembers that he is its owner.
- After the creation, only the owner (or the user for whom trust is established through the permission mechanism implemented as a smart contract on the blockchain) can change the entry.
- Everyone can read all the records.
- In order to prevent conflicts between the keys of their records between different users, all keys of the user's records are prefixed with: user ID.
- More complex permissions can be set using a smart contract in the blockchain (for example, trust between specific users, rights to create / delete tables), etc.
- All permissions must be checked for both transactions and replications.
Mandatory cryptographic signature of each record ensures that its modification or deletion by the malicious party is impossible without knowing the private key of the owner of the record. That is, the data storage thus constructed is stable to the
Byzantine problem even without a consensus mechanism.
This gives hope that the speed of such a scheme will not differ much from the speed of existing noSql database implementations. However, an attacker can produce a
sybil attack , generating key pairs and creating garbage entries in the database. This problem is solved by introducing motivation.
Motivation
The public network assumes that participants are free to join it, providing equipment that enhances computing power, information storage and distribution of the network. To stimulate this behavior, equipment owners should receive a reward that motivates them to work honestly.
This means that database operations will be paid for by the end user. This may sound crazy for a person who is not familiar with the blockchain, but it is quite reasonable. The fact is that blockchain projects often do not have an owner. They own the community. As a result, the community itself must pay the costs of the project. The money is very small, but nonzero. The existing decentralized file storages, which we discussed in the previous part of the article, also charge the user for file storage. And we are nowhere without this, at the basic level the functioning of the equipment must be paid for by its users. In principle, these costs can then be offset from other sources, but this topic is beyond the scope of this article.
Similar to
Ethereum Swarm , the following awards are assumed:
- Data Retrieval Reward
- Data storage reward
Awards are allocated from the funds of the user making requests to the database. Since payments through the blockchain take a very long time, you can use two methods for fast payments -
off-chain transactions and checkbooks. In the case of off-chain transactions, the user will need to create an off-chain channel with each database node (or use intermediate channels between nodes). Given that each such channel requires the reservation of funds on it, it can be quite expensive. Therefore, we will adopt the “checkbook” approach as the main approach. Explain what it is.
Before accessing the database, the user must reserve part of the funds on a special smart contract “checkbook”. The address of this contract is then used by the database node to receive remuneration — the checkbook contract stores its owner’s money and allows third parties to cash in checks signed by the owner simply by sending the transaction with the check as data to the cash contract method.
- The contract tracks the total amount written to each recipient during the connection.
- When sending the check, the owner must remember the total amount sent
Check cashed if
- contract address corresponds to the address on the check
- check is signed by the owner (user ID - public key)
- the total amount on the check is greater than the previous canceled check for the recipient
Then, if you need to reward the database node, the user simply sends her a check. The receiving node can save only the last received check from each user and periodically cash it, sending a “checkbook” to the contract, which allows, with some confidence, to save on blockchain transactions.
Data Retrieval Reward
The data on the database nodes have a certain level of replication, that is, data with a certain key is stored only on a part of the nodes, for example, on
N. However, the user can refer to any node for data. The node to which the user has addressed acts further as the “coordinator”.
The value of the data key is used to calculate the
N nodes responsible for storing these keys, and requests are sent to them. The data returned by the nodes is checked by the coordinator for compliance with electronic signatures, compared by the time stamp, and the latest record is returned to the user.
Payment is subject to the work of the coordinator and the replicas that store data. The proportions of payment are subject to a more detailed calculation, but in order to stimulate the correct behavior, the following principles should be followed:
- The faster the node returns the data, the more payment it deserves.
- If the node returns the old data, payment decreases
- Node not returning data gets nothing
- The coordinator gets a fixed small portion
Together with the data, the coordinator issues an invoice, which indicates which node is entitled to how much. The user writes checks to everyone. The coordinator sends checks to the nodes. As well as updated data, if the node did not return anything or returned old data.
To protect against malicious coordinators and users who will not pay, each node keeps a list of users from whom it expects payment, and coordinators sending requests from these users. If the debt level exceeds a certain threshold value, the node may stop accepting requests from specified users and coordinators. Upon receipt of checks lists are adjusted.
Reward for storage
The extraction reward indirectly stimulates and stores, but only works for popular and frequently requested data. To encourage long-term storage of data, especially if they are rarely requested, a reward for storage is required.
The article on
Ethereum Swarm describes the storage reward system. Nodes enter into a data storage contract with the information owner for a period of time. Payment for storage can occur at the moment of saving (updating) data or after some time, provided that the data is actually stored. In case of detection of data loss during the contract, the site can be fined, for which each node requires initial registration with a security deposit.
When saving data, the node returns a receipt, which serves as proof that the node has accepted the file for storage. This receipt subsequently allows you to check whether the data corresponding to them is still stored, and if not, initiate a transaction for a court smart contract that will allow the punished node to be punished.
In our case, the data is not static, a record with the same key can be overwritten several times. In this case, not only the original record, but also a newer record with the same key can correspond to the presented receipt.
When a user initiated a data deletion operation, instead of a physical deletion, the data is replaced with a special “zero” entry. The record can be physically deleted after the expiration of the storage contract on it.
Full-text search, secondary indexes
In noSql databases, a quick search using a small number of nodes is possible only by the primary key. Our database is not much different in this respect. Meanwhile, the search for records by keywords, as well as the grouping of records by some criteria is difficult to achieve without secondary indexes and full-text search. For full-text search, as well as the use of secondary indexes, it is proposed to use a solution similar to
Elassandra . This solution is the local full-text
ElasticSearch indexes on each node of the distributed noSql base of
Cassandra . Full-text requests are sent by the coordinator to all nodes, then mixed and returned to the client. Since additional indexes are created locally and independently at each node, additional prevention of the problem of Byzantine generals is not required.
Results
Thus, we presented the concept of a public database associated with the blockchain, which satisfies the requirements for use by decentralized applications:
- Distribution
The database supports an unlimited number of replicas, each of which can be a coordinator. That is, referring to one of them, the user gets access to all the data.
- Publicity
The database is designed to work in a public environment. New nodes can be added to the network and take on part of the load at any time.
- Resistance to the problem of Byzantine generals and other types of attacks in the public network
Considering that all data placed in the database is signed by their owner, nodes cannot replace data at their discretion, nor can they spoil data when replicating to other nodes. Substitution attempts are immediately detected using an electronic signature mechanism. For an attempt to substitute the offending node may be deprived of the registration deposit and excluded from the network. For placing a deposit, setting access rights, mechanisms for mutual settlements between nodes, an external (for DB) blockchain is used, which must support turing-complete smart contracts.
- Sharding support (the ability to replicate only a part of the data on each node to increase the maximum total amount of data).
Each database node is responsible for a certain interval of the primary keys of the data it stores. The level of replication (the number of nodes that store copies of data with the same primary key) is set separately and can grow as the network grows.
- Speed
The principles of data storage suggest that the speed of writing and reading data in such a database will not differ much from current implementations of such databases, for example, Apache Cassandra.
- Ability to store structured data
Data in such a database maintain structure. This may be a JSON document with a structure suitable for a particular application.
- Ability to delete data
Deletion of data is supported. It is impossible to guarantee instant removal, but in the end, with respectable behavior of the nodes, the data will be deleted. A malicious node can consciously save all data that is deleted. However, he will not be able to do this for all data, because requests are sent to him only in a certain interval of primary keys.
- Query language with the ability to search not only the primary key
Using ElasticSearch , similar to the methods of integration with Cassandra in the Elassandra project, it is possible to extend the query language with secondary keys and full-text search.
Such a database can be used by decentralized applications on any blockchains that support Turing-complete smart contracts (however, decentralized applications are not created for other blockchains). For example, it can be used for the needs of distributed applications on top of the
Ethereum ,
RChain and other
blockchains .
The concept of such a database was developed in the framework of the open-source project
Superman . There is no implementation yet (but it is clearly in demand), and the concept is being submitted to the public today. We hope that for the first time the introduced concept of a public decentralized noSql database will interest you enough to discuss it or even join the
development . You can also join the
channel in telegrams , where you can chat with developers (in English).
→
The first part of the article→
The second part of the article