📜 ⬆️ ⬇️

Part 2. Where to store data for decentralized applications on the blockchain?

In the first part of the article , the problem of storing application data in the blockchain was identified. To understand the essence of what is happening, we recommend to read it . In this part of the article, we will designate our wishes for the properties of an ideal data warehouse, as well as consider existing approaches to solving this problem.

So, if we have a decentralized application, what data warehouse requirements would be worth? We offer the following requirements


Let's see how existing technologies meet these requirements.

IPFS


IPFS (InterPlanetary File System) is a distributed file system technology based on DHT (Distributed Hash Table) and BitTorrent protocol. It allows you to combine file systems on different devices into one using content addressing.
')
Advantages :


Disadvantages :


It is with the use of IPFS that the AKASHA (Ethereum + IPFS) social network and the OpenBazaar marketplace are built. They fully inherit the above-mentioned disadvantages of IPFS, the main of which is that it is impossible to place information on the network and go offline until it has spread around the feasts.

Distributed File Storage


Such storage allows you to combine individual devices in a shared cloud storage. As a result, users can store their files there as well as they could do in classic centralized storage, for example, Dropbox , but cheaper. The owners of the devices (“farmers”), by providing a place to store other people's files, receive money from users for this, according to their contribution. To measure the contribution, ensure reliable storage and prevent abuse, various checks are used, for example, proof of storage, proof of retrievability (proof that the file is available and can be extracted) based on cryptography. The user pays for successful verification, and the farmer receives a certain amount in cryptocurrency.

Such projects are built, mainly using DHT technology and content addressing (when the hash from the file is its identifier). Some additionally use smart contracts.

Currently there are quite a few such projects on the market, for example, Sia , Storj , Ethereum Swarm , MadeSAFE . They are all built on similar principles. And Ethereum Swarm was conceived, among other things, to provide a convenient storage of files for dApps.

Advantages :


Disadvantages :


Distributed file storages look attractive for file storage. However, to store structured dynamic information, such as social network user data, storing data in static files is a serious problem. The fact is that file storages do not know anything about the contents of the file, and the application may be required to search for information not only by the identifier (or name) of the file, but also by its contents. For example, find all users with the blockchain keyword. Or located in a particular city. Or even carry out a full search for publications. Therefore, we continue to look for a better implementation.

Distributed Databases


Unfortunately, by virtue of the CAP theorem, it is impossible to obtain a fully distributed database that simultaneously ensures consistency, accessibility, and resistance to separation (the latter means that the database continues to function even if part of the nodes are disconnected from the network or their messages do not reach).

For our needs, we need a distributed database, of course, resistant to separation and accessible - we need to quickly get an answer from it, although perhaps not the most recent one. This limits our choice to NoSQL databases, because ACID SQL DBMS primarily ensures consistency.

The implementation of distributed NoSQL databases is great. For example, MongoDB , Cassandra , RethinkDB . All of them are able to work with a large number of replicas, united in a cluster. The client works with one of the replicas, and the data is automatically synchronized with the others. For load balancing, sharding can be used when part of the data is stored only on part of the replicas. Adding a new replica to the cluster almost linearly scales the cluster, with some implementations (for example, Cassandra) allowing the replica to automatically take over part of the cluster's work.

NoSQL databases provide “integrity ultimately” (eventual consistency), that is, the data becomes consistent after a while, when individual replicas are synchronized. In this they are similar to the blockchain - the more likely is the confirmation of a transaction, the more time has passed.

NoSQL databases can store just a key value, and maintain the internal structure of the value, as well as additional indexes. The most advanced also have basic transaction support and a SQL-like query language (for example, Cassandra).

For all the above, this class of databases may seem ideal for use in the blockchain. But there is a problem. Imagine that someone added a malicious replica to a well-coordinated cluster of such databases, which begins to inform other replicas in the cluster that all data must be deleted! All other replicas obediently all data will be deleted, and the database will be hopelessly corrupted. That is, such well-coordinated work of replicas is now possible only in a trusted environment (a cluster of such databases is unstable to the problem of Byzantine generals). If a maliciously functioning replica is placed in the cluster, it can cause the destruction of the data of the entire cluster.

Advantages :


Disadvantages :


Bigchaindb


There is a blockchain implementation called BigChainDB or, as it is also called, IPDB (InterPlanetary DataBase), which is often referred to as a solution to all problems with data storage. The authors declare a very high transaction rate (1 million / sec), a huge storage capacity (due to distributed storage with partial replication). BigChainDB gains these benefits through a simplified consensus on building blocks, as well as by storing all the blocks and transactions in an existing noSql database implementation — RethinkDB or MongoDB .

Unfortunately, this architecture has a significant drawback - each node has full rights to write to the general data repository, which means that the system as a whole is unstable to the problem of the Byzantine generals. The authors of this project are aware of this , promising to think about it later. However, the correction of fundamental problems in the basic architecture after the release of the product is very time consuming and often impossible, because it can lead to a significantly different product with a different architecture. Such a light attitude to the fundamental problem causes criticism of the project from the community, because the demonstrated high speed and volumetric characteristics of BigChainDB in the absence of BFT (Byzantine Fault Tolerance) are not so different from the characteristics that were initially demonstrated by noSql databases used by RethinkDB and MongoDB it for data storage. But since you still need full trust between the nodes, then why not use these databases directly?

Thus, the actual use of BigChainDB is limited to private networks. In order not to mislead people, BigChainDB should be called BigPrivateBlockChain, then there would be no questions. For public networks, it does not fit.

Advantages :


Disadvantages :


Thus, BigChainDB is completely unsuitable for storing data of decentralized applications in public networks.

findings


We considered several approaches to organizing data storage for public networks that can be used by distributed applications. There were few of them, but no more at the moment. Unfortunately, none of the approaches satisfies all the requirements that we formulated at the beginning of the article.

This situation resembles the stage of the formation of computers, when programs could save data only in files, and this was inconvenient. Therefore, for computers they have implemented DBMS, and some have made a fortune on this.

As a result, existing decentralized applications are interrupted by storing data directly in the blockchain or in distributed file systems, as in the Stone Age . They are forced to independently implement indexes on files, invent their own data format and generally spend a lot of time inventing bicycles, albeit decentralized.

But the world of decentralized applications cannot remain without a convenient data storage. Therefore, in the next part of the article, you will be presented with the concept of a repository that claims to satisfy all the requirements set out above.

→ The third part of the article
→ The first part of the article

Source: https://habr.com/ru/post/327947/


All Articles