This text will be a new chapter for a textbook on the protection of information of the Department of Radio Engineering and Control Systems MIPT (GU).The complete tutorial is available on github .On Habré, I also plan to post new “big” pieces, first, to collect useful comments and observations, and second, to give the community more overview material on useful and interesting topics.
When you have knowledge of what a cryptographically strong hash function is, it is very easy to understand what a blockchain is. A blockchain is a sequential set of blocks (or, more generally, a directed graph), each subsequent block in which includes the hash function from the previous block as hashed information.
The blockchain technology is used to organize transaction logs, whereby a transaction can be understood as anything: a financial transaction (transfer between accounts), an audit of authentication and authorization events, a record of the maintenance and maintenance of vehicles. In this case, the event is considered to have happened if an entry about it is included in the log. ')
In such systems there are three groups of actors:
event sources (transactions)
block sources (transaction fixers)
recipients (readers) of blocks and committed transactions.
Depending on the implementation, these groups may overlap. In systems like BitCoin, for example, all members of a distributed system can perform all three functions. Although the creation of blocks (commit transactions) are usually responsible for the allocated computing power, and the participants controlling them are called miners (see the section on the decentralized blockchain below).
The main requirement for such journals is:
Impossibility to modify the log: after adding a transaction to the log it should be impossible to delete or change it from there.
In order to understand how you can fulfill the requirement to ban modifications, you should deal with the following questions:
How is it guaranteed that information cannot be changed inside the block?
How does the system ensure that an existing block chain cannot be regenerated, thereby correcting the information in them?
The answer to the first question is simple: you need to provide each block with a hash sum from its contents. And include this hash sum as additional useful information (also hashable) in the next block. Then in order to change something in the block without destroying the clients' trust in it, it will be necessary to do it in such a way that the hash sum from the block does not change. And this is just almost impossible if we use a cryptographically stable hash function. Or change including the hash sum of the block. But then you have to change the value of this hash sum in the next block. And this will require changes, in turn, in the hash sum of the entire second block, and then in the third, and so on. It turns out that in order to change the information in one of the blocks, it will be necessary to regenerate the entire chain of blocks, starting with the modified one. Can this be done?
Here you need to answer the question of how in such systems are protected from the possibility of regeneration of a chain of blocks. We will consider three options for systems:
centralized with a trusted center
centralized with untrusted center
decentralized version using proof of work
Centralized blockchain with trusted center
If we have a trusted center, then we simply instruct it to create a new block after a certain period of time (or through a certain set of transactions), supplying it not only with a hash sum, but also with its electronic signature. Each client of the system has the opportunity to verify that all the blocks in the chain are generated by a trusted center and none other. Assuming that the trusted center is not compromised, the attacker does not have the possibility of modifying the log.
The use of blockchain technology in this case is redundant. If we have a trusted center, you can simply access it with the goal of signing each transaction, adding time and a serial number to it. The number ensures the order and the impossibility of adding (deleting) transactions from the chain, the electronic signature of the trusted center - the impossibility of modifying specific transactions.
Central blockchain with untrusted center
An interesting case is when a dedicated center is not trusted. More precisely, it is not fully trusted. We trust him in terms of fixing transactions in the journal, but we want to be sure that the dedicated center does not regenerate the entire block chain, removing more transactions from it that it does not need or adding necessary ones.
For this you can use, for example, the following two methods.
First method using additional trusted repository. After creating the next block, the center must send the hash code from the new block to the trusted and independent from this center. Trusted storage should not accept any changes to the hash codes of already created blocks. A decentralized system database, if present, can also be used as such storage. The size of the stored information may be small compared to the total volume of the magazine.
The second possible method is to add a timestamp generated by a trusted time center to each block. Such a label should contain the time of label generation and the electronic signature of the center, calculated on the basis of the hash code of the block and the time of the label. In case the “untrusted” center wants to regenerate part of the block chain, there will be a gap in the time stamps.
It should be noted that this method does not guarantee that the “untrusted” center will not generate two chains of blocks at once, supplementing them with correct time stamps, and then will not replace the other one.
Decentralized blockchain
The greatest interest for us (and - the smallest for companies selling blockchain solutions) is the decentralized blockchain system without dedicated block generation centers. Each participant can take a set of transactions waiting to be included in the log, and form a new block. Moreover, in systems like BitCoin, such a participant (we will call him a “miner”, from English to mine - dig) will also receive a bonus in the form of a certain amount and / or commission from the transactions accepted in the block.
But you can not just take and form a block in decentralized systems. The reliability of such systems is based precisely on the fact that a new unit cannot be formed faster (on average) than in a certain time. For example, in 10 minutes (BitCoin). This is provided by a mechanism called proof of work.
The mechanism is based on the following idea. Let there be a cryptographically strong hash function h(x) and some parameter is set t (from the English. target - the goal). 0<t<2n where n - output size of the hash function in bits. The correct new blockchain network will only recognize such a one whose hash sum is less than the current specified parameter. t . In this case, the algorithm of the miner works as follows:
collect from the pool of uncommitted transactions those that fit in 1 block (1 megabyte for the Bitcoin network) and have a maximum commission (solve the backpack problem)
add to the block information about the previous block;
add information about yourself to the block (as about the author of the block, to whom to charge commissions and bonuses for the block);
to establish r to some value for example 0 ;
perform in a loop:
update value r:=r+1 ;
calculate the value h=h(block||r) ;
if a h<t , add to block r and consider the block formed, otherwise - repeat the cycle.
For each iteration of the loop, the probability of getting the correct block is t/2n . Because t usually a little, the miners need to do a large number of iterations of the cycle to find the desired r . However, only one (usually the first) of the found blocks will be considered correct. The greater the computing power of a particular miner, the greater the likelihood that he will be the first to be able to find the necessary r .
Knowing the total computing power of the blockchain network, participants can agree on such a mechanism for changing a parameter t so that the generation time of the new correct block is approximately the specified time. For example, in the Bitcoin network parameter t recalculated every 2016 blocks so that the average block generation time is 10 minutes. This allows the network to adapt to changes in the number of participants, their computing power and the emergence of new mechanisms for calculating hash functions.
In addition to setting the parameter t It is possible to operate with other quantities, one way or another related to the power of the calculations.
Hashrate is the number of hashes that are counted for a unit of time by a particular miner or network as a whole. For example, in November 2017, the total hashrate for the Bitcoin network was approximately 7.7times1018 hashes per second.
Difficulty - the difficulty of finding the correct block, expressed as d=dconst/t where dconst - some constant of complexity, and t - the current goal (eng. Target ). In contrast to the parameter t , which decreases with an increase in the computational power of the network, d changes with hashrate , which makes it easier for people to understand and analyze.
In the case of approximately simultaneous generation of the next block by two or more miners (when information about the new block is published by the second miner before the information about the new block from the first one comes), a branching occurs in the directed graph of blocks. Then each of the miners chooses one of the new blocks (for example, which one they saw first) and tries to generate a new block based on the selected one, continuing the “branch” in the graph. In the end, one of the two such chains becomes longer (the one that was chosen by a large number of miners), and it is she who is recognized as the main one.
In the case of normal system behavior, the inclusion of specific transactions in the blocks is little affected, since each of the conscientious miners follow the same algorithm for including transactions in the block (for example, in the BitCoin network, the algorithm for maximizing commission per block). However, it can be assumed that some attacker will want to “moderate” the distributed blockchain, including or not including the transaction of his choice in the blocks. Suppose that the share of the attacker's computing resources (aimed at generating a new block) is equal to p (0% < p <50%). In this case, each next generated block with probability p will be generated by the attacker's powers. This will allow him to include in the block those transactions that other miners did not want to include.
But will this allow an attacker not to include something in the transaction chain? Not. Because after his block with probability (1−p) a block of the “ordinary” miner will follow, who will gladly (with the proportional commission-reward) include all transactions in his block.
However, the situation changes if the power of the attacker is more than 50% of the network capacity. In this case, if after an attacker's block was with probability 1−p a “normal” block has been generated, an attacker can simply ignore it and continue to generate new blocks, as if it is the only miner in the network. Then if the average time of generation of one unit by all powers t then for the time T the attacker will be able to generate NE=p∗T/t , and legal users NL=(1−p)∗T/t blocks, NE>NL . Even if with some probability legal users will generate 2 blocks faster than an attacker alone, the latter will still “catch up and overtake” the “legal” chain in about t/(2p−1) . Since there is an agreement in the blockchain that the longest chain is assumed to be the current state of the network, the attacker's chain will always be perceived as correct. It turns out that an attacker can, at will, include or not include transactions in chains.
True, the attacker still will not be able to use other people's money - since all blocks of transactions are checked for internal consistency and correctness of all transactions included in the block.
In addition to the concept of "proof of work" and others are used. For example, in the “proof of stake” approach used in the Etherium and EmerCoin networks, the probability of block generation is proportional to the amount of funds in the accounts of potential creators of the new block. This is much more energy efficient compared to PoW, and, in addition, binds the responsibility for the reliability and correctness of generating new units with the size of capital (the more money we have, the less we want to endanger the system). On the other hand, this gives additional motivation to concentrate more capital in one hand, which can lead to centralization of the system.
The mechanism for making changes to the protocol
Any system should be developed. But for decentralized systems one cannot simply “turn on one switch” and make the system participants work in a new way - otherwise the system cannot be called completely decentralized. At first glance, the mechanisms and methods for making changes may look nontrivial. For example:
apologists of the system suggest changes in the rules of work
Software authors make changes to the program code, allowing you to do two things:
indicate to the system participants that they support the new change
support new change
System members download the new version and set signal flags in the new transaction blocks (or the transactions themselves) indicating their intention to support the change.
if by a certain date a certain number of blocks contain a signal flag (pay attention to the binding of the number of votes to the number of generated blocks), then the change is considered accepted, and a large part (according to the number of blocks) of the system’s participants on a certain date includes these changes
those participants who did not accept the changes, or accepted the changes despite the lack of agreement on them by the majority of the participants, in the worst case, will begin to generate their block chain, only recognizing it as correct. They will consider the main chain of blocks incorrectly generated. In fact, this will lead to duplication (branching, fork) of the system, when at some date, instead of one transaction log, there are two transactions led by different people. These journals coincide until a certain date, after which a discrepancy begins in them.
Summing up, Satoshi Nakamoto (pseudonym), the author of blockchain and bitcoin technologies, was able to offer a working decentralized mechanism in which both the system behavior itself and the changes to this system pass through an explicit or implicit search for participants' consensus. To gain control over the system as a whole, an attacker will have to gain control of at least 50% of all system capacity (in the case of PoW), and without this you can only try to limit the possibility of using the system to specific participants.
However, the technology created is not without flaws. There are estimates that using the PoW method for a bitcoin system results in energy costs comparable to the consumption of electricity by entire cities or countries. There are problems with finding a consensus - a complex mechanism for making changes, as some experts believe, can lead to growth problems (for example, due to the limited number of transactions in the block), and, in the future, to the failure to use the mechanism as outdated and unresponsive tasks.
I would like to learn from the community about what other technologies should be told to students.On the one hand, they should definitely tell you about basic things - classical cryptography and public key cryptography.But I want to give a concept about modern things, which, perhaps, will not become an extra load of knowledge in five to ten years.The current curriculum content can be found here .
Change history
2017-11-17: CC-BY license indication added
2017-11-18: Updated and expanded information on the mechanism of proof-of-work and related definitions