📜 ⬆️ ⬇️

Bitcoin: development history, from CPU to FPGA


(photo source)
Previously, we considered the basic principles that follow when mining Bitcoins.
This time we will look at the history of the development of computing systems intended for the extraction of bitcoins, as well as technological advances and difficulties encountered along this path.

CPU: The first generation of miners


If you look at the source code of the Bitcoin miner, it turns out that it is surprisingly simple. The main part looks something like this:
while (1) { HDR[kNoncePos]++; IF (SHA256(SHA256(HDR)) < (65535 << 208)/ DIFFICULTY) return; } 


The SHA256 calculation is performed on 512-bit data blocks and includes 64 encryption rounds, requiring many 32-bit additions, shifts, and bitwise operations. Each subsequent round of encryption depends on the results of the previous one, creating a chain of dependencies. Although the execution of individual encryption rounds cannot be parallelized, the verification of each individual hash can be performed in parallel, which makes the calculations perfectly parallelizable.
The performance of such miners reaches 33 MH / s per processor (Core i7 990x). There is nothing more to add.

GPU: The second generation of miners


In October 2010, the code for the first open-source OpenCL miner was published, after which it was quickly adapted and optimized through the efforts of community enthusiasts. Such miners consisted of a bitcoin protocol implementation in a language like Java or Python and an enumeration algorithm in the form of an OpenCL file, which should be compiled for an ISA of the corresponding GPU.
')
The huge variety of OpenCL implementations was due to attempts to squeeze out of the compilers everything they were capable of, in pursuit of improving the quality of the code. In addition, the non-OpenCL part of the code was directly responsible for invoking the OpenCL API in order to double-check the results or control the GPU parameters in response to temperature and user settings.
Since it was assumed that such equipment would work for months, users started breaking down, playing with the supply voltage (lowering to reduce the cost of mining costs, or increasing along with the frequency to increase performance), the GPU core and code parameters such as number of threads. All in order to increase throughput in the framework of reasonable stability and temperature.

Since Bitcoin mining does not require a special load on the RAM or floating point operations, a large number of critical paths in the electrical circuit or bottlenecks in the GPU architecture do not manifest themselves. However, over time there could be a need to reconfigure the parameters, since power and cooling systems degrade over time with their characteristics.
AMD’s typical GPUs tend to show better performance than NVidia’s GPUs when it comes to $ GH / s, thanks in part to the command system well suited for SHA256 computing, and the VLIW architecture that contains more ALUs running in parallel, and at a slightly lower frequency. In particular, the shift and bit select operations can be implemented with a single AMD ISA instruction.
Integrated graphics, in particular from Intel, already have the best performance per watt, but the CPU is characterized by much more modest power-budgets - 200W, typical for the GPU, completely unacceptable for the CPU, which, generally speaking, is not only busy with graphics . In addition, one computer will not be able to connect as many CPUs as video cards could be connected (further on). Yes, and equipment depreciation is not so fast for the CPU. Therefore, inline graphics is not a miner's choice.

The main miner code, written in OpenCL, rather than assembler or machine code, was often patched after compilation in order to use GPU instructions not directly supported by OpenCL.
An implementation on OpenCL is one large block of code that first selects a series of parameters, based on the stream id, and then performs all 64 rounds of hashing in one unfolded loop.

A Datacenter In My Garage


Having spent $ 300-600 on mining equipment based on GPU, which, almost literally, prints money, and after spending a lot of time adjusting its parameters, the next step naturally arises the idea of ​​increasing the scale of the computing power catastrophe .

By purchasing another similar GPU and repeating the settings, you will double your profit. In fact, if the coins are mined so quickly and just as rapidly rising in price, it may be worth buying ten or even twenty GPUs! True, this can lead to a catastrophe - due to the group behavior of miners procured by video cards, the complexity of mining will skyrocket and the profitability of mining will also plummet. Fortunately, the catastrophe did not happen, thanks to the growth of the USD / BTC rate, the cost of buying video cards did pay off.
GPUs were much more accessible to end users than FPGAs. Their use for mining even requires PC assembly skills and hours of reading forums, but you can be a complete layman in parallel programming, not to mention the tools for working with FPGA. However, the GPU is characterized by several key limitations:
1. GPU does not work by itself. Each GPU should be included in the 8x or 16x PCI-E connector, which are relatively few on the motherboard.
2. The motherboard, processor, hard disk and RAM are practically not used in GPU mining, but the system becomes more expensive, i.e. increase the cost of mining per unit of performance. A typical user has a single PC at hand, where you can install 1-2 GPUs, but no more.
3. Each GPU consumes 200-300W, which very quickly exceeds the capacity of the power supply and requires an upgrade.
4. Standard chassis are not designed for cooling multiple GPUs. Especially if “several” is more than two.
5. The use of many GPUs quickly reaches the limits of power supply, cooling and noise levels acceptable in most residential areas.
6. Due to some problems (probably in the software part) of the implementation, OpenCL may require the display to be connected to the GPU. Although the technology itself does not put forward such requirements.
7. A typical GPU occupies two slots in the PC case, preventing a large number of video cards from being connected to the PC.

To solve these problems, the following solution appeared. First, because Bitcoin mining does not really use the interface bandwidth with the motherboard, then PCI-E 1x has enough bandwidth, and the GPU works and is connected in 1x slot. A simple cable, sold for $ 8, allows you to connect a 16x GPU to a 1x slot. True, this means that the video card cannot be located in the PC case, which prompted enthusiasts to create racks designed exclusively for mounting the GPU. Using the correct motherboard, with a large number of cheap 1x slots, solved the problem of connecting a large number of GPUs. The refusal of the usual body allowed to solve the problem of heat removal more effectively. A resistor plugged into the DVI connector successfully simulates a monitor connection, if one is required for OpenCL.
Using this approach, one motherboard, CPU, and RAM can serve 5-6 GPUs, thereby increasing the cost-effectiveness of the venture.


examples of how a GPU-based bitcoin mining machine may look (photo source)

Some such systems can work stably for several months, but then problems with stability of work begin to appear. The reason is that GPUs consume too much current on the 12V line, overloading the motherboard connector. The solution to this problem is an independent power supply to the video cards, bypassing the motherboard.

After solving problems with connecting the GPU to the forefront issues of proper power supply and cooling equipment. With the consumption of a single GPU in the region of 200W, the power density of such a system is comparable or even exceeds the performance of data centers. Actually, data centers are almost never used to host GPU miners due to the associated costs and equipment certification requirements. Also, few houses are able to provide such electrical power, and various pricing schemes for electricity supply can lead to exorbitant electricity bills. For these reasons, in practice, mining is most successful in the areas of warehouses, where there are no major problems with the generated noise and cooling, and electricity is available at industrial rates.


Bitcoin-miner containing 69 GPU (photo source)

FPGA: The third generation of miners


June 2011 brought to the public the first open-source implementation of a bitcoin miner for FPGA. FPGAs are good for both shift operations and bit operations that make up the core of the mining algorithm. An interesting challenge for FPGA developers was the development of such a design that would allow the effective use of various FPGAs, both hi-end and low-end.

The solution turned out to be quite elegant - the miner consists of several copies of the SHA256 module, which is parameterized by the depth of deployment. With full deployment, the module creates separate hardware nodes for each of the 64 rounds of hashing, separated by registers - a kind of pipeline. This implementation allows you to achieve performance 1shash / clock / module. The smaller depth of deployment leads to the fact that the pipeline is shorter, and the complete calculation of the hash requires several passes through its steps. If the FPGA is large enough, then several such pipelines can fit into it. The choice between the depth of deployment and the number of copies of the pipeline is a subject for optimization.

The main problem for FPGA-miners is that their power consumption is much higher than for typical cases of using FPGA. This is due to the almost constant activity of the logical elements in the calculation process. As a result, most ready-made FPGA boards, such as, for example, training kits that are easily accessible to students, could not provide either sufficient power or sufficient cooling. And for hi-end microcircuits, the problem was much sharper.

As a result, specialized boards appeared that minimized costs by discarding all redundant peripherals (RAM, I / O, etc.), and which were designed for the sole purpose of providing the necessary power and temperature for FPGA. The boards based on the Spartan XC6SLX150 FPGA allowed to achieve a performance of 860MH / s, at a frequency of 215 MHz, power consumption of 39W and cost $ 1060. Proprietary development company Butterfly Labs (BFL), located in Kansas, showed similar performance of 830 MH / s at a price of $ 599. The top solution from the same company based on FPGA Altera showed a performance of 25.2 GH / s at a price of $ 15K (650-750 MH / s per chip).
Currently, BFL has been and remains the most successful commercial bitcoin company.

Unfortunately, the FPGA was difficult to compete with the GPU - the latter cost ~ 30% less and had more potential for resale after completing their journey as a miner of bitcoins. It did not help that GPUs overtook the FPGA on the technical process ladder, often using a more modern and energy efficient technical process. However, the main advantage of the FPGA is almost fivefold gain in power consumption, which makes them as attractive as the GPU, subject to use for a year or two. In particular, the most advanced FPGAs, for example, produced by Intel for Altera on the latest 22nm and 14nm process technology, are extremely beneficial in terms of power consumption, but have a relatively high price.


500GH / s miner from BFL based on ASIC, surrounded by four FPGA miners at 25.2 GH / s. You can see two power cables - the miner consumes ~ 2700W. (photo source: James Gibson)

The FPGA board was short because a new generation of hardware, ASIC, was born, giving orders of magnitude and efficiency and efficiency. Efforts to develop FPGAs were not in vain, because A lot of things, from Verilog, the descriptions of the miner to the layout of printed circuit boards were reused.

About ASIC generation - the fourth generation of bitcoin miners, stepped into the era of silicon, made to order (Age of Bespoke Silicon) next time.

Source: https://habr.com/ru/post/205530/


All Articles