
I do not know how anyone, but I was shocked by the past 2017 by the rapid rise of Bitcoin. Now, of course, the excitement is already gone, and in the 17th year, everyone was talking and writing about cryptocurrency.
I saw that people are trying to make money on cryptocurrencies. Who knows how. Someone bought up all the savings on video cards and started self-mining in the garage. Someone invested in cloud mining. Someone is trying to organize your pool. Someone launched into the production of chocolate bitcoins, and someone produces mineral water:
')

I also began to study what these very bitcoins are. Once I even started my own research of the SHA256 algorithm and wrote an article here on the Habré "
Is it possible to calculate bitcoins faster, easier or easier? ". My research into hashing algorithms is still ongoing and is still not nearly completed ... Maybe someday I will write a separate article about this. And now for now this.
I tried running a bitcoin miner in an FPGA. I understood that the time had already passed, but I still wanted to touch the technology. Already at the end of last year, for some reason, I suddenly remembered that my Terasic DE10-Standard motherboard with Intel Cyclone V 5CSXFC6D6F31C6 FPGA is absolutely idle - this is the chip with the ARM processor. I thought it would be interesting to run some sort of altcoin miner in this board. Why? I no longer need to invest in equipment, it is already there. The main thing is that the board earns more than it consumes energy.
Finding the right altcoin was pretty simple. I was looking for ready-made projects for FPGA, which I can adapt to my board. Those were not very many. In fact, as I understand it all over the world there are only a few people who did FPGA projects and most importantly published them in open access, for example, on github.
Thus, I took the project
github.com/kramble/FPGA-Blakecoin-Miner and adapted it to the
Mars Rover3 card I had, and also adapted this project for the DE10-Standard.
Actually about how I adapted the
project for the Mars Rover 3 board is written here . For Cyclone V, in principle, everything is the same - only the revision of the project of blake_cv kvartus, my sources
here .
To my regret, only three blake functions hash fit into my Cyclone V.

Slightly lacking FPGA capacity for up to four hashes. I run the project at a frequency of 120 MHz and one blake hash is calculated per clock frequency of operation. So the performance of my project is 120 * 3 = 360MH / sec. Not very much to be honest, however, as I have already said, I already had a board, and I don’t need to return its cost ... Quartus says that Fmax = 150MHz. You can try to raise the frequency, but I'm afraid I will have to put a cooler, it will buzz - well, not so much I need these crypts to listen to the hum in the room.
The general idea of ​​the project is as follows: the board has a chip which has both FPGA and Dual-ARM:

When the board starts, the FPGA firstly loads from U-BOOT, then Linux starts and the cgminer mining program starts there. At first I thought that I could arrange a virtual communication channel between ARM and FPGA, and this is actually possible, but it didn’t work out that way. The fact is that the miner program cgminer works with hardware miners via USB and uses the libusb library. That is, it is easier for me to connect the FPGA to the Linux system via the USB-COM to FTDI converter than to fence the town by connecting the FPGA to the ARM bus. I was already doing
this somehow and
it was not very easy .
Now my "miner" looks like this (on Cyclone V I put the radiator on thermal grease, otherwise it gets very hot):

To tell you the truth, the main problems I had with the cgminer were not with the FPGA project.
The problems are as follows:
1) What cgminer should I use as a basis for my development? And the related question "Where to connect to start mining?". And what is the relationship between these issues? It would seem, where is the problem here - take the most recent cgminer, which one you will find. But let me: there are 98 forks of the cgminer program on github. All of them are somehow different, which is good, and which is bad, which is even at all working? Here you have an open-source. Each author added something to himself and corrected, or broke ... or made his own coin. Understand is not easy. I found a
site for myself, where on one page there is a link to both the github
project and the github project for
FPGA . That is, these two projects apparently somehow can and should intersect.
2) Since I took the project from the author of kramble as the basis for FPGA, in fact, of course, it would be logical to take his patches, which he attached to his project. But here, not without problems. He has patches for the cgminer-3.1.1 and cgminer-3.4.3 programs. I decided that it was better to take one that is newer than 3.4.3, but only lost time with it. It seems that the author began to adapt for this version, but something did not bring it to the end, and this version is quite raw. I had to take 3.1.1 and this seems to be an old version at all.
3) Authors who change the cgminer program in their forks for their altcoins do not monitor the correctness of comments and the naming of functions in the code. Often in the code here and there the word bitcoin is found, and this fork of cgminer itself already seems to be unable to read for bitcoin, but it can only be in altcoin.
4) Tests. WHERE TESTS? I don't understand something, how can I make a complex product without tests? I did not find them.
To tell the truth, even starting to do something was not easy. Imagine that you need to run some project in the FPGA, but it is not very clear what it should do, how to get data, what data and in what form it is necessary to produce the result. Some program must be attached to this FPGA project, which is not known exactly where to get it, but it must detect the miner's fee, send something there (it is not known what) and receive something from it. In what format, what blocks, how often - nothing is known.
In fact, studying the cgminer patches from kramble, I can imagine how it should work.
The usbutils.c file contains devices that can be considered as hardware external miners on the USB bus:
static struct usb_find_devices find_dev[] = { #ifdef USE_BFLSC { .drv = DRV_BFLSC, .name = "BAS", .ident = IDENT_BAS, .idVendor = IDVENDOR_FTDI, .idProduct = 0x6014,
I added the descriptor of my USB-to-COM converter FTDI-2232H to this structure. Now, if cgminer detects a device with VendorId / DeviceId = 0x0403: 0x6010, then it will try to work with this device as with the Icarus board, although it is not.
Next we look at the driver-icarus.c file and there is a function icarus_detect_one:
static bool icarus_detect_one(struct libusb_device *dev, struct usb_find_devices *found) { int this_option_offset = ++option_offset; struct ICARUS_INFO *info; struct timeval tv_start, tv_finish; const char golden_ob[] =
The meaning of this. The program transmits to the board a well-known task for finding a hash, and the task says from which nonce to start the calculation and this nonuse is slightly smaller than the real GOLDEN nonce. Thus, the board will start counting from the specified place and literally immediately in a matter of a split second stumble upon a GOLDEN nonce and return it. The program will immediately receive this result, compare it with the correct answer and immediately it becomes clear - this is really the HW miner with whom you can work or not.
And here there was a terrible problem - there are patches in C in the project, there is a test program on python and a testbench for FPGA.
In C patches, test data looks like this:
1) patch for cgminer-3.1.1
const char golden_ob[] = "553bf521cf6f816d21b2e3c660f29469" "f8b6ae935291176ef5dda6fe442ca6e4" "00000000000000000000000000000000" "00000000d1d9011caafb56522d4278bf"; const char golden_nonce[] = "00468bb4"; const uint32_t golden_nonce_val = 0x00468bb4;
1) patch for cgminer-3.4.3
const char golden_ob[] = "553bf521cf6f816d21b2e3c660f29469" "f8b6ae935291176ef5dda6fe442ca6e4" "00000000000000000000000000000000" "00000000d1d9011caafb56522d4278bf"; const char golden_nonce[] = "000187a2"; const uint32_t golden_nonce_val = 0x000187a2;
And what is right and what is not? The initial data are the same, and the golden nonce is declared different !!! Paradox ... (I will say in advance that the error for cgminer-3.4.3 is not the 0x000187a2 error, but how much time I spent on it ..)
The project has a test program on python that reads a text file, extracts data from it and sends it to the board via the serial port ... There are test data like this:
0000007057711b0d70d8682bd9eace78d4d1b42f82da7d934fac0db4001124d600000000cfb48fb35e8c6798b32e0f08f1dc3b6819faf768e1b23cc4226b944113334cc45255cc1f1c085340967d6c0e000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
0000007057711b0d70d8682bd9eace78d4d1b42f82da7d934fac0db4001124d6000000008fa40da64f312f0fa4ad43e2075558faf4e6d910020709bb1f79d0fe94e0416f5255cc521c085340df6b6e01000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
0000007095696e4529ae6568e4b2a0057a18e82ccf8d370bf87e358900f8ab5000000000253c6078c7245036a36c8e25fb2c1f99c938aeb8fac0be157c3b2fe34da2fa0952587a471c00fa391d2e5b02000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
000000704445e0446fcf2a84c47ce7305722c76507ba74796eaf39fe0007d44d00000000cac961f63513134a82713b172f45c9b5e5eea25d63e27851fac443081f453de1525886fe1c01741184a5c70e000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
00000070a3ac7627ca52f2b9d9a5607ac8212674e50eb8c6fb1219c80061ccd500000000ed5222b4f77e0d1b434e1e1c70608bc5d8cd9d363a59cbeb890f6cd433a6bd8d5258a0141c00b4e770777200000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
000000706c90b789e84044d5be8b2fac01fafe3933ca3735269671e90043f8d900000000d74578c643ab8e267ab58bf117d61bb71a04960a10af9a649c0060cdb0caaca35258b3f81c00b4e7b1b94201000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
00000070171d2644781cccf873ce3b6e54967afda244c47fc963bb240141b4ad00000000d56c4fbdc326e8f672834c8dbca53a087147fe0996d0c3a908a860e3db0589665258da3d1c016a2a14603a0a000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
00000070d03c78cb0bb0b41a5a2c6ce75402e5be8a705a823928a5640011110400000000028fb80785a6310685f66a4e81e8f38800ea389df7f16cf2ffad16bb98e0c4855258dda01c016a2ae026d404000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
0000007091a7eef446c4cb686aff8908ab5539d03a9ab2e975b9fe5700ed4ca9000000000f83bb385440decc66c10c0657fcd05f94c0bc844ebc744bba25b5bc2a7a557b5258e27c1c016a2a6ce1900a000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
00000070856bd0a3fda5dac9ede45137e0c5648d82e64fbe72477f5300e96aec0000000026ca273dbbd919bdd13ba1fcac2106e1f63b70f1f5f5f068dd1da94491ed0aa45258e51b1c017a7644697709000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000
Well, that is completely different! Then I already realized that this is not the data that is sent to the board, data is only extracted from this data, converted into a task in a special way and sent to the board.
But all the same, among these test data for the program on python there is NO task similar to the one described in the program in C !!!
Well, then I look at the test program testbench on verilog:
blakeminer #(.comm_clk_frequency(comm_clk_frequency)) uut (clk, RxD, TxD, led, extminer_rxd, extminer_txd, dip, TMP_SCL, TMP_SDA, TMP_ALERT); // TEST DATA (diff=1) NB target, nonce, data, midstate (shifted from the msb/left end) - GENESIS BLOCK reg [415:0] data = 416'h000007ffffbd9207ffff001e11f35052d554469e3171e6831d493f45254964259bc31bade1b5bb1ae3c327bc54073d19f0ea633b; // ALSO test starting at -1 and -2 nonce to check for timing issues // reg [415:0] data = 416'h000007ffffbd9206ffff001e11f35052d554469e3171e6831d493f45254964259bc31bade1b5bb1ae3c327bc54073d19f0ea633b; // reg [415:0] data = 416'h000007ffffbd9205ffff001e11f35052d554469e3171e6831d493f45254964259bc31bade1b5bb1ae3c327bc54073d19f0ea633b; reg serial_send = 0; wire serial_busy; reg [31:0] data_32 = 0; reg [31:0] start_cycle = 0; serial_transmit #(.comm_clk_frequency(comm_clk_frequency), .baud_rate(baud_rate)) sertx (.clk(clk), .TxD(RxD), .send(serial_send), .busy(serial_busy), .word(data_32));
There is an estimated data packet that the board must accept. But again this supposed data packet is in no way similar to a data packet in a C program or to data for a test program on python.
This lack of common test data for the program on python, C and Verilog spoils the picture very much. It turns out that there are no common points of contact between the components, common tests, and this is sad.
In general, in the trust of the blakecoin miner project, another shaped mockery of my body was hidden.
If you carry out a simulation of the project with verilog testbench, then in the simulator with these test data here 416'h000007ffffbd9207ffff001e11f35052d5544 ... the result of GOLDEN nonce is remarkably located and returned.
Then I compile the project for a real FPGA board, I submit the same data from the program on python and ... the board does not find GOLDEN nonce ...
It turns out that the test data in verilog testbench is “slightly bad”. They are for low complexity, when there are only 24 leading zeros in the resulting hash, not 32 as required.
In the file experimental / LX150-FourPiped / BLAKE_CORE_FOURPIPED.v there is such a code
reg gn_match_d = 1'b0; always @(posedge clk) `ifndef SIM gn_match_d <= (IV7 ^ b76 ^ d74) == 0; `else gn_match_d <= (IV7[23:0] ^ b76[23:0] ^ d74[23:0]) == 0; `endif
In the Verilog simulator, it is not checked the way it will work in hardware! That is, for a real FPGA board, we will check for 32 bits of leading zeros, and in the simulation we will check only 24 bits. This is just lovely. I want to beat the author.
Of course, I won it all. At the very least, the test program on python gives a cheerful message:

Okay, what's the result? How many naynil Unfortunately not at all.
As soon as I was ready to start mining, literally at the end of January, the complexity of Blake greatly increased:

Now I could leave a fee for the day and even though she found solutions, they were not taken by the pool - there are still few leading zeros.
I tried to switch to another currency - VCASH. With this currency, the pool at least sometimes gave me invigorating messages like this:

But still, the VCASH pool does not charge anything. Sadness, trouble.
Taking this opportunity I would like to ask knowledgeable people. Here I have an Nvidia 1060 video card. It issues 1.25GHash / sec on a blykoin and in an hour two or three times issues a nonce, which takes a pool (and charges a penny). I thought that if my FPGA board counts 360MHash / sec, well, that is, about 3 times worse than the video card, then in two hours I will receive at least one non-accepted pool. However, this does not happen. Even for a day there is not a single penny ... Where is the catch for me and there remains a mystery ...
Now at my leisure I am trying to understand whether it is possible to somehow optimize an existing FPGA project, say, use the built-in memory or something else. Maybe, if you're lucky, I'll think of something.