More efficient writing of arrays to the permanent memory of a smart contract in Solidity

Recently, I had to work a little bit with the Ethereum blockchain . The idea I was working on required to store some rather large numbers of integers right in the blockchain, so that the smart contract would have easy access to them. Most of the lessons on the development of smart contracts tell us "do not store a lot of data in the blockchain, it is expensive!". But how much is “a lot” and how much does the price get too high for practical use? This I had to find out, because we couldn’t take our data off-line, the whole idea collapsed.

I am just starting to work with Solidity and EVM, so this article does not pretend to be the ultimate truth, but I couldn’t find any other materials on this topic either in Russian or in English (although it’s very bad that I didn’t find this article before ), so I hope that she can be useful to someone. Well, or in a pinch, it may be useful to me if experienced comrades tell you how and where I am wrong in it.

To begin with, I decided to quickly estimate - will we succeed? Let's take the standard, widespread type of contract - the ERC20 token. At least, such a contract stores in the blockchain the correspondence of the addresses of the people who bought the tokens to their balances. In reality, only balances are stored, each of which occupies 32 bytes (in fact, there is no sense in saving here due to the features of Solidity and EVM). A more or less successful token can easily have tens of thousands of owners, and thus we find that storing around 320,000 bytes in the blockchain is quite acceptable. And we do not need more!
')

Naive approach

Well, let's try to save our data. Most of them are 8-bit unsigned integer numbers, so we will transfer their array to a contract, and try to write them into permanent memory:

uint8[] m_test; function test(uint8[] data) public { m_test = data; }

Opanki! This function eats gas, as not in itself. Attempting to save 100 values cost us 814033 gas, 8100 gas per byte!

Let's exhale and take a step back to theory. What is the minimum cost (in gas) of storing data in the Ethereum blockchain? We must remember that the data is stored in blocks of 32 bytes. EVM can read or write only a whole block at once, so ideally, the recorded data should be packaged as efficiently as possible so that one write command can immediately save more. Because this very recording team - SSTORE - in itself costs 20,000 gas (if we write to a memory cell that we haven’t written to yet). So our theoretical minimum, ignoring all other expenses, is about 625 gas per byte. Far from the 8100 that we got in the example above! Now is the time to dig deeper and find out who is eating our gas, and how to stop it.

Our first rush should be to look at the code generated by the Solidity compiler from our single line (m_test = data), because there’s nothing more to watch. This is a good, right impulse, which will acquaint us with a terrifying fact - the compiler in this place has generated some ancient horrors that at first glance you will not understand! Looking at the listing with a quick glance, we see not only SSTORE there (which is expected), but also SLOAD (loading from the permanent memory) and even EXP (exponentiation)! In general, it all looks like a very expensive way to record data. And worst of all, it becomes quite obvious that SSTORE is called too, too often. What is going on here?

A few things. It turns out that storing 8-bit integers is almost the worst thing you can do with EVM / Solidity (the article, the link to which I cited at the beginning, speaks about this). We lose productivity (which means we pay more gas) at every step. First, when we pass an array of 8-bit values to the input of our function, each of them expands to 256 bits. That is, only on the size of the transaction data, we already lose 32 times! Nicely! However, an attentive reader will notice that the cost per saved byte is, however, only 13 times higher than the theoretical minimum, and not at 32, which means that while saving the permanent memory of the contract is not so bad. The point is this: Solidity, while still saving the data, and in the permanent memory of the contract, our 8-bit numbers will be stored as efficiently as possible, 32 pieces in each memory block. This raises the question, how does the conversion of "256-bit" unpacked numbers that came to us in the input data of the function in a packaged form? The answer is “the most foolish way I can imagine.”

If we write everything that happens in a simplified form, then our lonely line of code turns into a creepy loop:

 for(uint i = 0; i < data.length; ++i) { //      ,    256-bit  8-bit uint8 from_value = uint8(data[i]); //  32-     -        ,     uint256 to_value = get_storage_data_at_offset(m_test, i); //        (    2  ) add_byte_to_value(to_value, i % 32, from_value); //  32-      set_storage_data_at_offset(m_test, i, to_value); }

How this code looks like has almost no effect on enabling or disabling optimization (in any case, in the Solidity compiler version 0.4.24), and, as you can see, it calls SSTORE (as part of set_storage_data_at_offset) 32 times more often than necessary (once for every 8-bit number, and not once for 32 such numbers). The only thing that saves us from a complete fiasco is that re-recording in the same cell costs not 5,000, but 5,000 gas. So every 32 bytes cost us 20,000 + 5,000 * 31 = 125,000 gas, or about 4,000 gas per byte. The rest of the cost, which we saw above, comes from reading memory (also not a cheap operation) and other calculations hidden in the code above in functions (and there are a lot of them).

Well, we can not do anything with the compiler, ~~so we will look for a button~~ . It only remains to conclude that it is not necessary to transfer and store in the contract arrays of 8-bit numbers in this way.

Simple solution for 8-bit numbers

And what should? And so:

 bytes m_test; function test(bytes data) public { m_test = data; }

We operate in all fields of type bytes. With this approach, the preservation of 100 values will cost 129914 gas - just 1300 gas per byte, 6 times better than when using uint8 []! Some inconvenience will be a payment for this — elements of an array of type bytes have type bytes1, which is not automatically converted to any of the usual integer types, so you will have to place explicit type conversions in the right places. Not very nice, but the gain is 6 times at the cost of the record, I think it is worth it! And, yes, we will lose a little when working with this data later, when reading, compared to storing each number as 256-bit, but here the scale already begins to matter: the gain from saving thousands of other 8-bit numbers in packaged form can , depending on the task, outweigh the losses when reading them later.

Before coming to this approach, I first tried to write a more efficient data saving function on the local macro assembler JULIA , but I ran into some problems that caused my solution to be a little less efficient, and gave about 1530 gas consumption per byte. However, it is still useful to us in this article, so the work was not done in vain.

Additionally, I note that the more data you save at a time - the lower the cost per byte, which suggests that part of the cost is fixed. For example, if you save 3000 values, then when approaching with bytes we get 900 gas per byte.

More general solution

Well, that, all is well, that ends well, right? But our problems did not end here, because sometimes you want to write in the memory of the contract not only 8-bit numbers, but other data types that are not a direct match to the type bytes. That is, it is clear that anything can be encoded into the byte buffer, but then it may not be convenient to get it from there, and it is even expensive because of the extra gestures to convert the raw memory to the desired type. So the function that saves the passed bytes array to the array of the desired type is still useful to us. It is quite simple, but it took me a long time to find all the necessary information and get into EVM and JULIA to write it, and all this was not collected in one place. Therefore, I think it will be useful if I give here what I’ve dug.

First, let's talk about how Solidity stores an array in memory. Arrays are a concept that exist only within the framework of Solidity, EVM does not know anything about them, but simply stores a virtual array of 2 ^ 256 32-byte blocks. It is clear that empty blocks are not stored, but in fact, we have a table of non-empty blocks, the key of which is a 256-bit number. And it is this number that is taken as the input to the EVM SSTORE and SLOAD commands (this is not entirely obvious from the documentation).

To store arrays, Solidity does such a tricky thing : first, the “main” block of the array is allocated for it somewhere in the permanent memory, in the usual order of placement of contract members (or structures, but this is a separate song), as if it were The usual 256-bit number. This ensures that the array receives one complete block for itself, regardless of other stored variables. In this block, the length of the array is stored. But since it is not known in advance, and it can change (we are talking about dynamic arrays), then the authors of Solidity had to figure out where to put the data in the array so that they do not accidentally intersect with the data of another array. Strictly speaking, this is an insoluble task: if you create two arrays longer than 2 ^ 128, then they are guaranteed to intersect where you don’t place them, but in practice no one should do that, so this simple trick is used: the SHA3 hash is taken from the number of the main block of the array , and the resulting number is used as a key in the block table (which, I recall, 2 ^ 256). According to this key, the first data block of the array is placed, and the rest are sequentially after it, if necessary. The probability of collision of non-giant arrays is extremely small.

Thus, in theory, all we need to do is find where the array data is and copy the byte buffer passed to us block by block. While we are working with types smaller than half a block, we will at least a little bit, but win over the “naive” solution generated by the compiler.

Only one problem remains - if everything is done like this, then the bytes in our array will turn out backwards. Because EVM is big-endian. The easiest and most effective way is, of course, to expand the bytes when sending, but for the simplicity of the API, I decided to do this in the contract code. If you want to save a little more - feel free to throw away this part of the function, and do everything at the time of sending.

Here is the function that I got to turn an array of bytes into an array of 64-bit signed integers (however, it can be easily adapted to other types):

 function assign_int64_storage_from_bytes(int64[] storage to, bytes memory from) internal { //    .      int64,     8    (sizeof  Solidity  :( ) to.length = from.length / 8; //     ,  SHA3      uint256 addr; bytes32 base; assembly{ // keccak256   ,    ,          mstore(addr, to_slot) base := keccak256(addr, 32) } uint i = 0; for(uint offset = 0; offset < from.length; offset += 32) { //  32-     //     32  -  ,   ,     uint256 tmp; assembly{ tmp := mload(add(from, add(offset,32))) } //   .  ,     ,       . for(uint b = 0; b < 16; ++b) { uint shift = b*8; uint shift2 = (256 - (b+1)*8); uint low = (tmp & (0xFF << shift)) >> shift; uint high = (tmp & (0xFF << shift2)) >> shift2; tmp = tmp & ~( (0xFF << shift) | (0xFF << shift2)); tmp = tmp | (low << shift2) | (high << shift); } //      assembly{ sstore(add(base, i), tmp) } i += 1; } }

With 64-bit numbers, we win not so much as with 8-bit ones, compared to the code that the compiler generates, but nevertheless this function spends 718466 gas (7184 gas per number, 898 gas per byte) versus 1003225 naive solutions (1003 gas per number, 1254 per byte), which makes its use quite meaningful. And as mentioned above, you can save more by removing the address bytes to the caller.

It is worth noting that the gas limit per unit in Ethereum sets a limit on how much data we can write in one transaction. To make matters worse, writing data to an already filled array is a much more difficult task, except when the last used block of the array was filled to the limit (in which case you can use the same function, but with a different indent). At the moment, the gas limit per unit is about 6 million, which means that we can more or less save 6K of data at a time, and in fact - even less, due to other expenses.

Coming changes

The forthcoming changes in the Ethereum network in October, which will occur with the activation of Constantinople's EIPs, should lead to the fact that saving data will become easier and cheaper - EIP 1087 assumes that data storage will not be charged for each SSTORE command, but for the number of blocks changed, which will make the naive approach used by the compiler almost as beneficial as the hand-written code in JULIA (but not quite - a lot of unnecessary gestures will remain there, especially for 8-bit values). The future transition to WebAssembly as the base language of EVM will change the picture even more, but this is still a very distant prospect, and the tasks should be solved now.

This post does not claim to be the best solution to the problem posed, and I will be glad if someone offers a more effective one - so far I have just started to get used to Ethereum, and could have overlooked some EVM features that could help me. But in my searches on the network, I did not see anything about this problem, and maybe the above thoughts and code will be useful for someone as a starting point for optimization.

Source: https://habr.com/ru/post/423639/

All Articles

More efficient writing of arrays to the permanent memory of a smart contract in Solidity

Naive approach

Simple solution for 8-bit numbers

More general solution

Coming changes

More articles: