Under the hood Ethereum Virtual Machine. Part 1 - Solidity basics

Recently, the words “cryptocurrency” and “blockchain” can be heard more and more often in the news and, as a result, there is an influx of a large number of people interested in these technologies, and with it a huge amount of new products. Often, to implement some kind of internal logic of the project or to collect funds, smart contracts are used - special programs created on the Ethereum platform and living inside its blockchain. There is already enough material in the network devoted to the creation of simple smart contracts and basic principles, but there is practically no description of the work of the Ethereum virtual machine (hereinafter referred to as EVM) at a lower level, therefore in this series of articles I would like to analyze the work of EVM in more detail.

Solidity - the language created for the development of smart contracts, exists relatively recently - its development began only in 2014 and, as a result, in places it is “raw”. In this article I will begin with a more general description of the work of EVM and some of the distinctive features of solidity, which are needed for understanding lower-level work.

Ps The article assumes some basic knowledge about writing smart contracts, as well as about the Ethereum blockchain in general, so if you hear about it for the first time, I recommend that you first familiarize yourself with the basics, for example, here:

Memory
- Storage
- Memory
- Stack
Data location of complex types
Transactions and message calls
Visibility
Links

Memory types

Before you begin to dive into the subtleties of EVM, you should understand one of the most important moments - where and how all data is stored. This is very important, because the memory areas in EVM are very different in their device, and, as a result, not only the cost of reading / writing data, but also the mechanisms for working with them differ.

Storage

The first and most expensive type of memory is Storage. Each contract has its own storage memory, where all global variables ( state variables ) are stored, the state of which is constantly between function calls. It can be compared with a hard disk - after the completion of the current code, everything will be recorded in the blockchain, and the next time we call the contract, we will have access to all the data obtained earlier.

contract Test { // this variable is stored in storage uint some_data; // has default value for uint type (0) function set(uint arg1) { some_data = arg1; // some_data value was changed and saved in global } }

Structurally, storage is a key-value type of storage, where all cells are 32 bytes in size, which strongly resembles hash tables, so this memory is very sparse and we will not get any benefit from storing data in two neighboring cells: storing one variable in The first cell and the other in the 1000th cell will cost as much gas as if we stored them in cells 1 and 2.

 [32 bytes][32 bytes][32 bytes]...

As I said before, this type of memory is the most expensive - to take a new cell in storage costs 20,000 gas, change the occupied one - 5,000 and read - 200. Why is it so expensive? The reason is simple - the data stored in the storage contract will be recorded in the blockchain and will remain there forever.

Also, it is not difficult to calculate the maximum amount of information that can be stored in the contract: the number of cells is 2 ^ 256, the size of each is 32 bytes, so we have 2 ^ 261 bytes! In fact, we have a certain Turing machine - the possibility of recursive challenge / jumps and almost infinite memory. More than enough to simulate inside another Ethereum that will simulate Ethereum :)

Memory

The second type of memory is Memory. It is much cheaper than storage, cleared between external (you can read about the types of functions in the following chapters) function calls and is used to store temporary data: for example, arguments passed to functions, local variables, and storage of return values. It can be compared with RAM - when the computer (in our case, EVM) is turned off, its contents are erased.

 contract Test { ... function (uint a, uint b) returns (uint) { // a and b are stored in memory uint c = a + b // c has been written to memory too return c } }

By internal memory device is a byte array. At first it has a size of zero, but can be expanded with 32-byte chunks. Unlike storage, memory is continuous and therefore well packed — it is much cheaper to store an array of length 2, storing 2 variables, than an array of length 1000, storing the same 2 variables at the ends and zeroes in the middle.

Reading and writing one machine word (recall, in EVM is 256 bits) costs only 3 gases, while expanding the memory increases its value depending on the current size. Storage of several KBs will be inexpensive, but already 1 MB will cost millions of gas, because the price grows quadratically.

 // fee for expanding memory to SZ TOTALFEE(SZ) = SZ * 3 + floor(SZ**2 / 512) // if we need to expand memory from x to y, it would be // TOTALFEE(y) - TOTALFEE(x)

Stack

Since EVM has a stack organization, it is not surprising that the last memory area is the stack - it is used for all the EVM calculations, and the price of its use is similar to memory. It has a maximum size of 1024 elements of 256 bits, but only the top 16 elements are available for use. Of course, you can move elements of the stack to memory or storage, however, random access is impossible without first removing the top of the stack. If the stack is full, contract execution will be interrupted, so I advise you to leave the compiler to work with him;)

Data location of complex types

In solidity, working with 'complex' types, such as structures and arrays that may not fit into 256 bits, should be organized more carefully. Since copying them can be quite expensive, we need to think about where to store them: in memory (which is not permanent) or in storage (where all global variables are stored). For this, in solidity for arrays and structures there is an additional parameter - 'data location'. Depending on the context, this parameter is always the default value, but it can be changed by the keywords storage and memory. The standard value for the function arguments is memory, for local variables it is storage (for simple types it is still memory) and for global variables it is always storage.

There is also a third location - calldata. The data there is immutable, and work with them is organized as well as in memory. Arguments of external functions are always stored in calldata.

The location of the data is also important because it affects the way the assignment operator works: assignments between variables in storage and memory always create an independent copy, while assigning a local storage variable will only create a link that points to a global variable. Assigning the type of memory - memory also does not create a copy.

 contract C { uint[] x; // the data location of x is storage // the data location of memoryArray is memory function f(uint[] memoryArray) { x = memoryArray; // works, copies the whole array to storage // var is just a shortcut, that allows us automatically detect a type // you can replace it with uint[] var y = x; // works, assigns a pointer, data location of y is storage y[7]; // fine, returns the 8th element of x y.length = 2; // fine, modifies x through y delete x; // fine, clears the array, also modifies y uint[3] memory tmpArr = [1, 2, 3]; // tmpArr is located in memory var z = tmpArr; // works, assigns a pointer, data location of z is memory // The following does not work; it would need to create a new temporary / // unnamed array in storage, but storage is "statically" allocated: y = memoryArray; // This does not work either, since it would "reset" the pointer, but there // is no sensible location it could point to. delete y; g(x); // calls g, handing over a reference to x h(x); // calls h and creates an independent, temporary copy of x in memory h(tmpArr) // calls h, handing over a reference to tmpArr } function g(uint[] storage storageArray) internal {} function h(uint[] memoryArray) internal {} }

Transactions and message calls

In Ethereum, there are 2 types of accounts that share the same address space: External accounts are regular accounts controlled by pairs of private-public keys (or, more simply, people's accounts) and contract accounts are controlled by the code stored with them (smart contracts). A transaction is a message from one account to another (which may be the same, or a special zero account, see below) containing some data ( payload ) and Ether.

With transactions between regular accounts, everything is clear - they just convey the value. When the target account is a zero account (with address 0), the transaction creates a new contract, and its address forms from the sender's address and the number of transactions sent (the 'nonce' account). Payload of such a transaction is interpreted by EVM as bytecode and executed, and the output is saved as a contract code.

If the target account is a contract account, the code in it is executed, and the payload is passed as input. Contract account transactions cannot be sent on their own, but they can be launched in response to received ones (both from external account and from other contract accounts). Thus it is possible to ensure the interaction of contracts with each other through internal transactions ( message calls ). Internal transactions are identical to the usual - they also have the sender, recipient, Ether, gas, etc., and the contract can set their gas-limit when sending. The only difference from transactions created by regular accounts is that they live exclusively in the Ethereum execution environment.

Visibility

In solidity, there are 4 types of 'visibility' of functions and variables - external , public , internal and private , the standard is public. For global variables, the standard is internal, and external is not possible. So, consider all the options:

External - functions of this type are part of the contract interface, which means they can be called from other contracts by means of a message call. The called contract will receive a clean copy of the memory and access to payload data, which will be located in a separate section - calldata. After completion of the execution, the returned data will be placed in a pre-allocated place of the calling contract in memory. The external function cannot be called directly from within the contract (that is, we cannot use func() , however such a call is still possible - this.func() ). In the case when a lot of data is supplied to the input, these functions may be more efficient than the public (I will write about this below).
Internal - functions, as well as global variables of this type can be used only within the contract itself, as well as contracts inherited from it. Unlike the external functions, the former do not use message calls, but work by means of 'jumping' over the code (the JUMP instruction). Due to this, when calling such a function, memory is not cleared, which allows you to pass complex types stored in memory by reference (remember the example from the chapter Data location - tmpArr is passed to function h by reference).
Public - public functions are universal: they can be called both externally - that is, they are part of the contract interface, as well as from the inside of the contract. For public global variables, a special getter function is automatically generated - it has external visibility and returns the value of the variable.
Private - private functions and variables are no different from internal, except that they are not visible in inherited contracts.

For clarity, consider a small example.

 contract C { uint private data; function f(uint a) private returns(uint b) { return a + 1; } function setData(uint a) { data = a; } // default to public function getData() public returns(uint) { return data; } function compute(uint a, uint b) internal returns (uint) { return a+b; } } contract D { uint local; function readData() { C c = new C(); uint local = cf(7); // error: member "f" is not visible c.setData(3); local = c.getData(); local = c.compute(3, 5); // error: member "compute" is not visible } } contract E is C { function g() { C c = new C(); uint val = compute(3, 5); // acces to internal member (from derivated to parent contract) uint tmp = f(8); // error: member "f" is not visible in derived contracts } }

One of the most frequent questions is "why do we need external functions, if you can always use public?" In fact, there is no case when external cannot be replaced by public, however, as I already wrote, in some cases it is more efficient. Let's look at a specific example.

 contract Test { function test(uint[3] a) public returns (uint) { // a is copied to memory return a[2]*2; } function test2(uint[3] a) external returns (uint) { // a is located in calldata return a[2]*2; } }

Performing the public function costs 413 gas, while calling the external version of only 281. This happens because in the public function the array is copied into memory, while in the external function the reading comes directly from the calldata. Allocating memory is obviously more expensive than reading from calldata.

The reason that public functions need to copy all the arguments into memory is that they can also be called from within the contract, which is a completely different process - as I wrote earlier, they work by jumping in the code, and the arrays are passed through pointers to memory. Thus, when the compiler generates code for the internal function, it expects to see the arguments in memory.

For external functions, the compiler does not need to provide internal access, so it provides access to reading data directly from calldata, bypassing the copy step into memory.

Thus, competent selection of the type of 'visibility' serves not only to limit access to functions, but also allows them to be used more efficiently.

PS: In the following articles I will go over to the analysis of work and optimization of complex types at the level of bytecode, and also I will write about the main vulnerabilities and bugs that are present in solidity at the moment.

Links

Source: https://habr.com/ru/post/340928/

All Articles