If you google search “Virtual Machine” you’ll get two (vague and non-exclusive) definitions:
The EVM falls into the latter definition.
The EVM, aside from allowing smart contract execution on numerous miner architectures and environments, provides:
In the simplest case, VMs are programs that consume and interpret bytecode. Bytecode consists of instructions (comprised of opcodes and arguments), which inform the VM how to proceed with computation. The VM keeps track of the specific section of the bytecode it is currently executing with a pointer called the program counter (PC).
There are additional complexities that can be introduced (e.g. just-in-time compilation) but let’s just ignore them for now.
A VM execution loop looks something like this:
Fetch the instruction the PC points to
Execute the instruction
If the instruction jumps, set the PC to the new target
Otherwise, increment the PC
In fact, we can see this exact structure in Geth’s (go-ethereum) EVM implementation (example heavily truncated):
Now that we understand how VMs work generally, let's dive into some specifics about the EVM.
Program VMs are typically built in one of two architectures:
The EVM is a stack-based VM. It uses big endian byte ordering for instructions, memory, and input data. Words in the EVM are 256 bits wide.
For a quick refresher on stacks, you typically PUSH data onto the top of it, POP data off, and apply instructions like ADD or MULT to the first few values that lay on top of it.
The EVM state components are:
Both the stack and memory are volatile (deleted after contract execution), but storage is persistent and stored in Ethereum’s world state (sticks around after execution).
You can think of the EVM memory like a big array (that’s the “linear” part). It can be addressed (indexed into) at the byte level (return 8 bit values) or at the word level (return 256 bit values).
Account storage is more like a map, pairing 256 bit keys with 256 bit values (words to words). Each location in storage is 0 initialized.
(diagrams adapted from EVM Illustrated)
As mentioned earlier, the EVM uses a cost mechanism for computation called gas. It exists to prevent network spam attacks, infinite loops, and excess resource usage.
Gas is spent on a per-instruction basis with some types of instructions (like storage writes) being far more expensive than others (like math operations). A list of opcodes with their respective gas prices can be found here.
There exist several implementations of the EVM specification, including the one embedded in Geth. We’ll use Geth as it is one of the most popular Ethereum clients and handily supports an EVM command line utility for easy invocation.
This, in addition to building the rest of Geth from source, will leave you with an evm executable in the build folder. Feel free to copy this executable to wherever you’d like to work.
We’re going to be using https://ethervm.io/ as our opcode reference. Make sure you keep it handy in another tab!
Now that we have an EVM implementation setup, let’s write some code! First, create an empty easm file to edit and open it up in your favorite editor.
Let’s write some EVM assembly, which is a text-based human readable format that will be assembled into EVM bytecode for us by geth’s evm
binary.
Go ahead and run ./evm compile hello.easm
. You should get the rather intractable looking string 600260020200
out. This is your evm assembly assembled into evm bytecode (in hexadecimal format)!
Now go ahead and try ./evm --debug run hello.easm.
The first value (0x
) is our return value. We have not explicitly called the RETURN opcode, so this is empty.
We can also see the execution trace of our bytecode, along with the program counter value (pc), current remaining gas, and gas cost of each instruction. Additionally we can see a full view of the stack and every step of execution. Neat! Notice that the stack is shown top to bottom (index 0 is the top most element).
From this trace we can see our bytecode first pushes the value “2” on top of the stack, and then pushes another “2”. It then calls “mult” which multiplies the first two values on the stack together and pushes the result (4). Finally, it calls stop, halting execution.
Looking at the return opcode definition, you can see that it returns values stored in memory (the expression form is return memory[offset:offset+length]
). So, let’s write a value to memory!
As previously mentioned, memory is sequentially addressable. We’ll use the MSTORE8
instruction, which sets memory at the index value on the top of the stack (we’ll call this s) to the second value on the stack (s+1).
Basically we’re telling the EVM to store the 8 bit value “2” at location “0”.
Running this through the evm, we can now see a layout of memory too, with our 2 value sitting in the first byte. Nice!
Next, let’s go ahead and try that return.
This is telling the evm “go ahead and return memory values 0 to 0+1”. Running that, we get a return value of 0x2
. Perfect!
We won’t be digging into the implementation details of how Ethereum handles persistent storage (for that, check out this StackExchange answer on Merkle Patricia Tries or this fantastic article). Instead we can treat storage as totally isolated on a per-contract basis.
As mentioned before, storage in the EVM operates as a map of 32 byte keys to 32 byte values. It is persistent, meaning that the current storage state sticks around even after contract exectuion completes (STOP or RETURN is called). Any subsequent runs of a contract will have read and write access to the same storage space. There is no opportunity for data races because the EVM does not currently support concurrent execution of a single contract - all transactions are executed sequentially.
We can see from the SSTORE
opcode definition that it behaves very similarly to the MSTORE
set of instructions.
The evm
tool will give us a key-value view into memory (leading 0s truncated for legibility):
Reading from storage behaves the same way.
One thing to note is the gas cost. Our SSTORE
cost a whopping 22100 gas! SLOAD
cost 100. Reading and writing to storage is very expensive.
Part 2 to come soon!