# A Playdate with the EVM ## What's a VM? If you google search “Virtual Machine” you’ll get two (vague and non-exclusive) definitions: * **System virtual machines**, which are programs that emulate a specific real machine ([QEMU](https://www.qemu.org/), [VMware](https://www.vmware.com/products/workstation-pro.html), and even emulators like [Dolphin](https://dolphin-emu.org/) and [DeSmuME](http://desmume.org/) fall into this category) * **Process virtual machines**, which are programs that intend to allow other programs to run on multiple platforms (like the [JVM](https://en.wikipedia.org/wiki/Java_virtual_machine) or the [.NET runtime](https://en.wikipedia.org/wiki/.NET_Framework)) The EVM falls into the latter definition. ## Why a VM? The EVM, aside from allowing smart contract execution on numerous miner architectures and environments, provides: * Sandboxing for smart contract execution * Ethereum specific functionality including: * Addressing the [halting problem](https://en.wikipedia.org/wiki/Halting_problem) with the [gas mechanism](https://ethereum.org/en/developers/docs/gas/) * Storage and memory primitives * Execution contexts (caller, origin, balance, gas limit, etc) * Quality of life functions (keccak256, etc) * Cross-contract communication (call, delegatecall, return, etc) * A simple compilation target for languages like Solidity and Vyper ## How do VMs Work? In the simplest case, VMs are programs that consume and interpret *bytecode*. Bytecode consists of *instructions* (comprised of opcodes and arguments), which inform the VM how to proceed with computation. The VM keeps track of the specific section of the bytecode it is currently executing with a pointer called the *program counter* (PC). There are additional complexities that can be introduced (e.g. [just-in-time compilation](https://aboullaite.me/understanding-jit-compiler-just-in-time-compiler/)) but let’s just ignore them for now. A VM execution loop looks something like this: > Fetch the instruction the PC points to > Execute the instruction > If the instruction jumps, set the PC to the new target > Otherwise, increment the PC In fact, we can see this exact structure in [Geth’s (go-ethereum) EVM implementation](https://github.com/ethereum/go-ethereum/blob/f05419f0fb8c5328dca92ea9fb184d082300344a/core/vm/interpreter.go#L196) (example heavily truncated): ```go pc = uint64(0) for { op = contract.GetOp(pc) operation := in.cfg.JumpTable[op] res, err = operation.execute(&pc, in, callContext) switch { case err != nil: return nil, err case operation.reverts: return res, ErrExecutionReverted case operation.halts: return res, nil case !operation.jumps: pc++ } } ``` ## EVM Nitty-Gritty Now that we understand how VMs work generally, let's dive into some specifics about the EVM. Program VMs are typically built in one of two architectures: * **Register-based VMs** that closely mirror the execution model of traditional instruction set architectures (e.g. x86, ARM). These primarily operate on data stored in a fixed set of memory locations called registers. * **Stack-based VMs** that primarily operate on data stored in a [stack data structure](https://en.wikipedia.org/wiki/Stack_(abstract_data_type)) (LIFO). These are also sometimes called [stack machines](https://en.wikipedia.org/wiki/Stack_machine). The EVM is a stack-based VM. It uses [big endian](https://en.wikipedia.org/wiki/Endianness) byte ordering for instructions, memory, and input data. [Words](https://en.wikipedia.org/wiki/Word_(computer_architecture)) in the EVM are 256 bits wide. For a quick refresher on stacks, you typically PUSH data onto the top of it, POP data off, and apply instructions like ADD or MULT to the first few values that lay on top of it. ![](https://i.imgur.com/q6iEY7Z.png) ![](https://i.imgur.com/SYJBUBS.png) ## Memory & Storage The EVM state components are: * The **Stack** - with maximum **1024 elements**. Each element is **256 bits** (1 word). * **Memory** (volatile memory) - byte addressed linear memory. * **Storage** (persistent memory) - key-value store of **256 bit to 256 bit pairs**. ![](https://i.imgur.com/IM8CqHz.png) Both the stack and memory are **volatile** (deleted after contract execution), but storage is **persistent** and stored in Ethereum’s world state (sticks around after execution). You can think of the EVM memory like a **big array** (that’s the “linear” part). It can be addressed (indexed into) at the **byte level** (return 8 bit values) or at the **word level** (return 256 bit values). Account storage is more like a **map**, pairing **256 bit keys** with **256 bit values** (words to words). Each location in storage is **0 initialized**. ![](https://i.imgur.com/94oyuPz.png) *(diagrams adapted from EVM Illustrated)* ## Gas As mentioned earlier, the EVM uses a cost mechanism for computation called gas. It exists to prevent network spam attacks, infinite loops, and excess resource usage. Gas is spent on a per-instruction basis with some types of instructions (like storage writes) being far more expensive than others (like math operations). [A list of opcodes with their respective gas prices can be found here](https://github.com/crytic/evm-opcodes). ## EVM Setup There exist several implementations of the EVM specification, including the one embedded in Geth. We’ll use Geth as it is one of the most popular Ethereum clients and handily supports an EVM command line utility for easy invocation. ```shell $ git clone https://github.com/ethereum/go-ethereum.git $ cd go-ethereum $ make all ``` This, in addition to building the rest of Geth from source, will leave you with an evm executable in the build folder. Feel free to copy this executable to wherever you’d like to work. We’re going to be using [https://ethervm.io/](https://ethervm.io/) as our opcode reference. Make sure you keep it handy in another tab! ## Hello, EVM! Now that we have an EVM implementation setup, let’s write some code! First, create an empty easm file to edit and open it up in your favorite editor. ```shell $ touch hello.easm ``` Let’s write some **EVM assembly**, which is a text-based human readable format that will be *assembled* into EVM bytecode for us by geth’s `evm` binary. ``` PUSH 0x2 PUSH 0x2 MUL STOP ``` Go ahead and run `./evm compile hello.easm`. You should get the rather intractable looking string `600260020200` out. This is your evm assembly assembled into evm bytecode (in hexadecimal format)! Now go ahead and try `./evm --debug run hello.easm.` ``` 0x #### TRACE #### PUSH1 pc=00000000 gas=10000000000 cost=3 PUSH1 pc=00000002 gas=9999999997 cost=3 Stack: 00000000 0x2 MUL pc=00000004 gas=9999999994 cost=5 Stack: 00000000 0x2 00000001 0x2 STOP pc=00000005 gas=9999999989 cost=0 Stack: 00000000 0x4 #### LOGS #### ``` The first value (`0x`) is our return value. We have not explicitly called the RETURN opcode, so this is empty. We can also see the execution trace of our bytecode, along with the program counter value (pc), current remaining gas, and gas cost of each instruction. Additionally we can see a full view of the stack and every step of execution. Neat! Notice that the stack is shown top to bottom (index 0 is the top most element). From this trace we can see our bytecode first pushes the value “2” on top of the stack, and then pushes another “2”. It then calls “mult” which multiplies the first two values on the stack together and pushes the result (4). Finally, it calls stop, halting execution. ## Returning a Value Looking at the [return opcode definition](https://ethervm.io/#F3), you can see that it returns values stored in memory (the expression form is `return memory[offset:offset+length]`). So, let’s write a value to memory! As previously mentioned, memory is sequentially addressable. We’ll use the `MSTORE8` instruction, which sets memory at the index value on the top of the stack (we’ll call this *s*) to the second value on the stack (*s+1*). ``` PUSH 0x2 PUSH 0x0 MSTORE8 ``` Basically we’re telling the EVM to store the 8 bit value “2” at location “0”. Running this through the evm, we can now see a layout of memory too, with our 2 value sitting in the first byte. Nice! ``` Memory: 00000000 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ``` Next, let’s go ahead and try that return. ``` PUSH 0x1 PUSH 0x0 RETURN ``` This is telling the evm “go ahead and return memory values 0 to 0+1”. Running that, we get a return value of `0x2`. Perfect! ## Storage We won’t be digging into the implementation details of how Ethereum handles persistent storage (for that, check out this [StackExchange answer on Merkle Patricia Tries](https://ethereum.stackexchange.com/questions/6415/eli5-how-does-a-merkle-patricia-trie-tree-work) or [this fantastic article](https://rockwaterweb.com/ethereum-merkle-patricia-trees-javascript-tutorial/)). Instead we can treat storage as totally isolated on a per-contract basis. As mentioned before, *storage* in the EVM operates as a map of 32 byte keys to 32 byte values. It is *persistent*, meaning that the current storage state sticks around even after contract exectuion completes (STOP or RETURN is called). Any subsequent runs of a contract will have read and write access to the same storage space. There is no opportunity for [data races](https://en.wikipedia.org/wiki/Race_condition) because the EVM does not currently support concurrent execution of a single contract - all transactions are executed sequentially. We can see from the [`SSTORE` opcode definition](https://ethervm.io/#55) that it behaves very similarly to the `MSTORE` set of instructions. ``` PUSH 0x1337 PUSH 0x0 SSTORE ``` The `evm` tool will give us a key-value view into memory (leading 0s truncated for legibility): ``` Storage: 0: 1337 ``` Reading from storage behaves the same way. ``` PUSH 0x0 SLOAD ``` ``` SLOAD pc=00000008 gas=9999977891 cost=100 Stack: 00000000 0x0 Storage: 0000000000000000000000000000000000000000000000000000000000000000: 0000000000000000000000000000000000000000000000000000000000001337 STOP pc=00000009 gas=9999977791 cost=0 Stack: 00000000 0x1337 ``` One thing to note is the gas cost. Our `SSTORE` cost a whopping 22100 gas! `SLOAD` cost 100. Reading and writing to storage is very expensive. ## Part 2 Part 2 to come soon! ## References * [EVM Illustrated](https://takenobu-hs.github.io/downloads/ethereum_evm_illustrated.pdf) * [Ethereum Virtual Machine Opcodes Reference](https://ethervm.io/) * [EVM Docs](https://ethereum.org/en/developers/docs/evm/) * [Geth’s EVM implementation](https://github.com/ethereum/go-ethereum/tree/master/core/vm) * [Stack vs Register Based VMs](https://www.codeproject.com/Articles/461052/Stack-Based-vs-Register-Based-Virtual-Machine-Arch) * [EVM Opcodes Gas List](https://github.com/crytic/evm-opcodes)