# A Playdate with the EVM
## What's a VM?
If you google search “Virtual Machine” you’ll get two (vague and non-exclusive) definitions:
* **System virtual machines**, which are programs that emulate a specific real machine ([QEMU](https://www.qemu.org/), [VMware](https://www.vmware.com/products/workstation-pro.html), and even emulators like [Dolphin](https://dolphin-emu.org/) and [DeSmuME](http://desmume.org/) fall into this category)
* **Process virtual machines**, which are programs that intend to allow other programs to run on multiple platforms (like the [JVM](https://en.wikipedia.org/wiki/Java_virtual_machine) or the [.NET runtime](https://en.wikipedia.org/wiki/.NET_Framework))
The EVM falls into the latter definition.
## Why a VM?
The EVM, aside from allowing smart contract execution on numerous miner architectures and environments, provides:
* Sandboxing for smart contract execution
* Ethereum specific functionality including:
* Addressing the [halting problem](https://en.wikipedia.org/wiki/Halting_problem) with the [gas mechanism](https://ethereum.org/en/developers/docs/gas/)
* Storage and memory primitives
* Execution contexts (caller, origin, balance, gas limit, etc)
* Quality of life functions (keccak256, etc)
* Cross-contract communication (call, delegatecall, return, etc)
* A simple compilation target for languages like Solidity and Vyper
## How do VMs Work?
In the simplest case, VMs are programs that consume and interpret *bytecode*. Bytecode consists of *instructions* (comprised of opcodes and arguments), which inform the VM how to proceed with computation. The VM keeps track of the specific section of the bytecode it is currently executing with a pointer called the *program counter* (PC).
There are additional complexities that can be introduced (e.g. [just-in-time compilation](https://aboullaite.me/understanding-jit-compiler-just-in-time-compiler/)) but let’s just ignore them for now.
A VM execution loop looks something like this:
> Fetch the instruction the PC points to
> Execute the instruction
> If the instruction jumps, set the PC to the new target
> Otherwise, increment the PC
In fact, we can see this exact structure in [Geth’s (go-ethereum) EVM implementation](https://github.com/ethereum/go-ethereum/blob/f05419f0fb8c5328dca92ea9fb184d082300344a/core/vm/interpreter.go#L196) (example heavily truncated):
pc = uint64(0)
op = contract.GetOp(pc)
operation := in.cfg.JumpTable[op]
res, err = operation.execute(&pc, in, callContext)
case err != nil:
return nil, err
return res, ErrExecutionReverted
return res, nil
## EVM Nitty-Gritty
Now that we understand how VMs work generally, let's dive into some specifics about the EVM.
Program VMs are typically built in one of two architectures:
* **Register-based VMs** that closely mirror the execution model of traditional instruction set architectures (e.g. x86, ARM). These primarily operate on data stored in a fixed set of memory locations called registers.
* **Stack-based VMs** that primarily operate on data stored in a [stack data structure](https://en.wikipedia.org/wiki/Stack_(abstract_data_type)) (LIFO). These are also sometimes called [stack machines](https://en.wikipedia.org/wiki/Stack_machine).
The EVM is a stack-based VM. It uses [big endian](https://en.wikipedia.org/wiki/Endianness) byte ordering for instructions, memory, and input data. [Words](https://en.wikipedia.org/wiki/Word_(computer_architecture)) in the EVM are 256 bits wide.
For a quick refresher on stacks, you typically PUSH data onto the top of it, POP data off, and apply instructions like ADD or MULT to the first few values that lay on top of it.
## Memory & Storage
The EVM state components are:
* The **Stack** - with maximum **1024 elements**. Each element is **256 bits** (1 word).
* **Memory** (volatile memory) - byte addressed linear memory.
* **Storage** (persistent memory) - key-value store of **256 bit to 256 bit pairs**.
Both the stack and memory are **volatile** (deleted after contract execution), but storage is **persistent** and stored in Ethereum’s world state (sticks around after execution).
You can think of the EVM memory like a **big array** (that’s the “linear” part). It can be addressed (indexed into) at the **byte level** (return 8 bit values) or at the **word level** (return 256 bit values).
Account storage is more like a **map**, pairing **256 bit keys** with **256 bit values** (words to words). Each location in storage is **0 initialized**.
*(diagrams adapted from EVM Illustrated)*
As mentioned earlier, the EVM uses a cost mechanism for computation called gas. It exists to prevent network spam attacks, infinite loops, and excess resource usage.
Gas is spent on a per-instruction basis with some types of instructions (like storage writes) being far more expensive than others (like math operations). [A list of opcodes with their respective gas prices can be found here](https://github.com/crytic/evm-opcodes).
## EVM Setup
There exist several implementations of the EVM specification, including the one embedded in Geth. We’ll use Geth as it is one of the most popular Ethereum clients and handily supports an EVM command line utility for easy invocation.
$ git clone https://github.com/ethereum/go-ethereum.git
$ cd go-ethereum
$ make all
This, in addition to building the rest of Geth from source, will leave you with an evm executable in the build folder. Feel free to copy this executable to wherever you’d like to work.
We’re going to be using [https://ethervm.io/](https://ethervm.io/) as our opcode reference. Make sure you keep it handy in another tab!
## Hello, EVM!
Now that we have an EVM implementation setup, let’s write some code! First, create an empty easm file to edit and open it up in your favorite editor.
$ touch hello.easm
Let’s write some **EVM assembly**, which is a text-based human readable format that will be *assembled* into EVM bytecode for us by geth’s `evm` binary.
Go ahead and run `./evm compile hello.easm`. You should get the rather intractable looking string `600260020200` out. This is your evm assembly assembled into evm bytecode (in hexadecimal format)!
Now go ahead and try `./evm --debug run hello.easm.`
#### TRACE ####
PUSH1 pc=00000000 gas=10000000000 cost=3
PUSH1 pc=00000002 gas=9999999997 cost=3
MUL pc=00000004 gas=9999999994 cost=5
STOP pc=00000005 gas=9999999989 cost=0
#### LOGS ####
The first value (`0x`) is our return value. We have not explicitly called the RETURN opcode, so this is empty.
We can also see the execution trace of our bytecode, along with the program counter value (pc), current remaining gas, and gas cost of each instruction. Additionally we can see a full view of the stack and every step of execution. Neat! Notice that the stack is shown top to bottom (index 0 is the top most element).
From this trace we can see our bytecode first pushes the value “2” on top of the stack, and then pushes another “2”. It then calls “mult” which multiplies the first two values on the stack together and pushes the result (4). Finally, it calls stop, halting execution.
## Returning a Value
Looking at the [return opcode definition](https://ethervm.io/#F3), you can see that it returns values stored in memory (the expression form is `return memory[offset:offset+length]`). So, let’s write a value to memory!
As previously mentioned, memory is sequentially addressable. We’ll use the `MSTORE8` instruction, which sets memory at the index value on the top of the stack (we’ll call this *s*) to the second value on the stack (*s+1*).
Basically we’re telling the EVM to store the 8 bit value “2” at location “0”.
Running this through the evm, we can now see a layout of memory too, with our 2 value sitting in the first byte. Nice!
00000000 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Next, let’s go ahead and try that return.
This is telling the evm “go ahead and return memory values 0 to 0+1”. Running that, we get a return value of `0x2`. Perfect!
As mentioned before, *storage* in the EVM operates as a map of 32 byte keys to 32 byte values. It is *persistent*, meaning that the current storage state sticks around even after contract exectuion completes (STOP or RETURN is called). Any subsequent runs of a contract will have read and write access to the same storage space. There is no opportunity for [data races](https://en.wikipedia.org/wiki/Race_condition) because the EVM does not currently support concurrent execution of a single contract - all transactions are executed sequentially.
We can see from the [`SSTORE` opcode definition](https://ethervm.io/#55) that it behaves very similarly to the `MSTORE` set of instructions.
The `evm` tool will give us a key-value view into memory (leading 0s truncated for legibility):
Reading from storage behaves the same way.
SLOAD pc=00000008 gas=9999977891 cost=100
STOP pc=00000009 gas=9999977791 cost=0
One thing to note is the gas cost. Our `SSTORE` cost a whopping 22100 gas! `SLOAD` cost 100. Reading and writing to storage is very expensive.
## Part 2
Part 2 to come soon!
* [EVM Illustrated](https://takenobu-hs.github.io/downloads/ethereum_evm_illustrated.pdf)
* [Ethereum Virtual Machine Opcodes Reference](https://ethervm.io/)
* [EVM Docs](https://ethereum.org/en/developers/docs/evm/)
* [Geth’s EVM implementation](https://github.com/ethereum/go-ethereum/tree/master/core/vm)
* [Stack vs Register Based VMs](https://www.codeproject.com/Articles/461052/Stack-Based-vs-Register-Based-Virtual-Machine-Arch)
* [EVM Opcodes Gas List](https://github.com/crytic/evm-opcodes)