Try   HackMD

This is a guide to understanding the EVM, its relationship with solidity, and how to use some debugging tools.

Overview

The EVM is a stack-based virtual machine with an ephemeral memory byte-array and persistent key-value storage (persisted in a Merkle tree). Elements on the stack are 32-byte words, and all keys and values in storage are 32 bytes.

Security

The EVM is a security oriented virtual machine, designed to permit untrusted code to be executed by a global network of computers. To do so securely, it imposes the following restrictions:

  • Every computational step taken in a program's execution must be paid for up front, thereby preventing Denial-of-Service attacks.
  • Programs may only interact with each other by transmitting a single arbitrary-length byte array; they do not have access to each other's state.
  • Program execution is sandboxed; an EVM program may access and modify its own internal state and may trigger the execution of other EVM programs, but nothing else.
  • Program execution is fully deterministic and produces identical state transitions for any conforming implementation beginning in an identical state.

These restrictions motivated many of the design decisions of the overall Ethereum state transition machine, and their enforcement is pervasive throughout the specification.

Here is the list from the pyethereum client, annotated with rough category names.

# schema: [opcode, ins, outs, gas]

opcodes = {

    # arithmetic
    0x00: ['STOP', 0, 0, 0],
    0x01: ['ADD', 2, 1, 3],
    0x02: ['MUL', 2, 1, 5],
    0x03: ['SUB', 2, 1, 3],
    0x04: ['DIV', 2, 1, 5],
    0x05: ['SDIV', 2, 1, 5],
    0x06: ['MOD', 2, 1, 5],
    0x07: ['SMOD', 2, 1, 5],
    0x08: ['ADDMOD', 3, 1, 8],
    0x09: ['MULMOD', 3, 1, 8],
    0x0a: ['EXP', 2, 1, 10],
    0x0b: ['SIGNEXTEND', 2, 1, 5],

    # boolean
    0x10: ['LT', 2, 1, 3],
    0x11: ['GT', 2, 1, 3],
    0x12: ['SLT', 2, 1, 3],
    0x13: ['SGT', 2, 1, 3],
    0x14: ['EQ', 2, 1, 3],
    0x15: ['ISZERO', 1, 1, 3],
    0x16: ['AND', 2, 1, 3],
    0x17: ['OR', 2, 1, 3],
    0x18: ['XOR', 2, 1, 3],
    0x19: ['NOT', 1, 1, 3],
    0x1a: ['BYTE', 2, 1, 3],

    # crypto
    0x20: ['SHA3', 2, 1, 30],
    
    # contract context
    0x30: ['ADDRESS', 0, 1, 2],
    0x31: ['BALANCE', 1, 1, 20],
    0x32: ['ORIGIN', 0, 1, 2],
    0x33: ['CALLER', 0, 1, 2],
    0x34: ['CALLVALUE', 0, 1, 2],
    0x35: ['CALLDATALOAD', 1, 1, 3],
    0x36: ['CALLDATASIZE', 0, 1, 2],
    0x37: ['CALLDATACOPY', 3, 0, 3],
    0x38: ['CODESIZE', 0, 1, 2],
    0x39: ['CODECOPY', 3, 0, 3],
    0x3a: ['GASPRICE', 0, 1, 2],
    0x3b: ['EXTCODESIZE', 1, 1, 20],
    0x3c: ['EXTCODECOPY', 4, 0, 20],

    # blockchain context
    0x40: ['BLOCKHASH', 1, 1, 20],
    0x41: ['COINBASE', 0, 1, 2],
    0x42: ['TIMESTAMP', 0, 1, 2],
    0x43: ['NUMBER', 0, 1, 2],
    0x44: ['DIFFICULTY', 0, 1, 2],
    0x45: ['GASLIMIT', 0, 1, 2],
  
    # storage and execution
    0x50: ['POP', 1, 0, 2],
    0x51: ['MLOAD', 1, 1, 3],
    0x52: ['MSTORE', 2, 0, 3],
    0x53: ['MSTORE8', 2, 0, 3],
    0x54: ['SLOAD', 1, 1, 50],
    0x55: ['SSTORE', 2, 0, 0],
    0x56: ['JUMP', 1, 0, 8],
    0x57: ['JUMPI', 2, 0, 10],
    0x58: ['PC', 0, 1, 2],
    0x59: ['MSIZE', 0, 1, 2],
    0x5a: ['GAS', 0, 1, 2],
    0x5b: ['JUMPDEST', 0, 0, 1],

    # logging
    0xa0: ['LOG0', 2, 0, 375],
    0xa1: ['LOG1', 3, 0, 750],
    0xa2: ['LOG2', 4, 0, 1125],
    0xa3: ['LOG3', 5, 0, 1500],
    0xa4: ['LOG4', 6, 0, 1875],
	
    # arbitrary length storage (proposal for metropolis hardfork)
    0xe1: ['SLOADBYTES', 3, 0, 50],
    0xe2: ['SSTOREBYTES', 3, 0, 0],
    0xe3: ['SSIZE', 1, 1, 50],

    # closures
    0xf0: ['CREATE', 3, 1, 32000],
    0xf1: ['CALL', 7, 1, 40],
    0xf2: ['CALLCODE', 7, 1, 40],
    0xf3: ['RETURN', 2, 0, 0],
    0xf4: ['DELEGATECALL', 6, 0, 40],
    0xff: ['SUICIDE', 1, 0, 0],
}

# push
for i in range(1, 33):
    opcodes[0x5f + i] = ['PUSH' + str(i), 0, 1, 3]

# duplicate and swap
for i in range(1, 17):
    opcodes[0x7f + i] = ['DUP' + str(i), i, i + 1, 3]
    opcodes[0x8f + i] = ['SWAP' + str(i), i + 1, i + 1, 3]

The table tells us how many arguments each opcode pops off the stack and pushes back onto the stack, as well as how much gas is consumed. Most opcodes take some number of arguments off the stack, and push one or no results back onto the stack. Some, like GAS and PC, take no arguments off the stack, and push the remaining gas and program counter, respectively, onto the stack. A number of opcodes, like SHA3, CREATE, and RETURN, take arguments off the stack that refer to positions and sizes in memory, allowing them to operate on a contiguous array of memory.

All arithmetic happens on big integers using elements on the stack (ie. 32-byte Big Endian integers). Currently, the only crypto operation is the SHA3 hash function, which takes an arbitrary length byte-array from memory (specified by an initial position in memory and a length) and outputs the hash on the stack. Contract and blockchain level contexts give access to various useful environmental information - for instance CALLDATACOPY will copy the input data sent to the contract (known as call-data) into memory, and NUMBER can be used to time-lock behaviour by block number.

The EVM operates on its ephemeral memory via MLOAD and MSTORE, and on its persistent storage via SLOAD and SSTORE. JUMP can be used to jump to arbitrary points in the program, where those points must be a JUMPDEST. The PUSH1-PUSH32 opcodes push anywhere from 1 to 32 bytes to the stack. The DUP1-DUP16 opcodes push a duplicate of one of the top 16 elements of the stack to the top of the stack. The SWAP1-SWAP16 opcodes swap the top element of the stack with any of the preceding 16.

The LOG opcodes enable event logging which is recorded in blocks and can be verified efficiently by light clients. Finally, CALL and CREATE allow contracts to call and create other contracts, respectively, while RETURN returns a chunk of memory from a call, and SUICIDE causes the contract to be destroyed and return all funds to a specified address.