Simple Subroutines for the EVM

--- eip: 2315 title: Simple Subroutines for the EVM description: Two opcodes for efficient, safe, and static subroutines. author: Greg Colvin (@gcolvin), Martin Holst Swende (@holiman), Brooklyn Zelenka (@expede), John Max Skaller <skaller@internode.on.net> discussions-to: https://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941 status: Review type: Standards Track category: Core created: 2019-10-17 requires: 3540, 3670, 4200 --- ## Abstract This proposal provides a _complete_, _efficient_, _safe_ and _static_ control-flow facility. It introduces two new opcodes to support calling and returning from subroutines: * `RJUMPSUB relative_offset` -- relative jump to subroutine * `RETURNSUB` -- return to `PC` after most recent `RJUMPSUB`. It depends on the two new opcodes proposed by [EIP-4200](./eip-4200.md) to support static jumps: * `RJUMP relative_offset` — relative jump to `PC + relative_offset` * `RJUMPI relative_offset` — conditional relative jump It deprecates `JUMP` and `JUMPI`, allowing valid code to support streaming, one-pass, and other near-linear compilers. In concert with [EIP-3540](./eip-3540.md) and [EIP-3670](./eip-3670.md) it ensures, at initialization time, that valid code will not execute invalid instructions or jump to invalid locations, will not underflow stack, will maintain consistent numbers of inputs and outputs for subroutines, and will have bounded stack height in the absence of recursion. This is among the simplest possible proposals that meets these requirements. ## Motivation ### We need a complete control-flow facility. Jumps, conditional jumps and subroutines were proposed by Alan Turing in 1945 as a means of organizing the logic of the code and the design of the memory crystals for his Automatic Computing Engine: > We wish to be able to arrange that sequences of orders can divide at various points, continuing in different ways according to the outcome of the calculations to date... We also wish to be able to arrange for the splitting up of operations into subsidiary operations... To start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation. > > — [Alan Turing](https://academic.oup.com/comjnl/article/20/3/269/751954) — *in B.E. Carpenter, R.W. Doran, "The other Turing machine." The Computer Journal, Volume 20, Issue 3, January 1977. [DOI: 10.1093/comjnl/20.3.269](https://doi.org/10.1093/comjnl/20.3.269)* In more contemporary terms, we have sequences of instructions, jumps and conditional jumps that divide sequences into blocks, subroutine calls, and a stack of addresses to return to. The details vary, but similar facilities have proven their value across a long line of important machines over the last 75 years, including most all of the machines we have programmed or implemented -- physical machines including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP-11 and VAX, Motorola 68000, Sun SPARC, and Intel x86s, as well as virtual machines for Scheme, Forth, Pascal, Java, Wasm, and others. Unlike these machines, the Ethereum Virtual Machine _does not_ provide subroutine operations. Instead, they must be synthesized using the dynamic `JUMP` instruction, which takes its destination on the stack. Further, the EVM provides _only_ dynamic jumps, impeding the static analysis we need. ### We need efficient control-flow. Efficient to write by hand, compile from high level labguages, validate at deploy time, interpret by VMs, and compile to machine code. Static jumps, conditional jumps, and subroutines are sufficient and efficient in space and time, as shown by historical experience and as we will show for the EVM below. ### We need safe control-flow. The EVM has unusually high requirements for safety. Not only do many smart contracts control inordinately large amounts of valuable Ether, but once placed on the blockchain any defects are visible to attackers and cannot be repaired. We need to statically validate important safety constraints on code at initialization time. ### We need static control-flow. The EVM's dynamic jumps cause two major problems. First, the need to synthesize static jumps and subroutines with dynamic jumps wastes space and gas with needlessly complex code, as we will show below. Worse, jumps that can dynamically branch to any destination in the code can cause quadratic "path explosions" when traversing the flow of control. For Ethereum, this is a denial-of-service vulnerability that prevents us, at initialization time, from validating the safe use of EVM code and from compiling EVM code to machine code. We need static control-flow to support streaming, one-pass, and other near-linear compilers of EVM bytecode to machine code. ## Specification ### Opcodes #### `RJUMPSUB (0x5f) relative_offset` Transfers control to a subroutine. 1. Decode the `relative_offset` from the immediate data at `PC`. 2. Push the current `PC + 3` to the `return stack`. 3. Set `PC` to `PC + relative_offset`. The `relative_offset` is relative to the current `PC`. The offset is encoded as a two-byte, twos-complement signed integer, stored MSB-first. The gas cost is _low_. #### `RETURNSUB (0x5e)` Returns control to the caller of a subroutine. 1. Pop the `return stack` to `PC`. The gas cost is _verylow_. _Notes:_ * _Values popped off the `return stack` do not need to be validated, since they are alterable only by `RJUMPSUB` and `RETURNSUB`._ * _The description above lays out the semantics of these instructions in terms of a `return stack`. But the actual state of the `return stack` is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementer may code `RJUMPSUB` to unobservably push `PC` on the `return stack` rather than `PC + 1`, which is allowed so long as `RETURNSUB` observably returns control to the `PC + 3` location.)_ ### Validity If the execution of an instruction would violate a condition, then the execution is in an exceptional halting state. The Yellow Paper defines five such states. 1. Insufficient gas 2. More than 1024 stack items 3. Insufficient stack items 4. Invalid jump destination 5. Invalid instruction We would like to consider EVM code valid iff no execution of the program can lead to an exceptional halting state. In practice, we must test at runtime for conditions 1 and 2 — sufficient gas and sufficient stack. We don’t know how much gas there will be, we don’t know how deep a recursion may go, and analysis of stack depth even for non-recursive programs is nontrivial. All of the remaining conditions MUST be validated statically, in time and space quasi-linear in the size of the code. #### Static Constraints on Valid Code * Every instruction MUST be valid: * The `JUMP` and `JUMPI` instructions ARE NOT valid. * Every jump MUST be valid: * The `RJUMP`, `RJUMPI`, or `RJUMPSUB` instructions MUST NOT address immediate data or addresses outside of their code section. * The stacks MUST be valid: * The number of items on the `data stack` MUST always be positive. * The number of items on the `return stack `MUST always be positive. * The data stack MUST be consistently aligned: * The data stack height is * the absolute difference between the current `stack pointer` and the `stack pointer` on entry to the current subroutine. * It MUST be the same for every reachable path through a given `PC` and * MUST NOT exceed 1024. ## Rationale This is a purely semantic specification, placing no constraints on the syntax of code sections beyond being a sequence of opcodes and immediate data – a subroutine is not a contiguous sequence of bytecode, it is a subgraph of the bytecode's control-flow graph. The EVM is a simple state machine. We only promise that valid code will not, as it were, jam up the gears of the machine. By avoiding syntactic constraints we allow for optimizations like tail call elimination, multiple-entry subroutines, moving "cold" code out of line, and other ways of reducing redundancy and keeping "hot" code in cache. Since we wish to support one-pass compilation of EVM code to machine code it is crucial that the EVM code be as well optimized as possible up front. ### Validation Rather than enforce constraints via syntax, we enforce them via validation. The constraints on valid code cover all of the exceptional halting states that we can validate and allow code to be validated and compiled in time and space quasi-linear in the size of the code. The `RJUMP`, `RJUMPI` and `RJUMPSUB` instructions take their *relative_offset* as immediate arguments, which cannot change at runtime. Having constant destinations for all jumps means that all jump destinations can be validated at initialization time, not runtime. Dynamic jumps can branch to any destination in the code, so exploitable quadratic "path explosions" are possible when traversing the control flow graph -- the time and space needed can go up quadratically with the size of the code. Deprecating `JUMP` and `JUMPI` prevents this. Requiring a consistently aligned `data stack` * prevents stack underflow * ensures that all calls to a subroutine have the same number of inputs and the same number of outputs and * ensures that stack height is bounded in the absence of recursion. Requiring a consistently aligned `data stack` also allows some algorithms that traverse the control-flow graph -- including code validation and compilation -- to break cycles at joins, again preventing quadratic path explosion. When a traversal gets to a `PC` it has visited before it is either at the beginning of a loop or the entry to a function -- and the stacks will align. So we know that loops will not grow stack, and that the number of arguments to a subroutine will always be the same -- there is no need to traverse that path again. _Note: The JVM and Wasm enforce similar constraints for similar reasons._ ### Alternative Designs There are a few major designs for a subroutine facility, two of which are considered here. The others are mostly not appropriate for the EVM, such as the Wheeler Jump -- self-modifying code that writes return addresses into called subroutines. *1. Keep return addresses on a dedicated return stack.* Turing's design is often used by stack machines, including those for Forth, Java, Wasm, and others. The data stack is used for computation, with a dedicated stack for return addresses. A single instruction suffices to call, and another to return from a routine. *2. Keep return addresses on the data stack.* This design is often used by register machines, including those from CDC, IBM, DEC, Intel, and ARM. The registers are used primarily for computation, and the stack maintains call frames for return addresses, arguments, and local variables. On the EVM there are no registers for computation, so using the stack for both purposes can cause the sort of inefficiencies we saw above. Pascal p-code does use this design, but as part of a complex calling convention with dedicated registers. #### We prefer the dedicated return stack. * It maintains a clear separation between calculation and flow of control: * the data stack is free of vulnerable return addresses and * it's impossible to overwrite the return stack. * It improves efficiency: * it uses native arithmetic rather than 256-bit EVM instructions for the return address, * doesn't use up a `data stack` slot for the return address and * needs less motion of 256-bit data on the stack. ### Efficiency We illustrate here how subroutine instructions can be used to reduce the complexity and gas costs of both ordinary and optimized subroutine calls compared to using `JUMP`. #### **Simple Subroutine Call** Consider these examples of a fairly minimal subroutine, including the code to call it. Subroutine call, using `RJUMPSUB`: ``` SQUARE: dup1 ; 3 gas mul ; 5 gas returnsub ; 3 gas CALL_SQUARE: push 0x02 ; 3 gas rjumpsub SQUARE ; 5 gas ``` _Total gas: 19_ Subroutine call, using `JUMP`: ``` SQUARE: jumpdest ; 1 gas swap1 ; 3 gas dup1 ; 3 gas mul ; 5 gas swap1 ; 3 gas jump ; 8 gas CALL_SQUARE: jumpdest ; 1 gas push 0x02 ; 3 gas push RTN_CALL: ; 3 gas push SQUARE ; 3 gas jump ; 8 gas RTN_CALL: jumpdest ; 1 gas ``` _Total: 41 gas_. Using `RJUMPSUB` versus `JUMP` saves _41 - 19 = 22 gas_ — a _54%_ improvement. #### **Tail Call Optimization** Of course in cases like this one we can optimize the tail call, so that the return from `SQUARE` actually returns from `TEST_SQUARE`. Tail call optimization, using `RJUMPSUB` and `RETURNSUB`: ```SQUARE: dup1 ; 3 gas mul ; 5 gas returnsub ; 3 gas CALL_SQUARE: push 0x02 ; 3 gas rjump SQUARE ; 3 gas ``` _Total: 17 gas_ Tail call optimization, using `JUMP`: ``` SQUARE: jumpdest ; 1 gas swap1 ; 3 gas dup1 ; 3 gas mul ; 5 gas swap2 ; 3 gas jump ; 8 gas CALL_SQUARE: jumpdest ; 1 gas push 0x02 ; 3 gas push SQUARE ; 3 gas jump ; 8 gas ``` _Total: 38 gas_ Using `RJUMPSUB` versus `JUMP` saves _38 - 17 = 21 gas_ — a _55%_ improvement. #### Efficiency Caveats We can see that these instructions provide a simpler and more gas-efficient subroutine mechanism than using `JUMP` — in our examples they cut gas use by about half. Clearly, the benefits of this efficiency are greater for programs that have been factored into smaller subroutines. How small? Wrapping code in a subroutine costs only _8 gas_ using `RJUMPSUB` and `RETURNSUB` versus _30 gas_ using `JUMP`, `PUSH` and `SWAP` as above. ### Costs The _low_ cost of `RJUMPSUB` versus the _mid_ cost of `JUMP` is justified by needing only to decode the immediate two byte destination to the `PC` and push the return address on the `return stack`, all using native arithmetic, versus using the data stack with emulated 256-bit instructions. The _verylow_ cost of `RETURNSUB` is justified by needing only to pop the `return stack` into the `PC`. Benchmarking will be needed to tell if the costs are well-balanced. ## Backwards Compatibility These changes affect the semantics of existing EVM code: bytes that would have been interpreted as valid jump destinations may now be interpreted as immediate data. Since this proposal depends on the Ethereum Object Format to signal the change this is not a practical issue. ## Test Cases ### Simple routine This should jump into a subroutine, back out and stop. Bytecode: `0x5f0003005e` (`RJUMPSUB 3, RETURNSUB, STOP`) | Pc | Op | Cost | Stack | RStack | |-------|-------------|------|-----------|-----------| | 0 | RJUMPSUB | 5 | [] | [] | | 2 | STOP | 0 | [] | [] | | 3 | RETURNSUB | 3 | [] | [] | Output: 0x Consumed gas: `10` ### Two levels of subroutines This should execute fine, going into one two depths of subroutines Bytecode: `0x5f00045F00025200` (`RJUMPSUB 4, RJUMPSUB 2, RETURNSUB, RETURNSUB, STOP`) | Pc | Op | Cost | Stack | RStack | |-------|-------------|------|-----------|-----------| | 0 | RJUMPSUB | 5 | [] | [] | | 3 | RJUMPSUB | 5 | [] | [] | | 4 | RETURNSUB | 5 | [] | [] | | 5 | RETURNSUB | 5 | [] | [] | | 6 | STOP | 0 | [] | [] | Consumed gas: `20` ### Failure 1: invalid jump This should fail, since the given location is outside of the code-range. Bytecode: `0X5fff`(`RJUMPSUB -1`) | Pc | Op | Cost | Stack | RStack | |-------|-------------|------|-----------|-----------| | 0 | RJUMPSUB | 10 | [] | [] | ``` Error: at pc=0, op=RJUMPSUB: invalid jump destination ``` ### Failure 2: shallow `return stack` This should fail at first opcode, due to shallow `return_stack` Bytecode: `0x5e` (`RETURNSUB`) | Pc | Op | Cost | Stack | RStack | |-------|-------------|------|-----------|-----------| | 0 | RETURNSUB | 5 | [] | [] | ``` Error: at pc=0, op=RETURNSUB: invalid retsub ``` ### Subroutine at end of code In this example the RJUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' _after_ the bytecode, and not exit with error Bytecode: `0x5c035e5ff` (`RJUMP 3, RETURNSUB, RJUMPSUB -1`) | Pc | Op | Cost | Stack | RStack | |-------|-------------|------|-----------|-----------| | 0 | RJUMP | 5 | [] | [] | | 2 | RETURNSUB | 5 | [] | [] | | 3 | RJUMPSUB | 5 | [] | [] | | 5 | STOP | 0 | [] | [] | Consumed gas: `15` ## Reference Implementation The following is a pseudo-Python implementation of an algorithm for predicating code validity. An equivalent algorithm must be run at initialization time. This algorithm performs a symbolic execution of the program that recursively traverses the _code_, emulating its control flow and stack use and checking for violations of the rules above. It runs in time equal to `O(vertices + edges)` in the program's control-flow graph, where edges represent control flow and the vertices represent _basic blocks_ — thus the algorithm takes time proportional to the size of the _code_. It maintains a stack of continuations for conditional jumps, the size of which is at most proportional to the size of the _code_. ### Validation Function For simplicity's sake we assume that all jumpdest analysis and prior validation has been done, including EIP-3540, EIP-3670, and EIP-4200, so EOF headers and sections are well-formed, and there are no invalid instructions or jumps. In practice, all passes of validation can be folded into a single loop no recursion. We also assume some helper functions. * `is_valid(opcode)` returns true iff opcode is valid. * `is_terminator(opcode)` returns true iff opcode is terminator. * `is_valid_jumpdest(pc)` returns true iff `pc` is a valid jump destination. * `immediate_data(pc)` returns the immediate data for the instruction at `pc`. * `immediate_size(opcode)` returns the size of the immediate data for an opcode. * `removed_items(opcode)` returns the number of items removed from the `data_stack` by the `opcode`. * `added_items(opcode)` returns the number of items added to the `data_stack` by the `opcode`. ``` # returns true iff code is valid def validate_code(code: bytes, pc: int, sp: int, bp: int) -> boolean: continuations = [] do while pc < len(code): opcode = code[pc] if !is_valid(opcode): return false if is_terminator(opcode): return true # check stack height and return if we have been here before stack_height = sp - bp if stack_height > 1024 return false if pos in stack_heights: if stack_height != stack_heights[pos]: return false return true else: stack_heights[pos] = stack_height if opcode == RJUMP: # reset pc to destination of jump jumpdest = immediate_data(pc) pc += jumpdest if !is_valid_jumpdest(pc) return false elif opcode == RJUMPI: jumpdest = pc + immediate_data(pc) if !is_valid_jumpdest(pc) return false # continue true side of conditional later continations.push((jumpdest, sp, bp)) # continue false side of conditional now elif opcode == RJUMPSUB: # will enter subroutine at destination bp = sp # push return address and reset pc to destination jumpdest = pc + immediate_data(pc) if !is_valid_jumpdest(pc) return false push(return_stack, pc + 3) pc = jumpdest continue elif opcode == RETURNSUB: # will return to subroutine at destination bp = sp # pop return address and check for preceding call pc = pop(return_stack) if code[pc - 3] != RJUMPSUB: return false # apply instructions to stack sp -= removed_items(opcode) if sp < 0 return false sp += added_items(opcode) # Skip opcode and immediate data pc += 1 + immediate_size(opcode) while (pc, sp, bp) = continuations.pop() return true ``` ## Security Considerations These changes introduce new flow control instructions. They do not introduce any new security considerations. This EIP is intended to improve security by validating a higher level of safety for EVM code deployed on the blockchain. The validation algorithm must be quasi-linear in time and space to not be a denial of service vulnerability. The algorithm here makes one linear-time pass of the bytecode, and uses a stack of continuations that cannot exceed the number of `RJUMPI` instructions in the code. ## Copyright Copyright and related rights waived via [CC0](../LICENSE.md).