Try   HackMD

Abstract

This EIP proposes to validate at initialization time that contracts will not underflow stack, jump to invalid locations, or execute invalid instructions. This further implies that subroutines are always called with and return the same number of argument and that looping will not change the stack height. Linear-time validation algorithms are required and one is provided.

Motivation

Safety

For our purposes we define a safe EVM contract as one that cannot encounter an exceptional halting state. Unsafe code can attempt to overflow stack, underflow stack, execute invalid instructions, and jump to invalid locations. Unsafe contracts are exploits waiting to happen.

From the standpoint of security it would be best if unsafe contracts were never placed on the blockchain.

Validity

We cannot guarantee complete safety we must allow valid contracts to overflow stack or run out of gas. But we guarantee that valid contracts cannot execute invalid instructions, jump to invalid locations, or underflow stack.

To prevent denial of service attacks all control flow must be validated at initialization time, and in time and space linear in the size of the code. We present an algorithm for doing so below.

Static Jumps and Subroutines

The relative jumps of EIP-4200 and the simple subroutines of EIP-2315 provide a complete set of static control flow instructions:

RJUMP offset

  • Jumps to IP+offset.

RJUMPI offset

  • Jumps if the top of stack is non-zero.

RJUMPSUB offset

  • Pushes IP+1 on the return stack and jumps to IP+offest.

RETURNSUB

  • Pops an address off the return stack and jumps to it

Note that each jump creates at most two paths of control through the code, such that the complexity of traversing the entire control-flow graph is linear in the size of the code.

Dynamic Jumps

JUMP or JUMPI also provide a complete set of control flow instructions. But they are an obstacle to proving validity in linear time jumps can be to any destination in the code, creating a quadratic "path explosion". We propose to deprecate JUMP and JUMPI.

Performance

Validating control flow at initialization time has potential performance advantages, as checking for invalid jumps and stack underflow does not need to be done at runtime.

Specification

This EIP does not introduce any new opcodes. Rather, it restricts the use of proposed control-flow instructions. These restrictions must be validated at contract initialization time – not at runtime – by the provided algorithm or its equivalent. This algorithm must take time and space linear in the size of the contract, so as not to be a denial of service vulnerability.

This specification is entirely semantic it imposes no further syntax on bytecode. Ethereum Virtual Machine bytecode is just that a sequence of bytes that when executed causes a sequence of changes to the machine state. The safety we seek here is simply to not, as it were, jam up the gears.

Validity

In theory, theory and practice are the same. In practice, they're not. Albert Einstein

Exceptional Halting States

The execution of each instruction is defined in the Yellow Paper as a change to the EVM state that preserves the invariants of EVM state. At runtime, if the execution of an instruction would violate an invariant the EVM is in an exceptional halting state. The Yellow Paper defined five such states.

  1. Insufficient gas
  2. More than 1024 stack items
  3. Insufficient stack items
  4. Invalid jump destination
  5. Invalid instruction

A program is safe iff no execution can lead to an exceptional halting state.

We would like to consider EVM programs valid iff they are safe.

In practice, we must be able to validate code in linear time to avoid denial of service attacks. And we must support dynamically-priced instructions, loops, and recursion, which can use arbitrary amounts of gas and stack.

Thus our validation algorithm cannot consider concrete computations it only performs a limited symbolic execution of the code. This means we will reject programs if we detect any invalid execution paths, even if those paths are not reachable at runtime. And we will count as valid programs that may not always produce correct results.

We must check for the first two states at runtime:

  • out of gas and
  • stack overflow.

The remaining three states we can check at validation time:

  • stack underflow,
  • invalid jump, and
  • invalid instruction.

Constraints on Valid Code

  • Every instruction MUST be valid.
    • The JUMP and JUMPI instructions ARE NOT valid.
  • Every jump MUST be valid:
    • The RJUMP, RJUMPI, or RJUMPSUB instructions MUST NOT address immediate data.
  • The stacks MUST always be valid:
    • The number of items on the data stack MUST always be positive, and at most 1024.
    • The number of items on the return stackMUST always be positive, and at most 1024.
  • The data stack MUST be consistently aligned:
    • The number of items on the data stack between the current stack pointer and the stack pointer on entry to the most recent subroutine MUST be the same for each execution of a byte_code.

The RJUMP, RJUMPI and RJUMPSUB instructions take their destination as an immediate argument, which cannot change at runtime.

Having constant destinations for all jumps means that all jump destinations can be validated at initialization time, not runtime.

Requiring a consistently aligneddata stack prevents stack underflow. It can also catch irreducible control flows and calls to subroutines with the wrong number of arguments.

Taken together, these rules allow for EVM code to be validated by traversing its control-flow graph, in time and space linear in the size of the code, following each edge only once.

Rationale

If we are to be able to validate contracts in linear time we have two real choices.

  1. Deprecate dynamic jumps. This is trivial.
  2. Constrain dynamic jumps. This requires static analysis.

Unfortunately, the second choice is a nonstarter: we cannot know whether a JUMP or JUMPI are subroutine calls in linear time. We must deprecate them.

There are of course other useful safety properties that could be so validated, but we do not consider them here.

Backwards Compatibility

These changes affect the semantics of EVM code – the use of JUMP, JUMPI are deprecated, such that some code that would otherwise run correctly will nonetheless be invalid EVM code. In practice this will not be a problem, as this change applpies only to EOF code.

Reference Implementation

The following is a pseudo-Go implementation of an algorithm for predicating code validity. An equivalent algorithm must be run at initialization time.

This algorithm performs a symbolic execution of the program that recursively traverses the code, emulating its control flow and stack use and checking for violations of the rules above.

It runs in time equal to O(vertices + edges) in the program's control-flow graph, where edges represent control flow and the vertices represent basic blocks thus the algorithm takes time proportional to the size of the code. It uses recursion to maintain a stack of continuations for conditional jumps, the size of which is at most proportional to the size of the code

*Note: All valid code has a control-flow graph that can be traversed in time and space linear in the length of the code, while avoiding stack underflow and misalignment. That means that some other static analyses and code transformations that might otherwise require quadratic time can also be written to run in linear time, including linear-time complilers and streaming machine code generators.

Validation Function

For simplicity's sake we assume that jumpdest analysis has been done and that we have some helper functions.

  • is_valid_instruction(pc) returns true if pc points at a valid instruction.
  • is_valid_jumpdest(dest) returns true if dest is a valid jumpdest.
  • immediate_data(pc) returns the immediate data for the instruction at pc.
  • advance_pc(pc) advances to and returns the next pc, skipping any immediate data.
  • removed_items(pc,sp) returns the number of items removed from the data_stack by the instruction at pc.
  • add_items(pc,sp) returns the number of items added to the data_stack by the instruction at pc.
// emulated machine state
var code          [code_len]byte
var avail_items   [code_len]int
var return_stack  []int
var data_stack    []uint256

// return nil or an error
func validate(pc := 0, sp := 0, bp := 0, rp := 0) int, error {

   for pc < code_len {
      if !is_valid_instruction(pc) {
         return pc, invalid_instruction
      }
      
      // if available items on stack for `pc` are non-zero
      //    we have been here before
      //    so return to break cycle
      if avail_items[pc] != 0 {

          // invalid if available items not the same
          if avail_items[pc] != sp - bp {
            return invalid_stack
          }
      }
      avail_items[pc] = sp - bp
      if avail_items[pc] < 0 {
         return stack_underflow
      }
      
      switch code[pc] {

      // successful termination
      case STOP:
         return nil
      case RETURN:
         return nil
      case SELFDESTRUCT:
         return nil
      case REVERT:
         return nil
      case INVALID:
         return pc, invalid_instruction

      // track constants pushed on data stack
      case PUSH1 <= code[pc] && code[pc] <= PUSH32 {
         sp++
         if (sp > 1023) {
            return pc, stack_overflow
         }
         data_stack[sp] = immediate_data(pc)
         pc = advance_pc(pc)
         continue
          
      case RJUMP:

         // check for valid jump destination
         if !is_valid_jumpdest(jumpdest) {
            return pc, invalid_destination
         }
         
         // reset pc to destination of jump
         pc += immediate_data(pc)

      case RJUMPI:

         // recurse to validate true side of conditional
         jumpdest = pc + immediate_data(pc)
         if !is_valid_jumpdest(pc + jumpdest) {
            return pc, invalid_destination
         }
         max_left, err = validate(jumpdest, sp, bp, rp)
         if err {
            return pc, err
         }
         
         // recurse to validate false side of conditional
         pc = advance_pc(pc)
         max_right, err = validate(pc, sp, bp, rp)
         if err {
            return pc, err
         }
         
         return nil

      case RJUMPSUB:

         // check for valid jump destination
         jumpdest = imm_data(pc)
         if !is_valid_jumpdest(pc + jumpdest) {
            return pc, invalid_destination
         }

         // will enter subroutine at destination
         bp = sp

         // push return address and reset pc to destination
         rp++
         return_stack = append(return_stack, pc + 1)
         pc += jumpdest

      case RETURNSUB:
      
         // will return to subroutine at destination
         bp = sp

         // pop return address and check for valid destination
         pc = return_stack[--rp]
         return_stack = return_stack[:rp]
         if !code[pc - 1] == JUMPSUB {
            return pc, invalid_destination

         data_stack[sp++] = immediate_data(pc)
         pc = advance_pc(pc)

      default:
      
         pc = advance_pc(pc)

         // apply other instructions to stack pointer
         sp -= removed_items(pc, sp)
         if (sp < 0) {
            return pc, stack_underflow
         }
         sp += added_items(pc, sp)
      }
   }

   // successful termination
   return nil
}

Security Considerations

This EIP is intended to improve security by ensuring a higher level of safety for EVM code deployed on the blockchain.

Copyright and related rights waived via CC0.