This EIP proposes to validate at initialization time that contracts will not underflow stack, jump to invalid locations, or execute invalid instructions. This further implies that subroutines are always called with and return the same number of argument and that looping will not change the stack height. Linear-time validation algorithms are required and one is provided.
For our purposes we define a safe EVM contract as one that cannot encounter an exceptional halting state. Unsafe code can attempt to overflow stack, underflow stack, execute invalid instructions, and jump to invalid locations. Unsafe contracts are exploits waiting to happen.
From the standpoint of security it would be best if unsafe contracts were never placed on the blockchain.
We cannot guarantee complete safety – we must allow valid contracts to overflow stack or run out of gas. But we guarantee that valid contracts cannot execute invalid instructions, jump to invalid locations, or underflow stack.
To prevent denial of service attacks all control flow must be validated at initialization time, and in time and space linear in the size of the code. We present an algorithm for doing so below.
The relative jumps of EIP-4200 and the simple subroutines of EIP-2315 provide a complete set of static control flow instructions:
RJUMP
offset
RJUMPI
offset
RJUMPSUB
offset
RETURNSUB
Note that each jump creates at most two paths of control through the code, such that the complexity of traversing the entire control-flow graph is linear in the size of the code.
JUMP
or JUMPI
also provide a complete set of control flow instructions. But they are an obstacle to proving validity in linear time – jumps can be to any destination in the code, creating a quadratic "path explosion". We propose to deprecate JUMP
and JUMPI
.
Validating control flow at initialization time has potential performance advantages, as checking for invalid jumps and stack underflow does not need to be done at runtime.
This EIP does not introduce any new opcodes. Rather, it restricts the use of proposed control-flow instructions. These restrictions must be validated at contract initialization time – not at runtime – by the provided algorithm or its equivalent. This algorithm must take time and space linear in the size of the contract, so as not to be a denial of service vulnerability.
This specification is entirely semantic – it imposes no further syntax on bytecode. Ethereum Virtual Machine bytecode is just that – a sequence of bytes that when executed causes a sequence of changes to the machine state. The safety we seek here is simply to not, as it were, jam up the gears.
In theory, theory and practice are the same. In practice, they're not. – Albert Einstein
The execution of each instruction is defined in the Yellow Paper as a change to the EVM state that preserves the invariants of EVM state. At runtime, if the execution of an instruction would violate an invariant the EVM is in an exceptional halting state. The Yellow Paper defined five such states.
A program is safe iff no execution can lead to an exceptional halting state.
We would like to consider EVM programs valid iff they are safe.
In practice, we must be able to validate code in linear time to avoid denial of service attacks. And we must support dynamically-priced instructions, loops, and recursion, which can use arbitrary amounts of gas and stack.
Thus our validation algorithm cannot consider concrete computations – it only performs a limited symbolic execution of the code. This means we will reject programs if we detect any invalid execution paths, even if those paths are not reachable at runtime. And we will count as valid programs that may not always produce correct results.
We must check for the first two states at runtime:
out of gas
andThe remaining three states we can check at validation time:
JUMP
and JUMPI
instructions ARE NOT valid.RJUMP
, RJUMPI
, or RJUMPSUB
instructions MUST NOT address immediate data.data stack
MUST always be positive, and at most 1024.return stack
MUST always be positive, and at most 1024.data stack
between the current stack pointer
and the stack pointer
on entry to the most recent subroutine MUST be the same for each execution of a byte_code.The RJUMP
, RJUMPI
and RJUMPSUB
instructions take their destination as an immediate argument, which cannot change at runtime.
Having constant destinations for all jumps means that all jump destinations can be validated at initialization time, not runtime.
Requiring a consistently aligneddata stack
prevents stack underflow. It can also catch irreducible control flows and calls to subroutines with the wrong number of arguments.
Taken together, these rules allow for EVM code to be validated by traversing its control-flow graph, in time and space linear in the size of the code, following each edge only once.
If we are to be able to validate contracts in linear time we have two real choices.
Unfortunately, the second choice is a nonstarter: we cannot know whether a JUMP
or JUMPI
are subroutine calls in linear time. We must deprecate them.
There are of course other useful safety properties that could be so validated, but we do not consider them here.
These changes affect the semantics of EVM code – the use of JUMP
, JUMPI
are deprecated, such that some code that would otherwise run correctly will nonetheless be invalid EVM code. In practice this will not be a problem, as this change applpies only to EOF code.
The following is a pseudo-Go implementation of an algorithm for predicating code validity. An equivalent algorithm must be run at initialization time.
This algorithm performs a symbolic execution of the program that recursively traverses the code, emulating its control flow and stack use and checking for violations of the rules above.
It runs in time equal to O(vertices + edges)
in the program's control-flow graph, where edges represent control flow and the vertices represent basic blocks – thus the algorithm takes time proportional to the size of the code. It uses recursion to maintain a stack of continuations for conditional jumps, the size of which is at most proportional to the size of the code
*Note: All valid code has a control-flow graph that can be traversed in time and space linear in the length of the code, while avoiding stack underflow and misalignment. That means that some other static analyses and code transformations that might otherwise require quadratic time can also be written to run in linear time, including linear-time complilers and streaming machine code generators.
For simplicity's sake we assume that jumpdest analysis has been done and that we have some helper functions.
is_valid_instruction(pc)
returns true if pc
points at a valid instruction.is_valid_jumpdest(dest)
returns true if dest
is a valid jumpdest.immediate_data(pc)
returns the immediate data for the instruction at pc
.advance_pc(pc)
advances to and returns the next pc
, skipping any immediate data.removed_items(pc,sp)
returns the number of items removed from the data_stack
by the instruction at pc
.add_items(pc,sp)
returns the number of items added to the data_stack
by the instruction at pc
.This EIP is intended to improve security by ensuring a higher level of safety for EVM code deployed on the blockchain.
Copyright and related rights waived via CC0.