Franck Cassez, Head of Research at Mantle. — all views expressed are my own.
Learn More →
Control flow graphs (CFGs) provide an abstract semantics of programs.
They are used in compilers to optimise the code, e.g. for constants propagation or for minimising the number of used registers.
CFGs are also instrumental in qualitative and quantitative analysis of programs and are used as inputs to static analysers or worst-case execution-time analysers.
Building CFGs from high-level (structured) code is reasonably easy, but unfortunately, building CFGs from low-level code e.g., ARM assembly or in our context, EVM bytecode, is notoriously hard.
This is mostly due to dynamic jumps: the next value of the program counter may depend on the history of the computation and is not hard-coded.
In this post I'll describe an algorithm to compute precise and minimal CFGs for EVM bytecode. The algorithm I have designed can handle complex EVM bytecode and construct CFGs for contracts like an ERC-20, or the Deposit Contract.
In the next sections I'll explain what a CFG is and explain how the algorithm works.
The algorithm uses a combination of abstract interpretation, path unfolding, loop detection and automata minimisation.
I have implemented the algorithm in Dafny, and the code is available in the evm-dis repository.
Dafny has several backends to generate code in C#, Java, Python and some other languages.
You can refer to the usage section of the repository to see how to run your favourite[1] artefact(s) of the CFG generator.
In this section I'll explain what a CFG is and give some simple examples for programs in EVM bytecode.
Given an EVM program
You can find the formal definition of an EVM state in the Dafny-EVM semantics. A state
The set of executions of
where
Now, assume we restrict our attention to the program counter in executions. Given an execution
We can apply this projection to all the executions of
It turns out that
By this definition, an automaton with a node per segment and edges between any two nodes is a CFG: it over-approximates the sequences of PCs. However this is not a very useful CFG as it does not provide any information on the program.
The art of building good CFG is construct approximations that are not too coarse and provide some information ont the program structure.
The following simple EVM program TwoCalls
below simulates two successive calls to the same function and illustrates some difficulties in constructing CFGs.
TwoCalls – EVM Bytecode | Control Flow Graphs |
---|---|
The execution of the program below starts
After the first jump we should return to CFG1 (opposite, top) over-approximates sequences of PCs CFG2 is more accurate and captures exactly the set of |
|
The program counter is updated after each instruction.
For the majority of instructions like ADD
, DUP1
etc, the program counter is advanced to the instruction that is next in the code, something like pc := pc + 1
.
For other instructions like JUMP/JUMPI
the execution can jump/branch to an address that is not the instruction after the current one.
In the example above, the bytecode is split into linear segments: contiguous sequences of instructions with no branching except possibly the last instruction. Segments' boundaries are determined by JUMPDEST
(instructions we can jump to), JUMP/JUMPI
, STOP
or REVERT
instructions.
Each segment ends in one of the following instructions: JUMP/JUMPI
, or STOP/REVERT
that end the program.
In the example above, there are four segments.
A CFG (a finite automaton) defines sequences of CFG nodes as follows: a) there is a designated initial node, e.g. N0
in the CFGs above, and b) sequences are defined as the nodes we see on paths in the graph. Paths are built by following directed edges between two CFG nodes. For instance CFG1 contains a path N0, N1, N2, N1, N2, N3
. Each node is labelled with a segment and a sequence of nodes defines a sequence of segments that are the labels of the nodes.
One of the difficulties in constructing a CFG for this program is that the target of the JUMP
in Segment 3 is determined by the top element of the stack when we execute 0x0e: JUMP
; and this is not the same the first and the second time.
Let's now assume that we have a way to track the states. To build a CFG, we can just unfold each possible execution of the program, one at a time, and build the CFG as a collection of all the paths.
This is is going to work if executions have a maximum depth but it may never end if we have loops in the program.
Indeed another difficulty is to deal with JUMP/JUMPI
instructions that branch to a segment we have already seen on the path we are currently unfolding. This is similar to what is happening in the TwoCalls
program: after Segment 0, Segment 3, Segment 1, we encounter Segment 3 again. Do we add an edge to the previously seen node N1
that corresponds to has label Segment 3 or do we continue to unfold the graph?
If we choose to create an edge to the previsouly seen Segment 3 (CFG1), we add infeasible paths to the graph. If we continue to unfold and we are on a loop, the unfolding may never end.
For the reasons explained above, designing an algorithm that builds precise CFGs (minimising the number of infeasible paths) and always terminates is hard.
In the example of CFG1 above, CFG1 contains loops whereas TwoCalls
does not so it is not a very precise CFG for TwoCalls
.
In the next sections I'll explain how we can track the useful content of the state, and provide a criterion to stop unfolding a path.
As mentioned before, it is notoriously hard to build CFGs for low-level code like ARM assembly. In ARM assembly, target addresses of JUMPs can be stored in memory or computed at runtime. And the target addresses can be anywhere in the code.
The situation for EVM bytecode is much more favourable.
First there is a known and fixed set of valid targets for JUMP
s: the JUMPDEST
instructions. In EVM bytecode, every target of a JUMP/JUMPI
must be a JUMPDEST
instruction. This is the case for the program TwoCalls
introduced above.
This means that all the JUMP
targets are in the set of JUMPDEST
s. In itself this enables us to define a simple algorithm to builf a CFG: from each JUMP/JUMPI
, we create edges to each JUMPDEST
. This over-approximates the set of sequences of PCs, but this over-approximation may be too coarse to be of any use (e.g. CFG1 above).
There is a second EVM feature that somehow impacts how targets of jumps are stored and retrieved accross successive jumps.
JUMP/JUMPI
will be stored in memory or storage because using memory or storage is much more expensive, gas-wise, than using the stack in the EVM.
As a result, compilers to EVM bytecode will almost certainly stack return addresses. It is also unlikely that targets of jumps are computed e.g. with ADD
or MUL
as every operation costs some gas.
JUMP/JUMPI
instructions.
An (EVM) abstract state s
has two components: pc
is the program counter, and stack
is the stack represented by a sequence of elements. We use s.pc
and s.stack
to reference the components. We let
The empty stack is the empty sequence []
, and for non-empty stacks, the top is the first element of the sequence e.g., [a, b]
means that a
is at the top of the stack.
We can write
Given a sequence
The abstract semantics of EVM instructions is given by the function
The complete definition of the
The semantics[2] of JUMP
and JUMPI
are: given a state
The abstract semantics of other instructions can be defined in the same way.
However, we also want to ignore some of the stack content that may not be relevant to tracking the JUMP
targets.
To do so, we have two types of elements in the stack: values, non-negative integers and unknown denoted as
As mentioned before, most of the time, target addresses of JUMP
s will be stacked, with a PUSH
instructions. The stack is then managed in a way that the target addresses are on top of the stack when needed by a JUMP
(this is is the role of the compiler to ensure that the code meets this requirement).
This also means that lots of operations on the stack e.g., ADD
, will not produce targets of JUMP
s and the actual result can be ignored, but of course not the effect on the stack size.
As a result we define the abstract semantics of operations like ADD, SUB, PUSH0
and others as follows:
The trick here is that the ADD
pops the two topmost elements of the stack, adds them and pushes the result, but in the abstract semantics, we omit the actual result and replaces it by
For operations that may push jump targets like PUSH1
to PUSH32
, we want to track the values pushed on the stack[3]:
And finally, for other stack operations like DUPk
, SWAPk
, we re-arrange the stack elements according to the operation without modifying their contents:
Let's see on an example how this works. The program below, Loop 1, contains a loop.
In this code snippet, the loop can repeat forever and the stack size can grow arbitrary large[4].
If we use our abstract semantics to track the content of the stack (?
denotes the unknown stack value
Loop 1: Function Call/Return – EVM Bytecode | Control Flow Graph |
---|---|
After |
|
This seems to be very good and we can sucessfully track the targets of JUMP
s after each iteration.
But if we unfold the CFG after each iteration, we build arbitrary long paths as the stack size grows arbitrary large.
So how do we compute the CFG on the righthand side? How can we guarantee that we always end up with the same target address after the JUMP
instruction at 0x06
?
When we unfold a path
If we reach a value of PC that we have already seen on
Let's take the example of Loop 1 above.
Unfolding the graph leads to something like:
As we have seen an occurrence of Segment 1 before Node 2
, we would like to know if can we stop at Node 2
and add a loop edge on Node 1
.
We would like to fold this graph like so:
We can do so if we can guarantee that every time we execute Segment 1 (Node 1
), we end up after the JUMP
at Segment 1 again, i.e. with pc==0x02
.
A sufficient condition to test whether this holds is the following:
pc==0x02
;If you are not familiar with preconditions and weakest preconditions, I'll try to explain what they mean.
Assume we want to guarantee that after executing a JUMP
instruction, the program counter is 0x02
. As a JUMP
instruction updates the program counter to the value at the top of the stack, the least we can require is that, before we execute the JUMP
instruction, the top of the stack contains 0x02
.
Any other choice would not lead to 0x02
after the JUMP
.
Overall, if we start in an (EVM) state such that the top of the stack is 0x02
, written as the predicate Peek(0)==0x02
, a JUMP
instruction will update the program counter to 0x02
.
Peek(0)==0x02
is a precondition (holds before) that ensures that executing a JUMP
instruction leads to the postcondition (holds after) pc==0x02
.
If you are familiar with Hoare triples and Hoare logic, this can be summarised as:
Moreover, Peek(0)==0x02
is the minimal requirement to ensure that pc==0x02
after executing a JUMP
. So Peek(0)==0x02
is the weakest precondition for pc==0x02
after a JUMP
. Let's write that as follows:
Remember that a weakest precondition is a predicate, so the equation above reads "WPre is the predicate Peek(0) == 0x02
" not that "Wpre is equal to Peek(0) itself equal to 0x02
".
Computing weakest preconditions can be extended to sequences of instructions by computing the WPre from the last to to first instructions. For a sequence of instructions
For instance:
We just back propagate weakest preconditions from the last instruction to the first.
If we do this computation for all the instructions in Segment 1, we obtain:
So the weakest precondition for Segment 1 to guarantee the postcondition
Let's call this precondition WPRE1
.
To decide whether we can add a loop edge on Node 1
in the CFG above, we are going to check whether WPRE1
is preserved by Segment 1.
By preserved, I mean, if an EVM abstract state WPRE1
, and we execute Segment 1, then the resulting state WPRE1
.
You can see here that if this is the case, WPRE1
is an invariant for Segment 1: if it holds before we execute it, it holds at the end.
And this implies that after any number of executions of Segment 1, we have pc == 0x02
.
Putting everything together, we can guarantee that any future computations after Segment 1 can only execute Segment 1 and so we can safely add a loop edge to Node 1
in our CFG.
How do we check that WPRE1
is an invariant?
We are going to compute the postcondition of WPRE1
after Segment 1.
If it implies WPRE1
then it is an invariant.
I'll skip the details here, but let's start in a state
WPRE1
i.e. such that WPRE1
? Yes, because the top of the stack is 0x02
. So WPRE1
is an invariant for Segment 1.
And this means that we can add an edge from Node 1
to Node 1
in the CFG because we have proved that this is the only possible target of the JUMP
.
To be complete we also need to take into account any stack constraint SC1
that should hold when we reach the first occurrence of Node 1
. We do so by building a conjunct WPre := WPRE1 and SC1
and checking the invariance of WPRE
.
The algorithm presented above unfolds each execution path separately and stops the unfolding when an invariant (as presented below) is detected on the path.
This may lead to CFGs that are not minimal. For instance, the CFG on the lefthand side below corresponds to the unfolding of two paths that end up in the same sequence of segments: Segment 3, Segment 1
.
In other words, the orange and red nodes can be merged as they define the same future sequences of PCs.
Minimising the raw CFG on the left amounts to minimising an automaton and can be done with standard techniques to partition the set of of nodes in equivalence classes. I have implemented a minimisation algorithm[6] that can generate a minimised version of the path-unfolding CFG. The result of the minimisation for the Raw CFG below is depicted on the righthand side.
Raw CFG | Minimised CFG |
---|---|
|
|
I have presented an algorithm to construct CFGs from EVM bytecode.
The algorithm uses an abstract semantics of the EVM, and a semantically sound criterion to detect loops on execution paths.
Because we use an abstract semantics, the construction may fail in some cases, for instance:
JUMP
is stored outside of the stack,JUMP
is computed with e.g. ADD
,As discussed in this post, 1. and 2. are unlikely to occur as reasonable
compiler strategies would choose to store targets of JUMP
s on the stack.
Regarding 3., "the proof (of usefulness) is in the pudding".
I have tried to construct several CFGs for programs with loops and the algorithm always terminated and successfuly detected path invariants.
The algorithm presented above offers precision by unfolding each path separately, sound loop folding by using a semantically sound criterion to
detect loops on a path, and a minimal size CFG by collapsing nodes to equivalence classes of nodes in the graph.
Thanks to Andreas Penzkofer for insighful comments on the draft version of this post.
I have previsouly worked on CFG construction. The technique presented above is inspired by the Master's Thesis work
of D. Klumpp that I supervised at Macquarie University, Sydney.
Automated Control Flow Reconstruction from Assembler Programs. D. Klumpp. Masters' Thesis. Dec. 2018
Other builds will be released when the corresponding Dafny backends are ready, e.g. Rust. ↩︎
I omit the checks on the stack sizes and assume when
Note that PUSHk
instructions have k arguments, so the next instruction is at address
I know this is not feasible in the EVM but we are building CFGs that are over-approximations here. ↩︎
We only need to collect the PCs at the beginning of each segment. ↩︎
A mixture of Moore and Hopcroft ↩︎