Try   HackMD

Franck Cassez, Head of Research at Mantle. — all views expressed are my own.


Control flow graphs (CFGs) provide an abstract semantics of programs.
They are used in compilers to optimise the code, e.g. for constants propagation or for minimising the number of used registers.
CFGs are also instrumental in qualitative and quantitative analysis of programs and are used as inputs to static analysers or worst-case execution-time analysers.

Building CFGs from high-level (structured) code is reasonably easy, but unfortunately, building CFGs from low-level code e.g., ARM assembly or in our context, EVM bytecode, is notoriously hard.
This is mostly due to dynamic jumps: the next value of the program counter may depend on the history of the computation and is not hard-coded.

In this post I'll describe an algorithm to compute precise and minimal CFGs for EVM bytecode. The algorithm I have designed can handle complex EVM bytecode and construct CFGs for contracts like an ERC-20, or the Deposit Contract.

Overview

In the next sections I'll explain what a CFG is and explain how the algorithm works.
The algorithm uses a combination of abstract interpretation, path unfolding, loop detection and automata minimisation.

I have implemented the algorithm in Dafny, and the code is available in the evm-dis repository.
Dafny has several backends to generate code in C#, Java, Python and some other languages.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
I have built the Python, JavaScript, Java and C# artefacts so you don't need to install Dafny to run the CFG generator; you can use one of the target builds in Python, Java or C#.

You can refer to the usage section of the repository to see how to run your favourite[1] artefact(s) of the CFG generator.

What are Control Flow Graphs?

In this section I'll explain what a CFG is and give some simple examples for programs in EVM bytecode.

Execution of EVM Bytecode

Given an EVM program p in bytecode, and an input i for p, the semantics of the EVM uniquely defines run(p,i), the execution of p on input i, as a finite sequence of EVM states.
You can find the formal definition of an EVM state in the Dafny-EVM semantics. A state s comprises of the program counter s.pc, the stack content s.stack, memory content s.memory, and several other fields.
The set of executions of p over all its possible inputs is:
Runs(p)={run(p,i)|iI}
where I is the set of all inputs.

Now, assume we restrict our attention to the program counter in executions. Given an execution eRuns(p), which is a sequence s0,s1,,sn of EVM states, we define e.pc=s0.pc,s1.pc,,sn.pc the corresponding sequence of program counters.
We can apply this projection to all the executions of p and we obtain the set:
PCRuns(p)={e.pc | eRuns(p)}.
PCRuns(p) is a (possibly infinite) set of sequences of program counters. Sets of sequences are sometimes called languages. Finite automata can finitely represent sets in the family of regular languages.
It turns out that PCRuns(p) is not very often a regular language but it can be over-approximated by a regular language.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
A CFG for p is a finite automaton that over-approximates PCRuns(p).

By this definition, an automaton with a node per segment and edges between any two nodes is a CFG: it over-approximates the sequences of PCs. However this is not a very useful CFG as it does not provide any information on the program.
The art of building good CFG is construct approximations that are not too coarse and provide some information ont the program structure.

Examples of CFGs

The following simple EVM program TwoCalls below simulates two successive calls to the same function and illustrates some difficulties in constructing CFGs.

TwoCalls – EVM Bytecode Control Flow Graphs

The execution of the program below starts
with pc==0,
Segment 0. Segment 3 (0x0d) in the code below
is executed twice. There are two successive
jumps to
Segment 3:

  • first jump from 0x04;
  • second jump from 0x0a.
PC :Instruction // stack
// Segment 0 
0x00:PUSH1 0x05 // [0x05]
0x02:PUSH1 0x0d // [0x0d, 0x05]
0x04:JUMP       // call1 JUMP to 0x0d
// Segment 1                   
0x05:JUMPDEST   // [] empty
0x06:PUSH1      // 0x0b [0x0b]
0x08:PUSH1      // 0x0d [0x0d, 0x0b]
0x0a:JUMP       // call2 JUMP to 0x0d
// Segment 2
0x0b:JUMPDEST   // [] empty 
0x0c:STOP       // [] 
// Segment 3
0x0d:JUMPDEST // call1 [0x05]   | call2 [0x0b]
0x0e:JUMP     // Return to 0x05 | Return to 0x0b 

After the first jump we should return to 0x05; this
is the address that is pushed prior to the first call.
After the second jump, we should return to 0x0b;
this is the address pushed prior to the second call.

CFG1 (opposite, top) over-approximates sequences of PCs
in TwoCalls but contains some infeasible runs e.g.,
Seg0, Seg3, Seg1, Seg3, Seg1, Seg3, Seg2.

CFG2 is more accurate and captures exactly the set of
feasible sequences of PCs in TwoCalls.







CFG

CFG1 for TwoCalls


s15_4

N3. Segment 2 [0xb]  
JUMPDEST 
STOP 



s0_0

N0. Segment 0 [0x0]  
PUSH1 0x05 
PUSH1 0x0d 
JUMP 



s1_1

N1. Segment 3 [0xd]  
JUMPDEST 
JUMP 



s0_0->s1_1








s1_1->s15_4









s3_2

N2. Segment 1 [0x5]  
JUMPDEST 
PUSH1 0x0b 
PUSH1 0x0d 
JUMP 



s1_1->s3_2









s3_2->s1_1














CFG

CFG2 for TwoCalls


s0_0

N0. Segment 0 [0x0]  
PUSH1 0x05 
PUSH1 0x0d 
JUMP 



s1_1

N1. Segment 3 [0xd]  
JUMPDEST 
JUMP 



s0_0->s1_1








s3_2

N2. Segment 1 [0x5]  
JUMPDEST 
PUSH1 0x0b 
PUSH1 0x0d 
JUMP 



s1_1->s3_2








s7_3

N3. Segment 3 [0xd]  
JUMPDEST 
JUMP 



s3_2->s7_3








s15_4

N4. Segment 2 [0xb]  
JUMPDEST 
STOP 



s7_3->s15_4








How can we build CFGs?

The program counter is updated after each instruction.
For the majority of instructions like ADD, DUP1 etc, the program counter is advanced to the instruction that is next in the code, something like pc := pc + 1.
For other instructions like JUMP/JUMPI the execution can jump/branch to an address that is not the instruction after the current one.

In the example above, the bytecode is split into linear segments: contiguous sequences of instructions with no branching except possibly the last instruction. Segments' boundaries are determined by JUMPDEST (instructions we can jump to), JUMP/JUMPI, STOP or REVERT instructions.
Each segment ends in one of the following instructions: JUMP/JUMPI, or STOP/REVERT that end the program.
In the example above, there are four segments.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
In each segment it is easy to track the program counter upto the last instruction: it progresses to the next instruction in the code. The CFG of a segment is linear. To avoid clutering the CFG, we collapse all the nodes within a segment into the same CFG node.

A CFG (a finite automaton) defines sequences of CFG nodes as follows: a) there is a designated initial node, e.g. N0 in the CFGs above, and b) sequences are defined as the nodes we see on paths in the graph. Paths are built by following directed edges between two CFG nodes. For instance CFG1 contains a path N0, N1, N2, N1, N2, N3. Each node is labelled with a segment and a sequence of nodes defines a sequence of segments that are the labels of the nodes.

One of the difficulties in constructing a CFG for this program is that the target of the JUMP in Segment 3 is determined by the top element of the stack when we execute 0x0e: JUMP; and this is not the same the first and the second time.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
To construct CFGs we may have to track the history of states in the executions.

Let's now assume that we have a way to track the states. To build a CFG, we can just unfold each possible execution of the program, one at a time, and build the CFG as a collection of all the paths.
This is is going to work if executions have a maximum depth but it may never end if we have loops in the program.

Indeed another difficulty is to deal with JUMP/JUMPI instructions that branch to a segment we have already seen on the path we are currently unfolding. This is similar to what is happening in the TwoCalls program: after Segment 0, Segment 3, Segment 1, we encounter Segment 3 again. Do we add an edge to the previously seen node N1 that corresponds to has label Segment 3 or do we continue to unfold the graph?
If we choose to create an edge to the previsouly seen Segment 3 (CFG1), we add infeasible paths to the graph. If we continue to unfold and we are on a loop, the unfolding may never end.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

When unfolding a path in the CFG, if we encounter a segment that was already seen on a node in this path, how do we decide whether we can branch back to the previously seen node and stop the unfolding of the path?

For the reasons explained above, designing an algorithm that builds precise CFGs (minimising the number of infeasible paths) and always terminates is hard.
In the example of CFG1 above, CFG1 contains loops whereas TwoCalls does not so it is not a very precise CFG for TwoCalls.

In the next sections I'll explain how we can track the useful content of the state, and provide a criterion to stop unfolding a path.

An Algorithm to build CFGs

As mentioned before, it is notoriously hard to build CFGs for low-level code like ARM assembly. In ARM assembly, target addresses of JUMPs can be stored in memory or computed at runtime. And the target addresses can be anywhere in the code.

The situation for EVM bytecode is much more favourable.
First there is a known and fixed set of valid targets for JUMPs: the JUMPDEST instructions. In EVM bytecode, every target of a JUMP/JUMPImust be a JUMPDEST instruction. This is the case for the program TwoCalls introduced above.
This means that all the JUMP targets are in the set of JUMPDESTs. In itself this enables us to define a simple algorithm to builf a CFG: from each JUMP/JUMPI, we create edges to each JUMPDEST. This over-approximates the set of sequences of PCs, but this over-approximation may be too coarse to be of any use (e.g. CFG1 above).
There is a second EVM feature that somehow impacts how targets of jumps are stored and retrieved accross successive jumps.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
It is very unlikely that target addresses of JUMP/JUMPI will be stored in memory or storage because using memory or storage is much more expensive, gas-wise, than using the stack in the EVM.

As a result, compilers to EVM bytecode will almost certainly stack return addresses. It is also unlikely that targets of jumps are computed e.g. with ADD or MUL as every operation costs some gas.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Tracking the content of the stack and the effect of some simple stack operations may be enough to determine target addresses of JUMP/JUMPI instructions.

Abstract States & Semantics

An (EVM) abstract state s has two components: pc is the program counter, and stack is the stack represented by a sequence of elements. We use s.pc and s.stack to reference the components. We let AState be the set of abstract states.
The empty stack is the empty sequence [], and for non-empty stacks, the top is the first element of the sequence e.g., [a, b] means that a is at the top of the stack.
We can write s.Peek(k) the k+1-th element of the stack (from the top), with s.Peek(0) being the top.
Given a sequence x=x0,x1,x2,,xn, tail(x) is the sequence x1,x2,,xn.

The abstract semantics of EVM instructions is given by the function
Next:AState×IAState with
I the set of EVM instructions.
The complete definition of the Next function can be found here.

The semantics[2] of JUMP and JUMPI are: given a state s=(pc,st),

Next((pc,st),JUMP)=(st.Peek(0),tail(st))Next((pc,st),JUMPI)={ifst.Peek(1)>0then(st.Peek(0),tail(tail(st)))else(pc+1,tail(tail(st))).
The abstract semantics of other instructions can be defined in the same way.
However, we also want to ignore some of the stack content that may not be relevant to tracking the JUMP targets.
To do so, we have two types of elements in the stack: values, non-negative integers and unknown denoted as .
As mentioned before, most of the time, target addresses of JUMPs will be stacked, with a PUSH instructions. The stack is then managed in a way that the target addresses are on top of the stack when needed by a JUMP (this is is the role of the compiler to ensure that the code meets this requirement).
This also means that lots of operations on the stack e.g., ADD, will not produce targets of JUMPs and the actual result can be ignored, but of course not the effect on the stack size.
As a result we define the abstract semantics of operations like ADD, SUB, PUSH0 and others as follows:
Next((pc,st),ADD)=(pc+1,[]+tail(tail(st)))Next((pc,st),PUSH0)=(pc+1,[]+st).
The trick here is that the ADD pops the two topmost elements of the stack, adds them and pushes the result, but in the abstract semantics, we omit the actual result and replaces it by .
For operations that may push jump targets like PUSH1 to PUSH32, we want to track the values pushed on the stack[3]:
Next((pc,st),PUSHk v)=(pc+1+k,[v]+st).
And finally, for other stack operations like DUPk, SWAPk, we re-arrange the stack elements according to the operation without modifying their contents:
Next((pc,st),DUP1)=(pc+1,[st.Peek(0)]+st)Next((pc,st),SWAP1)=(pc+1,[st.Peek(1),st.Peek(0)]+tail(tail(st))).

Let's see on an example how this works. The program below, Loop 1, contains a loop.
In this code snippet, the loop can repeat forever and the stack size can grow arbitrary large[4].
If we use our abstract semantics to track the content of the stack (? denotes the unknown stack value ), we obtain the values below.

Loop 1: Function Call/Return – EVM Bytecode Control Flow Graph
PC  Instruction // stack 

//  Segment 0
0x00:PUSH1 0x02 // [0x02]

//  Segment 1
0x02:JUMPDEST   // [0x02]
0x03:PUSH0      // [?, 0x02]
0x04:SWAP1      // [0x02, ?]
0x05:DUP1       // [0x02, 0x02, ?]
0x06:JUMP       // [0x02, ?] ...

After n jumps, the abstract stack content at 0x02 is
[0x02, ?, ?, ... ?] (n unknown values).







CFG

Loop 1 – CFG


s0_0

Segment 0 [0x0] 
PUSH1 0x02 



s0_1

Segment 1 [0x2] 
JUMPDEST 
PUSH0 
SWAP1 
DUP1 
JUMP 



s0_0->s0_1








s0_1->s0_1








This seems to be very good and we can sucessfully track the targets of JUMPs after each iteration.
But if we unfold the CFG after each iteration, we build arbitrary long paths as the stack size grows arbitrary large.

So how do we compute the CFG on the righthand side? How can we guarantee that we always end up with the same target address after the JUMP instruction at 0x06?

Loop Detection

When we unfold a path π in the CFG we can collect all the values of PCs[5] encountered so far.
If we reach a value of PC that we have already seen on π at index i i.e. π[i], we may check if we can stop unfolding the path and branch back to π[i].

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
But what is exactly the criterion to decide if we can stop unfolding and add an edge to a node we have already seen?

Let's take the example of Loop 1 above.
Unfolding the graph leads to something like:







%0



n1

Node 0
Segment 1 
 PUSH1 0x02



n2

Node 1
Segment 2 
JUMPDEST
PUSH0 
... 
DUP1
JUMP



n1->n2


Next



n3

Node 2
Segment 2 
JUMPDEST
PUSH0 
... 
DUP1
JUMP



n2->n3


JUMP



n4

Node 3 
Segment 2 
JUMPDEST
PUSH0 
... 
DUP1
JUMP



n3->n4


JUMP



n5
    ...



n4->n5





As we have seen an occurrence of Segment 1 before Node 2, we would like to know if can we stop at Node 2 and add a loop edge on Node 1.
We would like to fold this graph like so:







%0



n1

Node 0 (Segment 0)



n2

Node 1 (Segment 1)



n1->n2


Next



n2->n2


JUMP ?



We can do so if we can guarantee that every time we execute Segment 1 (Node 1), we end up after the JUMP at Segment 1 again, i.e. with pc==0x02.

A sufficient condition to test whether this holds is the following:

  1. compute the weakest precondition for Segment 1, with a target post conditionpc==0x02;
  2. check whether this weakest precondition is invariant when executing Segment 1.

Weakest Preconditions

If you are not familiar with preconditions and weakest preconditions, I'll try to explain what they mean.

Assume we want to guarantee that after executing a JUMP instruction, the program counter is 0x02. As a JUMP instruction updates the program counter to the value at the top of the stack, the least we can require is that, before we execute the JUMP instruction, the top of the stack contains 0x02.
Any other choice would not lead to 0x02 after the JUMP.
Overall, if we start in an (EVM) state such that the top of the stack is 0x02, written as the predicate Peek(0)==0x02, a JUMP instruction will update the program counter to 0x02.

Peek(0)==0x02 is a precondition (holds before) that ensures that executing a JUMP instruction leads to the postcondition (holds after) pc==0x02.
If you are familiar with Hoare triples and Hoare logic, this can be summarised as:
{ Peek(0)=0x02 }JUMP{ pc=0x02 }.
Moreover, Peek(0)==0x02 is the minimal requirement to ensure that pc==0x02 after executing a JUMP. So Peek(0)==0x02 is the weakest precondition for pc==0x02 after a JUMP. Let's write that as follows:
WPre(JUMP,pc==0x02)=Peek(0)=0x02.
Remember that a weakest precondition is a predicate, so the equation above reads "WPre is the predicate Peek(0) == 0x02" not that "Wpre is equal to Peek(0) itself equal to 0x02".
Computing weakest preconditions can be extended to sequences of instructions by computing the WPre from the last to to first instructions. For a sequence of instructions xi+[a], and a predicate P,
WPre(xi+[a],P)=WPre(xi,WPre(a,P)).

For instance:
WPre(DUP1 JUMP,pc==0x02)=WPre(DUP1,WPre(JUMP,pc=0x02))=WPre(DUP1,Peek(0)=0x02)=Peek(0)=0x02
We just back propagate weakest preconditions from the last instruction to the first.
If we do this computation for all the instructions in Segment 1, we obtain:
WPre(JUMPDEST PUSH0 SWAP1 DUP1 JUMP,pc=0x02)=WPre(JUMPDEST PUSH0 SWAP1,Peek(0)=0x02)=WPre(JUMPDEST PUSH0,WPre(SWAP1,Peek(0)=0x02))=WPre(JUMPDEST PUSH0,Peek(1)=0x02)=WPre(JUMPDEST,WPre(PUSH0,Peek(1)=0x02))=WPre(JUMPDEST,Peek(0)=0x02)=Peek(0)=0x02
So the weakest precondition for Segment 1 to guarantee the postcondition pc=0x02 is Peek(0)=0x02.
Let's call this precondition WPRE1.

Is the Weakest Precondition an invariant?

To decide whether we can add a loop edge on Node 1 in the CFG above, we are going to check whether WPRE1 is preserved by Segment 1.
By preserved, I mean, if an EVM abstract state s0 satisfies WPRE1, and we execute Segment 1, then the resulting state s1 also satisfies WPRE1.
You can see here that if this is the case, WPRE1 is an invariant for Segment 1: if it holds before we execute it, it holds at the end.
And this implies that after any number of executions of Segment 1, we have pc == 0x02.
Putting everything together, we can guarantee that any future computations after Segment 1 can only execute Segment 1 and so we can safely add a loop edge to Node 1 in our CFG.

How do we check that WPRE1 is an invariant?
We are going to compute the postcondition of WPRE1 after Segment 1.
If it implies WPRE1 then it is an invariant.
I'll skip the details here, but let's start in a state s that satifies
WPRE1 i.e. such that s.stack=[0x02,] and s.pc=0x02. After executing Segment 1 from s, we end up in a new state s with s.stack=[0x02,0x00,] and s.pc=0x02. Does s satisfy WPRE1? Yes, because the top of the stack is 0x02. So WPRE1 is an invariant for Segment 1.

And this means that we can add an edge from Node 1 to Node 1 in the CFG because we have proved that this is the only possible target of the JUMP.

To be complete we also need to take into account any stack constraint SC1 that should hold when we reach the first occurrence of Node 1. We do so by building a conjunct WPre := WPRE1 and SC1 and checking the invariance of WPRE.

Automata Minimisation

The algorithm presented above unfolds each execution path separately and stops the unfolding when an invariant (as presented below) is detected on the path.

This may lead to CFGs that are not minimal. For instance, the CFG on the lefthand side below corresponds to the unfolding of two paths that end up in the same sequence of segments: Segment 3, Segment 1.
In other words, the orange and red nodes can be merged as they define the same future sequences of PCs.

Minimising the raw CFG on the left amounts to minimising an automaton and can be done with standard techniques to partition the set of of nodes in equivalence classes. I have implemented a minimisation algorithm[6] that can generate a minimised version of the path-unfolding CFG. The result of the minimisation for the Raw CFG below is depicted on the righthand side.

Raw CFG Minimised CFG






CFG



s0_0

Segment 0 [0x0]   
PUSH1 0x0a 
PUSH1 0x08 
PUSH1 0x03 
SWAP1 
PUSH1 0x13 
JUMP 



s1_1

Segment 2 [0x13]   
JUMPDEST 
SWAP2 
SWAP1 
DUP1 
DUP4 
LT 
PUSH1 0x1f 
JUMPI 



s0_0->s1_1








s1_2

Segment 3 [0x1c]   
JUMPDEST 
POP 
JUMP 



s1_1->s1_2








s3_2

Segment 4 [0x1f]   
JUMPDEST 
SWAP1 
SWAP2 
POP 
SWAP1 
PUSH0 
PUSH1 0x1c 
JUMP 



s1_1->s3_2








s5_3

Segment 1 [0xa]   
JUMPDEST 
PUSH1 0x40 
MSTORE 
PUSH1 0x20 
PUSH1 0x40 
RETURN 



s1_2->s5_3








s7_3

Segment 3 [0x1c]   
JUMPDEST 
POP 
JUMP 



s3_2->s7_3








s15_4

Segment 1 [0xa]   
JUMPDEST 
PUSH1 0x40 
MSTORE 
PUSH1 0x20 
PUSH1 0x40 
RETURN 



s7_3->s15_4














CFG



s0_0

Segment 0 [0x0]   
PUSH1 0x0a 
PUSH1 0x08 
PUSH1 0x03 
SWAP1 
PUSH1 0x13 
JUMP 



s1_1

Segment 2 [0x13]   
JUMPDEST 
SWAP2 
SWAP1 
DUP1 
DUP4 
LT 
PUSH1 0x1f 
JUMPI 



s0_0->s1_1








s1_2

Segment 3 [0x1c]   
JUMPDEST 
POP 
JUMP 



s1_1->s1_2








s3_2

Segment 4 [0x1f]   
JUMPDEST 
SWAP1 
SWAP2 
POP 
SWAP1 
PUSH0 
PUSH1 0x1c 
JUMP 



s1_1->s3_2








s5_3

Segment 1 [0xa]   
JUMPDEST 
PUSH1 0x40 
MSTORE 
PUSH1 0x20 
PUSH1 0x40 
RETURN 



s1_2->s5_3








s3_2->s1_2








Conclusion

I have presented an algorithm to construct CFGs from EVM bytecode.
The algorithm uses an abstract semantics of the EVM, and a semantically sound criterion to detect loops on execution paths.
Because we use an abstract semantics, the construction may fail in some cases, for instance:

  1. if a target of a JUMP is stored outside of the stack,
  2. if a target of a JUMP is computed with e.g. ADD,
  3. if we never find an invariant.

As discussed in this post, 1. and 2. are unlikely to occur as reasonable
compiler strategies would choose to store targets of JUMPs on the stack.
Regarding 3., "the proof (of usefulness) is in the pudding".
I have tried to construct several CFGs for programs with loops and the algorithm always terminated and successfuly detected path invariants.

The algorithm presented above offers precision by unfolding each path separately, sound loop folding by using a semantically sound criterion to
detect loops on a path, and a minimal size CFG by collapsing nodes to equivalence classes of nodes in the graph.

Acknowledgements

Thanks to Andreas Penzkofer for insighful comments on the draft version of this post.

I have previsouly worked on CFG construction. The technique presented above is inspired by the Master's Thesis work
of D. Klumpp that I supervised at Macquarie University, Sydney.

Automated Control Flow Reconstruction from Assembler Programs. D. Klumpp. Masters' Thesis. Dec. 2018


  1. Other builds will be released when the corresponding Dafny backends are ready, e.g. Rust. ↩︎

  2. I omit the checks on the stack sizes and assume when Peek(k) is used there are at least k + 1 elements on the stack. ↩︎

  3. Note that PUSHk instructions have k arguments, so the next instruction is at address 1+k+pc in the code. ↩︎

  4. I know this is not feasible in the EVM but we are building CFGs that are over-approximations here. ↩︎

  5. We only need to collect the PCs at the beginning of each segment. ↩︎

  6. A mixture of Moore and Hopcroft ↩︎