Try   HackMD

Everything You Wanted to Know About Symbolic Execution for Ethereum Smart Contracts (But Were Afraid to Ask)

Author: Palina Tolmach

Formal verification techniques are widely regarded as a trustworthy, although complicated, way to rigorously ensure security and correctness of smart contracts. One such technique is symbolic execution, which allows users to check whether an execution of a program can violate a certain property or reach a vulnerable state. Symbolic execution is in the spotlight lately, with many tools available, including established ones like Mythril or Manticore, or newer ones like Halmos. In this blog post we introduce symbolic execution and explore the existing bytecode-level symbolic execution tools for Ethereum smart contracts.

We discuss our experiments with some of these tools and highlight their strengths and limitations. We also conveniently provide shell commands for you to reproduce every experiment!

This post is the first of two on the topic of symbolic analysis of smart contracts. The link to part 2 covering symbolic testing will be here once it's ready, so check back later! We publish a small smart contract benchmark in this post, and an extended version is to come in the later one. For now, enjoy part 1!

What is symbolic execution?

If you know the answer to that question, feel free to skip to the next section :)

Definitions

Symbolic execution is a program analysis technique which explores multiple execution paths at the same time. If a program is run concretely, i.e., on a specific concrete input, a single control flow path of this program is explored—that is what happens during the execution of a unit test. Similarly, fuzzing concretely executes a program on a set of random (yet, concrete) inputs. In contrast to these techniques, symbolic execution assigns symbolic—rather than concrete—values to input variables that determine which paths should be be executed. A symbolic value represents an arbitrary value that can be assigned to a variable; in other words, a symbolic value means that a variable can take any value that a variable of its type can have (modulo constraints imposed on this variable by the program code leading to the current execution point).

if (x > 5) {
    // symbolic variable x is _any_ integer > 5
} else {
    // symbolic variable x is _any_ integer <= 5
}

Symbolic execution can thus be viewed as a way to generalize testing. It is commonly used to check whether a certain property can be violated by any program execution, i.e., if a certain vulnerable state is reachable.

if (balance >= 0) {
    ...
} else {
    // this should be unreachable
    // but is it?...
}

To determine reachability, symbolic execution engines typically rely on an off-the-shelf satisfiability modulo theories (SMT) solver, such as Z3. Reachability is equivalent to the feasibility of a certain execution path. Execution paths are defined by a series of conditions in the execution flow (e.g., conditions contained in the if-statements). These path conditions are converted into a logical formula, which is then given to an SMT solver to check its satisfiability: whether this combination of conditions can theoretically evaluate to true. An SMT solver either finds a satisfying concrete assignment to symbolic variables, thus demonstrating path feasibility (e.g., the output might be along the lines of "this path is feasible if i == 6 && j == 17"); or proves that a formula is unsatisfiable, indicating that no satisfying assignment exists and the path is infeasible. Generating concrete program inputs that will cause specific paths to execute can be used to automatically generate test cases.

Symbolic Execution by Example

Consider the following smart contract. It was slightly modified from the code included in an issue from the Foundry repository titled condition not caught by invariant/fuzzer. Conditions that should always hold are often explicitly specifed in the code via assertions. In this particular code sample, we're interested in the assert in line 15. We wrap the body of a function in an unchecked block to ignore a possible underflow in line 8 (so that we focus on a failing assert detection).

pragma solidity ^0.8.17; contract PostExample { // https://github.com/foundry-rs/foundry/issues/2851 function backdoor(uint256 x) external pure { uint256 number = 99; unchecked { uint256 z = x - 1; if (z == 6912213124124531) { number = 0; } else { number = 1; } } assert(number != 0); } }

The contract PostExample contains a function backdoor that is called with an argument uint256 x. At the beginning of the function, the variable number is initialized with its value set to 99 (line 6). Line 8 initializes another variable z that is equal to x - 1, where x is the input parameter. Depending on whether z is equal to6912213124124531 (line 9), number is set either to 0 (line 10) or to 1 (line 12). Finally, the assert statement in line 15 is used to check whether number is not equal to 0 (i.e., if the true branchline 10is executed, the assert gets violated).

As the title of the issue and the screenshot below suggest, Foundry's fuzzer cannot identify the input that triggers the execution of the true branch of the conditional statement in line 9 even if we increase the number of runs:

import "forge-std/Test.sol"; import "../src/PostExample.sol"; contract PostExampleTest is Test { PostExample public example; function setUp() public { example = new PostExample(); } function testBackdoor(uint256 x) public view { example.backdoor(x); } }

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Symbolic execution, on the other hand, is particularly useful when it comes to the identification of a vulnerable state that might be hard to reach otherwise, e.g., using testing or black-box fuzzing. Let's see how the violation of assert(number != 0) in line 15 could have been identified using symbolic execution. The following figure illustrates the process of symbolically executing backdoor statement by statement.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

For each explored control flow path of a program, a symbolic execution engine maintains (1) a path condition—a first-order logic formula over variable values that captures the conditions satisfied by the branches taken along that path, and (2) a symbolic memory store that maps variables to symbolic expressions or values computed along that path. An execution of a branch statement updates the path condition, while an assignment updates the store. In the initial state, the path constraints

π are true and the input parameter x is assigned a symbolic (i.e., arbitary) value 🤷‍♀️. When number is initialized in line 6, the symbolic store
σ
is updated by associating a corresponding variable
number
with a concrete value of
99
. The symbolic store is then updated again when the value of z is set to x - 1, where x is an input parameter and, therefore, a symbolic variable.

A conditional branch in line 9 forks further execution into two subtrees. When a true branch is taken, the constraint from line 9 (in symbolic terms) is added to the path conditions, which become

π= 🤷‍♀️
1==6912213124124531
. When a false branch is taken, the negation of a branch condition is added to path conditionsfor example, if z == 6912213... does not hold, the path condition is updated as follows:
π=¬(
🤷‍♀️
1==6912213124124531)
. Depending on the branch taken, lines 10 and 12 assign variable number the value of 0 or 1, respectively, and, therefore, update the symbolic state.

The violation of an assert is equivalent to satisfiability of a logical formula constructed as a conjunction of a path condition and a negation of the asserted expression (shown at the very bottom of the illustration). After unrolling both execution traces of backdoor, we can conclude that one trace does reach a state where this formula is satisfiable. The assertion can be violated if line 10 (setting number to 0) gets executed, in other words, if the condition in line 9 (z == 6912213124124531) is satisfied. Since the value of z is derived from the input parameter x as z = x - 1, an SMT solver used to determine satisfiability of path constraints can also suggest concrete input values corresponding to this execution path:x = 6912213124124531 + 1 = 6912213124124532. The output produced by a symbolic execution tool (corresponding to a counterexample leading to the assert violation) usually looks like this: postExample.backdoor(6912213124124532).

As an illustration of the symbolic execution process, please watch Nicolas "Symbolic Execution" Cage forking himself at every branching point to explore all possible execution paths and identify a failed assertion in a 2007 movie "Next".

Here are a couple of links to our previous, more detailed writeup on symbolic execution and some exercises to get a better grasp on what is going on:

Challenges in Symbolic Execution

Although useful for various analyses—as will be seen in the remainder of this post—exhaustive exploration of all possible execution paths performed by symbolic execution comes with certain performance and scalability constraints (duh!). The challenges include:

  • Handling state space explosion (the number of states grows exponentially in the number of system components, and exploring smart contracts with, e.g., loops and recursive calls might be infeasible)
  • Constraint solving (certain types of constraints pose a difficulty to SMT solvers—these include non-linear arithmetic expressions which are common in DeFi smart contracts, e.g., the Uniswap invariant; or hashing of any kind)
  • Interaction with the environment (e.g., calling other smart contracts and accounting for possible side effects)
  • Modeling memory and storage (which can be fully symbolic, partially symbolic, or concretized)

What tools are there?

Symbolic execution has been widely used to implement vulnerability detection tools for smart contracts, with examples including Pakala, Oyente, Osiris, HoneyBadger, Maian, teEther. However, most of these tools are unmaintained, support a limited subset of EVM, are limited to Z3 as a solver, or have other limitations or drawbacks. After a high-level analysis and exploratory experiments, we selected the following four open-source EVM-level symbolic execution tools for comprehensive empirical analysis:

Manticore
A symbolic execution tool that is developed by Trail of Bits. Manticore is implemented in Python and supports several SMT solvers (Z3, Yices2, Boolector, cvc4). Manticore has several interfaces: it can be used as a CLI tool, through its Python API, or using manticore-verifier-a property-based symbolic execution tool. Besides vulnerability detection, Manticore automatically generates test cases corresponding to the execution paths identified through symbolic execution.

Mythril
A security analysis tool for EVM bytecode that performs vulnerability detection using symbolic execution. Mythril is developed by ConsenSys. It's implemented in Python and uses Z3 as an SMT solver through its Python API.

hevm
A symbolic execution engine and equivalence checker for EVM bytecode. Initially developed as part of the dapptools framework, hevm has been forked by Ethereum Foundation and is currently under active development-we are using the newer Ethereum Foundation fork for our experiments. hevm is implemented in Haskell and supports Z3 and cvc5 as solvers.

EthBMC
A bounded model checker for EVM bytecode based on symbolic execution developed in Ruhr-University Bochum. The only academic tool that made it into our list, EthBMC has been proposed in an academic paper presented at Usenix Security, 2020. A distinctive feature of EthBMC is its support of concrete validation of the identified counterexamples using geth. EthBMC is implemented in Rust and supports several SMT solvers (Z3, Yices2, Boolector).

Two other relevant tools include Halmos, a new tool by a16z; and KEVM, which is built on top of the K Framework. Both tools did not make it into this blog post for various reasons, but both perform symbolic testing, which is a topic for our next post, so stay tuned ;)

What can they do?

Let's look at how these four tools can be used to analyze Ethereum smart contracts. All experiments described in this post were performed on a MacBook Pro with an M1 Max chip and 64 GB of memory. When run in Docker (as suggested in the "follow along" snippets), you may see degraded performance depending on the amount of resources allocated to your containers.

Setup

If you want to follow along, you can pull pre-built Docker images using the following commands:

docker pull baolean/manticore:latest
docker pull baolean/mythril:latest
docker pull baolean/ethbmc:latest
docker pull baolean/hevm:latest

You can find these container images in this Docker Hub registry.

The smart contracts used for the evaluation are available in this repository, you can clone it using this command:

git clone https://github.com/baolean/symexec-bench.git && \
cd symexec-bench && \
git submodule init && git submodule update

Now you can run the Docker containers with the cloned repo mounted in /examples directory in the container.
Assuming that your current directory (pwd) is the cloned repo (you are in the symexec-bench folder), the command would look something like this:

docker run -it --rm --platform linux/amd64 \
   -v $(pwd):/examples baolean/hevm:latest

Try running hevm version to verify that everything works. To exit the container, type exit and press enter. Thanks to --rm flag, the container (not the image!) will self-delete upon exit, so you don't need to cleanup afterwards.

Detailed steps to install all the tools that we examine in this post can be found in the Dockerfiles: Manticore, Mythril, EthBMC, and hevm. If you prefer, you can build the images from scratch by running

docker build -t IMAGE_NAME .

in the respective cloned repository, or just install the tools directly on your machine.

Simple Example

First, let's compare the selected four tools on our PostExample smart contract from the introduction. Unless specified otherwise, the tools here are run in their default configuration-for vulnerability detection tools, such as Mythril, Manticore, EthBMC, this implies that all default detectors are enabled. In this post, however, we mainly compare the tools based on their ability to identify a potentially failing assert statement, which is commonly used to denote the condition that should always hold. Failed assert detection is supported by all analyzed tools except EthBMC, for which we have added it ourselves; in addition, we had to update the fork of Ethereum used by geth to a newer one (London). Manticore identifies failed asserts based on the INVALID opcode, so we set the version of Solidity to v0.7.6 in experiments with Manticore.

Symbolic Execution with Z3

Since all four tools support Z3 as an SMT solver, let's run all of them with Z3 as a solver as the first step. The analysis done by symbolic execution tools is typically performed by issuing one or more symbolic transactions, i.e., transactions with symbolic calldata. Since the failed assert can be detected based on one function call (with proper input), here we limit the symbolic analysis performed by all tools to one transaction.

Mythril performs symbolic execution through its myth a command, which takes a Solidity file as input, and in 2.7s identifies a failed assertion as well as the input that causes it (backdoor(6912213124124532)). The output shown on the screenshot below demonstrates the identified counterexample as well as some supplementary information related to the number of queries sent to an SMT solver and the time spent on these queries (76 queries processed in 0.31 seconds). Z3 is the only solver supported by Mythril, so it is used by default.

To follow along, enter the container by running:

docker run -it --rm --platform linux/amd64 \
   -v $(pwd):/examples baolean/mythril:latest

In the container, run:

myth a examples/PostExample/PostExample.sol -t1

Manticore's CLI can also take a Solidity file as input, however, as mentioned above, the version of Solidity should be downgraded to 0.7.6. The --smt.solver z3 flag ensures that Manticore will use Z3 as an SMT solver. Manticore should also be run with the --thorough-mode flag for it to report potential vulnerabilities (corresponding to WARNING: INVALID instruction in the screenshot shown below). The time used by an SMT solver is also available in the output (0.13 seconds).

Using Docker, it can be reproduced as:

docker run -it --rm --platform linux/amd64 \
   -v $(pwd):/examples baolean/manticore:latest

manticore examples/PostExample/PostExample_0.7.sol \
   --smt.solver z3 --thorough-mode --txlimit 1

Manticore also generates test cases for each possible execution path, as well as some other information related to the analysis (code coverage, etc)-all in 1.7 seconds! Exploring one of the generated files shows the identified counterexample:

Other benefits of using Manticore include the flexibility and customization of the analysis, e.g., different gas models, turning reasoning about symbolic balances on and off, etc. The information about the corresponding flags is available through manticore --help.

Different from Manticore and Mythril that can operate on a Solidity file, the following two tools-hevm and EthBMC-perform the analysis based on the runtime bytecode of a smart contract. The runtime binary can be generated by the Solidity compiler solc using the command solc --bin-runtime examples/PostExample/PostExample.sol:

hevm symbolically analyzes the binary of a smart contract when invoked via hevm symbolic. Z3 is used in hevm by default, but we can explicitly instruct hevm to use Z3 using --solver z3:


To reproduce:

docker run -it --rm --platform linux/amd64 \
   -v $(pwd):/examples baolean/hevm:latest

hevm symbolic --code \
   $(<examples/PostExample/PostExample.bin-runtime)

The execution takes less than a second (and even less than 0.1s!). However, the output requires additional decoding-the decoded output from the screenshot is indeed backdoor(6912213124124532), which can be verified using one of the available calldata decoders. This is left to the reader as an exercise; to do the decoding, use the ABI generated by the compiler (solc --abi examples/PostExample/PostExample.sol) and the generated calldata.

Calldata

0x99d7cd3400000000000000000000000000000000000000000000000000188e9f07e00f74

PostExample ABI

[{"inputs":[{"internalType":"uint256","name":"x","type":"uint256"}],"name":"backdoor","outputs":[],"stateMutability":"pure","type":"function"}]


EthBMC accepts the input in an even more specific format-it requires a user to supply a configuration YAML file, which includes the addresses of all contracts involved in the analysis, their balances, runtime binary, and nonce. The initial values of storage variables can also be provided in this file-otherwise, they are assumed to have default values for their type (the constructor of a smart contract is not executed or analyzed, since it is not included in the runtime bytecode, which is also true for hevm).

When provided with such input, EthBMC outputs an identified counterexample (that also has to be decoded) in 4.3s. As mentioned above, EthBMC concretely validates the identified counterexample against smart contracts locally deployed on geth, which helps avoid false positives (concrete validation can be turned off by the --no-verify flag).
EthBMC uses Z3 when run with the --solver z3 flag.

To reproduce:

docker run -it --rm --platform linux/amd64 \
   -v $(pwd):/examples baolean/ethbmc:latest

ethbmc examples/PostExample/PostExample.yml --solver z3

Our initial evaluation demonstrates that all four analyzed tools can successfully identify a failed assert in our examples within a few seconds-which is very good! The results averaged across five runs using Z3 as a solver are as follows:

Tool Language Solvers Time (Z3)
Mythril Python Z3 2.8 sec
Manticore Python Z3, cvc4, Boolector, Yices2 1.6 sec
hevm Haskell Z3, cvc5 0.1 sec
EthBMC Rust Z3, Boolector, Yices2 3.6 sec

Different Solvers

While the results of the performed analysis look very convincing, let's see if we can improve them even further by using SMT solvers other than Z3, since the performance may vary from solver to solver when presented with different tasks.

Indeed: running EthBMC with Yices2 as a solver makes the analysis complete in 0.19s, which is significantly less that 3.6s it took with Z3!

ethbmc examples/PostExample/PostExample.yml \
   --solver yice

Boolector also performs better than Z3, making the analysis finish in 0.3s:

ethbmc examples/PostExample/PostExample.yml \
   --solver boolector

Manticore takes 2s when run with Yices2:

manticore examples/PostExample/PostExample_0.7.sol \
   --smt.solver yices --thorough-mode --txlimit 1

1.6s with Boolector:

manticore examples/PostExample/PostExample_0.7.sol \
   --smt.solver boolector --thorough-mode --txlimit 1

While cvc4 takes 4.9s (which might be attributed to the fact that I run it on M1 through Rosetta):

manticore examples/PostExample/PostExample_0.7.sol \
   --smt.solver cvc4 --thorough-mode --txlimit 1

The performance of hevm with cvc5 is similar to the one hevm shows with Z3, taking around 0.08 seconds to finish the analysis. To reproduce:

hevm symbolic --code \
   $(<examples/PostExample/PostExample.bin-runtime) \
   --solver cvc5

The averaged results can be summarized as follows (the best performing solver is in bold, where applicable):

Tool Language Solvers Time (Z3) Best time
Mythril Python Z3 2.6 sec 2.6 sec
Manticore Python Z3, cvc4, Boolector, Yices2 1.6 sec 1.6 sec
hevm Haskell Z3, cvc5 0.1 sec 0.08 sec
EthBMC Rust Z3, Boolector, Yices2 3.6 sec 0.19 sec

Simple Example with Two Transactions

The example we were using so far demonstrates the advantages of using symbolic execution-however, it is also quite a trivial one. In many cases, it takes more than one transaction to get to the vulnerable state. Let's slightly modify our example contract so that it has to be "activated" via calling setLive(true) before the backdoor function can be invoked without being reverted.

pragma solidity ^0.8.17; contract PostExample { bool live; modifier isLive() { require(live); _; } function setLive(bool _live) external { live = _live; } // https://github.com/foundry-rs/foundry/issues/2851 function backdoor(uint256 x) external view isLive { uint256 number = 99; unchecked { uint256 z = x - 1; if (z == 6912213124124531) { number = 0; } else { number = 1; } } assert(number != 0); } }

Most of the tools can identify the two transactions leading to the assert violation, although it understandably takes almost twice as long. For example, Mythril correctly identifies the sequence of two function calls: setLive(true) followed by backdoor(691221...) in 5.4 seconds:

To reproduce:

myth a examples/PostExample/PostExample_2tx.sol -t2

Manticore takes 3.2 seconds to identify a violation and generate test cases:

To reproduce:

manticore examples/PostExample/PostExample_2tx_0.7.sol \
   --smt.solver boolector --thorough-mode --txlimit 2

EthBMC finds the vulnerable function call sequence in just 0.5 seconds:

To reproduce:

ethbmc examples/PostExample/PostExample_2tx.yml \
  --solver yice

Symbolic Storage

hevm however, detects a failed assert in only one transaction (and, still, in less than 0.1 seconds). Now, however, section Storage is added to the output, indicating that in slot 0x0 (corresponding to the live variable) the value is supposed to be 1 (i.e., true) in order for the violation to occur. This behavior corresponds to having so-called symbolic storage, which is used in hevm by default. With symbolic storage, the default value returned bySLOAD is assumed to be a symbolic value without other constraints (instead of it being equal to the default value for this type, e.g., false for a Boolean or 0 for a uint). In this mode, the tool performs under-constrained analysis which is likely to generate false positives corresponding to unreachable states. However, it is useful in some verification scenarios-similar to the one we're using-as it can reduce the number of function calls to be explored, i.e., in this example we can avoid having to call setLive(true) first. In addition to symbolic storage, hevm supports other storage models, which can be configured using the --storage-model [ConcreteS, SymbolicS, InitialS] flag). hevm currently limits the analysis to a single transaction, therefore, it cannot identify the violation when using a "concrete" storage model (--storage-model InitialS).

To reproduce:

hevm symbolic --code \
   $(<examples/PostExample/postExample_2tx.bin-runtime)

hevm symbolic --code \
   $(<examples/PostExample/postExample_2tx.bin-runtime) \
   --storage-model InitialS

In Mythril, symbolic storage is enabled via the --unconstrained-storage flag. The analysis exploring one symbolic transaction performed in this mode successfully finds a violation; it completes faster (3.4 sec) than the previously discussed two-transaction analysis (5.4 sec), but is slower than the one-transaction analysis performed in the previous section (1.6 sec). You may also notice that the query count is higher now.

To reproduce:

myth a examples/PostExample/PostExample_2tx.sol \
   -t1 --unconstrained-storage

We've added symbolic storage to both Manticore (see here) and EthBMC (see here). In Manticore, it could be achieved by simply changing the default storage value from zero to a symbolic value; in EthBMC, it took significantly more effort-in particular, due to the need to concretize the symbolic values corresponding to both symbolic storage locations and the values stored there in order to enable geth-based concrete validation; perhaps, a topic for another blog post.

When exploring one transaction with symbolic storage enabled, Manticore (--evm.symbolic_storage true) and EthBMC (--symbolic-storage) identify the violation in 2.2 and 0.2 seconds, respectively:

To reproduce:

manticore examples/PostExample/PostExample_2tx_0.7.sol \
   --smt.solver boolector --thorough-mode \
   --txlimit 1 --evm.symbolic_storage True

Similarly to hevm, section Concretized storage in the output produced by EthBMC shows the concretized locations and value of storage variables that enable the identified attack:

To reproduce:

ethbmc examples/PostExample/PostExample_2tx.yml \
   --solver yice --symbolic-storage
Tool Time (Z3) Best time Best: 2 txns Best: 2txns + SS
Mythril 2.6 sec 2.6 sec 5.1 sec 3.4 sec
Manticore 1.6 sec 1.6 sec 3.2 sec 2.2 sec
hevm 0.1 sec 0.08 sec x 0.08 sec
EthBMC 3.6 sec 0.19 sec 0.5 sec 0.2 sec

SS stands for Symbolic Storage.

Examples from Echidna VS Forge Repository

We can further test the analyzed symbolic execution tools by running them against the examples from the repository by clabby. Intially created to examine the performance of Echidna and Foundry fuzzers, the repository contains two smart contracts: Foo (having uint256 variables as function inputs) and FooBytes (having bytes32 variables as function inputs). The objective is to find two inputs equal to the magic number 0x69 to supply to contract's foo(uint256/bytes32) and bar(uint256/bytes32) functions. If both of these functions are called with an input of 0x69, another function revertIfCracked() will revert.

To test the analyzed tools, we have slightly modified the contracts to use the assert Solidity statement in revertIfCracked(), since the tools only report the violation if they identify a specific error code in the execution trace.

Mythril can find the three transactions leading to the violation in both smart contracts in 11 seconds. To detect the issue, Mythril should be called with the -t3 flag, increasing the limit of the analyzed transactions-otherwise, the analysis will complete without detecting the issue after only two symbolic transactions.

The decimal number 105 suggested as an argument to foo and bar corresponds to 0x69 in hexadecimal, which can be verified in a converter.

To reproduce:

myth a examples/echidna-vs-forge/symexec/Foo_assert.sol -t3

EthBMC identifies two possible counterexamples (with two different orderings of foo() and bar()) in 1.2 sec:

ethbmc examples/echidna-vs-forge/symexec/Foo.yml \
   --solver yice

Interestingly, even when instructed to analyze 3 transactions using the corresponding flag (--txlimit 3), Manticore finished the analysis after exploring only two transactions without uncovering the violation. After investigation, we realized that the third transaction is not analyzed because no new coverage is identified after the second one-commenting this out in the code makes Manticore find the issue in 43 seconds. Since the significant amount of time is spent on writing the generated test cases (this analysis of Foo generates 187 test cases!), we added the --only-invalid-testcases flag that makes it generate only the test cases that correspond to the identified violations. The analysis of Foo with this flag completes in 29 seconds:

To reproduce:

manticore examples/echidna-vs-forge/symexec/Foo_assert_0.7.sol \
 --thorough-mode --smt.solver boolector \
 --only-invalid-testcases --txlimit 3

hevm explores only one symbolic transaction-it, however, can identify a violation in under 0.1 sec by using symbolic storage (instead of setting the values of variables through calling the foo and bar functions):

To reproduce:

hevm symbolic --code \
   $(<examples/echidna-vs-forge/symexec/Foo.bin-runtime)

On the FooBytes smart contract that uses bytes32 variables, all tools perform similarly to Foo smart contract that uses uint256 variables (unlike fuzzers that perform differently depending on the type).

Tool Time (Z3) Best time Best: 2 txns Best: 2txns + SS Foo[Bytes]
Mythril 2.6 sec 2.6 sec 5.1 sec 3.4 sec 11 sec
Manticore 1.6 sec 1.6 sec 3.1 sec 2.2 sec 29 sec
hevm 0.1 sec 0.08 sec x 0.08 sec 0.07 sec (SS)
EthBMC 3.6 sec 0.19 sec 0.5 sec 0.2 sec 1.2 sec

Foo and FooBytes have similar time. SS stands for Symbolic Storage.

A More Realistic Example

What happens if we run all these tools on slighly more complicated smart contract code? Let's consider a simplified version of the Vat smart contract, which is part of MakerDAO. This example is based on a violation of the Fundamental DAI Equation invariant identified using Certora Prover-another formal verification tool.

The (simplified) MiniVat contract contains four functions init(), move(address, int256), frob(int256), and fold(int256). In addition to these, we encode the counterexample leading to the invariant violation in a separate function:

function counterexample() external { init(); frob(10**18); fold(-10**27); init(); assert(debt == Art * rate); }

Therefore, the task for a symbolic execution tool here is to analyze the counterexample() function and report a violation of the assert statement in line 6 (the arguments of functions are concrete and are leading to the violation, so here we're simply testing the capabilities of the tools to analyze the more or less realistic bytecode). You'd think that any symbolic execution tool would immediately catch that-and you'd be surprised ;).

As you will see, EthBMC is the only tool that maintains its high performance when used to analyze the MiniVat example (especially when using Boolector as a solver-the analysis is this case takes 0.8 sec; with Yices2 it takes 3.2 sec; with Z3-29 sec):

To reproduce:

ethbmc examples/Vat/MiniVat.yml \
   --solver boolector

Since EthBMC performs so well, we can ask it to identify a sequence of 2 function calls leading to the invariant violation, e.g.,

    function counterexample() public {
        init();
        frob(10**18);
        fold(-10**27);
        init();
    }

    function check_assert() public {
        assert(debt == Art * rate);
    }

This leads to a solver timeout (set to 120 seconds by default) on one of the queries, which, however, does not prevent EthBMC from identifying a vulnerable sequence of two transactions (counterexample(); check_assert()) in under 3 minutes when using Yices2 as a solver (interestingly, it is not the best-performing solver for a 1txn-MiniVat example). Experiments with increasing the timeout or modifying the contract may be an interesting exercise for the reader.

To reproduce:

ethbmc examples/Vat/MiniVat_2tx.yml \
   --solver yice

Manticore takes a long time (more than 10 minutes) to identify the violation in the MiniVat smart contract. However, the performance improves dramatically if we turn off vulnerability detectors for issues other than failing assert (by providing an --exclude flag). When generating only failing test cases (--only-invalid-testcases) and exploring only one transaction (--txlimit 1), Manticore identifies the assert violation in MiniVat in just 22 seconds! Please keep in mind that this approach only works if you know the vulnerability you are looking for-which is unfortunately rarely the case.

To reproduce:

manticore examples/Vat/MiniVat_0.7.sol --thorough-mode \
 --txlimit 1 --only-invalid-testcases \
 --exclude 'overflow,uninitialized-storage,uninitialized-memory,'`
   `'reentrancy,reentrancy-adv,unused-return,suicidal,'`
   `'delegatecall,ext-call-leak,env-instr,lockdrop'

Interestingly, when run on the 2-transaction example (counterexample(); check_assert()), Manticore successfully identifies the assert violation we're looking for in 16 minutes. However, it also reports two false positives that occur due to the "unsound symbolication" strategy used in Manticore to handle hashes, which incorrectly assumes there is a possible hash collision. These false positives are not produced when running the same analysis with a different hash strategy (--evm.sha3 concretize)-the execution in this case completes in a bit over 6 minutes.

To reproduce:

manticore examples/Vat/MiniVat_2tx_0.7.sol --thorough-mode \
 --only-invalid-testcases \
 --exclude 'overflow,uninitialized-storage,uninitialized-memory,'`
   `'reentrancy,reentrancy-adv,unused-return,suicidal,'`
   `'delegatecall,ext-call-leak,env-instr,lockdrop' \
 --evm.sha3 concretize

Since our experiments with other tools show that they can't keep up with the performance of EthBMC and Manticore on MiniVat, to make it easier for them, for the rest of the section by default we're using a modified version of MiniVat, in which counterexample() is the only external function, i.e., the only function that can and should be analyzed in the contract (the other four functions are made internal).

When run on this version of MiniVat, hevm finds a failed assert in a bit over 2 minutes if Z3 solver is used and the storage is assumed to be concrete (we did encounter an issue with an earlier version of hevm, which was very promptly addressed).

To reproduce:

hevm symbolic --code \
    $(<examples/Vat/MiniVat_internal.bin-runtime) \
    --storage-model InitialS

However, when cvc5 is used as a solver (--solver cvc5), the counterexample is not identified and the following error is displayed: Could not determine reachability of the following end states: ....

The analysis of the original version of MiniVat with all functions having public or external visibility completes in approximately 5m30s. To reproduce:

hevm symbolic --code \
    $(<examples/Vat/MiniVat.bin-runtime) \
    --storage-model InitialS

Mythril takes 2m19s to explore one symbolic transaction of the simplified MiniVat version (with internal functions) and does not report any violations. 125 seconds out of 140 are spent on constraint solving, which suggests there might be an SMT solver timeout, which prevents Mythril from completing the analysis:

To reproduce:

myth a examples/Vat/MiniVat_internal.sol -t1

As expected, if we increase the solver timeout, the analysis will not finish in 2.5 minutes (note that the warning message related to the timeout is not displayed under the default log level). To the contrary, the analysis does not complete even in 30 minutes before being interrupted (notice that 1814 seconds out of 1825 are spent on constraint solving):

To reproduce:

myth a examples/Vat/MiniVat_internal.sol -t1 \
   --solver-timeout 10000000

Delayed Constraint Solving

We can, however, further improve the performance of Mythril. First, lets restrict the types of vulnerabilities detected to failed asserts (-m Exceptions), similarly to what we did in Manticore. In addition, we can also use the --sparse-pruning 0 flag, which turns on delayed constraint evaluation. The same feature is used in hevm by default and seems to be available in Manticore's experimental branch, which was used during the Uniswap V3 audit. When performing delayed (or lazy) constraint evaluation, path reachability is not checked at every JUMPI-instead, solving the reachability queries is delayed, e.g., until the end of transaction. While effective, this approach may cause issues in the presence of loops (that will be analyzed infinitely); therefore, in Mythril, the frequency of reachability checking is controlled by the argument supplied with --sparse-pruning, while --ask-smt-iterations in hevm is used to specify the number of loop iterations to be explored before reachability checking.

The (optimized) analysis performed with the described parameters by Mythril takes only 1.5 minutes to find the counterexample:

To reproduce:

myth a examples/Vat/MiniVat_internal.sol -t1 \
   --solver-timeout 10000000 --pruning-factor 0 \
   -m Exceptions

We hope these short examples illustrate the splendeurs et misères of the symbolic execution tools available for Ethereum smart contracts! Deeply insightful with a right set of flags, they time out and fail in inexperienced hands. A lot of our work on symbolic execution at MakerDAO is aimed at simplifying and democratizing symbolic execution tools for a wider audience of smart contract developers who are not necessarily experts on formal verification.

tldr;

All four tools selected for evaluation (Mythril, Manticore, hevm, EthBMC) support basic EVM and symbolic execution features including hashing and inter-contract analysis; they also perform basic query optimizations such as, e.g., constant folding (evaluating constant expressions before sending them in a query to a solver).

Python-based tools such as Mythril and Manticore tend to have better UX and offer good flexibility and configurability of the performed analysis. hevm and EthBMC, on the other hand, are often faster. Manticore, hevm, and EthBMC support multiple solvers, which is useful in practice as different solvers have their strengths and weaknesses. More details on the pros and cons of each of these tools are discussed below:

Mythril

Upsides:

  • Actively maintained by ConsenSys
  • Good UX, highly configurable, e.g., supports several path exploration strategies
  • Supports multiple optimizations and features including lazy constraint solving and symbolic storage

Downsides:

  • Slower than hevm and EthBMC
  • Solver support is limited to Z3

Manticore

Upsides:

  • Maintained by Trail of Bits
  • Has several APIs
  • Supports multiple solvers
  • Highly configurable: supports different strategies for metering gas, modeling ETH balances, and handling hashes

Downsides:

  • Often slower than hevm and EthBMC
  • Does not fully support some of the optimizations and features, e.g., symbolic storage (although it's easy to add) and lazy constraint evaluation (although it's implemented in an experimental branch that was used during the Uniswap V3 audit)

EthBMC

Upsides:

  • Fast
  • Supports multiple solvers
  • Written in Rust (a Foundry-compatible language, which would allow for tighter integration, which might be useful, e.g., for concolic execution)

Downsides:

  • Not maintained since July 2021, misses several hard-fork updates (e.g., newer opcodes)
  • Suboptimal UX: input should be supplied as a YAML configuration file, output calldata should be decoded
  • Missing some features: symbolic storage and failed assert detection was added by us

hevm

Upsides:

  • Fast
  • Integrated with dapptools, supports symbolic testing
  • Supports symbolic storage mode, multiple SMT solvers

Downsides:

  • Under active development
  • Suboptimal UX: input should be supplied as a runtime binary, output calldata should be decoded
  • Does not explore transaction sequences longer than one

Summary

To conclude, symbolic execution is a promising technique when it comes to code security, with our experiments proving that it can find issues that some other techniques cannot. It is very exciting that there is active development in this area from many teams. At the same time, in their current state, symbolic execution tools often are fragile, cranky, and wayward. To get desired results and even get them to work at all, you often need to know their internals in and out. This post is the first in a series that documents our efforts on making symbolic execution more practical and accessible for wider audiences.

Promising Directions

Promising directions in symbolic execution for smart contracts include concolic execution-a combination of concrete testing (or fuzzing) and symbolic execution, which might help overcome both the scalability challenges of pure symbolic execution and the incomprehensiveness of random testing; and symbolic testing-symbolic execution of parameter-based tests that is performed by hevm through its integration with dapptools and by Halmos.

Based on our results, we decided to further experiment with EthBMC by integrating it with Foundry to enable symbolic testing as well. While symbolic testing is a topic of the next blog post, here is a small snippet of what this integration can do:

Other Resources

As mentioned in the introduction, we also prepared a tutorial and two exercises on symbolic execution for smart contracts:

References

Keith Schwarz. 2019. Lectures on Mathematical Foundations of Computing. Stanford University. https://web.stanford.edu/class/archive/cs/cs103/cs103.1202/lectures/04/Small04.pdf, https://web.stanford.edu/class/archive/cs/cs103/cs103.1202/lectures/05/Small05.pdf

R. Baldoni, E. Coppa, D. C. D'Elia, C. Demetrescu, and A. Finocchi. 2018. Survey of Symbolic Execution Techniques. ACM Computing Surveys. https://arxiv.org/pdf/1610.00502.pdf

J. C. King. 1976. Symbolic execution and program testing. ACM Communications. https://madhu.cs.illinois.edu/cs598-fall10/king76symbolicexecution.pdf

Trail of Bits. 2020. "How to use Manticore to find bugs in smart contracts". https://ethereum.org/en/developers/tutorials/how-to-use-manticore-to-find-smart-contract-bugs/

Web-resource. dapptools & Ethereum Foundation. 2023. "symbolic - hevm". https://hevm.dev/symbolic.html