Everything You Wanted to Know About Symbolic Execution for Ethereum Smart Contracts (But Were Afraid to Ask)

# Everything You Wanted to Know About Symbolic Execution for Ethereum Smart Contracts (But Were Afraid to Ask) Author: [Palina Tolmach](https://twitter.com/palinatolmach) Formal verification techniques are widely regarded as a trustworthy, although complicated, way to rigorously ensure security and correctness of smart contracts. One such technique is symbolic execution, which allows users to check whether an execution of a program *can* violate a certain property or reach a vulnerable state. Symbolic execution is in the spotlight lately, with many tools available, including established ones like [Mythril](https://github.com/ConsenSys/mythril) or [Manticore](https://github.com/trailofbits/manticore), or newer ones like [Halmos](https://a16zcrypto.com/content/article/symbolic-testing-with-halmos-leveraging-existing-tests-for-formal-verification/). In this blog post we introduce symbolic execution and explore the existing bytecode-level symbolic execution tools for Ethereum smart contracts. We discuss our experiments with some of these tools and highlight their strengths and limitations. We also conveniently provide shell commands for you to reproduce every experiment!  *This post is the first of two on the topic of symbolic analysis of smart contracts. The link to part 2 covering **symbolic testing** will be here once it's ready, so check back later! We publish a small smart contract benchmark in this post, and an extended version is to come in the later one. For now, enjoy part 1!* ## What is symbolic execution? If you know the answer to that question, feel free to skip [to the next section](#What-tools-are-there) :) ### Definitions *Symbolic execution* is a program analysis technique which explores multiple execution paths at the same time. If a program is run *concretely*, i.e., on a specific *concrete* input, a single control flow path of this program is explored—that is what happens during the execution of a unit test. Similarly, fuzzing concretely executes a program on a set of random (yet, concrete) inputs. In contrast to these techniques, symbolic execution assigns *symbolic*—rather than concrete—values to input variables that determine which paths should be be executed. A symbolic value represents an arbitrary value that can be assigned to a variable; in other words, a symbolic value means that a variable can take *any* value that a variable of its type can have (modulo constraints imposed on this variable by the program code leading to the current execution point). ```solidity if (x > 5) { // symbolic variable x is _any_ integer > 5 } else { // symbolic variable x is _any_ integer <= 5 } ``` Symbolic execution can thus be viewed as a way to generalize testing. It is commonly used to check whether a certain property *can* be violated by any program execution, i.e., if a certain vulnerable state is *reachable*. ```solidity if (balance >= 0) { ... } else { // this should be unreachable // but is it?... } ``` To determine reachability, symbolic execution engines typically rely on an off-the-shelf [satisfiability modulo theories](https://en.wikipedia.org/wiki/Satisfiability_modulo_theories) (SMT) solver, such as [Z3](https://github.com/Z3Prover/z3). Reachability is equivalent to the feasibility of a certain execution path. Execution paths are defined by a series of conditions in the execution flow (e.g., conditions contained in the `if`-statements). These path conditions are converted into a logical formula, which is then given to an SMT solver to check its *satisfiability*: whether this combination of conditions can theoretically evaluate to `true`. An SMT solver either finds a satisfying concrete assignment to symbolic variables, thus demonstrating path feasibility (e.g., the output might be along the lines of "this path is feasible if `i == 6 && j == 17`"); or proves that a formula is unsatisfiable, indicating that no satisfying assignment exists and the path is infeasible. Generating concrete program inputs that will cause specific paths to execute can be used to automatically generate test cases.  ### Symbolic Execution by Example Consider the following smart contract. It was slightly modified from the code included in an [issue](https://github.com/foundry-rs/foundry/issues/2851) from the Foundry repository titled `condition not caught by invariant/fuzzer`. Conditions that should always hold are often explicitly specifed in the code via assertions. In this particular code sample, we're interested in the assert in line 15. We wrap the body of a function in an `unchecked` block to ignore a possible underflow in line 8 (so that we focus on a failing assert detection). ```solidity= pragma solidity ^0.8.17; contract PostExample { // https://github.com/foundry-rs/foundry/issues/2851 function backdoor(uint256 x) external pure { uint256 number = 99; unchecked { uint256 z = x - 1; if (z == 6912213124124531) { number = 0; } else { number = 1; } } assert(number != 0); } } ``` The contract `PostExample` contains a function `backdoor` that is called with an argument `uint256 x`. At the beginning of the function, the variable `number` is initialized with its value set to 99 (line 6). Line 8 initializes another variable `z` that is equal to `x - 1`, where `x` is the input parameter. Depending on whether `z` is equal to`6912213124124531` (line 9), `number` is set either to 0 (line 10) or to 1 (line 12). Finally, the assert statement in line 15 is used to check whether `number` is not equal to 0 (i.e., if the *true* branch--line 10--is executed, the assert gets violated).  As the title of the issue and the screenshot below suggest, Foundry's fuzzer cannot identify the input that triggers the execution of the *true* branch of the conditional statement in line 9 even if we increase the number of runs:  ```solidity= import "forge-std/Test.sol"; import "../src/PostExample.sol"; contract PostExampleTest is Test { PostExample public example; function setUp() public { example = new PostExample(); } function testBackdoor(uint256 x) public view { example.backdoor(x); } } ``` ![](https://i.imgur.com/5mvqmvO.png) Symbolic execution, on the other hand, is particularly useful when it comes to the identification of a vulnerable state that might be hard to reach otherwise, e.g., using testing or black-box fuzzing. Let's see how the violation of `assert(number != 0)` in line 15 could have been identified using symbolic execution. The following figure illustrates the process of symbolically executing `backdoor` statement by statement. ![](https://i.imgur.com/WhR7DWW.png)  For each explored control flow path of a program, a symbolic execution *engine* maintains (1) a *path condition*—a [first-order logic](https://web.stanford.edu/class/archive/cs/cs103/cs103.1202/lectures/04/Small04.pdf) formula over variable values that captures the conditions satisfied by the branches taken along that path, and (2) a *symbolic memory store* that maps variables to symbolic expressions or values computed along that path. An execution of a branch statement updates the path condition, while an assignment updates the store. In the initial state, the path constraints $\pi$ are *true* and the input parameter `x` is assigned a symbolic (i.e., arbitary) value 🤷‍♀️. When `number` is initialized in line 6, the symbolic store $\sigma$ is updated by associating a corresponding variable $number$ with a concrete value of $99$. The symbolic store is then updated again when the value of `z` is set to `x - 1`, where `x` is an input parameter and, therefore, a symbolic variable. A conditional branch in line 9 forks further execution into two subtrees. When a *true* branch is taken, the constraint from line 9 (in symbolic terms) is added to the path conditions, which become $\pi =$ 🤷‍♀️ $- 1==6912213124124531$. When a *false* branch is taken, the *negation* of a branch condition is added to path conditions--for example, if `z == 6912213...` does not hold, the path condition is updated as follows: $\pi = \neg($ 🤷‍♀️ $- 1==6912213124124531)$. Depending on the branch taken, lines 10 and 12 assign variable `number` the value of 0 or 1, respectively, and, therefore, update the symbolic state. The violation of an assert is equivalent to satisfiability of a logical formula constructed as a conjunction of a path condition and a negation of the asserted expression (shown at the very bottom of the illustration). After unrolling both execution traces of `backdoor`, we can conclude that one trace does reach a state where this formula is satisfiable. The assertion can be violated if line 10 (setting `number` to 0) gets executed, in other words, if the condition in line 9 (`z == 6912213124124531`) is satisfied. Since the value of `z` is derived from the input parameter `x` as `z = x - 1`, an SMT solver used to determine satisfiability of path constraints can also suggest concrete input values corresponding to this execution path:`x = 6912213124124531 + 1 = 6912213124124532`. The output produced by a symbolic execution tool (corresponding to a counterexample leading to the assert violation) usually looks like this: `postExample.backdoor(6912213124124532)`. As an illustration of the symbolic execution process, please watch Nicolas "Symbolic Execution" Cage forking himself at every branching point to explore all possible execution paths and identify a failed assertion in a 2007 movie ["Next"](https://www.wikiwand.com/en/Next_(2007_film)). <video autoplay loop muted playsinline src="https://bazzilic.me/files/cage-small.mp4" style="width: 100%" type="video/mp4"></video> Here are a couple of links to our previous, more detailed writeup on symbolic execution and some exercises to get a better grasp on what is going on: - [Tutorial](https://github.com/WilfredTA/formal-methods-curriculum/blob/master/courses/2_Approaches_Modeling_Verification/content/1_Symbolic_Execution/symbolic_execution.md) - [Exercise 1—Practical Symbolic Execution Using Manticore](https://github.com/WilfredTA/formal-methods-curriculum/blob/master/courses/2_Approaches_Modeling_Verification/exercises/1_Symbolic_Execution/exercise_1.md) - [Exercise 2—Developing Symbolic Execution Tools for EVM Using Z3](https://github.com/WilfredTA/formal-methods-curriculum/blob/master/courses/2_Approaches_Modeling_Verification/exercises/1_Symbolic_Execution/exercise_2.md) ### Challenges in Symbolic Execution Although useful for various analyses—as will be seen in the remainder of this post—exhaustive exploration of all possible execution paths performed by symbolic execution comes with certain performance and scalability constraints (duh!). The challenges include: - Handling state space explosion (the number of states grows exponentially in the number of system components, and exploring smart contracts with, e.g., loops and recursive calls might be infeasible) - Constraint solving (certain types of constraints pose a difficulty to SMT solvers—these include non-linear arithmetic expressions which are common in DeFi smart contracts, e.g., the Uniswap invariant; or hashing of any kind) - Interaction with the environment (e.g., calling other smart contracts and accounting for possible side effects) - Modeling memory and storage (which can be fully symbolic, partially symbolic, or concretized) ## What tools are there? Symbolic execution has been widely used to implement vulnerability detection tools for smart contracts, with examples including [Pakala](https://github.com/palkeo/pakala), [Oyente](https://github.com/enzymefinance/oyente), [Osiris](https://github.com/christoftorres/Osiris), [HoneyBadger](https://github.com/christoftorres/HoneyBadger), [Maian](https://github.com/ivicanikolicsg/MAIAN), [teEther](https://github.com/nescio007/teether). However, most of these tools are unmaintained, support a limited subset of EVM, are limited to Z3 as a solver, or have other limitations or drawbacks. After a high-level analysis and exploratory experiments, we selected the following four open-source EVM-level symbolic execution tools for comprehensive empirical analysis: [Manticore](https://github.com/trailofbits/manticore) A symbolic execution tool that is developed by Trail of Bits. Manticore is implemented in Python and supports several SMT solvers (Z3, Yices2, Boolector, cvc4). Manticore has several interfaces: it can be used as a CLI tool, through its Python API, or using `manticore-verifier`---a property-based symbolic execution tool. Besides vulnerability detection, Manticore automatically generates test cases corresponding to the execution paths identified through symbolic execution. [Mythril](https://github.com/ConsenSys/mythril) A security analysis tool for EVM bytecode that performs vulnerability detection using symbolic execution. Mythril is developed by ConsenSys. It's implemented in Python and uses Z3 as an SMT solver through its Python API.  [hevm](https://github.com/ethereum/hevm) A symbolic execution engine and equivalence checker for EVM bytecode. Initially developed as part of the [dapptools](https://github.com/dapphub/dapptools) framework, hevm has been forked by Ethereum Foundation and is currently under active development---we are using the newer Ethereum Foundation fork for our experiments. hevm is implemented in Haskell and supports Z3 and cvc5 as solvers. [EthBMC](https://github.com/RUB-SysSec/EthBMC) A [bounded model checker](https://en.wikipedia.org/wiki/Model_checking#Techniques) for EVM bytecode based on symbolic execution developed in Ruhr-University Bochum. The only academic tool that made it into our list, EthBMC has been proposed in an academic [paper](https://www.usenix.org/system/files/sec20-frank.pdf) presented at Usenix Security, 2020. A distinctive feature of EthBMC is its support of concrete validation of the identified counterexamples using `geth`. EthBMC is implemented in Rust and supports several SMT solvers (Z3, Yices2, Boolector). Two other relevant tools include [Halmos](https://github.com/a16z/halmos/), a new tool by a16z; and [KEVM](https://github.com/runtimeverification/evm-semantics), which is built on top of the [K Framework](https://kframework.org/). Both tools did not make it into this blog post for various reasons, but both perform *symbolic testing*, which is a topic for our next post, so stay tuned ;) ## What can they do? Let's look at how these four tools can be used to analyze Ethereum smart contracts. All experiments described in this post were performed on a MacBook Pro with an M1 Max chip and 64 GB of memory. When run in Docker (as suggested in the "follow along" snippets), you may see degraded performance depending on the amount of resources allocated to your containers. ### Setup  If you want to follow along, you can pull pre-built Docker images using the following commands: ```shell docker pull baolean/manticore:latest docker pull baolean/mythril:latest docker pull baolean/ethbmc:latest docker pull baolean/hevm:latest ``` You can find these container images in [this Docker Hub registry](https://hub.docker.com/u/baolean). The smart contracts used for the evaluation are available in [this repository](https://github.com/baolean/symexec-bench/), you can clone it using this command: ```shell git clone https://github.com/baolean/symexec-bench.git && \ cd symexec-bench && \ git submodule init && git submodule update ``` Now you can run the Docker containers with the cloned repo mounted in `/examples` directory in the container. Assuming that your current directory (`pwd`) is the cloned repo (you are in the `symexec-bench` folder), the command would look something like this: ```shell docker run -it --rm --platform linux/amd64 \ -v $(pwd):/examples baolean/hevm:latest ``` Try running `hevm version` to verify that everything works. To exit the container, type `exit` and press enter. Thanks to `--rm` flag, the container (not the image!) will self-delete upon exit, so you don't need to cleanup afterwards. Detailed steps to install all the tools that we examine in this post can be found in the Dockerfiles: [Manticore](https://github.com/baolean/manticore/blob/master/Dockerfile), [Mythril](https://github.com/baolean/mythril/blob/develop/Dockerfile), [EthBMC](https://github.com/baolean/EthBMC/blob/master/Dockerfile), and [hevm](https://github.com/baolean/hevm/blob/main/Dockerfile). If you prefer, you can build the images from scratch by running ```shell docker build -t IMAGE_NAME . ``` in the respective cloned repository, or just install the tools directly on your machine.  ### Simple Example First, let's compare the selected four tools on our [`PostExample` smart contract](https://github.com/baolean/symexec-bench/blob/main/PostExample/PostExample.sol) from the introduction. Unless specified otherwise, the tools here are run in their default configuration---for vulnerability detection tools, such as Mythril, Manticore, EthBMC, this implies that all default detectors are enabled. In this post, however, we mainly compare the tools based on their ability to identify a potentially failing `assert` statement, which is commonly used to denote the condition that should always hold. Failed assert detection is supported by all analyzed tools except EthBMC, for which we have [added it ourselves](https://github.com/baolean/EthBMC/commit/7cc533b2b16af0eeb1a53c0276632c8ec42a1748); in addition, we had to update the fork of Ethereum used by `geth` to a newer one (London). Manticore identifies failed asserts based on the `INVALID` opcode, so we set the version of Solidity to v0.7.6 in experiments with Manticore. #### Symbolic Execution with Z3 Since all four tools support Z3 as an SMT solver, let's run all of them with Z3 as a solver as the first step. The analysis done by symbolic execution tools is typically performed by issuing one or more symbolic transactions, i.e., transactions with symbolic calldata. Since the failed assert can be detected based on one function call (with proper input), here we limit the symbolic analysis performed by all tools to one transaction. **Mythril** performs symbolic execution through its `myth a` command, which takes a Solidity file as input, and in 2.7s identifies a failed assertion as well as the input that causes it (`backdoor(6912213124124532)`). The output shown on the screenshot below demonstrates the identified counterexample as well as some supplementary information related to the number of queries sent to an SMT solver and the time spent on these queries (76 queries processed in 0.31 seconds). Z3 is the only solver supported by Mythril, so it is used by default. ![](https://i.imgur.com/bdoPqAH.png) To follow along, enter the container by running: ```shell docker run -it --rm --platform linux/amd64 \ -v $(pwd):/examples baolean/mythril:latest ``` In the container, run: ```shell myth a examples/PostExample/PostExample.sol -t1 ``` **Manticore**'s CLI can also take a Solidity file as input, however, as mentioned above, the version of Solidity should be [downgraded to 0.7.6](https://github.com/baolean/symexec-bench/blob/main/PostExample/PostExample_0.7.sol). The `--smt.solver z3` flag ensures that Manticore will use Z3 as an SMT solver. Manticore should also be run with the `--thorough-mode` flag for it to report potential vulnerabilities (corresponding to `WARNING: INVALID instruction` in the screenshot shown below). The time used by an SMT solver is also available in the output (0.13 seconds). ![](https://i.imgur.com/LUTIz4n.png) Using Docker, it can be reproduced as: ``` docker run -it --rm --platform linux/amd64 \ -v $(pwd):/examples baolean/manticore:latest manticore examples/PostExample/PostExample_0.7.sol \ --smt.solver z3 --thorough-mode --txlimit 1 ``` Manticore also generates test cases for each possible execution path, as well as some other information related to the analysis (code coverage, etc)---all in 1.7 seconds! Exploring one of the generated files shows the identified counterexample: ![](https://i.imgur.com/4S9Cfss.png) *Other benefits of using Manticore include the flexibility and customization of the analysis, e.g., different gas models, turning reasoning about symbolic balances on and off, etc. The information about the corresponding flags is available through `manticore --help`.* Different from Manticore and Mythril that can operate on a Solidity file, the following two tools---hevm and EthBMC---perform the analysis based on the runtime bytecode of a smart contract. The runtime binary can be generated by the Solidity compiler `solc` using the command `solc --bin-runtime examples/PostExample/PostExample.sol`: ![](https://i.imgur.com/n8PbnXP.png) **hevm** symbolically analyzes the binary of a smart contract when invoked via `hevm symbolic`. Z3 is used in `hevm` by default, but we can explicitly instruct `hevm` to use Z3 using `--solver z3`:  ![](https://i.imgur.com/cBLwk7y.png) To reproduce: ``` docker run -it --rm --platform linux/amd64 \ -v $(pwd):/examples baolean/hevm:latest hevm symbolic --code \ $(<examples/PostExample/PostExample.bin-runtime) ``` The execution takes less than a second (and even less than 0.1s!). However, the output requires additional decoding---the decoded output from the screenshot is indeed `backdoor(6912213124124532)`, which can be verified using one of the available [calldata decoders](https://tools.deth.net/calldata-decoder). This is left to the reader as an exercise; to do the decoding, use the ABI generated by the compiler (`solc --abi examples/PostExample/PostExample.sol`) and the generated calldata. :::spoiler Calldata `0x99d7cd3400000000000000000000000000000000000000000000000000188e9f07e00f74` ::: :::spoiler PostExample ABI `[{"inputs":[{"internalType":"uint256","name":"x","type":"uint256"}],"name":"backdoor","outputs":[],"stateMutability":"pure","type":"function"}]` ::: <br/> **EthBMC** accepts the input in an even more specific format---it requires a user to supply a [configuration YAML file](https://github.com/RUB-SysSec/EthBMC#yaml-format), which includes the addresses of all contracts involved in the analysis, their balances, runtime binary, and nonce. The initial values of storage variables can also be provided in this file---otherwise, they are assumed to have default values for their type (the constructor of a smart contract is not executed or analyzed, since it is not included in the runtime bytecode, which is also true for hevm). ![](https://i.imgur.com/39r8EC0.png) When provided with such input, EthBMC outputs an identified counterexample (that also has to be decoded) in 4.3s. As mentioned above, EthBMC concretely validates the identified counterexample against smart contracts locally deployed on `geth`, which helps avoid false positives (concrete validation can be turned off by the `--no-verify` flag). EthBMC uses Z3 when run with the `--solver z3` flag. ![](https://i.imgur.com/ws8xKyn.jpg) To reproduce: ``` docker run -it --rm --platform linux/amd64 \ -v $(pwd):/examples baolean/ethbmc:latest ethbmc examples/PostExample/PostExample.yml --solver z3 ```   Our initial evaluation demonstrates that all four analyzed tools can successfully identify a failed assert in our examples within a few seconds---which is very good! The results averaged across five runs using Z3 as a solver are as follows: | Tool | Language | Solvers | Time (Z3) | --------- | -------- | ------- | -------- | | Mythril | Python | Z3 | 2.8 sec | Manticore | Python | Z3, cvc4, Boolector, Yices2 | 1.6 sec | hevm | Haskell | Z3, cvc5 | 0.1 sec | EthBMC | Rust | Z3, Boolector, Yices2 | 3.6 sec **Different Solvers** While the results of the performed analysis look very convincing, let's see if we can improve them even further by using SMT solvers other than Z3, since the performance may vary from solver to solver when presented with different tasks.  Indeed: running **EthBMC** with Yices2 as a solver makes the analysis complete in 0.19s, which is significantly less that 3.6s it took with Z3! ``` ethbmc examples/PostExample/PostExample.yml \ --solver yice ``` Boolector also performs better than Z3, making the analysis finish in 0.3s: ``` ethbmc examples/PostExample/PostExample.yml \ --solver boolector ``` **Manticore** takes 2s when run with Yices2: ``` manticore examples/PostExample/PostExample_0.7.sol \ --smt.solver yices --thorough-mode --txlimit 1 ``` 1.6s with Boolector: ``` manticore examples/PostExample/PostExample_0.7.sol \ --smt.solver boolector --thorough-mode --txlimit 1 ``` While cvc4 takes 4.9s *(which might be attributed to the fact that I run it on M1 through Rosetta)*: ``` manticore examples/PostExample/PostExample_0.7.sol \ --smt.solver cvc4 --thorough-mode --txlimit 1 ``` The performance of **hevm** with cvc5 is similar to the one hevm shows with Z3, taking around 0.08 seconds to finish the analysis. To reproduce: ```shell hevm symbolic --code \ $(<examples/PostExample/PostExample.bin-runtime) \ --solver cvc5 ``` The averaged results can be summarized as follows (the best performing solver is in bold, where applicable): | Tool | Language | Solvers | Time (Z3) | Best time | | -------- | -------- | -------- | -------- |-------- | | Mythril | Python | **Z3** | 2.6 sec | 2.6 sec | Manticore | Python | Z3, cvc4, **Boolector**, Yices2 | 1.6 sec | 1.6 sec | hevm | Haskell | Z3, **cvc5** | 0.1 sec | 0.08 sec | EthBMC | Rust | Z3, Boolector, **Yices2** | 3.6 sec | 0.19 sec ### Simple Example with Two Transactions The example we were using so far demonstrates the advantages of using symbolic execution---however, it is also quite a trivial one. In many cases, it takes more than one transaction to get to the vulnerable state. Let's slightly modify our example contract so that it has to be "activated" via calling `setLive(true)` before the `backdoor` function can be invoked without being reverted. ```solidity= pragma solidity ^0.8.17; contract PostExample { bool live; modifier isLive() { require(live); _; } function setLive(bool _live) external { live = _live; } // https://github.com/foundry-rs/foundry/issues/2851 function backdoor(uint256 x) external view isLive { uint256 number = 99; unchecked { uint256 z = x - 1; if (z == 6912213124124531) { number = 0; } else { number = 1; } } assert(number != 0); } } ``` Most of the tools can identify the two transactions leading to the assert violation, although it understandably takes almost twice as long. For example, **Mythril** correctly identifies the sequence of two function calls: `setLive(true)` followed by `backdoor(691221...)` in 5.4 seconds: ![](https://i.imgur.com/QVclaG9.png) To reproduce: ``` myth a examples/PostExample/PostExample_2tx.sol -t2 ``` **Manticore** takes 3.2 seconds to identify a violation and generate test cases: ![](https://i.imgur.com/QTcGYYp.jpg) To reproduce: ``` manticore examples/PostExample/PostExample_2tx_0.7.sol \ --smt.solver boolector --thorough-mode --txlimit 2 ``` **EthBMC** finds the vulnerable function call sequence in just 0.5 seconds: ![](https://i.imgur.com/6NkWNQl.png) To reproduce: ``` ethbmc examples/PostExample/PostExample_2tx.yml \ --solver yice ``` #### Symbolic Storage **hevm** however, detects a failed assert in only one transaction (and, still, in less than 0.1 seconds). Now, however, section `Storage` is added to the output, indicating that in slot `0x0` (corresponding to the `live` variable) the value is supposed to be `1` (i.e., `true`) in order for the violation to occur. This behavior corresponds to having so-called *symbolic storage*, which is used in hevm by default. With symbolic storage, the default value returned by`SLOAD` is assumed to be a symbolic value without other constraints (instead of it being equal to the default value for this type, e.g., `false` for a Boolean or `0` for a uint). In this mode, the tool performs under-constrained analysis which is likely to generate false positives corresponding to unreachable states. However, it is useful in some verification scenarios---similar to the one we're using---as it can reduce the number of function calls to be explored, i.e., in this example we can avoid having to call `setLive(true)` first. In addition to symbolic storage, `hevm` supports other storage models, which can be configured using the `--storage-model [ConcreteS, SymbolicS, InitialS]` flag). hevm currently limits the analysis to a single transaction, therefore, it cannot identify the violation when using a "concrete" storage model (`--storage-model InitialS`). ![](https://i.imgur.com/12AqPUu.png) To reproduce: ``` hevm symbolic --code \ $(<examples/PostExample/postExample_2tx.bin-runtime) hevm symbolic --code \ $(<examples/PostExample/postExample_2tx.bin-runtime) \ --storage-model InitialS ``` In **Mythril**, symbolic storage is enabled via the `--unconstrained-storage` flag. The analysis exploring one symbolic transaction performed in this mode successfully finds a violation; it completes faster (3.4 sec) than the previously discussed two-transaction analysis (5.4 sec), but is slower than the one-transaction analysis performed in the previous section (1.6 sec). You may also notice that the query count is higher now. ![](https://i.imgur.com/uO24ZDL.png) To reproduce: ``` myth a examples/PostExample/PostExample_2tx.sol \ -t1 --unconstrained-storage ``` We've added symbolic storage to both Manticore ([see here](https://github.com/baolean/manticore/commit/dce7c4abfc2cde3409d51ac586431df2937e9a59)) and EthBMC ([see here](https://github.com/baolean/EthBMC/commit/d103beef94c91d7e185a73751e9f2a916244fe0f)). In Manticore, it could be achieved by simply changing the default storage value from zero to a symbolic value; in EthBMC, it took significantly more effort---in particular, due to the need to concretize the symbolic values corresponding to both symbolic storage locations and the values stored there in order to enable `geth`-based concrete validation; perhaps, a topic for another blog post. When exploring one transaction with symbolic storage enabled, **Manticore** (`--evm.symbolic_storage true`) and **EthBMC** (`--symbolic-storage`) identify the violation in 2.2 and 0.2 seconds, respectively: ![](https://i.imgur.com/o5X1SLy.jpg) To reproduce: ```shell manticore examples/PostExample/PostExample_2tx_0.7.sol \ --smt.solver boolector --thorough-mode \ --txlimit 1 --evm.symbolic_storage True ``` Similarly to hevm, section `Concretized storage` in the output produced by EthBMC shows the concretized locations and value of storage variables that enable the identified attack: ![](https://i.imgur.com/ibpUCu8.jpg) To reproduce: ``` ethbmc examples/PostExample/PostExample_2tx.yml \ --solver yice --symbolic-storage ``` | Tool | Time (Z3) | Best time | Best: 2 txns | Best: 2txns + SS | | -------- | -------- |-------- | -------- | -------- | | Mythril | 2.6 sec | 2.6 sec | 5.1 sec | 3.4 sec | Manticore | 1.6 sec | 1.6 sec | 3.2 sec | 2.2 sec | hevm | 0.1 sec | 0.08 sec | x | 0.08 sec | EthBMC | 3.6 sec | 0.19 sec | 0.5 sec | 0.2 sec *SS stands for Symbolic Storage.* ### Examples from [Echidna VS Forge](https://github.com/clabby/echidna-vs-forge) Repository We can further test the analyzed symbolic execution tools by running them against the examples from the [repository by clabby](https://github.com/clabby/echidna-vs-forge). Intially created to examine the performance of Echidna and Foundry fuzzers, the repository contains two smart contracts: [Foo](https://github.com/clabby/echidna-vs-forge/blob/main/src/Foo.sol) (having `uint256` variables as function inputs) and [FooBytes](https://github.com/clabby/echidna-vs-forge/blob/main/src/FooBytes.sol) (having `bytes32` variables as function inputs). The objective is to find two inputs equal to the magic number `0x69` to supply to contract's `foo(uint256/bytes32)` and `bar(uint256/bytes32)` functions. If both of these functions are called with an input of `0x69`, another function `revertIfCracked()` will revert. To test the analyzed tools, we have [slightly modified](https://github.com/baolean/echidna-vs-forge/blob/main/symexec/Foo_assert.sol) the contracts to use the `assert` Solidity statement in `revertIfCracked()`, since the tools only report the violation if they identify a specific error code in the execution trace. **Mythril** can find the three transactions leading to the violation in both smart contracts in 11 seconds. To detect the issue, Mythril should be called with the `-t3` flag, increasing the limit of the analyzed transactions---otherwise, the analysis will complete without detecting the issue after only two symbolic transactions. ![](https://i.imgur.com/nXizJcT.png) The decimal number 105 suggested as an argument to `foo` and ` bar` corresponds to 0x69 in hexadecimal, which can be verified in a [converter](https://www.rapidtables.com/convert/number/decimal-to-hex.html). To reproduce: ```shell myth a examples/echidna-vs-forge/symexec/Foo_assert.sol -t3 ```  **EthBMC** identifies two possible counterexamples (with two different orderings of `foo()` and `bar()`) in 1.2 sec: ![](https://i.imgur.com/tZd32XG.png) ``` ethbmc examples/echidna-vs-forge/symexec/Foo.yml \ --solver yice ``` Interestingly, even when instructed to analyze 3 transactions using the corresponding flag (`--txlimit 3`), **Manticore** finished the analysis after exploring only two transactions without uncovering the violation. After investigation, we realized that the third transaction is not analyzed because no new coverage is identified after the second one---[commenting this out in the code](https://github.com/baolean/manticore/commit/e5dea21b288e663af970f6a0e0f8c4c88ac5a1ef) makes Manticore find the issue in 43 seconds. Since the significant amount of time is spent on writing the generated test cases (this analysis of `Foo` generates 187 test cases!), [we added](https://github.com/baolean/manticore/commit/b9e19f7ad71224b39b234f34e9d3ad727683e28a) the `--only-invalid-testcases` flag that makes it generate only the test cases that correspond to the identified violations. The analysis of `Foo` with this flag completes in 29 seconds: ![](https://i.imgur.com/Prq2h80.png) To reproduce: ```shell manticore examples/echidna-vs-forge/symexec/Foo_assert_0.7.sol \ --thorough-mode --smt.solver boolector \ --only-invalid-testcases --txlimit 3 ``` **hevm** explores only one symbolic transaction---it, however, can identify a violation in under 0.1 sec by using symbolic storage (instead of setting the values of variables through calling the `foo` and `bar` functions): ![](https://i.imgur.com/Ov5refu.png) To reproduce: ``` hevm symbolic --code \ $(<examples/echidna-vs-forge/symexec/Foo.bin-runtime) ``` On the `FooBytes` smart contract that uses `bytes32` variables, all tools perform similarly to `Foo` smart contract that uses `uint256` variables (unlike fuzzers that perform differently depending on the type). | Tool | Time (Z3) | Best time | Best: 2 txns | Best: 2txns + SS | Foo[Bytes] | -------- | -------- |-------- | -------- | -------- | -------- | | Mythril | 2.6 sec | 2.6 sec | 5.1 sec | 3.4 sec | 11 sec | Manticore | 1.6 sec | 1.6 sec | 3.1 sec | 2.2 sec | 29 sec | hevm | 0.1 sec | 0.08 sec | x | 0.08 sec | 0.07 sec (SS) | EthBMC | 3.6 sec | 0.19 sec | 0.5 sec | 0.2 sec | 1.2 sec *`Foo` and `FooBytes` have similar time. SS stands for Symbolic Storage.* ### A More Realistic Example What happens if we run all these tools on slighly more complicated smart contract code? Let's consider a [simplified version](https://github.com/baolean/symexec-bench/blob/main/Vat/MiniVat.sol) of the [Vat](https://github.com/makerdao/xdomain-dss/blob/master/src/Vat.sol) smart contract, which is part of MakerDAO. This example is based on a [violation of the Fundamental DAI Equation invariant](https://hackmd.io/@SaferMaker/DAICertoraSurprise) identified using Certora Prover---another formal verification tool.  The (simplified) `MiniVat` contract contains four functions `init()`, `move(address, int256)`, `frob(int256)`, and `fold(int256)`. In addition to these, we encode the counterexample leading to the invariant violation in a separate function: ```solidity= function counterexample() external { init(); frob(10**18); fold(-10**27); init(); assert(debt == Art * rate); } ``` Therefore, the task for a symbolic execution tool here is to analyze the `counterexample()` function and report a violation of the assert statement in line 6 (the arguments of functions are concrete and *are* leading to the violation, so here we're simply testing the capabilities of the tools to analyze the more or less realistic bytecode). You'd think that any symbolic execution tool would immediately catch that---and you'd be surprised ;). As you will see, EthBMC is the only tool that maintains its high performance when used to analyze the `MiniVat` example (especially when using Boolector as a solver---the analysis is this case takes 0.8 sec; with Yices2 it takes 3.2 sec; with Z3---29 sec): ![](https://i.imgur.com/InfDPYL.png) To reproduce: ``` ethbmc examples/Vat/MiniVat.yml \ --solver boolector ``` Since EthBMC performs so well, we can ask it to identify a sequence of 2 function calls leading to the invariant violation, e.g., ```solidity function counterexample() public { init(); frob(10**18); fold(-10**27); init(); } function check_assert() public { assert(debt == Art * rate); } ``` This leads to a solver timeout (set to 120 seconds by default) on one of the queries, which, however, does not prevent EthBMC from identifying a vulnerable sequence of two transactions (`counterexample(); check_assert()`) in under 3 minutes when using Yices2 as a solver (interestingly, it is not the best-performing solver for a 1txn-MiniVat example). Experiments with increasing the timeout or modifying the contract may be an interesting exercise for the reader. ![](https://i.imgur.com/m3ikweg.png) To reproduce: ``` ethbmc examples/Vat/MiniVat_2tx.yml \ --solver yice ``` **Manticore** takes a long time (more than 10 minutes) to identify the violation in the `MiniVat` smart contract. However, the performance improves dramatically if we turn off vulnerability detectors for issues other than failing assert (by providing an `--exclude` flag). When generating only failing test cases (`--only-invalid-testcases`) and exploring only one transaction (`--txlimit 1`), Manticore identifies the assert violation in `MiniVat` in just 22 seconds! Please keep in mind that this approach only works if you know the vulnerability you are looking for---which is unfortunately rarely the case. ![](https://i.imgur.com/g7FwebE.png) To reproduce: ```shell manticore examples/Vat/MiniVat_0.7.sol --thorough-mode \ --txlimit 1 --only-invalid-testcases \ --exclude 'overflow,uninitialized-storage,uninitialized-memory,'` `'reentrancy,reentrancy-adv,unused-return,suicidal,'` `'delegatecall,ext-call-leak,env-instr,lockdrop' ``` Interestingly, when run on the 2-transaction example (`counterexample(); check_assert()`), Manticore successfully identifies the assert violation we're looking for in 16 minutes. However, it also reports two false positives that occur due to the "unsound symbolication" strategy used in Manticore to handle hashes, which incorrectly assumes there is a possible hash collision.  These false positives are not produced when running the same analysis with a different hash strategy (`--evm.sha3 concretize`)---the execution in this case completes in a bit over 6 minutes. ![](https://i.imgur.com/LRk0AnX.png) To reproduce: ```shell manticore examples/Vat/MiniVat_2tx_0.7.sol --thorough-mode \ --only-invalid-testcases \ --exclude 'overflow,uninitialized-storage,uninitialized-memory,'` `'reentrancy,reentrancy-adv,unused-return,suicidal,'` `'delegatecall,ext-call-leak,env-instr,lockdrop' \ --evm.sha3 concretize ``` Since our experiments with other tools show that they can't keep up with the performance of EthBMC and Manticore on `MiniVat`, to make it easier for them, for the rest of the section by default we're using a [modified version of `MiniVat`](https://github.com/baolean/symexec-bench/blob/main/Vat/MiniVat_internal.sol), in which `counterexample()` is the only `external` function, i.e., the only function that *can* and should be analyzed in the contract (the other four functions are made `internal`). When run on this version of `MiniVat`, **hevm** finds a failed assert in a bit over 2 minutes if Z3 solver is used and the storage is assumed to be *concrete* (*we did encounter an [issue](https://github.com/ethereum/hevm/issues/223) with an earlier version of hevm, which was very promptly addressed*). ![](https://i.imgur.com/pbYVx6c.png) To reproduce: ``` hevm symbolic --code \ $(<examples/Vat/MiniVat_internal.bin-runtime) \ --storage-model InitialS ``` However, when cvc5 is used as a solver (`--solver cvc5`), the counterexample is not identified and the following error is displayed: `Could not determine reachability of the following end states: ...`. The analysis of the original version of `MiniVat` with all functions having `public` or `external` visibility completes in approximately 5m30s. To reproduce: ``` hevm symbolic --code \ $(<examples/Vat/MiniVat.bin-runtime) \ --storage-model InitialS ``` **Mythril** takes 2m19s to explore one symbolic transaction of the simplified `MiniVat` version (with internal functions) and does not report any violations. 125 seconds out of 140 are spent on constraint solving, which suggests there might be an SMT solver timeout, which prevents Mythril from completing the analysis: ![](https://i.imgur.com/z15CqMi.png) To reproduce: ```shell myth a examples/Vat/MiniVat_internal.sol -t1 ``` As expected, if we increase the solver timeout, the analysis will not finish in 2.5 minutes (*note that the [warning message](https://github.com/baolean/mythril/blob/d7256d98a9bb206867145266a7451809f4ea53d6/mythril/support/model.py#L80) related to the timeout is not displayed under the default log level*). To the contrary, the analysis does not complete even in 30 minutes before being interrupted (notice that 1814 seconds out of 1825 are spent on constraint solving): ![](https://i.imgur.com/Dy26PZ7.png) To reproduce: ```shell myth a examples/Vat/MiniVat_internal.sol -t1 \ --solver-timeout 10000000 ``` #### Delayed Constraint Solving We can, however, further improve the performance of Mythril. First, lets restrict the types of vulnerabilities detected to failed asserts (`-m Exceptions`), similarly to what we did in Manticore. In addition, we can also use the `--sparse-pruning 0` flag, which turns on delayed constraint evaluation. The same feature is used in hevm [by default](https://hevm.dev/symbolic.html) and seems to be available in Manticore's [experimental branch](https://github.com/trailofbits/manticore/tree/dev-evm-experiments), which was used during the [Uniswap V3 audit](https://github.com/Uniswap/v3-core/blob/main/audits/tob/README.md#verification-with-manticore). When performing delayed (or lazy) constraint evaluation, path reachability is not checked at every `JUMPI`---instead, solving the reachability queries is delayed, e.g., until the end of transaction. While effective, this approach may cause issues in the presence of loops (that will be analyzed infinitely); therefore, in Mythril, the frequency of reachability checking is controlled by the argument supplied with `--sparse-pruning`, while `--ask-smt-iterations` in hevm is used to specify the number of loop iterations to be explored before reachability checking. The (optimized) analysis performed with the described parameters by Mythril takes only 1.5 minutes to find the counterexample: ![](https://i.imgur.com/VKYQSFZ.png) To reproduce: ```shell myth a examples/Vat/MiniVat_internal.sol -t1 \ --solver-timeout 10000000 --pruning-factor 0 \ -m Exceptions ``` We hope these short examples illustrate the *splendeurs et misères* of the symbolic execution tools available for Ethereum smart contracts! Deeply insightful with a right set of flags, they time out and fail in inexperienced hands. A lot of our work on symbolic execution at MakerDAO is aimed at simplifying and democratizing symbolic execution tools for a wider audience of smart contract developers who are not necessarily experts on formal verification. ## tldr; All four tools selected for evaluation (Mythril, Manticore, hevm, EthBMC) support basic EVM and symbolic execution features including hashing and inter-contract analysis; they also perform basic query optimizations such as, e.g., constant folding (evaluating constant expressions before sending them in a query to a solver). Python-based tools such as Mythril and Manticore tend to have better UX and offer good flexibility and configurability of the performed analysis. hevm and EthBMC, on the other hand, are often faster. Manticore, hevm, and EthBMC support multiple solvers, which is useful in practice as different solvers have their strengths and weaknesses. More details on the pros and cons of each of these tools are discussed below:    #### Mythril *Upsides:* - Actively maintained by ConsenSys - Good UX, highly configurable, e.g., supports several path exploration strategies - Supports multiple optimizations and features including lazy constraint solving and symbolic storage  *Downsides:* - Slower than hevm and EthBMC - Solver support is limited to Z3      #### Manticore *Upsides:* - Maintained by Trail of Bits - Has several APIs - Supports multiple solvers - Highly configurable: supports different strategies for metering gas, modeling ETH balances, and handling hashes  *Downsides:* - Often slower than hevm and EthBMC - Does not fully support some of the optimizations and features, e.g., symbolic storage (although it's easy to add) and lazy constraint evaluation (although it's implemented in an experimental branch that was used during the [Uniswap V3 audit](https://github.com/Uniswap/v3-core/blob/main/audits/tob/README.md#verification-with-manticore))    #### EthBMC *Upsides:* - Fast - Supports multiple solvers - Written in Rust (a Foundry-compatible language, which would allow for tighter integration, which might be useful, e.g., for [concolic execution](https://en.wikipedia.org//wiki/Concolic_testing))   *Downsides:* - Not maintained since July 2021, misses several hard-fork updates (e.g., newer opcodes) - Suboptimal UX: input should be supplied as a YAML configuration file, output calldata should be decoded - Missing some features: symbolic storage and failed assert detection was added by us  #### hevm *Upsides:* - Fast - Integrated with `dapptools`, supports symbolic testing - Supports symbolic storage mode, multiple SMT solvers  *Downsides:* - Under active development - Suboptimal UX: input should be supplied as a runtime binary, output calldata should be decoded - Does not explore transaction sequences longer than one ### Summary To conclude, symbolic execution is a promising technique when it comes to code security, with our experiments proving that it can find issues that some other techniques cannot. It is very exciting that there is active development in this area from many teams. At the same time, in their current state, symbolic execution tools often are fragile, cranky, and wayward. To get desired results and even get them to work at all, you often need to know their internals in and out. This post is the first in a series that documents our efforts on making symbolic execution more practical and accessible for wider audiences. ### Promising Directions Promising directions in symbolic execution for smart contracts include *concolic execution*---a combination of concrete testing (or fuzzing) and symbolic execution, which might help overcome both the scalability challenges of pure symbolic execution and the incomprehensiveness of random testing; and *symbolic testing*---symbolic execution of parameter-based tests that is performed by hevm through its integration with dapptools and by [Halmos](https://github.com/a16z/halmos). Based on our results, we decided to further experiment with EthBMC by integrating it with Foundry to enable symbolic testing as well. While symbolic testing is a topic of the next blog post, here is a small snippet of what this integration can do: <video autoplay controls muted playsinline src="https://palinatolma.ch/assets/ethbmc_forge.mov" style="width: 100%" type="video/mp4"></video> ## Other Resources As mentioned in the introduction, we also prepared a tutorial and two exercises on symbolic execution for smart contracts: - [Tutorial](https://github.com/WilfredTA/formal-methods-curriculum/blob/master/courses/2_Approaches_Modeling_Verification/content/1_Symbolic_Execution/symbolic_execution.md) - [Exercise 1—Practical Symbolic Execution Using Manticore](https://github.com/WilfredTA/formal-methods-curriculum/blob/master/courses/2_Approaches_Modeling_Verification/exercises/1_Symbolic_Execution/exercise_1.md) - [Exercise 2—Developing Symbolic Execution Tools for EVM Using Z3](https://github.com/WilfredTA/formal-methods-curriculum/blob/master/courses/2_Approaches_Modeling_Verification/exercises/1_Symbolic_Execution/exercise_2.md) ## References Keith Schwarz. 2019. Lectures on Mathematical Foundations of Computing. Stanford University. https://web.stanford.edu/class/archive/cs/cs103/cs103.1202/lectures/04/Small04.pdf, https://web.stanford.edu/class/archive/cs/cs103/cs103.1202/lectures/05/Small05.pdf R. Baldoni, E. Coppa, D. C. D'Elia, C. Demetrescu, and A. Finocchi. 2018. Survey of Symbolic Execution Techniques. ACM Computing Surveys. https://arxiv.org/pdf/1610.00502.pdf J. C. King. 1976. Symbolic execution and program testing. ACM Communications. https://madhu.cs.illinois.edu/cs598-fall10/king76symbolicexecution.pdf Trail of Bits. 2020. "How to use Manticore to find bugs in smart contracts". https://ethereum.org/en/developers/tutorials/how-to-use-manticore-to-find-smart-contract-bugs/ *Web-resource*. dapptools & Ethereum Foundation. 2023. "symbolic - hevm". https://hevm.dev/symbolic.html

Read more

The Easy Way To Quit (Concrete) Testing

When Invariants Aren't: DAI's Certora Surprise