The Easy Way To Quit (Concrete) Testing

Author: Palina Tolmach

This blog post serves as the second installment of a two-part series exploring the concept of symbolic execution for smart contracts. For background information on symbolic execution and an overview of existing tools for Ethereum smart contracts verification, be sure to check out part one!

In this post, we will dive into the topic of symbolic testing. The first half of this post will provide a comprehensive review of existing tools that support symbolic testing for EVM smart contracts. In the second half, we'll share our own experience of integrating EthBMC—one of the top-performing tools from part one—with the Foundry development framework to enable symbolic testing.

What is symbolic testing?

Symbolic execution provides an effective way to identify potential vulnerabilities in smart contracts by analyzing multiple execution paths simultaneously. However, it can also be limited by scalability and usability problems. Symbolic testing partially addresses these issues by analyzing tests using symbolic inputs rather than systematically examining all possible sequences of function calls made to the smart contract. Symbolic testing also allows reusing existing unit or fuzz tests for formal analysis, removing the necessity to separately define specifications, a labor-intensive blocker for adoption. Last but not least, symbolic testing also takes advantage of the setUp() function that is usually supported by testing frameworks such as Foundry, which helps (concretely) set up the initial state for the analysis; this feature is particularly useful for symbolic execution tools like hevm and EthBMC that analyze runtime bytecode, which does not include constructor logic.

Let's consider a simple smart contract example from the previous post. The contract is based on the code included in an issue from the Foundry repository titled condition not caught by invariant/fuzzer. We're interested in the assert on line 15, which should fail if the function is called with x being equal to 6912213124124532 (6912213124124531 + 1).

















pragma solidity ^0.8.17;

contract PostExample {
    // https://github.com/foundry-rs/foundry/issues/2851
    function backdoor(uint256 x) external pure {
        uint256 number = 99;
        unchecked {
            uint256 z = x - 1;
            if (z == 6912213124124531) {
                number = 0;
            } else {
                number = 1;
            }
        }
        assert(number != 0);
    }
}

As indicated by the issue title, Foundry–-a popular Rust-based development framework–-offers support for fuzz testing. When given a test that takes an least one parameter, Foundry uses fuzzing to analyze it, i.e., Foundry runs the test multiple times with different randomized concrete inputs; those are usually generated based on a set of heuristics. For instance, testBackdoor(uint256 x) in line 11 of the PostExampleTest contract below corresponds to a test that, in turn, calls the backdoor(uint256 x) function from PostExample with the same argument x. Running forge test will initiate the fuzzing of this test. In case the input generation strategy produces 6912213124124532 as input, the assert in line 15 of PostExample will fail and the test will not pass.














import "forge-std/Test.sol";
import "../src/PostExample.sol";

contract PostExampleTest is Test {
    PostExample public example;

    function setUp() public {
        example = new PostExample();
    }

    function testBackdoor(uint256 x) public view {
        example.backdoor(x);
    }
}

However, Foundry's fuzzer cannot pinpoint the input that triggers this assert violation (this happens due to its restricted data-flow reasoning capabilities–-if we change z = x - 1 to z = x in line 8 of PostExample, Foundry's fuzzer will successfully identify the issue).

At the same time, as demonstrated in the previous post, symbolic execution tools successfully identify the backdoor function's assert violation in a few seconds at most, and can, therefore, be applied in this setting as well.

What tools are there?

In this post, we evaluate the tools based on their ability to symbolically analyze tests written as part of the Foundry project. We will briefly discuss the structure of Foundry projects in the Setup section below; for more information on Foundry, please refer to its documentation. To the best of our knowledge, symbolic testing of the Ethereum smart contracts is only supported by the following tools, which we evaluate in this post:

hevm
A symbolic execution engine and equivalence checker for EVM bytecode, extensively analyzed in the previous post. Initially developed as part of the dapptools framework, hevm has been forked by the Ethereum Foundation and is currently under active development–-we are using the newer Ethereum Foundation fork for our experiments. hevm is implemented in Haskell and supports Z3 and cvc5 as solvers. hevm was one of the earliest tools to support symbolic analysis of unit tests for EVM smart contracts via its integration with the dapp-test framework; it has also recently added support for symbolic testing of Foundry projects.

Halmos
A symbolic bounded model checker designed for symbolic testing of Ethereum smart contract bytecode. Halmos is developed by a16z, is implemented in Python, and supports Z3 SMT solver through its Python API.

KEVM Foundry
KEVM implements a formal semantics (i.e., a model) of the EVM defined with the K Framework, which, among other things, can be used to perform symbolic execution of smart contract bytecode. KEVM can symbolically execute Foundry tests, supports an extensive list of Foundry cheatcodes, and defines several additional ones. K, upon which KEVM is built, is a general framework for the development of formal semantics and verification, and, different from all other tools reviewed in this and the previous post, is not limited to EVM. Therefore, the same symbolic execution module that is applied to Ethereum smart contracts can also be used to analyze smart contracts running on other platforms, such as Algorand, WASM, and some others. KEVM relies on Z3 as an SMT solver, with the underlying K framework implemented in several programming languages, including Python, Haskell, LLVM, C/C++, and Haskell.

EthBMC
A symbolic bounded model checker for EVM bytecode developed in Ruhr-University Bochum, EthBMC was thoroughly investigated in our previous post. This time, we examine its symbolic testing capabilities based on our own experimental integration of EthBMC with (an instrumented version of) Foundry. EthBMC is implemented in Rust and supports several SMT solvers (Z3, Yices2, Boolector).

Two other verification tools that are capable of analyzing Foundry projects are Mythril–-a symbolic execution-based security analysis tool for EVM bytecode–-and SMTChecker–-a Solidity smart contract verifier that is shipped together with the compiler. Both of these tools, however, analyze the source code of the smart contract implementation but not the tests defined for it, so we do not consider them today.

What can they do?

Let's take a look at how these tools can be used to analyze tests written for Ethereum smart contracts using Foundry. All experiments described in this post were performed on a DigitalOcean droplet with Ubuntu 22.04, 32 GB of RAM, and 4 CPUs. When run in Docker (as suggested in the "follow along" snippets), you may see degraded performance depending on the amount of resources allocated to your containers.

Setup

To follow along, you can pull pre-built Docker images using the following commands:

docker pull baolean/halmos:latest
docker pull baolean/kevm:latest
docker pull baolean/hevm-symtest:latest
docker pull baolean/ethbmc-forge:latest

You can find these container images in this Docker Hub registry.

Since all symbolic testing tools that we evaluate support the analysis of tests written using Foundry, we will be using a Foundry project for the experiments throughout this post. The project that includes all code used for the evaluation is available in the symtest branch of the symexec-bench repository under the SymTest folder. Most of the tools considered in this post rely on the compilation artifacts produced by Foundry–-to generate the corresponding files, we should executeforge build in the project folder. To clone and build the project, run:

git clone https://github.com/baolean/symexec-bench.git && \
    cd symexec-bench && \
    git checkout symtest -- && \
    cd SymTest && forge build

The /src folder contains the smart contract implementations that we will be examining, while /test includes the contracts defining tests that will be symbolically analyzed. Each test contract should be inherited from forge-std/src/Test.sol, which provides access to cheatcodes, an extended set of assertions, and other functionality that facilitates testing. Test contracts in Foundry also contain a setUp() function that is used to create smart contracts involved in the tests as well as to setup an initial state, from which the analysis will be performed. We expect that the examined tools should respect setUp() and use the result of its execution as the initial state; this functionality is particularly helpful for the tools that perform runtime bytecode analysis, such as hevm and EthBMC, as it provides a straightforward way to initialize the contract state in the absence of a constructor.

After you have cloned the symexec-bench repository and entered its SymTest directory, you can run the Docker containers with the current SymTest folder mounted in the /SymTest directory in the container. Assuming that your current directory (pwd) is the symexec-bench/SymTest folder, the command would look like this:

docker run -it --rm --platform linux/amd64 \
   -v $(pwd):/SymTest baolean/hevm-symtest:latest

Try running hevm version to verify that everything works. To exit the container, type exit and press enter. Thanks to --rm flag, the container (but not the image) will self-destruct upon exit, so you won't have to clean up afterwards.

Detailed steps to install all the tools that we examine in this post can be found in the Dockerfiles we prepared for Halmos, hevm, EthBMC/Foundry, and KEVM. If you prefer, you can build the images from scratch by running

docker build -t IMAGE_NAME .

in the respective cloned repository, or just install the tools directly on your machine.

All tools considered in this post perform symbolic analysis of test functions that follow a certain naming convention: hevm and EthBMC/Foundry symbolically execute functions that start with prove, while Halmos and KEVM target the ones beginning with test.

Simple Example

First, let’s compare the selected tools based on the testBackdoor(uint256 x) function that analyzes the behavior of the PostExample contract from the introduction:

function testBackdoor(uint256 x) public view {
    example.backdoor(x);
}

The code of the PostExample smart contract is available in src/PostExample.sol, while the tests are located in test/PostExample.t.sol.

Halmos symbolically executes functions that start with thetest keyword and will, therefore, analyze testBackdoor symbolically. To follow along, enter the Docker container by running:

docker run -it --rm --platform linux/amd64 \
   -v $(pwd):/SymTest baolean/halmos:latest

In the container, run the following commands to enter the SymTest folder with the Foundry project we will be analyzing and, then, recompile its source files:

cd SymTest && forge build --force

Now, to run Halmos on the PostExampleTest smart contract that contains the definitions of test functions, run the following command in the project folder:

halmos --contract PostExampleTest

The analysis successfully completes in less than 5 seconds, with the output showing the correctly identified value of the input variable x leading to the assert violation:

The option --contract we used in the command above restricts the analysis performed in this run of Halmos to a specific test contract, PostExampleTest. Running halmos without --contract would initiate the analysis of all functions prefixed with test.

In a similar fashion, we can use hevm to symbolically analyze test functions in a Foundry project by running hevm test in the project repository. In contrast to Halmos, hevm symbolically executes functions that are prefixed with prove instead of test–-that is necessary to distinguish symbolic tests from the ones that should be analyzed concretely and using fuzzing; therefore, we have added a proveBackdoor function to our test contract as well:

function proveBackdoor(uint256 x) public view {
    example.backdoor(x);
}

To use hevm in a Docker container, run:

docker run -it --rm --platform linux/amd64 \
   -v $(pwd):/SymTest baolean/hevm-symtest:latest

Similar to how we set up the environment in the Halmos container, execute the following commands to enter the directory with the analyzed Foundry project and recompile it:

cd SymTest && forge build --force

hevm test uses the --match option to specify the test or the contract to be analyzed; the instructions on its usage are available in the dapptools documentation. To analyze PostExampleTest, we can run:

hevm test --match PostExampleTest

In just 1.5 seconds, hevm reports a revert that happens in proveBackdoor with the Panic code 1, which corresponds to a failed assertion in Solidity versions >= 0.8:

The output shown on the screenshot above also reports on testBackdoor(uint256) analyzed using fuzzing (as can be seen from the results, 100 runs performed by hevm are insufficient to identify the input triggering the violation). Besides, you may notice that, different from the results produced by hevm symbolic in the previous post, hevm test returns concrete (decoded) calldata, and does not require a user to provide the runtime binary of the analyzed contract, which significantly improves the user experience.

Driven by a similar motivation, we implemented an experimental integration of EthBMC with Foundry to support symbolic testing within the Foundry framework itself. Based on the findings of our previous post, EthBMC performs well on the experimental smart contracts, but its user-friendliness is hindered by the necessity for users to provide a runtime binary in a particular configuration file, and the requirement to decode the returned counterexample calldata. The integration of EthBMC with Foundry partially solves this usability issue: we leverage Foundry to automatically compile Solidity code of the analyzed smart contracts, send it to EthBMC for processing, and decode the results of the symbolic execution returned from it. EthBMC also uses the state of the test contract after setUp() as the initial state for the analysis. Symbolic testing performed by the EthBMC/Foundry integration can be configured in the foundry.toml file located in the root of a Foundry project in the following manner:







[symbolic]
symbolic_storage = false
concrete_validation = true
solver_timeout = 120_000
solver = 2 # 0: z3, 1: boolector, 2: yices2
loop_bound = 256
call_bound = 1

We describe our experience of developing this integration in the second part of this post. To evaluate EthBMC with respect to symbolic testing functionality, we run an instrumented version of Foundry available in this repository. The installation instructions are also provided later in the post.

To run EthBMC/Foundry using Docker, you can enter the container as well as the Foundry project directory by executing the following commands:

docker run -it --rm --platform linux/amd64 \
   -v $(pwd):/SymTest baolean/ethbmc-forge:latest

cd SymTest && forge build --force

Much like hevm, EthBMC/Foundry symbolically analyzes tests that start with the prove keyword–-such tests are symbolically executed by EthBMC once the following command is run in the project folder:

forge test --match-contract PostExampleTest

where the optional --match-contract argument provides the name of the test contract that should be analyzed, similar to the --match option in hevm. In 1.1 seconds, this command returns the (decoded!) input leading to the assert violation that happens within proveBackdoor–-note that this execution time also includes the fuzzing of the testBackdoor function performed by Foundry in the same run. For this evaluation, we reduce the number of fuzz runs to 100–-the default value used by hevm.

The fourth symbolic testing tool that we examined, KEVM, symbolically analyzes test functions that start with test, like Halmos. At the moment, KEVM does not automatically run setUp() at the beginning of the analysis, as other tools do–-therefore, we manually invoke setUp() in body of the test function. The test contract analyzed by KEVM looks like this:












contract PostExampleTest is Test {
    PostExample public example;

    function setUp() public {
        example = new PostExample();
    }
    ...
    function testBackdoor_k(uint256 x) public {
        setUp();
        example.backdoor(x);
    }
}

For every test function we have defined in the SymTest/test repository, we have added a counterpart containing a call to setUp(). Such tests are postfixed with _k (as they are meant to be analyzed with KEVM) and by default are commented out–-otherwise, they would also be analyzed by other tools considered in this evaluation.

To follow along using the Docker container, run:

docker run -it --rm --platform linux/amd64 \
   -v $(pwd):/SymTest baolean/kevm:latest

cd SymTest && forge build --force

The process of symbolic testing with KEVM consists of two steps. First, we need to build the K definition for the analyzed test suite by executing the following command in the project folder:

kevm foundry-kompile --require lemmas.k \
    --module-import DEMO-LEMMAS

The command also makes use of some lemmas (lemmas.k) defined to facilitate the analysis; we have included this file in the test project (it is also available in this repository). If the contracts are modified later on, execute forge build and use a --regen flag when building a K definition to update it, i.e.,

forge build && \
    kevm foundry-kompile --require lemmas.k \
       --module-import DEMO-LEMMAS --regen

The compilation of our whole test project took a bit less than 3 minutes:

Note that foundry-kompile produces compilation artifacts that are added to the /out folder, which normally only contains the files generated by forge build. Other tools, such as hevm, perform the analysis based on the files present in this directory and might complain about the ones generated by KEVM or another tool. If that happens, the original contents of /out can be regenerated using forge build --force (however, please note that you would have to execute foundry-kompile again to use KEVM).

In KEVM, symbolic testing is performed via the kevm foundry-prove command. Once the compilation is finished, run the following command to symbolically execute the testBackdoor_k function, which–-different from the tests we analyzed before–-includes an explicit call to setUp(). To reproduce:

kevm foundry-prove \
    --test PostExampleTest.testBackdoor_k

The optional --test argument instructs KEVM on the test function that should be symbolically executed. The analysis of testBackdoor_k peformed by KEVM completes in 10 minutes and reports that the test has failed.

At the moment, KEVM takes a different approach to presenting the results of its analysis: in contrast to other symbolic testing tools, it does not currently demonstrate the concrete input leading to the assert violation. Instead, it lets the user visualize and explore the (symbolic) execution of the analyzed test either interactively using:

kevm foundry-view-kcfg PostExampleTest.testBackdoor_k

or statically:

kevm foundry-show PostExampleTest.testBackdoor_k

Note that if we don't add setUp() to the body of a test function, the test will pass, i.e., the failed assert will not be identified, as the PostExample contract called within a test in this case will not be instantiated prior to the analysis:

To reproduce:

kevm foundry-prove \
    --test PostExampleTest.testBackdoor

Our initial evaluation shows that all tools considered in this post successfully identify a failed assert in the PostExample smart contract using symbolic testing! The best results reported by these four tools are as follows:

Tool	Language	Solvers	Time
Halmos	Python	Z3	4.2 sec
hevm	Haskell	Z3, cvc5	1.5 sec
EthBMC	Rust	Z3, Boolector, Yices2	1.1 sec
KEVM	Haskell, Python, etc.	Z3	10 min 24 sec

Simple Example with Two Transactions

Following in the footsteps of our previous post, we will further analyze these four tools on the two-transaction version of the PostExample smart contract. In this contract, the assert violation happens as a result of two function calls: setLive(true); backdoor(6912213124124532). By including the first setLive(true) call in the setUp() function, we can examine how well the analyzed symbolic testing tools support the corresponding setup functionality. This version of the PostExample contract is available in src/PostExample_2tx.sol, while the tests are specified in test/PostExample_2tx.t.sol. The test contract looks as follows:




















import "../src/PostExample_2tx.sol";

contract PostExampleTwoTest is Test {
    
    PostExample public example;

    function setUp() public {
       example = new PostExample();
       example.setLive(true);
    }

    function testBackdoor(uint256 x) public {
        example.backdoor(x);
    }

    function proveBackdoor(uint256 x) public {
        example.backdoor(x);
    }
    
}

When run on this contract, Halmos, hevm, and EthBMC/Foundry exhibit behavior comparable to the single-transaction example, completing the analysis in 4.4, 1.4, and 1.2 seconds, respectively, and returning identical counterexample information–-this indicates that the setUp() function is handled correctly. To reproduce:

halmos --contract PostExampleTwoTest

hevm test --match PostExampleTwoTest

forge test --match-contract PostExampleTwoTest

To analyze this test using KEVM, we again include the call to setUp() into the testBackdoor(uint256 x) function:

function testBackdoor_k(uint256 x) public {
    setUp();
    example.backdoor(x);
}

In a bit less than 15 minutes, KEVM is able to detect the failure of this test. To reproduce:

kevm foundry-prove \
    --test PostExampleTwoTest.testBackdoor_k

The difference in the tools' performance can be observed when running them on a modification of the two-transaction test smart contract, which removes the example.setLive(true) function call from setUp(), and instead addsexample.setLive(bool isLive) to the test function itself, with isLive being a symbolic argument taken by the test function:

function setUp() public {
    example = new PostExample();
}

function testBackdoor(bool isLive, uint256 x) public {
    example.setLive(isLive);
    example.backdoor(x);
}

Halmos demonstrates comparable performance by taking 4.5 seconds to detect and report the failure of the test:

To reproduce:

halmos --contract PostExampleTwoLiveTest

As seen on the screenshot above, Halmos reports a single counterexample that involves setting the value of live to true and calling the backdoor function with x being equal to 6912213124124532.

EthBMC/Foundry takes more time to analyze the test than it did before–-it completes the analysis in 11.5 seconds. Interestingly, the best results for this test can be obtained with Boolector being used as the SMT solver, while Yices2 performed better on all previous examples from this post.

To reproduce:

forge test --match-contract PostExampleTwoLiveTest

EthBMC/Foundry identifies two failing tests in PostExampleTwoLiveTest. For proveBackdoor, which is analyzed symbolically, it returns a counterexample similar to the one reported by Halmos. In addition to it, Foundry's fuzzer also identifies the input that makes testBackdoor (analyzed concretely) fail–-this happens due to a revert that occurs in the require(live); statement if the value of live has been set to false . If we comment out the content of the testBackdoor function, forge test will return a single counterexample corresponding to the proveBackdoor test.

hevm also reports both testBackdoor and proveBackdoor as failing (in just 1.5 seconds!). However, different from EthBMC/Foundry and Halmos, it returns three counterexamples it total–-one for testBackdoor and two for proveBackdoor:

To reproduce:

hevm test --match PostExampleTwoLiveTest

The second counterexample returned for proveBackdoor is similar to the one that is (concretely) generated for testBackdoor, i.e., it corresponds to a revert that occurs due to a failed require(live); check if the value of live is false. By default, Foundry and hevm report reverted tests as failed; even though Halmos also reports a test failure if it gets reverted, Halmos returns a single counterexample as the output when run on the test smart contract considered in this section.

Interestingly, KEVM does not complete the analysis of this variation of the testBackdoor function within the timeout of one hour:

    function testBackdoor_k(bool isLive, uint256 x) public {
        setUp();
        example.setLive(isLive);
        example.backdoor(x);
    }

To reproduce:

kevm foundry-prove  \
    --test PostExampleTwoLiveTest.testBackdoor_k

The summary for the analysis performed in this post so far is shown in the table below:

Tool	Language	Best: 1tx	Best: sU+1tx	Best: 2tx
Halmos	Python	4.2 sec	4.4 sec	4.5 sec
hevm	Haskell	1.5 sec	1.4 sec	1.5 sec
EthBMC	Rust	1.1 sec	1.2 sec	11.5 sec
KEVM	Haskell, Python, etc.	10 min 24 sec	14 min 46 sec	x

"sU+1tx" stands for "setUp() + 1tx", which corresponds to the earlier variation of the test, where setLive is called in the setUp function;
2tx corresponds to the latter variation, where setLive is called in the test function with a symbolic argument.

Examples from Echidna VS Forge Repository

Similarly to the previous post, we further compare the tools based on the smart contracts from the Echidna VS Forge repository by clabby. This repository contains two smart contracts: Foo (having uint256 variables as function inputs) and FooBytes (having bytes32 variables as function inputs); these contracts were originally intended to be used for fuzzer comparison. The objective is to find two inputs equal to the magic number 0x69 to supply to the contract's foo(uint256/bytes32) and bar(uint256/bytes32) functions. If both of these functions are called with an input of 0x69, another function revertIfCracked() will fail with the assert violation.

The inclusion of this repository into the evaluation performed in this post also helps showcase the usability improvements offered by symbolic testing compared to traditional symbolic execution: Echidna VS Forge is, in fact, a Foundry project that already contains two tests (one for each of the two analyzed smart contracts). Each test encodes three function calls that eventually lead to a revert when called with specifically crafted input; we can directly re-use these already-defined tests to evaluate the considered symbolic testing tools based on their ability to identify these revert-triggering inputs. In the scope of our previous post, we modified the analyzed smart contracts (Foo and FooBytes) to use assert instead of a revert statement in order to make it detectable by all symbolic execution tools. For the analysis performed in this post, we also had to rename the test contracts (Foo_Test and FooBytes_Test) to eliminate the underscore in the contract name, which at the moment is considered a forbidden character in KEVM. The analyzed smart contacts are available in src/Foo.sol and src/FooBytes.sol; the tests are defined in test/Foo.t.sol and test/FooBytes.t.sol.

Two tests that we analyze in this section take the inputs to foo and bar as (in our case, symbolic) parameters and perform the following three functions calls to Foo and FooBytes, respectively:

Foo internal a;

function setUp() public {
    a = new Foo();
}

function testFuzz_cracked(uint256 x, uint256 y) public {
   a.foo(x);
   a.bar(y);
   a.revertIfCracked();
}

FooBytes internal a;

function setUp() public {
    a = new FooBytes();
}

function testFuzz_cracked(bytes32 x, bytes32 y) public {
   a.foo(x);
   a.bar(y);
   a.revertIfCracked();
}

Untill now, we used the --contract option in Halmos, --match-contract in EthBMC/Foundry, and --match in hevm to specify the test contract that should be symbolically analyzed by these tools. In Echidna VS Forge, both testFuzz_cracked tests are named similarly but are located in two different test contracts. However, all the tools we are considering in this post provide the ability to symbolically execute test functions that match a certain regular expression. This helps us analyze both testFuzz_cracked tests in the same run, and can be done using --function in Halmos and --match-test in EthBMC/Foundry. In hevm, we can analyze specific test functions using the same --match option as before.

Halmos takes 5.8 seconds to detect the assert violations happening in both the uint256 and bytes32 tests and output the identified counterexample data (note that 105 is the decimal representation of 0x69, which is the expected result):

To reproduce:

halmos --function testFuzz

hevm takes around 1.6 seconds to find both violations and the input that leads to them:

To reproduce:

hevm test --match 'proveFuzz'

While hevm and Halmos consistently demonstrate high performance, EthBMC/Foundry now takes around 1 minute 41 seconds to analyze both tests. Our empirical observations suggest that the performance decline is associated with an increase in the number of external function calls being analyzed within the test and needs further investigation.

To reproduce:

forge test --match-test 'proveFuzz'

KEVM takes 35 minutes to complete the analysis of both tests. In this example, we make use of the paralellization capability by supplying the -j2 flag together with the kevm foundry-prove command, which instructs KEVM to analyze two tests in parallel (the analysis of the same two tests takes more than an hour if run without this flag):

kevm foundry-prove \
    --test FooTest.testFuzz_cracked_k \
    --test FooBytesTest.testFuzz_cracked_k -j2

KEVM reports both of the tests as failed, indicating that it successfully identified the corresponding inputs:

The following table shows the summary of the evaluation:

Tool	1tx	sU+1tx	2tx	Foo/FooBytes
Halmos	4.2 sec	4.4 sec	4.5 sec	5.8 sec
hevm	1.5 sec	1.4 sec	1.5 sec	1.6 sec
EthBMC	1.1 sec	1.2 sec	11.5 sec	1 min 41 sec
KEVM	10 min 24 sec	14 min 46 sec	x	35 min 14 sec

MiniVat–-A More Realistic Example

Finally, following the structure of our previous post, let's run our tools on the MiniVat smart contract, which represents more realistic smart contract code. The violation of an assert expressing the fundamental equation of DAI invariant occurs through a sequence of four function calls, which we include in a test function called testInvariant. We express the invariant using a DSTest/Foundry assertEq supplementary function, which checks an equality between two values. We discuss how this assertion functionality is implemented under the hood later in the post. The analyzed smart contact can be found in src/MiniVat.sol; the tests are provided in test/MiniVat.t.sol.

The test function we are analyzing is shown below. All the arguments are concrete:

function testInvariant() public {
    vat.frob(10 ** 18);
    vat.fold(-10 ** 27);
    vat.init();
    (uint Art, uint rate, uint debt) = vat.getValues();

    assertEq(debt, Art * rate);
}

This test contains several function calls to MiniVat, which are external to the test contract. As demonstrated by the previous example, the increased number of external calls introduces overhead to the analysis performed by tools such as EthBMC/Foundry and KEVM. This observation can be verified by comparing the performance of these tools on the test shown above with their performance on a test involving a single external call to the counterexample()function, which, in turn, invokes the same set of smart contract functions but within the MiniVat smart contract itself. The latter test will be analyzed much faster.

function testCounterexample() public {
    vat.counterexample();
}

When run on a proveInvariant/testInvariant test that contains several external calls and an assertEq statement, EthBMC/Foundry correctly identifies a violation in 3 minutes 51 seconds:

To reproduce:

forge test --match-contract MiniVatTest

It takes around 1.5 seconds for hevm to report the assert violation in the symbolic (proveInvariant) as well as the concrete tests (testInvariant):

To reproduce:

hevm test --match MiniVatTest

Halmos reports the testInvariant test as failing in a bit less than 3 minutes. However, it encounters some issues with processing and returning the counterexample as it outputs the Counterexample: unknown error message (even though there are no parameters in this test, Halmos normally returns [] as a counterexample in this case).

To reproduce:

halmos --contract MiniVatTest

KEVM takes 32 minutes to finish the analysis and successfully identifies the violation happening within the test:

To reproduce:

kevm foundry-prove \
    --test MiniVatTest.testInvariant_k

The summary of all results of the evaluation performed in this post is shown below:

Tool	1tx	sU+1tx	2tx	Foo/FooBytes	MiniVat
Halmos	4.2 sec	4.4 sec	4.5 sec	5.8 sec	2 min 48 sec
hevm	1.5 sec	1.4 sec	1.5 sec	1.6 sec	1.5 sec
EthBMC	1.1 sec	1.2 sec	11.5 sec	1 min 41 sec	3 min 51 sec
KEVM	10 min 24 sec	14 min 46 sec	x	35 min 14 sec	32 min 12 sec

tl;dr

All four tools considered in this post–-Halmos, hevm, KEVM, and EthBMC/Foundry–-support symbolic execution of tests defined as part of a Foundry project. Most of the tools perform well on the majority of the experimental smart contracts and their corresponding tests, using only several seconds for the analysis. Based on the evaluation described in this post, hevm demonstrates the best performance, followed by Halmos and EthBMC, and, finally, KEVM. Most of the tools output the concrete counterexample corresponding to the test failure. KEVM informs the user on the status of the test verification, but takes a different approach to the representation of the results: it allows the user to step through the execution, thereby visualizing the analysis process.

Apart from being able to analyze the compilation artifacts produced by Foundry, all tools support some of the DSTest/Foundry-specific functionality. For example, hevm, Halmos, and EthBMC support the state initialization via setUp(), and all four tools recognize Foundry's supplementary assert functions, such as assertEq, assertTrue, etc.

In this post, we chose not to compare the tools based on their hevm/Foundry cheatcode support as they seem to implement an almost non-intersecting sets of cheatcodes–-however, all tools support at least one cheatcode. hevm supports a comperehensive set of cheatcodes including warp, roll, store, load etc. In fact, these cheatcodes are actively used in Foundry projects but were originally introduced by hevm itself. KEVM supports arguably the most extensive list of cheatcodes at the moment, including several cheatcodes that are specific to symbolic testing with KEVM. The assume cheatcode is supported by EthBMC and Halmos, with the latter also supporting getCode.

Summary

In summary, symbolic testing is a rapidly developing technique that shows great promise in improving the usability and user-friendliness of symbolic execution for smart contracts, making it easier to integrate symbolic analysis into the smart contract development processes. All tools reviewed in this post support basic functionality related to symbolic testing of Foundry projects, which enables re-use of existing tests and properties defined for a smart contract. All these tools, however, are also undergoing active development, so we expect further improvements in their performance as well as extended support for Foundry's supplementary testing features, such as cheatcodes.

Bonus Section: The Hitchhiker's Guide to Foundry Integration

In this section, we document our experience of implementing the experimental integration of EthBMC with Foundry. The purpose of this integration is two-fold: first, we introduce symbolic testing into the Foundry development framework; and, second, it also helps improve the user experience for symbolic execution with EthBMC. For a more detailed discussion on the performance and usability of EthBMC, please refer to our previous post.

In the evaluation described in this post, we used an instrumented version of Foundry that is available in the symexec branch of our Foundry fork. To carry out symbolic testing with EthBMC, our fork of Foundry uses a local crate built from the EthBMC implementation located in a folder in the same directory as one containing the Foundry source code. The version of EthBMC that supports Foundry integration is implemented in the forge branch of our fork of EthBMC.

To install the version of Foundry that supports symbolic testing based on EthBMC, run the following commands:

git clone https://github.com/baolean/EthBMC.git && \
    cd EthBMC && \
    git checkout forge && \
    cd ..

git clone https://github.com/baolean/foundry.git && \
    cd foundry && \
    git checkout symexec && \
    foundryup --path .

With this version of Foundry installed, every run of forge test not only executes the concrete, fuzz, and invariant tests supported by Foundry, but also symbolically analyzes test functions that start with prove.

The rest of this section will outline the necessary steps to successfully integrate a symbolic execution tool with Foundry to facilitate symbolic testing; these steps include input and output processing, symbolic analysis of test functions that follow a chosen naming convention, and support of supplementary testing features available in Foundry.

Input Processing

To symbolically execute a smart contract, EthBMC relies on a runtime binary of a smart contract that has to be provided by a user. The binary, together with other information such as contract addresses and initial storage state (the constructor code is not included in the runtime binary!), should be supplied in the format of a specific YAML configuration file. Integrating EthBMC with Foundry helps us streamline the process of source code compilation by using the build artifacts generated by Foundry (as a result of forge build or forge test) and supplying the resulting bytecode to EthBMC automatically.

In our implementation, the symbolic execution functionality of EthBMC is invoked directly from Foundry based on the results of the test contract code pre-processing. This approach helps us leverage Foundry's compilation and calldata decoding capabilities, and output the result of the analysis in the format that is consistent with those for tests analyzed concretely. The obvious downside of this approach is the necessity to run a modified version of Foundry, which might be challenging to keep in sync with the original branch, as it is updated very actively. An alternative approach is taken by some other tools, such as hevm and KEVM, that also perform symbolic testing based on the compilation artifacts produced by Foundry, but are run independently from it.

In addition to runtime binaries of the analyzed smart contracts, the information that should be supplied to EthBMC includes the initial states of the considered smart contracts; the address of the test contract that will be analyzed; function selectors corresponding to the tests that should be symbolically executed; and configuration information related to symbolic execution, if available.

The initial states of the smart contracts that will be used for analysis correspond to their states after the execution of the setUp() function. As we mentioned earlier in this post, setUp() therefore provides a straightforward way to initialize the state in the absence of a constructor, which is not included in the runtime bytecode. For every smart contract that is present in the project, we extract the information about its bytecode and storage layout after the execution of setUp()–-this data will then be used in EthBMC to define the initial state for the symbolic analysis. Foundry and EthBMC use different implementations of Ethereum types, so, for simplicity, we record the values in the form of hex strings.

for (address, account) in db.accounts.iter() {
    let mut account_storage: HashMap<String, String> = HashMap::new();
    // Recording the contract information:
    // address, bytecode, and initial storage 
    let account_code: String = hex::encode(&account.info.code.as_ref()                 
                                                      .unwrap().bytes());
    // Recording storage values to set up symbolic tests
    for (slot, value) in &account.storage {
        account_storage.insert(
            hex::encode(utils::u256_to_h256_be(*slot)),
            hex::encode(utils::u256_to_h256_be(*value)),
        );
    }
    setup_accounts.insert(hex::encode(address),
                            (account_code, account_storage));
}

Similar to hevm, our EthBMC/Foundry integration symbolically executes tests that are prefixed with the keyword prove. To supply this information to EthBMC, we instruct Foundry to recognize tests that should be symbolically executed, and record the signatures for functions that follow this naming convention:

fn is_symbolic_test(&self) -> bool {
        self.starts_with("prove")
    }
...

pub fn run_symbolic_test(...) -> ... {
    ...
    // Recording `prove...` function selectors
    let signature: String = hex::encode(function.short_signature());
    ...
}

One other type of input we need to provide to EthBMC is the symbolic execution configuration. The cutomizable parameters should include the SMT solver used and the loop bound; we should also be able to enable or disable the usage of symbolic storage and concrete validation. We have instrumented Foundry with the ability to recognize the following configuration section that can be added to the foundry.toml file:

[symbolic]
symbolic_storage = false
concrete_validation = true
solver_timeout = 120_000_000   # ms
solver = 1                     # 0: z3, 1: boolector, 2: yices2
loop_bound = 50
call_bound = 1

We set the value of call_bound, which is the number of function calls to be analyzed, to 1 by default, as we are only interested in executing symbolic tests, and not sequences of function calls. The parsed configuration is recorded in the following structure:

pub struct SymbolicConfig {
    /// ... whether to assume that default storage values are symbolic
    pub symbolic_storage: bool,
    /// ... whether to perform geth-based concrete validation
    pub concrete_validation: bool,
    /// SMT solver to be used
    /// {0: z3, 1: boolector, 2: yices2}
    pub solver: u8,
    /// The timeout (ms) for the solver
    pub solver_timeout: u32,
    /// The number of loops to be unrolled in a single execution
    pub loop_bound: u32,
    /// The number of calls symbolically analyzed in a sequence
    pub call_bound: u32,
}

Finally, if the test contract contains symbolic tests, our version of Foundry starts processing these tests using EthBMC one by one. Analyzing one transaction at a time (in contrast to passing a set of symbolic tests to EthBMC) helps decode the results corresponding to the analysis of each function properly, correctly measure the time used to analyze each test, and possibly prevent symbolic execution of all tests from being stalled if one of them cannot be processed. By doing so, we also take advantage of the very short startup time of EthBMC. For every symbolic test identified in a project, we run EthBMC while also supplying the collected input, which includes the address of the analyzed contract, runtime bytecode and storage setup of smart contracts involved in the analysis, symbolic test function selectors, and the configuration information related to the symbolic execution.

pub fn run_symbolic_test(...) -> Result<TestResult> {
    ... 

    // Running symbolic analysis in EthBMC
    let symbolic_result: Option<(String, String, String)> = 
        esvm::forge_analysis(
            hex::encode(*analyzed_contract),
            signature,
            setup_accounts,
            serde_json::to_string(&test_options.symbolic).unwrap(),
    );

    ...
}

Symbolic Execution of Tests

As we have discussed in the previous post, symbolic execution tools usually perform the analysis by issuing a transaction with symbolic calldata, which represents a call to any smart contract function with symbolic parameters. Since, in this case, we perform symbolic testing, we are interested in symbolically executing only a specific function that starts with prove and corresponds to a funciton selector supplied from Foundry. A straightforward way to make sure that the analysis is restricted to such functions is to impose a constraint on the function selector (i.e., the first four bytes) of the otherwise symbolic calldata:

// Restricting the analysis to `prove_xxx` functions
// Loading the calldata from memory
let load = mload(&new_state.memory, new_state.input_tx().data,
                                                    &const256("0"));
// If the `prove` function selector is provided by Foundry,
if let Some(selector) = env.get_selector() {
    let prove_selector =
        const_vec(&hexdecode::decode(selector.as_bytes()).unwrap());
    // Selecting the first 4 bytes from the calldata
    let shiftval = const_u256(U256::from(224));
    let shiftop = lshr(&load, &shiftval);
    // Forming a constraint that asserts that these 4 bytes == selector 
    let prove_constraint = eql(&prove_selector, &shiftop);
    // Adding the constraint to the state
    new_state.push_constraint(prove_constraint);
}

Supporting Supplementary Testing Functionality

To facilitate testing, Foundry supports a set of supplementary features that should also be supported by the symbolic execution component of our integration. In this section, we consider two types of these features: DSTest/Foundry asserts and cheatcodes.

Handling DSTest/Forge asserts

DSTest and Forge Standard Library provide a smart contract developer with a set of functions that facilitate the specification of assertions. These functions include, for example, assertTrue(bool), assertEq(address a, address b), and many others. Different from regular asserts provided by Solidity, the violation of the condition expressed using these constructs does not immediately stop the execution of a smart contract; instead, the variable _failed in the DSTest smart contract is set to true. In a Foundry test contract that inherits from forge-std/Test.sol, this variable is located at slot 0 with byte offset 1.

To identify the violation of this type of an assert, we instrumented EthBMC with an additional check on the value located at this offset in the test smart contract storage during the symbolic execution process (and later during the concrete validation using Geth):

// Detecting forge-std/Test assert failures
// If the analysis is run in the Foundry symtest mode
if potential_attack_state.env.func_selector.is_some() {
    // Clone the current smart contract state
    let mut check = potential_attack_state.clone();
    // To identify failed `assertEq`, `assertTrue`, etc.,
    // read the value located at slot 0, offset 1
    let failed_val = byte_at(
        &sload(&check.memory, check.account().storage, &const_usize(0)),
        &one(),
    );

    // Add constraint that requires this value to be 1
    check.push_constraint(eql(&failed_val, &one()));
    // If it's possible--the assert might be violated
    if check.check_sat() {
        info!("Foundry's assert might be violated!");
        // Use concrete validation to make sure it is a true positive
        // Or just return results if the validation is disabled

Cheatcodes

Another feature provided by Foundry to facilitate smart contract analysis is cheatcodes. While there are many cheatcodes supported by Foundry, we decided to start with the experimental support of the assume cheatcode.

Cheatcodes are implemented as calls to a special Vm smart contract at address 0x7109709ecfa91a80626ff3989d68f67f5b1dd12d; to process the cheatcode correctly, we need to intercept the calls to this hardcoded address, parse the calldata with which this call was performed to identify the exact cheatcode to be processed, and update the symbolically analyzed state accordingly.

In EthBMC, support for the assume cheatcode is implemented in commits one and two. assume(bool) is used to declare the assumptions on the values of some variables. During fuzzing, the inputs that don't satisfy the assumption have to be rejected (possibly leading to the corresponding issue); during symbolic execution, this assumptions can be accomodated in a straightfoward way: by adding a corresponding constraint to a set of path conditions (i.e., to the formula that comprises conditions that should be satisfied for a specific execution path to be taken).

In our fork of EthBMC, we instruct it to recognize the calls performed to 0x7109709ecfa91a80626ff3989d68f67f5b1dd12d, decode the calldata supplied with this call, and, if it is equal to that of the assume cheatcode, add the corresponding expression to the set of constraints corresponding to this execution path:

let mut is_assume: bool = false;

if *to
    == parse_hex_string(
        &String::from("0x7109709ecfa91a80626ff3989d68f67f5b1dd12d")))
{
    // Executing a call to the Vm smart contract 
    is_vm = true;
}

// Loading the calldata from memory,
// extracting a function selector from it
let load = mload(&call.memory, call.mem, in_off);
let shiftval = const_u256(U256::from(224));
let shiftop = lshr(&load, &shiftval);

// If the selector is equal to
if FVal::simplified(&shiftop) == 
    // bytes4(keccak256("assume(bool)"))
    parse_hex_string(&String::from("0x4C63E562")) 
{
    // Executing an `assume` cheatcode
    is_assume = true;
}

// vm.assume
if is_hevm && is_assume {
    // 128 + 4 == 132
    let args = add(in_off, &const_usize(4));
    // Getting the condition contained provided in `assume`
    let load_args = mload(&call.memory, call.mem, &args);
    call.push_constraint(load_args);
    ...
    // Adding this condition to path conditions (current constraints)
    call.push_constraint(eql(&callres, &one()));
    ...
}

Output Processing

Finally, after the analysis performed by EthBMC is completed, we need to output the results of the symbolic testing. EthBMC does not automatically decode the identified counterexample calldata; however, we can utilize the decoding capabilities of Foundry. For this, we return the result of symbolic testing back to the original foundry test process that had run EthBMC. The returned information includes the addresses of the sender, receiver, and the calldata that leads to the test failure identified using symbolic testing:

    // Returning the first identified counterexample to Foundry
    if let Some(counterexample) = counterexamples.get(0) {
        let sender = counterexample[0].sender.clone();
        let receiver = counterexample[0].receiver.clone();
        let input_data = counterexample[0].input_data.clone();

        let result: (String, String, String) =
                    (sender, receiver, input_data);
        
        return Some(result);
    }

This information is then used to generate a counterexample in Foundry; this counterexample will also be shown in the output produced by forge test:










if let Some(result) = symbolic_result {
    let counterexample = BaseCounterExample::create(
        H160::from_str(&result.0).unwrap(),
        H160::from_str(&result.1).unwrap(),
        &Bytes::from_str(&result.2).unwrap(),
        &identified_contracts,
        None,
    )
    .wrap_err("Failed to create counterexample")?;
}

This approach also helps us return the results in the same format as Foundry does, as well as to execute symbolic tests in the same run as other concrete and fuzz tests.

Some of the code excerpts of our integration project between EthBMC and Foundry that were given in this post are, admittedly, crude hacks (e.g., primitive string marshalling in the example above). Please keep in mind that in this rough proof-of-concept we optimized for time-to-functioning-prototype rather than anything else, and any contributions are extremely welcome!

The Easy Way To Quit (Concrete) Testing

What is symbolic testing?

What tools are there?

What can they do?

Setup

Simple Example

Simple Example with Two Transactions

Examples from Echidna VS Forge Repository

MiniVat–-A More Realistic Example

tl;dr

Summary

Bonus Section: The Hitchhiker's Guide to Foundry Integration

Input Processing

Symbolic Execution of Tests

Supporting Supplementary Testing Functionality

Handling DSTest/Forge asserts

Cheatcodes

Output Processing

Read more

Everything You Wanted to Know About Symbolic Execution for Ethereum Smart Contracts (But Were Afraid to Ask)

When Invariants Aren't: DAI's Certora Surprise