Foundry mutation test

Background

mutation test is a technique which allows to assess "how good" a testing suite is. While coverage answers "is this line executed during the test", mutation answers "is this line being tested during the test?".
To do so, the mutation tool creates "mutants" (like Turtle Ninja's, but without turtles nor ninja's involved), which are the contract we're testing modified with just a small difference (ie one random == is replaced with a ≠). If at least one (previously passing) test now fails, then we say the mutant is killed - poor turtle, but it's a good thing as it means the test was actually depending on that ==.
On the other hand, if no test fails after a mutation is introduced (ie the mutant survives and go live in the sewer, eating pizzas with Donatello), it's a bad thing, as no test was really testing that ==).

There are some mutants which are obviously useless (ie if they cannot be compiled for instance - these are “not viable”), but still, there are a lot of possibilities -> this means mutation tests are usually slow and ressource heavy. Leveraging Foundry, the “blazzing fast (…) toolkit”, therefore makes sense.

General

This project would introduce the possibility to run mutation tests in Foundry’s Forge test (forge test -—mutate) .

Requirements

running forge test —-mutate produces an array with accurate number of dead and surviving mutants
for any surviving mutant, user should get the filename, the line and a diff of the mutation (option to deactivate this, for large codebase with partial coverage for instance):
```
Surviving mutant in Foo.sol:ln23
-- a = b - 4;
++ a = b + 4;
```
running-time is human-scale
consumer-grade hardware friendly

In-Depth

There is a first iteration of this idea, using Gambit as mutation engine. This PR has now been confirmed as stale by its main author: https://github.com/foundry-rs/foundry/pull/6588 https://hackmd.io/@tICezjHsSaiehIn9jbcUAA/SkTEyvuHa
Gambit (https://github.com/Certora/gambit/blob/master/src/lib.rs) is a tool built by Certora (mostly to use with their prover), which parses the solidity codebase, create and write new mutant in solidity files, compile and discard non-viable mutants.
Gambit was suggested amongst other possibilities for the mutation engine
There exists other tools or strategies - source-code based mutations being considered as superior (see the Foundry issue related discussion).

Tech spec

We wouldn’t reuse Gambit as-is. An initial MVP would instead use Solar to parse the ast, then mutate it from within forge (eventually reusing Gambit mutation dictionary), emit solidity code, use foundry-compilers as a solc binding and compile the mutant, then run the test.
A single thread should handle [ast mutation; emitting solidity; solc compilation; forge test] as each step is blocking the next.
We should always start by a non-mutated test run (ie running forge test)
We should only mutate one contract at a time, to avoid having to recompile the whole codebase on each iteration/thread
Mutations would be taken from Gambit generator, which looks a bit more exhaustive than (universalmutator-based) Vertigo
forge-compiler (ie solc binding) should be a temporary step, and replaced with Solar once it reaches feature completeness.*
Main and major improvement would be avoiding having 1000’s of temporary solidity files, only used by the compiler (ie parse the original codebase, mutate the ast, use this ast to compile/check if the mutant is viable) → if Solar dev cycle is fast enough, we’d wait and insure to stay generic enough for future integration; if major delay is forecasted, it then might be interesting to explore creating new bindings (or just ffi) for some of solc cpp functions (hir generation, etc).
UX wise, it should support filtering by contract, function and path (same as for forge test)
There should at be least some trivial equilvalency check implemented (todo: check how gambit or vertigo handles this), maybe a hashmap of the hashed flattened ast would work? Or hash(bytecode) (but then we don’t “save” a compilation)?

External Requirements

Solar completion

*:
pre-solar/mvp:

parse with solar -> get the ast
mutate the ast
emit solidity code to file
foundry-compile the file

post-solar

parse with solar -> ast
mutate the ast
solar compile (hopefully easy to compile starting from an ast)

Milestones and Estimates

tbd

Open Questions and Thoughts

Solar exact status?
Nice to have: a --resume option, which would allow running mutation tests in multiple chunks (provided test and implementation haven’t changed)
Nice to have: same filtering as forge test (match-contract, match-test and match-path), but applied on the target contracts
Nice to have: coverage integration (non-covered code will always produce surviving mutants) → either a warning, or avoid uncovered zones?
Nice to have: mutant equivalency and redudancy check (see BenTheKush infra):

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Signatures

@Simon Something at February 17, 2025
[Exported from Notion]