Foundry mutation test

# Foundry mutation test ## Background mutation test is a technique which allows to assess "how good" a testing suite is. While coverage answers "is this line executed during the test", mutation answers "is this line being tested during the test?". To do so, the mutation tool creates "mutants" (like Turtle Ninja's, but without turtles nor ninja's involved), which are the contract we're testing modified with just a small difference (ie one random == is replaced with a ≠). If at least one (previously passing) test now fails, then we say the mutant is killed - poor turtle, but *it's a good thing* as it means the test was actually depending on that ==. On the other hand, if no test fails after a mutation is introduced (ie the mutant survives and go live in the sewer, eating pizzas with Donatello), it's *a bad thing*, as no test was really testing that ==). There are some mutants which are obviously useless (ie if they cannot be compiled for instance - these are “not viable”), but still, there are **a lot** of possibilities -> this means mutation tests are usually slow and ressource heavy. Leveraging Foundry, the “blazzing fast (…) toolkit”, therefore makes sense. ```mermaid flowchart TD A[Start: Source Code] B[Apply Mutation Operators] C[Generate Mutants] D{Is Mutant Viable?} E[Execute Test Suite on Viable Mutants] F{Mutant Killed?} G[Mark as Killed] H[Mark as Survived] I[Mark as Non-Viable] J[Aggregate Mutation Results] K[Calculate Mutation Score] L[Report Results] A --> B B --> C C --> D D -- Yes --> E D -- No --> I E --> F F -- Yes --> G F -- No --> H G --> J H --> J I --> J J --> K K --> L ``` ## General This project would introduce the possibility to run mutation tests in Foundry’s Forge test (`forge test -—mutate`) . ### Requirements - [ ] running `forge test —-mutate` produces an array with accurate number of dead and surviving mutants - [ ] for any surviving mutant, user should get the filename, the line and a diff of the mutation (option to deactivate this, for large codebase with partial coverage for instance): ```markdown Surviving mutant in Foo.sol:ln23 -- a = b - 4; ++ a = b + 4; ``` - [ ] running-time is human-scale - [ ] consumer-grade hardware friendly ## In-Depth - There is a first iteration of this idea, using Gambit as mutation engine. This PR has now been confirmed as stale by its main author: https://github.com/foundry-rs/foundry/pull/6588 https://hackmd.io/@tICezjHsSaiehIn9jbcUAA/SkTEyvuHa - Gambit (https://github.com/Certora/gambit/blob/master/src/lib.rs) is a tool built by Certora (mostly to use with their prover), which parses the solidity codebase, create and write new mutant in solidity files, compile and discard non-viable mutants. - Gambit was suggested amongst other possibilities for the mutation engine - There exists other tools or strategies - source-code based mutations being considered as superior (see the [Foundry issue related discussion](https://github.com/foundry-rs/foundry/issues/478#issuecomment-1064303581)). # Tech spec - We wouldn’t reuse Gambit as-is. An initial MVP would instead use Solar to parse the ast, then mutate it from within forge (eventually reusing Gambit mutation dictionary), emit solidity code, use foundry-compilers as a solc binding and compile the mutant, then run the test. - A single thread should handle [ast mutation; emitting solidity; solc compilation; forge test] as each step is blocking the next. - We should always start by a non-mutated test run (ie running `forge test`) - We should only mutate one contract at a time, to avoid having to recompile the whole codebase on each iteration/thread - Mutations would be taken from [Gambit generator](https://github.com/Certora/gambit/blob/bf7ab3c91c47a10dcf272380b6406f0404f3b5d1/src/mutation.rs#L250), which looks a bit more exhaustive than (universalmutator-based) [Vertigo](https://github.com/JoranHonig/vertigo/blob/master/eth_vertigo/mutator/solidity/solidity_mutator.py) - forge-compiler (ie solc binding) should be a temporary step, and replaced with Solar once it reaches feature completeness.* - Main and major improvement would be avoiding having 1000’s of temporary solidity files, only used by the compiler (ie parse the original codebase, mutate the ast, use this ast to compile/check if the mutant is viable) → if Solar dev cycle is fast enough, we’d wait and insure to stay generic enough for future integration; if major delay is forecasted, it then might be interesting to explore creating new bindings (or just ffi) for some of solc cpp functions (hir generation, etc). - UX wise, it should support filtering by contract, function and path (same as for forge test) - There should at be least some trivial equilvalency check implemented (todo: check how gambit or vertigo handles this), maybe a hashmap of the hashed flattened ast would work? Or hash(bytecode) (but then we don’t “save” a compilation)? ## External Requirements Solar completion *: pre-solar/mvp: - parse with solar -> get the ast - mutate the ast - emit solidity code to file - foundry-compile the file post-solar - parse with solar -> ast - mutate the ast - solar compile (hopefully easy to compile starting from an ast) ## Milestones and Estimates tbd ## Open Questions and Thoughts - Solar exact status? - Nice to have: a `--resume` option, which would allow running mutation tests in multiple chunks (provided test and implementation haven’t changed) - Nice to have: same filtering as `forge test` (match-contract, match-test and match-path), but applied on the target contracts - Nice to have: coverage integration (non-covered code will always produce surviving mutants) → either a warning, or avoid uncovered zones? - Nice to have: mutant equivalency and redudancy check (see BenTheKush infra): ![image](https://hackmd.io/_uploads/Sy5UVsgqkg.png) ## Signatures - @Simon Something at February 17, 2025 [Exported from Notion]