Try   HackMD

Foundry mutation test

Background

mutation test is a technique which allows to assess "how good" a testing suite is. While coverage answers "is this line executed during the test", mutation answers "is this line being tested during the test?".
To do so, the mutation tool creates "mutants" (like Turtle Ninja's, but without turtles nor ninja's involved), which are the contract we're testing modified with just a small difference (ie one random == is replaced with a ≠). If at least one (previously passing) test now fails, then we say the mutant is killed - poor turtle, but it's a good thing as it means the test was actually depending on that ==.
On the other hand, if no test fails after a mutation is introduced (ie the mutant survives and go live in the sewer, eating pizzas with Donatello), it's a bad thing, as no test was really testing that ==).

There are some mutants which are obviously useless (ie if they cannot be compiled for instance - these are “not viable”), but still, there are a lot of possibilities -> this means mutation tests are usually slow and ressource heavy. Leveraging Foundry, the “blazzing fast (…) toolkit”, therefore makes sense.

Yes
No
Yes
No
Start: Source Code
Apply Mutation Operators
Generate Mutants
Is Mutant Viable?
Execute Test Suite on Viable Mutants
Mutant Killed?
Mark as Killed
Mark as Survived
Mark as Non-Viable
Aggregate Mutation Results
Calculate Mutation Score
Report Results

General

This project would introduce the possibility to run mutation tests in Foundry’s Forge test (forge test -—mutate) .

Requirements

  • running forge test —-mutate produces an array with accurate number of dead and surviving mutants

  • for any surviving mutant, user should get the filename, the line and a diff of the mutation (option to deactivate this, for large codebase with partial coverage for instance):

    ​​​​Surviving mutant in Foo.sol:ln23
    ​​​​-- a = b - 4;
    ​​​​++ a = b + 4;
    
  • running-time is human-scale

  • consumer-grade hardware friendly

In-Depth

Tech spec

  • We wouldn’t reuse Gambit as-is. An initial MVP would instead use Solar to parse the ast, then mutate it from within forge (eventually reusing Gambit mutation dictionary), emit solidity code, use foundry-compilers as a solc binding and compile the mutant, then run the test.
  • A single thread should handle [ast mutation; emitting solidity; solc compilation; forge test] as each step is blocking the next.
  • We should always start by a non-mutated test run (ie running forge test)
  • We should only mutate one contract at a time, to avoid having to recompile the whole codebase on each iteration/thread
  • Mutations would be taken from Gambit generator, which looks a bit more exhaustive than (universalmutator-based) Vertigo
  • forge-compiler (ie solc binding) should be a temporary step, and replaced with Solar once it reaches feature completeness.*
  • Main and major improvement would be avoiding having 1000’s of temporary solidity files, only used by the compiler (ie parse the original codebase, mutate the ast, use this ast to compile/check if the mutant is viable) → if Solar dev cycle is fast enough, we’d wait and insure to stay generic enough for future integration; if major delay is forecasted, it then might be interesting to explore creating new bindings (or just ffi) for some of solc cpp functions (hir generation, etc).
  • UX wise, it should support filtering by contract, function and path (same as for forge test)
  • There should at be least some trivial equilvalency check implemented (todo: check how gambit or vertigo handles this), maybe a hashmap of the hashed flattened ast would work? Or hash(bytecode) (but then we don’t “save” a compilation)?

External Requirements

Solar completion

*:
pre-solar/mvp:

  • parse with solar -> get the ast
  • mutate the ast
  • emit solidity code to file
  • foundry-compile the file

post-solar

  • parse with solar -> ast
  • mutate the ast
  • solar compile (hopefully easy to compile starting from an ast)

Milestones and Estimates

tbd

Open Questions and Thoughts

  • Solar exact status?
  • Nice to have: a --resume option, which would allow running mutation tests in multiple chunks (provided test and implementation haven’t changed)
  • Nice to have: same filtering as forge test (match-contract, match-test and match-path), but applied on the target contracts
  • Nice to have: coverage integration (non-covered code will always produce surviving mutants) → either a warning, or avoid uncovered zones?
  • Nice to have: mutant equivalency and redudancy check (see BenTheKush infra):

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Signatures

  • @Simon Something at February 17, 2025
    [Exported from Notion]