Carver: Rustc Reducer

Questions

Designed for rustc developer or CI suite? Can run in background or is user waiting for completion
log failed reductions (w/ snippets)?
save diffs after every successful reduction?
stopping conditions? (user-determined, after N passes, time-limit, etc)
library or cli tool? OracleBuilder for custom Conditions

Feature List

Reductions

preprocessing
- inlining
- loopification
- comment removal
cfgmenting (for users doing incremental reductions)
unused dependency removal
demoduling
depublication
deimplification
- split impls
- parameter reductions
ADT reduction
unusedification
bisect module tree
reduction by bisection-loopifying
RHS inlining
Generification (of parameters)
param elimination (cross-crate)
Defaultification
Ret-elimination
None-defaulting (for compile-time errors)
De-objectification

Prior Art

pnkfelix
- Rust Bug Minimiztion Patterns
John Regehr
- creduce
- Reduction Articles
- Test Case Reduction for C Compiler - paper, slides
David R Mclever
rust
- rust-mre by shepmaster
- cargo-minimize by nils
other
- theft C, generate input to find obscure bugs, then reduce to minimal failing input
- DustMite Golang
- picireny Python, grammar based
- QuickCheck Haskell
- Find More Bugs in QuickCheck paper, bug avoidance during test generation not reduction
- halfempty, parallel test minimizer by Google
- tstl
- One Test to Rule Them All paper

Appendix

Rust Property-Based Testing

Goal: find bridge between using input generated by program and generalized input, mainly, an arbitrary rust program. https://www.reddit.com/r/rust/comments/yi3y33/comment/iuicswx/

Rust-Analyzer

Rust-analyzer architecture and implementation: overview, videos.

Virtual File System

rust-analyzer uses a virtual file system that is platform agnostic.

CrateGraph

Unlike rustc, which operates over a single crate and pulls compiled binaries from crates.io, rust-analyzer works over all crates and can identify usages of a method,type,etc across the entire crate graph.

CrateGraph: information about project structure, specifies which files are crate roots, which cfg flags are specified for each crate and what dependencies exist between the crates. This is the input (ground) state. The analyzer keeps all this input data in memory and never does any IO. Because the input data is source code, which typically measures in tens of megabytes at most, keeping everything in memory is OK.

If you think about “using rust-analyzer as a library”, the hir_crate is most likely the façade you’ll be talking to.

`ide` crate

The ide crate builds on top of hir semantic model to provide high-level IDE features like completion or goto definition. It is an API Boundary. If you want to use IDE parts of rust-analyzer via LSP, custom flatbuffers-based protocol or just as a library in your text editor, this is the right API.

rust-analyzer combines snapshots and changes between snapshots. The structure that holds Snapshot A also holds a vector of all the changes until Snapshot B is taken.

`syntax`

An input like fn f() { 90 + 2 } might be parsed as

FN@0..17
  FN_KW@0..2 "fn"
  WHITESPACE@2..3 " "
  NAME@3..4
    IDENT@3..4 "f"
  PARAM_LIST@4..6
    L_PAREN@4..5 "("
    R_PAREN@5..6 ")"
  WHITESPACE@6..7 " "
  BLOCK_EXPR@7..17
    L_CURLY@7..8 "{"
    WHITESPACE@8..9 " "
    BIN_EXPR@9..15
      LITERAL@9..11
        INT_NUMBER@9..11 "90"
      WHITESPACE@11..12 " "
      PLUS@12..13 "+"
      WHITESPACE@13..14 " "
      LITERAL@14..15
        INT_NUMBER@14..15 "2"
    WHITESPACE@15..16 " "
    R_CURLY@16..17 "}"

Notes on Test-Case Reduction

Test-case reduction is the process of taking some value (a test case, usually represented as a sequence of bytes, though in property-based testing land it can be an arbitrary value) that demonstrates some property of interest (usually that it triggers a bug in some program) and turning it into a smaller and simpler test case with the same property.

Conjecture's Shrinker - MPL 2.0 license tracks the API the oracle uses to access the data and uses that to guide the reduction.

Why test case reduction?

Smaller examples are nicer for everyone concerned, computer and human alike.
- faster to run
- easier to read
- don't expose IP
Helps to normalize bugs – Many bugs will become the same after reduction, reducing your triage workload
Every feature of a reduced case is in some way essential – you can’t just delete a bit and expect it to work. This significantly aids the developer in localizing the fault.
reducers can be used as fuzzers

Reducer doesn't need to be perfect.

…doing test-case reduction perfectly is basically impossible. Because of the black-box nature of the problem, almost all test-case reduction problems can only be solved perfectly with an exponential search.

All based on a form of local search … try to make it smaller and simpler. Stop when no further transformations work.

Problems with Oracles

Test Validity
Slippage

Test Validity - when oracle's domain doesn't contain all of the values that the reducer might want to try, and its behaviour is incorrect outside of that domain.

Differences in compilation options may result in 'correct' outputs that have undefined behavior (which may be the real bug). Use miri?

Slippage - Oracle captures bugs outside of intended scope

The oracle functions we use are always “An exception of this type is thrown from this line”. This is reasonably robust to slippage. (use HIR span)
Because of how we apply test-case reduction, any reduced example is one that could have been generated, so we can only reduce to invalid examples if we could have generated invalid examples in the first place. This makes it much easier for users to fix their tests by adding preconditions when this happens.
Record all successful reductions. When running tests, replay the recorded reductions first. If path diverges to different bug, pause or backtrack.

Reduction order Use some weight/discriminator to decide which reduction path to take. Ex. lines of code where smaller is better. Hypothesis uses shortlex, aka radix (a,b,c..aa,bb,cc,..aaa,bbb,ccc…)

QuickCheck doesn't use one, and makes it harder to reason about the Oracle.

One Test to Rule them All paper.

tstl: Automated Test Generation for Python

hypothesis groups reduction and normalization as test-case reduction. reduction: reducing the length normalization: canonicalization operations that decrease the test-case lexicographically while keeping the length fixed

Caching for greedy reducers, can use bloom filter two-level cache strategy for non-greedy reducers

all known good examples
bloom filter for bad examples

First look up your test case in the cache. If you get a cache miss, canonicalize it and look that up in the cache. If that’s still a cache miss, run the oracle. Either way, update the cache entry for both the original and canonicalized form

The Oracle's caller handles caching and tracking best-case. Makes the reducer an 'anytime-program', capable of being interrupted and returning the best case at the time of interruption.

Metrics hypothesis prioritizes progress (actual reduction of test cases) over walltime and resource utilization.

Reduction Passes ex. loopification, inlining, demoduling, etc What order do you run them?

author is reasonably sure of the following regarding pass-ordering

interleave passes – a pass that has just run is much less likely to make good progress the second time you run it.
try to have a pass that is primarily designed to promote cache hits run fairly early on (e.g. variable name minimization, removing comments, loopification, remove unused dependencies).
It can be worth deprioritising passes that don’t seem to do anything useful.
Postpone running +quadratic passes for as long as you possible. Idally, don't have them. Often best treated as “emergency passes” that only run when all of the fast passes get stuck.
code-reducing passes over non-reducing passes. cheaper passes before more expensive passes.

Useful to run adaptive algorithms – ones which try to turn small reductions into much larger ones. e.g. when you successfully delete something, try deleting progressively larger regions around it using an exponential probe and binary search. This can often replace O(n) oracle invocations with only O(log(n)).

In what order to run operations:

randomized works well. a good trade off between progress and cost – larger reductions are great for reducing cost but they often don’t work. Smaller reductions are more likely to work but if you only make small reductions you’ll take forever to progress.
When you do make progress you should try to avoid doing things that look too much like things you’ve already tried this pass. e.g. a common mistake is that when trying to delete elements from a sequence you start again at the beginning when you’ve deleted one. This is a great way to turn a linear time pass into a quadratic time one.

Stopping Conditions Ideal: stop when all of the reduction passes fail to make progress, progress is at a local minimum.

Others:

User determination, can stop anytime because of anytime algorithm nature
After you have performed N oracle invocations.
After some time limit has passed.
After you have successfully reduced the test case N times. (Hypothesis, N=500, QuickCheck, prevents zig-zagging)

Reduction through Parsing if you have a formal grammar for the language of valid test cases, use the grammar to suggest reductions. remove the contents of the braces, remove the entire region, just remove the braces and leave their contents intact.

Pessimistic Concurrency Adopt a parallelisation strategy that assumes the oracle is always false

Zig Zagging The problem of zig zagging occurs when a small change unlocks a small change unlocks another small change etc. This is particularly pernicious when the small change is not one that changes the size of the example, because it can put you into exponential time failure modes.