reduce across crates
mode that removes dependence on libstd, libcore, (yielding #![no_std] or #![no_core] crates respectively). useful not to build libstd.
general-case reductions, oracle determining if the problem persisted E.g. “did the compiler continue to ICE”, or “does this code continue to be incorrectly rejected by the type system”
option to record non-target bugs for later analysis. Reducers as Fuzzers - J.Regehr
non-trivial oracle implementation, (“the program, when run, continues to malfunction.”)
testing environments
preprocessing
cfgmenting (for users doing incremental reductions)
unused dependency removal
demoduling
depublication
deimplification
ADT reduction
unusedification
bisect module tree
reduction by bisection-loopifying
RHS inlining
Generification (of parameters)
param elimination (cross-crate)
Defaultification
Ret-elimination
None-defaulting (for compile-time errors)
De-objectification
pnkfelix
John Regehr
David R Mclever
rust
other
Goal: find bridge between using input generated by program and generalized input, mainly, an arbitrary rust program. https://www.reddit.com/r/rust/comments/yi3y33/comment/iuicswx/
Rust-analyzer architecture and implementation: overview, videos.
rust-analyzer uses a virtual file system that is platform agnostic.
Unlike rustc, which operates over a single crate and pulls compiled binaries from crates.io, rust-analyzer works over all crates and can identify usages of a method,type,etc across the entire crate graph.
CrateGraph
: information about project structure, specifies which files are crate roots, which cfg flags are specified for each crate and what dependencies exist between the crates. This is the input (ground) state. The analyzer keeps all this input data in memory and never does any IO. Because the input data is source code, which typically measures in tens of megabytes at most, keeping everything in memory is OK.
If you think about “using rust-analyzer as a library”, the hir_crate is most likely the façade you’ll be talking to.
ide
crateThe ide crate builds on top of hir semantic model to provide high-level IDE features like completion or goto definition. It is an API Boundary. If you want to use IDE parts of rust-analyzer via LSP, custom flatbuffers-based protocol or just as a library in your text editor, this is the right API.
rust-analyzer combines snapshots and changes between snapshots. The structure that holds Snapshot A also holds a vector of all the changes until Snapshot B is taken.
syntax
An input like fn f() { 90 + 2 }
might be parsed as
FN@0..17
FN_KW@0..2 "fn"
WHITESPACE@2..3 " "
NAME@3..4
IDENT@3..4 "f"
PARAM_LIST@4..6
L_PAREN@4..5 "("
R_PAREN@5..6 ")"
WHITESPACE@6..7 " "
BLOCK_EXPR@7..17
L_CURLY@7..8 "{"
WHITESPACE@8..9 " "
BIN_EXPR@9..15
LITERAL@9..11
INT_NUMBER@9..11 "90"
WHITESPACE@11..12 " "
PLUS@12..13 "+"
WHITESPACE@13..14 " "
LITERAL@14..15
INT_NUMBER@14..15 "2"
WHITESPACE@15..16 " "
R_CURLY@16..17 "}"
Test-case reduction is the process of taking some value (a test case, usually represented as a sequence of bytes, though in property-based testing land it can be an arbitrary value) that demonstrates some property of interest (usually that it triggers a bug in some program) and turning it into a smaller and simpler test case with the same property.
Conjecture's Shrinker - MPL 2.0 license tracks the API the oracle uses to access the data and uses that to guide the reduction.
Why test case reduction?
Reducer doesn't need to be perfect.
…doing test-case reduction perfectly is basically impossible. Because of the black-box nature of the problem, almost all test-case reduction problems can only be solved perfectly with an exponential search.
All based on a form of local search … try to make it smaller and simpler. Stop when no further transformations work.
Problems with Oracles
Test Validity - when oracle's domain doesn't contain all of the values that the reducer might want to try, and its behaviour is incorrect outside of that domain.
Differences in compilation options may result in 'correct' outputs that have undefined behavior (which may be the real bug). Use miri?
Slippage - Oracle captures bugs outside of intended scope
Reduction order Use some weight/discriminator to decide which reduction path to take. Ex. lines of code where smaller is better. Hypothesis uses shortlex, aka radix (a,b,c..aa,bb,cc,..aaa,bbb,ccc…)
QuickCheck doesn't use one, and makes it harder to reason about the Oracle.
One Test to Rule them All paper.
tstl: Automated Test Generation for Python
hypothesis groups reduction and normalization as test-case reduction. reduction: reducing the length normalization: canonicalization operations that decrease the test-case lexicographically while keeping the length fixed
Caching for greedy reducers, can use bloom filter two-level cache strategy for non-greedy reducers
First look up your test case in the cache. If you get a cache miss, canonicalize it and look that up in the cache. If that’s still a cache miss, run the oracle. Either way, update the cache entry for both the original and canonicalized form
The Oracle's caller handles caching and tracking best-case. Makes the reducer an 'anytime-program', capable of being interrupted and returning the best case at the time of interruption.
Metrics hypothesis prioritizes progress (actual reduction of test cases) over walltime and resource utilization.
Reduction Passes ex. loopification, inlining, demoduling, etc What order do you run them?
author is reasonably sure of the following regarding pass-ordering
Useful to run adaptive algorithms – ones which try to turn small reductions into much larger ones. e.g. when you successfully delete something, try deleting progressively larger regions around it using an exponential probe and binary search. This can often replace O(n) oracle invocations with only O(log(n)).
In what order to run operations:
Stopping Conditions Ideal: stop when all of the reduction passes fail to make progress, progress is at a local minimum.
Others:
Reduction through Parsing if you have a formal grammar for the language of valid test cases, use the grammar to suggest reductions. remove the contents of the braces, remove the entire region, just remove the braces and leave their contents intact.
Pessimistic Concurrency Adopt a parallelisation strategy that assumes the oracle is always false
Zig Zagging The problem of zig zagging occurs when a small change unlocks a small change unlocks another small change etc. This is particularly pernicious when the small change is not one that changes the size of the example, because it can put you into exponential time failure modes.