# Parsing Rust Code Considered Harmful <aside class="notes"> Hi everyone Welcome to talk Entitled Parsing Rust Code Considered Harmful Goals: - How can we perform static analysis of Rust code? - The different techniques - The strength and weaknesses of each of them - why parsing should never be considered </aside> --- ## Who am I? <aside class="notes"> Let me introduce myself! I'm Sasha Pourcelot, my pronouns are she and her. My handle is scrabsha almost everywhere. Studying computer science at Polytech Nice Sophia, in France. I'll graduate in September! Software Engineer at TrustInSoft Static analysis tool for C code Goal: exhaustively list all UBs in a C program Self-role: POSIX-compliant filesystem functions -> POSIX sorceress Contributions to the Rust ecosystem: - Rust compiler - Tremor project </aside> Sasha Pourcelot (she/her) @scrabsha on `{GitHub, Twitter}` CS student at Polytech Nice Sophia (France) Software Engineer at TrustInSoft (static analysis of C programs) Rust ecosystem contributor --- ## Static analysis <aside class="notes"> Let's define static analysis! Deducing properties about code *without running it* Use: - detecting potential errors (type checking) - removing useless </aside> Deducing properties about code *without running it* Examples: - type checking - dead code paths detection - unused attributes/variants - breaking change detection --- ## `cargo-breaking` Detect breaking changes in a Rust crate - Catch breaking changes early (before release) - Useful in CI - Make dependencies upgrades safer ---- ## Breaking change? - Removal, renaming, ... ```rust // before pub fn knight_name(friend: &Friend) -> String; // after pub fn knight_name(friend: &Friend, mood: Mood) -> String; ``` ⚠️ Breaking change: `knight_name` has a new parameter ---- ## Breaking change? - Ambiguous trait method resolution - `#[non_exhaustive]` attribute - `Send` and `Sync` traits - Type size?! And so many other very subtle things --- ## Smol disclaimer Stability is a nice to have but not required `RUSTC_BOOTSTRAP=1` is where the fun begins --- ## Problem Getting information about a crate --- ## Parsing `src/lib.rs` → AST ---- ## Parser Parse all the Rust syntax `syn` can parse from `&str`: ```rust pub fn parse_file(content: &str) -> Result<File>; ``` ---- ## Macro support? Parse the output of `cargo-expand` ⚠️ Breaks hygiene But OK here as we're not looking at function bodies ---- ## Stability New syntax may break the parser Fix with `cargo update` ---- ## This doesn't work. No import resolution No dependency support ---- ## Actually, it could Build your own path resolution algorithm Download & parse additional dependencies from `crates.io` ---- ## But that's not what you want Rewriting `cargo` and `rustc` is not fun And very complex (I tried) --- ## Problem<sup>2</sup> Getting informations about a crate No reimplementation work Allow for dependency handling --- ## `rustc` as a library Instead of rewriting `rustc`, let's use it as a lib ---- ## Core idea A nightly feature: `#![feature(rustc_private)]` Gives access to `rustc`'s public API Documented at [https://doc.rust-lang.org/nightly/nightly-rustc/](https://doc.rust-lang.org/nightly/nightly-rustc/) ---- ## Usages in the wild Clippy (Linting is static analysis, after all) ---- ## Dependency handling Need to tell `rustc` about dependencies - Read the `Cargo.toml` file - Compute a dependency graph - Compile each dependency separately - Pass the artifact path to `rustc` Or maybe we could use cargo :thinking_face: ---- ## `cargo` integration `RUSTC_WRAPPER` env. variable - Fallback to system `rustc` when building a dependency - Run actual static analysis at last invocation ---- ## Interfacing with `rustc` Hooks defined in the `Callbacks` trait Enables: - Altering invocation settings - Altering (raw/macro-expanded) AST - MIR code analysis ---- ## `rustc`'s query engine <aside class="notes"> `TyCtxt`: one query = one method 568 methods </aside> Goal: reducing duplicate work with memoization `TyCtxt`: structure to perform queries against ---- ## It's too complex Very tied to `rustc` Knowledge in compiler development needed Very steep learning curve ---- ## Stability Constantly moving API Your OSS project probably does not have enough bandwidth --- ## Problem<sup>3</sup> Getting information about a crate No reimplementation work Allow for dependency handling Not too tied to the compiler --- ## `rustdoc` JSON output Freeing ourselves from the compiler internals ---- ## Core idea `rustdoc --output-format json` Writes information about API in a JSON file ---- ## Output deserialization Datatypes defined in `rustdoc_json_types` in the Rust repository Available on [crates.io](https://crates.io) as `rustdoc_types` Just Use Serde(tm) ---- ## Dependency handling Integrates very well with cargo: `cargo rustdoc -- --output-format json` ---- ## Information available Limited to items No pre-expansion information Can't be used for function body analysis ---- ## Stability More stable than `rustc` as a lib Automated release process of `rustdoc_types` --- Fin
{"metaMigratedAt":"2023-06-17T14:04:33.686Z","metaMigratedFrom":"YAML","title":"Parsing Rust Code Considered Harmful (TIS version)","breaks":true,"contributors":"[{\"id\":\"2717e1c8-29fc-49bb-ad87-7f1b89aa784a\",\"add\":5468,\"del\":0}]"}
    295 views