changed 2 years ago
Published Linked with GitHub

Parsing Rust Code Considered Harmful


Who am I?

Sasha Pourcelot (she/her)

@scrabsha on {GitHub, Twitter}

CS student at Polytech Nice Sophia (France)

Software Engineer at TrustInSoft (static analysis of C programs)

Rust ecosystem contributor


Static analysis

Deducing properties about code without running it

Examples:

  • type checking
  • dead code paths detection
  • unused attributes/variants
  • breaking change detection

cargo-breaking

Detect breaking changes in a Rust crate

  • Catch breaking changes early (before release)
  • Useful in CI
  • Make dependencies upgrades safer

Breaking change?

  • Removal, renaming,
// before
pub fn knight_name(friend: &Friend) -> String;

// after
pub fn knight_name(friend: &Friend, mood: Mood) -> String;

⚠️ Breaking change: knight_name has a new parameter


Breaking change?

  • Ambiguous trait method resolution
  • #[non_exhaustive] attribute
  • Send and Sync traits
  • Type size?!

And so many other very subtle things


Smol disclaimer

Stability is a nice to have but not required

RUSTC_BOOTSTRAP=1 is where the fun begins


Problem

Getting information about a crate


Parsing

src/lib.rs → AST


Parser

Parse all the Rust syntax

syn can parse from &str:

pub fn parse_file(content: &str) -> Result<File>;

Macro support?

Parse the output of cargo-expand

⚠️ Breaks hygiene
But OK here as we're not looking at function bodies


Stability

New syntax may break the parser

Fix with cargo update


This doesn't work.

No import resolution

No dependency support


Actually, it could

Build your own path resolution algorithm

Download & parse additional dependencies from crates.io


But that's not what you want

Rewriting cargo and rustc is not fun

And very complex (I tried)


Problem2

Getting informations about a crate

No reimplementation work

Allow for dependency handling


rustc as a library

Instead of rewriting rustc, let's use it as a lib


Core idea

A nightly feature: #![feature(rustc_private)]

Gives access to rustc's public API

Documented at https://doc.rust-lang.org/nightly/nightly-rustc/


Usages in the wild

Clippy

(Linting is static analysis, after all)


Dependency handling

Need to tell rustc about dependencies

  • Read the Cargo.toml file
  • Compute a dependency graph
  • Compile each dependency separately
  • Pass the artifact path to rustc

Or maybe we could use cargo

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


cargo integration

RUSTC_WRAPPER env. variable

  • Fallback to system rustc when building a dependency
  • Run actual static analysis at last invocation

Interfacing with rustc

Hooks defined in the Callbacks trait

Enables:

  • Altering invocation settings
  • Altering (raw/macro-expanded) AST
  • MIR code analysis

rustc's query engine

Goal: reducing duplicate work with memoization

TyCtxt: structure to perform queries against


It's too complex

Very tied to rustc

Knowledge in compiler development needed

Very steep learning curve


Stability

Constantly moving API

Your OSS project probably does not have enough bandwidth


Problem3

Getting information about a crate

No reimplementation work

Allow for dependency handling

Not too tied to the compiler


rustdoc JSON output

Freeing ourselves from the compiler internals


Core idea

rustdoc --output-format json

Writes information about API in a JSON file


Output deserialization

Datatypes defined in rustdoc_json_types in the Rust repository

Available on crates.io as rustdoc_types

Just Use Serde


Dependency handling

Integrates very well with cargo:

cargo rustdoc -- --output-format json


Information available

Limited to items

No pre-expansion information

Can't be used for function body analysis


Stability

More stable than rustc as a lib

Automated release process of rustdoc_types


Fin

Select a repo