# Parsing Rust Code Considered Harmful
<aside class="notes">
Hi everyone
Welcome to talk
Entitled Parsing Rust Code Considered Harmful
Goals:
- How can we perform static analysis of Rust code?
- The different techniques
- The strength and weaknesses of each of them
- why parsing should never be considered
</aside>
---
## Who am I?
<aside class="notes">
Let me introduce myself!
I'm Sasha Pourcelot, my pronouns are she and her.
My handle is scrabsha almost everywhere.
Studying computer science at Polytech Nice Sophia, in France.
I'll graduate in September!
Software Engineer at TrustInSoft
Static analysis tool for C code
Goal: exhaustively list all UBs in a C program
Self-role: POSIX-compliant filesystem functions
-> POSIX sorceress
Contributions to the Rust ecosystem:
- Rust compiler
- Tremor project
</aside>
Sasha Pourcelot (she/her)
@scrabsha on `{GitHub, Twitter}`
CS student at Polytech Nice Sophia (France)
Software Engineer at TrustInSoft (static analysis of C programs)
Rust ecosystem contributor
---
## Static analysis
<aside class="notes">
Let's define static analysis!
Deducing properties about code *without running it*
Use:
- detecting potential errors (type checking)
- removing useless
</aside>
Deducing properties about code *without running it*
Examples:
- type checking
- dead code paths detection
- unused attributes/variants
- breaking change detection
---
## `cargo-breaking`
Detect breaking changes in a Rust crate
- Catch breaking changes early (before release)
- Useful in CI
- Make dependencies upgrades safer
----
## Breaking change?
- Removal, renaming, ...
```rust
// before
pub fn knight_name(friend: &Friend) -> String;
// after
pub fn knight_name(friend: &Friend, mood: Mood) -> String;
```
⚠️ Breaking change: `knight_name` has a new parameter
----
## Breaking change?
- Ambiguous trait method resolution
- `#[non_exhaustive]` attribute
- `Send` and `Sync` traits
- Type size?!
And so many other very subtle things
---
## Smol disclaimer
Stability is a nice to have but not required
`RUSTC_BOOTSTRAP=1` is where the fun begins
---
## Problem
Getting information about a crate
---
## Parsing
`src/lib.rs` → AST
----
## Parser
Parse all the Rust syntax
`syn` can parse from `&str`:
```rust
pub fn parse_file(content: &str) -> Result<File>;
```
----
## Macro support?
Parse the output of `cargo-expand`
⚠️ Breaks hygiene
But OK here as we're not looking at function bodies
----
## Stability
New syntax may break the parser
Fix with `cargo update`
----
## This doesn't work.
No import resolution
No dependency support
----
## Actually, it could
Build your own path resolution algorithm
Download & parse additional dependencies from `crates.io`
----
## But that's not what you want
Rewriting `cargo` and `rustc` is not fun
And very complex (I tried)
---
## Problem<sup>2</sup>
Getting informations about a crate
No reimplementation work
Allow for dependency handling
---
## `rustc` as a library
Instead of rewriting `rustc`, let's use it as a lib
----
## Core idea
A nightly feature: `#![feature(rustc_private)]`
Gives access to `rustc`'s public API
Documented at [https://doc.rust-lang.org/nightly/nightly-rustc/](https://doc.rust-lang.org/nightly/nightly-rustc/)
----
## Usages in the wild
Clippy
(Linting is static analysis, after all)
----
## Dependency handling
Need to tell `rustc` about dependencies
- Read the `Cargo.toml` file
- Compute a dependency graph
- Compile each dependency separately
- Pass the artifact path to `rustc`
Or maybe we could use cargo :thinking_face:
----
## `cargo` integration
`RUSTC_WRAPPER` env. variable
- Fallback to system `rustc` when building a dependency
- Run actual static analysis at last invocation
----
## Interfacing with `rustc`
Hooks defined in the `Callbacks` trait
Enables:
- Altering invocation settings
- Altering (raw/macro-expanded) AST
- MIR code analysis
----
## `rustc`'s query engine
<aside class="notes">
`TyCtxt`: one query = one method
568 methods
</aside>
Goal: reducing duplicate work with memoization
`TyCtxt`: structure to perform queries against
----
## It's too complex
Very tied to `rustc`
Knowledge in compiler development needed
Very steep learning curve
----
## Stability
Constantly moving API
Your OSS project probably does not have enough bandwidth
---
## Problem<sup>3</sup>
Getting information about a crate
No reimplementation work
Allow for dependency handling
Not too tied to the compiler
---
## `rustdoc` JSON output
Freeing ourselves from the compiler internals
----
## Core idea
`rustdoc --output-format json`
Writes information about API in a JSON file
----
## Output deserialization
Datatypes defined in `rustdoc_json_types` in the Rust repository
Available on [crates.io](https://crates.io) as `rustdoc_types`
Just Use Serde(tm)
----
## Dependency handling
Integrates very well with cargo:
`cargo rustdoc -- --output-format json`
----
## Information available
Limited to items
No pre-expansion information
Can't be used for function body analysis
----
## Stability
More stable than `rustc` as a lib
Automated release process of `rustdoc_types`
---
Fin
{"metaMigratedAt":"2023-06-17T14:04:33.686Z","metaMigratedFrom":"YAML","title":"Parsing Rust Code Considered Harmful (TIS version)","breaks":true,"contributors":"[{\"id\":\"2717e1c8-29fc-49bb-ad87-7f1b89aa784a\",\"add\":5468,\"del\":0}]"}