Parsing Rust Code Considered Harmful

Hi everyone

Welcome to talk

Entitled Parsing Rust Code Considered Harmful

Goals:
  - How can we perform static analysis of Rust code?
  - The different techniques
  - The strength and weaknesses of each of them
  - why parsing should never be considered

Who am I?

Let me introduce myself!

I'm Sasha Pourcelot, my pronouns are she and her.
My handle is scrabsha almost everywhere.

Studying computer science at Polytech Nice Sophia, in France.
I'll graduate in September!

Software Engineer at TrustInSoft
Static analysis tool for C code
Goal: exhaustively list all UBs in a C program
Self-role: POSIX-compliant filesystem functions
-> POSIX sorceress

Contributions to the Rust ecosystem:
  - Rust compiler
  - Tremor project

Sasha Pourcelot (she/her)

@scrabsha on {GitHub, Twitter}

CS student at Polytech Nice Sophia (France)

Software Engineer at TrustInSoft (static analysis of C programs)

Rust ecosystem contributor

Static analysis

Let's define static analysis!

Deducing properties about code *without running it*

Use:
  - detecting potential errors (type checking)
  - removing useless

Deducing properties about code without running it

Examples:

type checking
dead code paths detection
unused attributes/variants
breaking change detection

`cargo-breaking`

Detect breaking changes in a Rust crate

Catch breaking changes early (before release)
Useful in CI
Make dependencies upgrades safer

Breaking change?

Removal, renaming, …

// before
pub fn knight_name(friend: &Friend) -> String;

// after
pub fn knight_name(friend: &Friend, mood: Mood) -> String;

⚠️ Breaking change: knight_name has a new parameter

Breaking change?

Ambiguous trait method resolution
#[non_exhaustive] attribute
Send and Sync traits
Type size?!

And so many other very subtle things

Smol disclaimer

Stability is a nice to have but not required

RUSTC_BOOTSTRAP=1 is where the fun begins

Problem

Getting information about a crate

Parsing

src/lib.rs → AST

Parser

Parse all the Rust syntax

syn can parse from &str:

pub fn parse_file(content: &str) -> Result<File>;

Macro support?

Parse the output of cargo-expand

⚠️ Breaks hygiene
But OK here as we're not looking at function bodies

Stability

New syntax may break the parser

Fix with cargo update

This doesn't work.

No import resolution

No dependency support

Actually, it could

Build your own path resolution algorithm

Download & parse additional dependencies from crates.io

But that's not what you want

Rewriting cargo and rustc is not fun

And very complex (I tried)

Problem²

Getting informations about a crate

No reimplementation work

Allow for dependency handling

`rustc` as a library

Instead of rewriting rustc, let's use it as a lib

Core idea

A nightly feature: #![feature(rustc_private)]

Gives access to rustc's public API

Documented at https://doc.rust-lang.org/nightly/nightly-rustc/

Usages in the wild

Clippy

(Linting is static analysis, after all)

Dependency handling

Need to tell rustc about dependencies

Read the Cargo.toml file
Compute a dependency graph
Compile each dependency separately
Pass the artifact path to rustc

Or maybe we could use cargo

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

`cargo` integration

RUSTC_WRAPPER env. variable

Fallback to system rustc when building a dependency
Run actual static analysis at last invocation

Interfacing with `rustc`

Hooks defined in the Callbacks trait

Enables:

Altering invocation settings
Altering (raw/macro-expanded) AST
MIR code analysis

`rustc`'s query engine

Goal: reducing duplicate work with memoization

TyCtxt: structure to perform queries against

It's too complex

Very tied to rustc

Knowledge in compiler development needed

Very steep learning curve

Stability

Constantly moving API

Your OSS project probably does not have enough bandwidth

Problem³

Getting information about a crate

No reimplementation work

Allow for dependency handling

Not too tied to the compiler

`rustdoc` JSON output

Freeing ourselves from the compiler internals

Core idea

rustdoc --output-format json

Writes information about API in a JSON file

Output deserialization

Datatypes defined in rustdoc_json_types in the Rust repository

Available on crates.io as rustdoc_types

Just Use Serde™

Dependency handling

Integrates very well with cargo:

cargo rustdoc -- --output-format json

Information available

Limited to items

No pre-expansion information

Can't be used for function body analysis

Stability

More stable than rustc as a lib

Automated release process of rustdoc_types

Fin

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

Parsing Rust Code Considered Harmful

Who am I?

Static analysis

cargo-breaking

Breaking change?

Breaking change?

Smol disclaimer

Problem

Parsing

Parser

Macro support?

Stability

This doesn't work.

Actually, it could

But that's not what you want

Problem2

rustc as a library

Core idea

Usages in the wild

Dependency handling

cargo integration

Interfacing with rustc

rustc's query engine

It's too complex

Stability

Problem3

rustdoc JSON output

Core idea

Output deserialization

Dependency handling

Information available

Stability

`cargo-breaking`

Problem²

`rustc` as a library

`cargo` integration

Interfacing with `rustc`

`rustc`'s query engine

Problem³

`rustdoc` JSON output