changed 4 years ago
Published Linked with GitHub

Peeking at compiler-internal data

for fun and profit


About me: oli-obk

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Note:

what do we say about us?


Where is Oli?

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


About me: nikomatsakis

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


About me: I am evil

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

I plan to mercilessly show you his private notes, at least when they're endearing.


About you


About you

Hopefully you are not here to play Rust the game

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


About you

You're here because you

  • want to analyze Rust code
  • are analyzing Rust code

This Talk

  • Why integrate with the compiler?
  • How?
  • The Future
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Oli's notes to himself
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

"oh boy, let me tell you about my PhD thesis which is just about that topic"

  • So you should ask him! But in a few weeks, once the baby is grown up and functional and the parents' lives have more or less gone back to normal.

Why does it matter how you integrate with the compiler?

  • DRY
  • don't repeat yourself
  • DO NOT repeat yourself

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Effects of DRY

  • compiler and tool are in sync
  • the compiler does parsing, type checking, etc.
  • the compiler's APIs get improved

Note:

because you didn't duplicate logic with slightly different behaviour

so you can just grab all the info from the compiler

because you give feedback on APIs


How

  • do you integrate with the compiler?
  • does the community help you?

Integrate with rustc

  1. Create a binary crate
  2. Call compiler APIs from your binary
  3. Report all the problems!!!

image


What we are going to do

Write a rustc that runs a custom lint to detect comparisons like x == x.

Then we can give a nice friendly error message!


Example error message

via GIPHY


All examples work with

rustc 1.53.0-nightly (f82664191 2021-03-21)

You can follow these examples via the hackmd of this presentation.

Also you can learn tons about rustc in the rustc-dev-guide:

https://rustc-dev-guide.rust-lang.org


get rustc as a lib

rustup component add rustc-dev llvm-tools-preview

Unstable stuff

#![feature(rustc_private)]
#![deny(rustc::internal)]
extern crate rustc_driver;
extern crate rustc_interface;
extern crate rustc_errors;
extern crate rustc_lint;

At present, the API is forever unstable, use at your own risk


Your own compiler

struct MyCallbacks;
impl rustc_driver::Callbacks for MyCallbacks {}

fn main() -> Result<(), rustc_errors::ErrorReported> {
    
    
    
}

fn main() -> Result<(), rustc_errors::ErrorReported> {
    let args: Vec<_> = std::env::args().collect();
    
    
}

fn main() -> Result<(), rustc_errors::ErrorReported> {
    let args: Vec<_> = std::env::args().collect();
    let mut my_cb = MyCallbacks;
    
}

fn main() -> Result<(), rustc_errors::ErrorReported> {
    let args: Vec<_> = std::env::args().collect();
    let mut my_cb = MyCallbacks;
    rustc_driver::RunCompiler::new(&args, &mut my_cb).run()
}

You have now reproduced rustc. You rock!

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Too bad people could already run rustc.


Callbacks

struct MyCallbacks;
impl rustc_driver::Callbacks for MyCallbacks {}

Callbacks is a trait which you can use to customize your compilation.


Curious


Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Callbacks

impl rustc_driver::Callbacks for MyCallbacks {
    fn config(&mut self, config: &mut Config) {
    
    
    
    
    
    }
}

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Note:

Config has lots of fun things to configure

We concentrate on lints, so register_lints


Callbacks

impl rustc_driver::Callbacks for MyCallbacks {
    fn config(&mut self, config: &mut Config) {
        config.register_lints = Some(Box::new(|_, ls| {
        
        
        
        }));
    }
}

Callbacks

impl rustc_driver::Callbacks for MyCallbacks {
    fn config(&mut self, config: &mut Config) {
        config.register_lints = Some(Box::new(|_, ls| {
            lint_store.register_late_pass(|| {
            
            })
        }));
    }
}

Note: we'll get into what a late pass is in a second


Callbacks

impl rustc_driver::Callbacks for MyCallbacks {
    fn config(&mut self, config: &mut Config) {
        config.register_lints = Some(Box::new(|_, ls| {
            lint_store.register_late_pass(|| {
                Box::new(MyLint)
            })
        }));
    }
}

Note: we haven'd defined MyLint yet


Custom lints

struct MyLint;

impl rustc_lint::LintPass for MyLint {
    fn name(&self) -> &'static str {
        "The best lint"
    }
}

impl<'tcx> rustc_lint::LateLintPass<'tcx> for MyLint {}

Note: interesting part: LateLintPass


A (very) brief tour of rustc's IRs

graph LR

.rs --> AST;
AST --Macro expansion--> AST;
AST --> HIR;
HIR --Type checking--> HIR;
HIR --> MIR;
MIR --Optimization--> MIR;
MIR --> LLVM;
LLVM --Dear god who knows--> LLVM;
LLVM --> .exe;
  • AST, pre-expansion: Just what the user wrote
  • AST, post-expansion: Macros expanded
  • HIR: High-level IR, an AST but with names resolved etc
    • Type-checking stores the info in "side tables"
  • MIR: kind of like JVM byte-code for Rust
  • LLVM: very low-level

LintPass(es)

lint type trait name datastructures
pre-expansion EarlyLintPass AST
post-expansion EarlyLintPass AST
type-checked LateLintPass HIR

Note: types are cool, so always use LateLintPass if you can


impl<'tcx> rustc_lint::LateLintPass<'tcx> for MyLint {
    fn check_expr(
        &mut self,
        cx: &rustc_lint::LateContext<'tcx>,
        expr: &rustc_hir::Expr<'tcx>,
    ) {
        // Static analysis goes here
    }
}

Note: called on all expressions, cannot cancel recursion


An actual lint

if let rustc_hir::ExprKind::Binary(op, l, r) = expr {
    if l.kind == r.kind {
        // Complain loudly
    }
}

Not the code you really want, but gives you the idea:

  • == compares too strictly
  • maybe want to consider types
  • needs diagnostics check the rustc-dev-guide

Check the clippy version for something more realistic.


Config

  • file_loader
  • register_lints
  • override_queries
  • make_codegen_backend

Note:
the interesting parts


file_loader

  • completely work on a VFS
  • manipulate files before passing them to rustc

22 second introduction to queries

digraph {
    mir_built -> compute_hir
    layout_of -> typeck
    optimized_mir -> analyzed_mir -> mir_built
    const_eval -> mir_for_ctfe -> analyzed_mir
    codegen -> optimized_mir -> layout_of
    codegen -> const_eval -> layout_of
    late_lints -> compute_hir
    typeck -> compute_hir
}

override_queries

  • access to original query
  • insert new query
    • modify input
    • modify output
    • completely replace

wow


Examples of queries to override

  • modify layout computation: layout_of
  • inject MIR optimizations: optimized_mir
  • access MIR before borrowck: mir_built
  • inject additional items for codegen

Integrations

  • driver
    • binary crate
    • uses compiler as a library
  • codegen-backends (plugin)

miri

  • a driver
  • heavily manipulates command line args
  • uses after-analysis callback
    • finds __start symbol
    • starts evaluation at that symbol
  • uses rustc_mir::interpret
    • also used by CTFE
    • generic MIR interpretation system

Note:

most argument manipulation is for cargo-miri integration


cranelift

  • a codegen backend
  • calls compiler queries
  • independent codegen framework

Note:

you are not forced to use rustc's
codgen_ssa framework


The community helps

  • library-ification
  • new abstractions for your use cases
    • compiler APIs that break less often
    • even if their internals break a lot
  • integrating your feedback
    • tell us about your compiler usage
    • upstream parts of your project

Note:

abstractions aren't required to be useful for the compiler, just consistent with the rest of the APIs


Library-ification

  • split parts of the compiler out into crates.io
    • allow them to be reused for many purposes
  • example crates:
    • chalk handles trait solving
    • polonius handles borrow checker
  • rust-analyzer uses chalk, for example
  • working towards a generic definition of Rust types that includes full Rust static analysis
    • you can help!

Note:

I skimmed over this a bit like it wasn't important


Summary

  • rustc is a library
  • we want to make it more of a library
  • incrementally create your own compiler
  • mentoring available for contributing
Select a repo