# Peeking at compiler-internal data for fun and profit --- ## About me: oli-obk ![oli](https://avatars.githubusercontent.com/u/332036?s=460&v=4) Note: what do we say about us? ---- ## Where is Oli? ![super cute dog dad](https://images.fatherly.com/wp-content/uploads/2017/04/dogsbaddads-header.jpg?q=65&enable=upscale&w=1200) ---- ## About me: nikomatsakis ![](https://avatars.githubusercontent.com/u/155238?s=400&u=c09aaff33aa53ea99359e53bef06aa5058ac8d15&v=4) ---- ## About me: I am evil ![Dr Evil](https://media.giphy.com/media/xl5QdxfNonh3q/source.gif) I plan to mercilessly show you his private notes, at least when they're endearing. --- ## About you ---- ## About you Hopefully you are not here to play Rust the game... ![](https://i.imgur.com/KM4ghcI.jpg) ---- ## About you You're here because you * want to analyze Rust code * are analyzing Rust code --- ## This Talk * Why integrate with the compiler? * How? * The Future :tm: ---- ## Oli's notes to himself :purple_heart: "oh boy, let me tell you about my PhD thesis which is just about that topic" * So you should ask him! But in a few weeks, once the baby is grown up and functional and the parents' lives have more or less gone back to normal. --- # Why does it matter how you integrate with the compiler? * DRY <!-- .element: class="fragment" --> * don't repeat yourself <!-- .element: class="fragment" --> * DO NOT repeat yourself <!-- .element: class="fragment" --> :face_palm: <!-- .element: class="fragment" --> ---- ## Effects of DRY * compiler and tool are in sync * the compiler does parsing, type checking, etc. * the compiler's APIs get improved Note: because you didn't duplicate logic with slightly different behaviour so you can just grab all the info from the compiler because you give feedback on APIs --- ## How... * do you integrate with the compiler? * does the community help you? --- ## Integrate with rustc 1. Create a binary crate 2. Call compiler APIs from your binary 3. Report all the problems!!! ---- ![image](https://pbs.twimg.com/media/Bs13i6LCcAAvwCf.jpg) --- ## What we are going to do Write a rustc that runs a custom lint to detect comparisons like `x == x`. Then we can give a nice friendly error message! ---- ## Example error message <iframe src="https://giphy.com/embed/Vi0lBaOIVF8atUPmOd" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/sunnyfxx-iasip-its-always-sunny-in-philadelphia-Vi0lBaOIVF8atUPmOd">via GIPHY</a></p> --- All examples work with ``` rustc 1.53.0-nightly (f82664191 2021-03-21) ``` You can follow these examples via [the hackmd](https://hackmd.io/RiztubvfT4eOk4-4nM8Y7Q?both) of this presentation. Also you can learn tons about rustc in the rustc-dev-guide: https://rustc-dev-guide.rust-lang.org ---- ## get rustc as a lib ``` rustup component add rustc-dev llvm-tools-preview ``` ---- ## Unstable stuff ```rust #![feature(rustc_private)] #![deny(rustc::internal)] extern crate rustc_driver; extern crate rustc_interface; extern crate rustc_errors; extern crate rustc_lint; ``` At present, the API is forever unstable, use at your own risk --- ## Your own compiler ```rust struct MyCallbacks; impl rustc_driver::Callbacks for MyCallbacks {} ``` ---- ```rust fn main() -> Result<(), rustc_errors::ErrorReported> { } ``` ---- ```rust fn main() -> Result<(), rustc_errors::ErrorReported> { let args: Vec<_> = std::env::args().collect(); } ``` ---- ```rust fn main() -> Result<(), rustc_errors::ErrorReported> { let args: Vec<_> = std::env::args().collect(); let mut my_cb = MyCallbacks; } ``` ---- ```rust fn main() -> Result<(), rustc_errors::ErrorReported> { let args: Vec<_> = std::env::args().collect(); let mut my_cb = MyCallbacks; rustc_driver::RunCompiler::new(&args, &mut my_cb).run() } ``` You have now reproduced rustc. You rock! :punch: Too bad people could already run rustc. <!-- .element: class="fragment" --> --- ## Callbacks ```rust struct MyCallbacks; impl rustc_driver::Callbacks for MyCallbacks {} ``` Callbacks is a trait which you can use to customize your compilation. ---- ![Curious](https://media.giphy.com/media/h81fYY4QWj4hlEuqiN/source.gif) ---- ![](https://i.imgur.com/Zj4eGTR.png) ---- ## Callbacks ```rust impl rustc_driver::Callbacks for MyCallbacks { fn config(&mut self, config: &mut Config) { } } ``` ---- ![](https://i.imgur.com/kNIKmzo.png) Note: Config has lots of fun things to configure We concentrate on lints, so `register_lints` ---- ## Callbacks ```rust impl rustc_driver::Callbacks for MyCallbacks { fn config(&mut self, config: &mut Config) { config.register_lints = Some(Box::new(|_, ls| { })); } } ``` ---- ## Callbacks ```rust impl rustc_driver::Callbacks for MyCallbacks { fn config(&mut self, config: &mut Config) { config.register_lints = Some(Box::new(|_, ls| { lint_store.register_late_pass(|| { }) })); } } ``` Note: we'll get into what a late pass is in a second ---- ## Callbacks ```rust impl rustc_driver::Callbacks for MyCallbacks { fn config(&mut self, config: &mut Config) { config.register_lints = Some(Box::new(|_, ls| { lint_store.register_late_pass(|| { Box::new(MyLint) }) })); } } ``` Note: we haven'd defined MyLint yet --- ## Custom lints ```rust struct MyLint; impl rustc_lint::LintPass for MyLint { fn name(&self) -> &'static str { "The best lint" } } impl<'tcx> rustc_lint::LateLintPass<'tcx> for MyLint {} ``` Note: interesting part: LateLintPass ---- ## A (very) brief tour of rustc's IRs ```mermaid graph LR .rs --> AST; AST --Macro expansion--> AST; AST --> HIR; HIR --Type checking--> HIR; HIR --> MIR; MIR --Optimization--> MIR; MIR --> LLVM; LLVM --Dear god who knows--> LLVM; LLVM --> .exe; ``` * AST, pre-expansion: Just what the user wrote * AST, post-expansion: Macros expanded * HIR: High-level IR, an AST but with names resolved etc * Type-checking stores the info in "side tables" * MIR: kind of like JVM byte-code for Rust * LLVM: very low-level ---- ## LintPass(es) | lint type | trait name | datastructures | | -------------- | ------------------- | -------- | | pre-expansion | `EarlyLintPass` | AST | | post-expansion | `EarlyLintPass` | AST | | type-checked | `LateLintPass` | HIR | Note: types are cool, so always use `LateLintPass` if you can ---- ```rust impl<'tcx> rustc_lint::LateLintPass<'tcx> for MyLint { fn check_expr( &mut self, cx: &rustc_lint::LateContext<'tcx>, expr: &rustc_hir::Expr<'tcx>, ) { // Static analysis goes here } } ``` Note: called on all expressions, cannot cancel recursion ---- ## An actual lint ```rust if let rustc_hir::ExprKind::Binary(op, l, r) = expr { if l.kind == r.kind { // Complain loudly } } ``` Not the code you really want, but gives you the idea: * `==` compares too strictly * maybe want to consider types * needs diagnostics -- check the rustc-dev-guide Check the clippy version for something more realistic. --- ## Config * `file_loader` * `register_lints` * `override_queries` * `make_codegen_backend` Note: the interesting parts ---- ## `file_loader` * completely work on a VFS * manipulate files before passing them to rustc ---- ## 22 second introduction to queries ```graphviz digraph { mir_built -> compute_hir layout_of -> typeck optimized_mir -> analyzed_mir -> mir_built const_eval -> mir_for_ctfe -> analyzed_mir codegen -> optimized_mir -> layout_of codegen -> const_eval -> layout_of late_lints -> compute_hir typeck -> compute_hir } ``` ---- ## `override_queries` * access to original query * insert new query * modify input * modify output * completely replace ---- ![wow](https://media.giphy.com/media/l0ExsURGF4fthsLJe/source.gif) ---- # Examples of queries to override * modify layout computation: `layout_of` * inject MIR optimizations: `optimized_mir` * access MIR before borrowck: `mir_built` * inject additional items for codegen --- ## Integrations * driver * binary crate * uses compiler as a library * codegen-backends (plugin) ---- ## miri * a driver * heavily manipulates command line args * uses after-analysis callback * finds `__start` symbol * starts evaluation at that symbol * uses `rustc_mir::interpret` * also used by CTFE * generic MIR interpretation system Note: most argument manipulation is for cargo-miri integration ---- ## cranelift * a codegen backend * calls compiler queries * independent codegen framework Note: you are not forced to use rustc's codgen_ssa framework --- ## The community helps * [library-ification](https://smallcultfollowing.com/babysteps/blog/2020/04/09/libraryification/) * new abstractions for your use cases * compiler APIs that break less often * even if their internals break a lot * integrating your feedback * tell us about your compiler usage * upstream parts of your project Note: abstractions aren't required to be useful for the compiler, just consistent with the rest of the APIs ---- ## Library-ification * split parts of the compiler out into crates.io * allow them to be reused for many purposes * example crates: * chalk -- handles trait solving * polonius -- handles borrow checker * rust-analyzer uses chalk, for example * working towards a generic definition of Rust types that includes full Rust static analysis * you can help! Note: I skimmed over this a bit like it wasn't important ---- ## Summary * rustc is a library * we want to make it more of a library * incrementally create your own compiler * mentoring available for contributing
{"metaMigratedAt":"2023-06-15T20:56:24.590Z","metaMigratedFrom":"YAML","title":"Peeking at compiler-internal data","breaks":true,"contributors":"[{\"id\":\"ce357653-6779-4c50-b873-5c2ef0815935\",\"add\":9018,\"del\":1332},{\"id\":\"27d8f2ac-a5dc-4ebe-8c31-cc45bcd8447e\",\"add\":3267,\"del\":903}]"}
    2235 views