owned this note
owned this note
Published
Linked with GitHub
---
tags: presentation
---
# Peeking at compiler-internal data
for fun and profit
---
## About me: oli-obk
![oli](https://avatars.githubusercontent.com/u/332036?s=460&v=4)
Note:
what do we say about us?
----
## Where is Oli?
![super cute dog dad](https://images.fatherly.com/wp-content/uploads/2017/04/dogsbaddads-header.jpg?q=65&enable=upscale&w=1200)
----
## About me: nikomatsakis
![](https://avatars.githubusercontent.com/u/155238?s=400&u=c09aaff33aa53ea99359e53bef06aa5058ac8d15&v=4)
----
## About me: I am evil
![Dr Evil](https://media.giphy.com/media/xl5QdxfNonh3q/source.gif)
I plan to mercilessly show you his private notes, at least when they're endearing.
---
## About you
----
## About you
Hopefully you are not here to play Rust the game...
![](https://i.imgur.com/KM4ghcI.jpg)
----
## About you
You're here because you
* want to analyze Rust code
* are analyzing Rust code
---
## This Talk
* Why integrate with the compiler?
* How?
* The Future :tm:
----
## Oli's notes to himself :purple_heart:
"oh boy, let me tell you about my PhD thesis which is just about that topic"
* So you should ask him! But in a few weeks, once the baby is grown up and functional and the parents' lives have more or less gone back to normal.
---
# Why does it matter how you integrate with the compiler?
* DRY <!-- .element: class="fragment" -->
* don't repeat yourself <!-- .element: class="fragment" -->
* DO NOT repeat yourself <!-- .element: class="fragment" -->
:face_palm: <!-- .element: class="fragment" -->
----
## Effects of DRY
* compiler and tool are in sync
* the compiler does parsing, type checking, etc.
* the compiler's APIs get improved
Note:
because you didn't duplicate logic with slightly different behaviour
so you can just grab all the info from the compiler
because you give feedback on APIs
---
## How...
* do you integrate with the compiler?
* does the community help you?
---
## Integrate with rustc
1. Create a binary crate
2. Call compiler APIs from your binary
3. Report all the problems!!!
----
![image](https://pbs.twimg.com/media/Bs13i6LCcAAvwCf.jpg)
---
## What we are going to do
Write a rustc that runs a custom lint to detect comparisons like `x == x`.
Then we can give a nice friendly error message!
----
## Example error message
<iframe src="https://giphy.com/embed/Vi0lBaOIVF8atUPmOd" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/sunnyfxx-iasip-its-always-sunny-in-philadelphia-Vi0lBaOIVF8atUPmOd">via GIPHY</a></p>
---
All examples work with
```
rustc 1.53.0-nightly (f82664191 2021-03-21)
```
You can follow these examples via [the hackmd](https://hackmd.io/RiztubvfT4eOk4-4nM8Y7Q?both) of this presentation.
Also you can learn tons about rustc in the rustc-dev-guide:
https://rustc-dev-guide.rust-lang.org
----
## get rustc as a lib
```
rustup component add rustc-dev llvm-tools-preview
```
----
## Unstable stuff
```rust
#![feature(rustc_private)]
#![deny(rustc::internal)]
extern crate rustc_driver;
extern crate rustc_interface;
extern crate rustc_errors;
extern crate rustc_lint;
```
At present, the API is forever unstable, use at your own risk
---
## Your own compiler
```rust
struct MyCallbacks;
impl rustc_driver::Callbacks for MyCallbacks {}
```
----
```rust
fn main() -> Result<(), rustc_errors::ErrorReported> {
}
```
----
```rust
fn main() -> Result<(), rustc_errors::ErrorReported> {
let args: Vec<_> = std::env::args().collect();
}
```
----
```rust
fn main() -> Result<(), rustc_errors::ErrorReported> {
let args: Vec<_> = std::env::args().collect();
let mut my_cb = MyCallbacks;
}
```
----
```rust
fn main() -> Result<(), rustc_errors::ErrorReported> {
let args: Vec<_> = std::env::args().collect();
let mut my_cb = MyCallbacks;
rustc_driver::RunCompiler::new(&args, &mut my_cb).run()
}
```
You have now reproduced rustc. You rock! :punch:
Too bad people could already run rustc.
<!-- .element: class="fragment" -->
---
## Callbacks
```rust
struct MyCallbacks;
impl rustc_driver::Callbacks for MyCallbacks {}
```
Callbacks is a trait which you can use to customize your compilation.
----
![Curious](https://media.giphy.com/media/h81fYY4QWj4hlEuqiN/source.gif)
----
![](https://i.imgur.com/Zj4eGTR.png)
----
## Callbacks
```rust
impl rustc_driver::Callbacks for MyCallbacks {
fn config(&mut self, config: &mut Config) {
}
}
```
----
![](https://i.imgur.com/kNIKmzo.png)
Note:
Config has lots of fun things to configure
We concentrate on lints, so `register_lints`
----
## Callbacks
```rust
impl rustc_driver::Callbacks for MyCallbacks {
fn config(&mut self, config: &mut Config) {
config.register_lints = Some(Box::new(|_, ls| {
}));
}
}
```
----
## Callbacks
```rust
impl rustc_driver::Callbacks for MyCallbacks {
fn config(&mut self, config: &mut Config) {
config.register_lints = Some(Box::new(|_, ls| {
lint_store.register_late_pass(|| {
})
}));
}
}
```
Note: we'll get into what a late pass is in a second
----
## Callbacks
```rust
impl rustc_driver::Callbacks for MyCallbacks {
fn config(&mut self, config: &mut Config) {
config.register_lints = Some(Box::new(|_, ls| {
lint_store.register_late_pass(|| {
Box::new(MyLint)
})
}));
}
}
```
Note: we haven'd defined MyLint yet
---
## Custom lints
```rust
struct MyLint;
impl rustc_lint::LintPass for MyLint {
fn name(&self) -> &'static str {
"The best lint"
}
}
impl<'tcx> rustc_lint::LateLintPass<'tcx> for MyLint {}
```
Note: interesting part: LateLintPass
----
## A (very) brief tour of rustc's IRs
```mermaid
graph LR
.rs --> AST;
AST --Macro expansion--> AST;
AST --> HIR;
HIR --Type checking--> HIR;
HIR --> MIR;
MIR --Optimization--> MIR;
MIR --> LLVM;
LLVM --Dear god who knows--> LLVM;
LLVM --> .exe;
```
* AST, pre-expansion: Just what the user wrote
* AST, post-expansion: Macros expanded
* HIR: High-level IR, an AST but with names resolved etc
* Type-checking stores the info in "side tables"
* MIR: kind of like JVM byte-code for Rust
* LLVM: very low-level
----
## LintPass(es)
| lint type | trait name | datastructures |
| -------------- | ------------------- | -------- |
| pre-expansion | `EarlyLintPass` | AST |
| post-expansion | `EarlyLintPass` | AST |
| type-checked | `LateLintPass` | HIR |
Note: types are cool, so always use `LateLintPass` if you can
----
```rust
impl<'tcx> rustc_lint::LateLintPass<'tcx> for MyLint {
fn check_expr(
&mut self,
cx: &rustc_lint::LateContext<'tcx>,
expr: &rustc_hir::Expr<'tcx>,
) {
// Static analysis goes here
}
}
```
Note: called on all expressions, cannot cancel recursion
----
## An actual lint
```rust
if let rustc_hir::ExprKind::Binary(op, l, r) = expr {
if l.kind == r.kind {
// Complain loudly
}
}
```
Not the code you really want, but gives you the idea:
* `==` compares too strictly
* maybe want to consider types
* needs diagnostics -- check the rustc-dev-guide
Check the clippy version for something more realistic.
---
## Config
* `file_loader`
* `register_lints`
* `override_queries`
* `make_codegen_backend`
Note:
the interesting parts
----
## `file_loader`
* completely work on a VFS
* manipulate files before passing them to rustc
----
## 22 second introduction to queries
```graphviz
digraph {
mir_built -> compute_hir
layout_of -> typeck
optimized_mir -> analyzed_mir -> mir_built
const_eval -> mir_for_ctfe -> analyzed_mir
codegen -> optimized_mir -> layout_of
codegen -> const_eval -> layout_of
late_lints -> compute_hir
typeck -> compute_hir
}
```
----
## `override_queries`
* access to original query
* insert new query
* modify input
* modify output
* completely replace
----
![wow](https://media.giphy.com/media/l0ExsURGF4fthsLJe/source.gif)
----
# Examples of queries to override
* modify layout computation: `layout_of`
* inject MIR optimizations: `optimized_mir`
* access MIR before borrowck: `mir_built`
* inject additional items for codegen
---
## Integrations
* driver
* binary crate
* uses compiler as a library
* codegen-backends (plugin)
----
## miri
* a driver
* heavily manipulates command line args
* uses after-analysis callback
* finds `__start` symbol
* starts evaluation at that symbol
* uses `rustc_mir::interpret`
* also used by CTFE
* generic MIR interpretation system
Note:
most argument manipulation is for cargo-miri integration
----
## cranelift
* a codegen backend
* calls compiler queries
* independent codegen framework
Note:
you are not forced to use rustc's
codgen_ssa framework
---
## The community helps
* [library-ification](https://smallcultfollowing.com/babysteps/blog/2020/04/09/libraryification/)
* new abstractions for your use cases
* compiler APIs that break less often
* even if their internals break a lot
* integrating your feedback
* tell us about your compiler usage
* upstream parts of your project
Note:
abstractions aren't required to be useful for the compiler, just consistent with the rest of the APIs
----
## Library-ification
* split parts of the compiler out into crates.io
* allow them to be reused for many purposes
* example crates:
* chalk -- handles trait solving
* polonius -- handles borrow checker
* rust-analyzer uses chalk, for example
* working towards a generic definition of Rust types that includes full Rust static analysis
* you can help!
Note:
I skimmed over this a bit like it wasn't important
----
## Summary
* rustc is a library
* we want to make it more of a library
* incrementally create your own compiler
* mentoring available for contributing