Polonius as type-checking

## Summary I think I have a model of the borrow-checker that is semi-formal and handles the same set of programs as Polonius. The model is a type-based model: it describes a set of well-typed programs. It does concern itself with an type-inference scheme, though the borrow-checking part of it is finite so "exhaust all the possibilities" would be a complete (but very inefficient) type-inference scheme. The goal is that Polonius would be a sound type inference system for it, in the sense that given the "inner results" of Polonius, it should be easy to create type inference. ## Continuation-Passing Style The model assumes that MIR is described in continuation-passing style, and that for model simplicity all MIR statements are basic-block terminators with 1 out-edge. The model also assumes that the variables that are passed are "mathematical variables" and can't have addresses taken of. To use it with a real MIR, "alloc" and "dealloc" statements are provided. Of course, for code to be compile-able, alloc and dealloc need to be used in a stack-like discipline. Validating that is pretty easy but beyond the scope of the model. I'm using `~T` as "type of alloca to T". Every statement is typed as something of the form: ∃ 'α,'β,.. LIFETIME_BOUNDS, LOCALS_TYPES_AND_BORROWS -> ( (∃ '𝞬,'δ,... LIFETIME_BOUNDS, LOCALS_TYPES_AND_BORROWS) -> ⊥ ) -> ⊥ It can have a set of free lifetimes and free types, like normal Rust code. I'll call "∃ 'α,'β,.. LIFETIME_BOUNDS, LOCALS_TYPES_AND_BORROWS" a "program-state" ``` 'a lifetime, 'b lifetime ------- 'a: 'b lifetime-bound ------ live liveness ------ dead liveness ------ imm borrow-type ------ mut borrow-type <path> self-based-path <borrow-type> borrow-type 'lt lifetime ------- 'lt <borrow-type> <path> borrow %x local-id ty type <l> liveness ('lt <borrow-type> <path> borrow)* ------ <l> %x {('lt <borrow-type> <path>)*}: ty typed-local ('a: 'b lifetime-bound)* (<l> %x: ty typed-local)* ----------- ∃ 'α,'β,.. ('a: 'b)* (<l> %x: ty)* program-state ``` Example program states: ``` These have freevars = {'r, K, V} ∃'α 'r: 'α, live %map {'α mut *self}: &'r mut HashMap<K, V>, live %key {}: K, live %0 {}: &'α mut HashMap<K, V> dead %map {}: &'r mut HashMap<K, V>, live %ret {}: &'r mut V ``` There is a magic lifetime known as 'dead to indicate dead lifetimes. Every lifetime outlives `'dead`. The lifetime bounds should be treated as a transitive closure with respect to 'x, so `'x: 'y, 'y: 'z` is the same as `'x: 'x, 'y: 'y, 'z: 'z, 'x: 'y, 'y: 'z, 'x: 'z, 'x: 'dead, 'y: 'dead, 'z: 'dead, 'dead: 'dead'` There's another rule with `'dead`: if a type is live, then `'dead` must not outlive any lifetime in the type. So a statement has a type of form ``` A program-state B1..Bn program-state X untyped-statement with n successors ------------- X: A -> (B1 -> ⊥) -> ... -> (Bn -> ⊥) -> ⊥ typed-statement ``` The rules of Rust should allow for typing statements in a rather obvious way, e.g. ``` 'x lifetime 'y lifetime A program-state A |- %b: ~T -- type of alloca to T A |- %a: &'y mut T A |- *%b has no live borrows A contains `%a: &mut` B is formed from A by adding `'x: 'y`, `'x mut *self` borrow to `%b`, removing all borrows of `*%a`, and marking `%a` as live B relates free regions the same as A ------------- %a = &mut %b: A -> (B -> ⊥) -> ⊥ ``` The interesting part comes from combining statements to a function. This is basically a sort of "letrec". It requires that all statements to be well-typed, and for the precondition of every statement to be a subtype of the postcondition of every predecessor. So now we need to define subtyping of program states. For that, do higher-ranked relating, except that the leak-check is reverse: every bound that the LHS+typing relations prove must be provable in the RHS, rather than the other way around. The reverse leak check means that if you have `'new1: 'old1: 'new2`, then you must have `'new1: 'new2` in the RHS. This means that bounds can be added in the RHS, including `'dead: 'x` bounds to kill lifetimes. Also add the rule that free regions can't be related, even if it's not required for strict soundness, for the usual reasons. The basic case of subtyping: ``` Sup: ∃'α,'β. 'α: 'β, live %x {'α mut *self}: ~u32, live %y: &'β mut u32 --- Sub: ∃'𝞬,'δ, ['𝞬: 'δ], live %x {'𝞬 mut *self}: ~u32, live %y: &'δ mut u32 When subtyping, you subtype the borrows on %x, which requires '𝞬: 'α (since the loan must be longer when subtyping) and you subtype %y, you have &'δ mut u32 <: &'β mut u32 'β: 'δ along with 'α: 'β, we have '𝞬: 'α: 'β: 'δ, which requires '𝞬: 'δ to appear. It would also be possible to force both '𝞬 and 'δ lifetimes, or just 'δ to be dead, since 'x: 'dead holds for all 'x. ``` Interesting cases: ``` Sup1: ∃'α,'β. 'r: 'α, live %map {'α mut *self}: &'r mut HashMap<K, V>, live %key {'β *self}: ~K, live %2: Option<&'α mut V>, live %1: &'β K, live %0: &'α mut HashMap<K, V> Sub1a: dead %map {'r mut *self}, live %2: Option<&'r mut V>, dead %1: &'dead K ``` Subtyping only concerns live variables, so this is OK by default. === Liveness question === I *think* the only time we care about the live/dead distinction is when an invariant value is joined. Maybe it is better to require liveness only at point of use. ``` fn foo<'a, 'b>(x: Invariant<'a>, y: Invariant<'b>) { // should this compile? let z; if * { z = x; } else { z = y; } } ``` === Semantics === When you have a syntactical model, it's interesting whether there exist simple semantics that satisfy it. I think the following semantics work for my model: you have `create_lifetime('x)` `relate('x: 'y)` `relate('dead: 'x)` intrinsics. they are created by code in the places the "subtyping" says they do. If a lifetime has `'dead: 't1 : ... : 'x"`, it is dead. Borrows hold a lock on memory until they are killed. With that semantics, both the caller and callee promise not to create new relationships between a pair of free (/'dead/'static) lifetimes while a function is having them as a parameter. The idea of the semantics is that you could have a proof on the lines of: 1. If a program is well-typed then it is lifetime-safe (i.e. only accesses memory via valid non-dead borrows) 2. If a program is well-typed and lifetime-safe then it is memory-safe