owned this note
owned this note
Published
Linked with GitHub
---
title: "Region solving in new trait solver and polonius"
tags: T-types, 2023-meetup-BRU, minutes
date: 2023-10-10
url: https://hackmd.io/qd9Wp03cQVy06yOLnro2Kg
---
# Region solving in new trait solver and polonius
## Background
See also the [rustc-dev-guide chapter on higher-ranked region inference](https://rustc-dev-guide.rust-lang.org/borrow_check/region_inference/placeholders_and_universes.html?highlight=universe#what-is-a-universe).
### Higher-ranked trait bounds
A **higher-ranked trait bound** is something like `for<'a> T: SomeTrait<'a>`. We currently have some rudimentary structure for solving these, but it is quite simplistic. When we have to prove a goal like this, we begin by incrementing the **universe counter** in the environment, thus creating a fresh universe `U` that never existed before. Then we introduce **placeholders**, or fresh, universally quantified variables, associated with that universe `U` (we write `!X` to indicate a free placeholder variable). In the case of `T: SomeTrait<'a>`, this would mean that we have `T: SomeTrait<!a>`, where `!a` is the fresh placeholder we created for `'a`. Then we can recursively try to prove that.
Now, imagine that there is an impl of `SomeTrait` like so:
```rust
impl<T, 'a> SomeTrait<'a> for T
where
T: 'a,
{}
```
In this case, to prove `T: SomeTrait<!a>` we would have to prove `T: !a`. In the current solver, these outlives bounds are accumulated into a big list in the `InferCtxt`. We will see that some parts of the system as it is implemented today distinguish between *type-outlives* bounds (`T: 'r`) and *region-outlives* bound (`'r1: 'r2`) and handle them slightly differently, although they are conceptually the same. Either way, the trait solver currently defers proving all outlives bounds and instead propagates them outward as constraints for the region check to prove later. (Well, almost, as we will see when we get to the leak check.) So proving `T: SomeTrait<!a>` would succeed but lodge the constraint `T: !a` in the environment.
### Higher-ranked types and subtyping
Higher-ranked constraints can also arise from subtyping. Consider these two types:
* `T1 = for<'a, 'b> fn(&'a u32, &'b u32)`
* `T2 = for<'b> fn(&'static u32, &'b u32)`
In Rust, `T1 <: T2`, but why is that and how do we compute it? The algorithm for relating two such types works as follows:
* Instantiate all the higher-ranked types in T2 with placeholders variables in a fresh universe, `U`, so that we have `T2' = fn(&'static u32, &'!b u32)`.
* Instantiate all the higher-ranked types in T1 with existential variables in `U`, so that we have `T1' = fn(&'?a u32, &'?b u32)`.
* Relate `T1' <: T2'`:
* `fn(&'?a u32, &?b u32) <: fn(&'static u32, &'!b u32)`
* `&'static u32 <: &'?a u32`
* `'static : '?a`
* `&'!b u32 <: &'?b u32`
* `'!b : '?b` (note that these are distinct regions)
These outlives relations would then be accumulated into the inference context as well (and considered to hold, provisionally).
Interestingly, this means that the *subtyping code proper* can't distinguish some cases where subtyping is in fact an error. For example: `fn(&'static u32) <: for<'a> fn(&'a u32)` would give rise to a constraint `'!a : 'static`, which is in fact unsolvable. It's just not true that any arbitrary region `'!a` outlives `'static`. But the subtyping engine would just accumulate that outlives constraint and deem it true.
### Universes™ and the leak check™
So when do we figure out if the code is the trait relation holds, or if the subtyping is correct? The answer is a bit messy. =) It turns out that we do this in two ways.
The first is something called the **leak check**. You can think of it as a "quick and dirty" approximation for the region check, which will come later. The leak check detects some kinds of errors early, essentially deciding between "this set of outlives constraints are guaranteed to result in an error eventually" or "this set of outlives constraints may be solvable".
The leak check interface runs on the inference context and it takes a universe as argument:
```rust
impl InferCtxt {
fn leak_check(&self, u: Universe) -> LeakCheckResult { ... }
}
enum LeakCheckResult {
False,
MaybeTrue,
}
```
The leak check is looking for two kinds of scenarios, both of which are guaranteed to be an error:
* `'!p1 : '!p2`, where `'!p1` and `'!p2` are different placeholders in the universe `u`
* `'!p : '?e`, where `'!p` is a placeholder in the universe `u` and `'?e` is an inference variable in some parent universe of `u`
Note that these scenarios can also occur indirectly. i.e., you may have `'!p1 : '?e1` and `'?e1 : '?e2` and `'?e2 : '!p2`, which means that (by transitivity) `'!p1 : '!p2` must hold. The leak check would discover this too.
Note that `'?e: '!p` is NOT an error, even if `'?e` is in a universe that cannot name `'!p`. That is because `'?e` could always be `'static`, which is in the root universe and which outlives everything.
**The leak check does NOT look at type outlives**. So you could have a type outlives like `&'!a u32: '!b` and the leak check would pass.
#### Where the leak check is used
We run the leak check in two specific places:
* Comparing candidates in trait selection
* Coherence
In the case of trait selection, at the end of `evaluate_candidate` for a particular candidate, we will run the leak check. This detects gross mismatches. For example, consider these two impls, which appear in a number of crates, and notably in wasm-bindgen:
```rust
impl Trait for for<'a> fn(&'a u32)
impl<'a> Trait for fn(&'a u32)
```
Imagine that we are checking whether `for<'a> fn(&'a u32): Trait`. The second impl is in fact inapplicable. That's because we'd have to equate the self type `fn(&?a u32)` (i.e., `fn(&u32)` for some free lifetime `?a`) with `for<'a> fn(&'a u32)`. Equating means doing subtyping in both directions. And `fn(&?a u32) <: for<'a> fn(&'a u32)` does not hold. If you work through the algorithm, though, it succeeds, but it creates an invalid outlives relation `!a : ?a`. This doesn't create an immediate error, though, it just gets enqueued in the inference context as an outlives relation. So from the perspective of the trait solver, it would have two viable candidates. But when we run the leak check, we see that `!a : ?a` is required, which means one of those candidates is determined to fail.
The other side of the coin then is coherence. When coherence runs, it will try to unify those two impls in essentially the same way. Coherence doesn't run the full region check, so it would consider those two impls to be overlapping, except that it runs the leak check. The leak check determines that the two impls cannot both apply to the same types, and so coherence will decide that the two impls are distinct.
### Region check
OK, so we covered the leak check, but we mentioned that it's only an approximation for the full region check. How does the full region check work?
#### Simplifying "type outlives" today
The first step is that it has to simplify "type outlives" constraints. The code for this lives here:
https://github.com/rust-lang/rust/blob/5c37696f6026f91a869d51ab555cb0efae488972/compiler/rustc_infer/src/infer/outlives/obligations.rs#L216-L228
Most of the time, type outlives can be trivially converted either into simpler type outlives or into region outlives. For example, `&'a T: 'b` can be converted into `'a: 'b` and `T: 'b` (we actually leave out the second one, because we require that `T: 'a`, and it follows by transitivity).
There are two tricky cases.
**One of them is type parameters.** Proving something like `X: ?a` ultimately has to rely on something from the environment, because we don't know what `X` is. So imagine we have `fn foo<X, 'b>() where X: 'b` -- in that case, we could convert `X: ?a` into the region constraint `'b: ?a`[^1].
[^1]: while we could, we currently don't; the algorithm winds up deciding there is ambiguity because we also include an implicit bound on `X` representing just the function body (but really that `'b` is known to outlive that implicit bound also, so we could ignore it, we just don't). Also, it wouldn't help our inference algorithm, which given `A: B` flows data from B into A (and the inference variable here is on the RHS, so no data flows into it).
But what about this case? `fn foo<X, 'b, 'c>() where X: 'b + 'c`? It's not clear then what we should do. What we do is to convert to a **`Verify`**, which is a funny name but refers to "some kind of constraint that is too complex for the region checker to understand, but which it should check must hold by the end". The idea is that, often, `?a` will be constrained by other things to have some value `V`, and so we can just check at the end whether `V: 'b` or `V: 'c`. If one or both is true, great. If not, we'll error out. This isn't ideal, because maybe we could have picked a different value `V'` for `?a` that satisfied ALL the constraints, but it avoids us making arbitrary choices.
The next complex case is **alias types** (like associated types). They are similar to type parameters, but with a few more ways to prove them to be true. In this case, there are four ways to prove an outlives like `<X as Trait>::Item<Y> : 'a`:
* **Normalize**: you can normalize `<X as Trait>::Item<Y>` to another type `T` and then prove that `T: 'a` ([in-progress a-mir-formality code](https://github.com/nikomatsakis/a-mir-formality-ndm/blob/f16ef10aaea38a194d126a1698561656e4514348/crates/formality-prove/src/prove/prove_outlives.rs#L109-L123))
* **Env Bounds**: you may have something in the environment like `where <X as Trait>::Item<Y>: 'b`, in which case if you can prove that `'b: 'a`, you are all set. (this falls out from [checking the env](https://github.com/nikomatsakis/a-mir-formality-ndm/blob/f16ef10aaea38a194d126a1698561656e4514348/crates/formality-prove/src/prove/prove_wc.rs#L50-L55) in a-mir-formality)
* **Declared Item Bounds**: similarly maybe the trait declares some bounds, like `trait Foo { type Item<Y>: 'static }`. In this case, since we know that `'static: 'a`, we are all set (this falls out from [implied bounds](https://github.com/nikomatsakis/a-mir-formality-ndm/blob/f16ef10aaea38a194d126a1698561656e4514348/crates/formality-prove/src/prove/prove_wc.rs#L88-L95) in a-mir-formality, or should)
* **Fallback**: If none of the above apply, we can prove the rule by requiring that all the input components outlive, e.g. `X: 'a` and `Y: 'a` ([in-progress a-mir-formality code](https://github.com/nikomatsakis/a-mir-formality-ndm/blob/f16ef10aaea38a194d126a1698561656e4514348/crates/formality-prove/src/prove/prove_outlives.rs#L67-L80))
The final tricky case is an unresolved inference variable `?X: 'b`. This is hard because we don't know what `?X` is. But this doesn't come up in the type outlives code today, because it runs after type check has completed, and all inference variables that matter have values (or you get an error for other reasons). This case *can* come up in the proposed end-design, though.
#### Region solver
The region solver itself basically takes in two kinds of constraints:
* **Outlives bounds** like `'r1: 'r2`.
* **Verify constraints** like the ones we discussed above.
It uses the outlives bounds to find a value `V_r` for each region variable `r1`. And then it checks those values to see that the verify constraints hold.
Finding values is pretty simple, but it does have to be a bit clever around cross-universe outlives bounds. If you have `?e : !p` and `?e` is in a smaller universe than `!p`, then you can convert that to `?e : static`.
## Expected outcomes
* a plan for how we will handle region solving in short, and to a lesser degree medium, and long terms, especially around higher-ranked stuff
* leak check future compat lint
## Key discussion topics
### lint in coherence
When we use the leak check in coherence, we currently flag a lint because we hoped to remove it, but that is not possible anymore. So we can remove that lint.
### What we can do for polonius shorter term
Make a variation of leak check that doesn't have to care about inference variables (because it runs after type outlives have been transformed) and which converts from `?e: !p` to `?e: 'static` -- but we have to be careful about diagnostics. *What about verify bounds?*
The current code is there because we hoped to get rid of leak check and just move *everything* to region check.
### How lcnr and I think it should work eventually
ideally the trait solver would convert placeholder relations into root bounds by figuring out the bounds
*But*:
* leak check also has to handle uninferred type variables and other sorts of type outlives
* Example from lcnr:
* `query(for<'a> ?X::Item<'a>: 'a)`
* we have a type outlives that `?X::Item<'a>: 'a`
* try to normalize `?X::Item`
### Incompleteness in higher-ranked inference
- There is a gap between the bounds (which MUST be true), they can't express disjunction, so we have VERIFIES -- but sometimes we fail to find a solution that exists
- This doesn't seem to come up a lot but could cause us to reject programs we should accept
- IF we used regionck during coherence, because there we are only accepting the program if there is an error, this would be *UNSOUND*, unless we distinguish these cases with an "ambiguous" failure
### Asserting that we don't mix universes with generational counters
* we can decrement but it could give rise to funky bugs
* we could include parity information on the universe if we wanted (e.g., a generational counter)
## Notes from the meetup itself