owned this note
owned this note
Published
Linked with GitHub
---
tags: reference
---
# Polonius Rules
## Big picture
* Initialization analysis
* computes which Paths are initialized at each cfg node
* reports errors for things that are used when not necessarily initialized
* Liveness analysis
* computes which variables may be used in the future at each cfg node, and how
* draws upon initialization to make these results more precise
* this in turn suggests which *origins* are 'live' (meaning that they may be dereferenced)
* Loan analysis
* computes which loans may be referenced by each origin
* combined with liveness results, this suggests which *loans* are live
* paths borrowed by a live loan cannot be modified
## Naming conventions
* `foo_at(..., Node)` -- indicates a relation that occurs at a particular point
* `foo_on_entry(..., Node)` -- indicates a relation that holds on entry to a given node, typically computed over all paths through the graph
* `foo_on_exit(..., Node)` -- indicates a relation that holds on exit from a given a given node, typically computed over all paths through the graph
* we try to also use distinct names for "instantaneous facts" from "computed relations" (e.g., assigned vs initialized)
* we try to embed the types of the parameters in name (modulo nodes, which use the "at" vs "on" convention)
* e.g., `path_assigned_at` rather than `assigned_at`
## Control-flow graph
This section contains inputs that are common to all the other analyses.
### Inputs
```prolog
.decl cfg_edge(SourceNode:node, TargetNode:node)
.input cfg_edge
```
### Relations
Enumerates all nodes (note that this approach implies that a single node graph is essentially not a thing).
```prolog
.decl cfg_node(P:point)
cfg_node(P) :- cfg_edge(P, _).
cfg_node(P) :- cfg_edge(_, P).
```
## Initialization analysis
### Inputs
```rust
// Relates the path to the variable it begins with;
// so `path_begins_with_var(a.b, a)` is true, and so
// forth.
.decl path_begins_with_var(Path:path, Var:var)
// True if Child is a direct "subpath" of Parent.
// e.g. `child(a, a.b)`` would be true, but not
// `child(a, a.b.c)`.
.decl parent_path(Parent:path, Child:path)
// Indicates that `Path` is assigned a value
// at the point `Node`.
//
// Important: This includes a tuple for each
// argument to the function, indicating that it is
// initialized on entry.
.decl path_assigned_at_base(Parent:path, Node:node)
// Indicates that the value in `Path` is moved
// at the point `Node`.
//
// Important: This includes a tuple for each
// local variable in the MIR, indicating that it is
// "moved" (uninitialized) on entry.
.decl path_moved_at_base(Path:path, Node:node)
// Indicates that the value in `Path` is accessed
// at the point `Node`.
.decl path_accessed_at_base(Path:path, Node:node)
```
### Transitive paths
#### the ancestor relation is transitive form of child
```prolog
.decl ancestor_path(Path:path, Path:path)
ancestor_path(Parent, Child) :-
parent_path(Parent, Child).
ancestor_path(Parent, Grandchild) :-
ancestor_path(Parent, Child),
parent_path(Child, Grandchild).
```
#### initialized at
If you initialize the path `a`, you also initialize `a.b`
```prolog
.decl path_assigned_at(Path:path, Node:node)
path_assigned_at(Path, Node) :-
path_assigned_at_base(Path, Node).
path_assigned_at(ChildPath, Node) :-
path_assigned_at(Path, Node),
ancestor_path(Path, ChildPath).
```
#### moved
If you move the path `a`, you also move `a.b`
```prolog
.decl path_moved_at(Path:path, Node:node)
path_moved_at(Path, Node) :-
path_moved_at_base(Path, Node).
path_moved_at(ChildPath, Node) :-
path_moved_at(Path, Node),
ancestor_path(Path, ChildPath).
```
#### accessed at
If you access the path `a`, you also access `a.b`
```prolog
.decl path_accessed_at(Path:path, Node:node)
path_accessed_at(Path, Node) :-
accessed_at_base(Path, Node).
path_accessed_at(ChildPath, Node) :-
accessed_at(Path, Node),
ancestor(Path, ChildPath).
```
#### computing "maybe" initialized
Here we compute the set of paths that *may* contain a value on exit from each given node in the CFG. This is used later as part of the liveness analysis. In particular, if a value has been moved, then its drop is a no-op.
This is not used to compute move errors -- it would be too "optimistic", since it only computes if a value *may* be initialized. See the next section on computing *uninitialization*.
```prolog
path_maybe_initialized_on_exit(Path, Node) :-
path_assigned_at(Path, Node).
path_maybe_initialized_on_exit(Path, TargetNode) :-
path_maybe_initialized_on_exit(Path, SourceNode),
cfg_edge(SourceNode, TargetNode),
!moved_out_at(Path, TargetNode).
```
We also compute which **variables** may be initialized (or at least partly initialized). Drops for variables that are not even partly initialized are known to be a no-op.
```prolog
var_maybe_partly_initialized_on_exit(Var, Node) :-
path_maybe_initialized_on_exit(Path, Node),
path_begins_with_var(Path, Var).
```
### computing uninitialization
Here we compute the set of paths that are maybe *uninitialized* on exit from a node. Naturally, it would be illegal to access a path that is maybe uninitialized.
We compute "maybe uninitialized" because it is easier than computing "must be initialized" (though they are equivalent), since the latter requires intersection, which is not available in "core datalog". It may make sense -- as an optimization -- to try and convert to intersection, although it is debatable which will result in more tuples overall.
```prolog
path_maybe_uninitialized_on_exit(Path, Node) :-
path_moved_at(Path, Node).
path_maybe_uninitialized_on_exit(Path, TargetNode) :-
path_maybe_uninitialized_on_exit(Path, SourceNode),
cfg_edge(SourceNode, TargetNode),
!path_assigned_at(Path, TargetNode).
```
### computing move errors
```prolog
move_error(Path, TargetNode) :-
path_maybe_uninitialized_on_exit(Path, SourceNode),
cfg_edge(SourceNode, TargetNode),
path_accessed_at(Path, TargetNode).
```
## Liveness analysis
The role of the liveness computation is to figure out, for each cfg node, which variables may be accessed at some point in the future. We also distinguish between variables that may be accessed in general and those that may only be dropped. This is because a "full access" may potentially dereference any reference found in the variable, whereas a drop is more limited in its effects.
One interesting wrinkle around drops is that we also need to consider the initialization state of each variable. This is because `Drop` statements can be added for variables which are never initialized, or whose values have been moved. Such statements are considered no-ops in MIR.
### Inputs
```rust
// Variable is used at the given CFG node
.decl var_used_at(Variable:var, Node:node)
// Variable is defined (overwritten) at the given CFG node
.decl var_defined_at(Variable:var, Node:node)
// Variable is dropped at this cfg node
.decl var_dropped_at(Variable:var, Node:node)
// References with the given origin may be
// dereferenced when the variable is used.
//
// In rustc, we generate this whenever the
// type of the variable includes the given
// origin.
.decl use_of_var_derefs_origin(Variable:var, Origin:origin)
// References with the given origin may be
// dereferenced when the variable is dropped
//
// In rustc, we generate this by examining the type
// and taking into account various
// unstable attributes. It is always a subset
// of `use_of_var_derefs_origin`.
.decl drop_of_var_derefs_origin(Variable:var, Origin:origin)
```
### Variables that are live on entry
```prolog
.decl var_live_on_entry(Var:var, Node:node)
var_live_on_entry(Var, Node) :-
var_used_at(Var, Node).
var_live_on_entry(Var, SourceNode) :-
var_live(Var, TargetNode),
cfg_edge(SourceNode, TargetNode),
!var_defined_at(Var, SourceNode).
```
### Variables that are "drop live" on entry
The initial rule is that, when a variable is dropped, that makes it drop-live -- unless we know that the variable is fully uninitialized, in which case the drop is a no-op.
```prolog
var_drop_live_on_entry(Var, TargetNode) :-
var_dropped_at(Var, TargetNode),
cfg_edge(SourceNode, TargetNode),
var_maybe_partly_initialized_on_exit(Var, SourceNode).
var_drop_live_on_entry(Var, SourceNode) :-
var_drop_live_on_entry(Var, TargetNode),
cfg_edge(SourceNode, TargetNode),
!var_defined_at(Var, SourceNode),
var_maybe_partly_initialized_on_exit(Var, SourceNode).
```
**Optimization:** In rustc, we compute drop-live only up to the point where something becomes "use-live". We could do the same here by adding some `!` checks against `var_live_on_entry`, though it would require stratification in the datalog (not a problem).
### Origins that are "live"
An origin is live at the node N if some reference with that origin may be dereferenced in the future.
```prolog
origin_live_on_entry(Origin, Node) :-
var_live_on_entry(Var, Node),
use_of_var_derefs_origin(Var, Origin).
origin_live_on_entry(Origin, Node) :-
var_drop_live_on_entry(Var, Node),
drop_of_var_derefs_origin(Var, Origin).
```
## Loan analysis
### Inputs
```rust
// Indicates that the given loan `Loan` was "issued" at
// the given node `Node`, creating a reference with the
// origin `Origin`.
.decl loan_issued_at(Origin:origin, Loan:loan, Node:node)
.input loan_issued_at
// Indicates that the path borrowed by the loan `Loan` has changed
// in some way that the loan no longer needs to be tracked.
// (In particular, mutations to the path that was borrowed
// no longer invalidate the loan.)
.decl loan_killed_at(Loan:loan, Node:node)
.input loan_killed_at
// Indicates that the loan is "invalidated" by some
// action tha takes place at the given node; if any
// origin that references this loan is live, that is
// an error.
.decl loan_invalidated_at(Loan:loan, Node:node)
.input loan_invalidated_at
// Indicates that `O1 <= O2` -- i.e., the set of loans in
// O1 are a subset of those in O2.
.decl subset_base(Origin1:origin, Origin2:origin, N:node)
.input subset_base
// Declares a "placeholder origin" and loan. These are
// the named lifetimes that appear on function declarations
// and the like (e.g., the `'a` in `fn foo<'a>(...)`).
.decl placeholder(O:origin, L:loan)
.input placeholder
// Declares a known subset relation between two
// placeholder origins. For example, `fn foo<'a, 'b: 'a>()`
// would have a relation to `'b: 'a`. This is not transitive.
.decl known_placeholder_subset(Origin1:origin, Origin2:origin)
.input known_placeholder_subset
```
## Known contains / Transitive Known placeholder subset
Computes the placeholder loans that a given placeholder origin is known to contain. This is derived from the `known_placeholder_subset` relation. (This is currently used in the `LocationInsensitive` variant to compute illegal subset relationship errors)
```prolog
.decl placeholder_known_to_contain(O:origin, L:loan)
placeholder_known_to_contain(Origin, Loan) :-
placeholder(Origin, Loan).
placeholder_known_to_contain(Origin2, Loan1) :-
placeholder_known_to_contain(Origin1, Loan1),
known_placeholder_subset(Origin1, Origin2).
```
## Transitive known placeholder subset
Similarly to the `placeholder_known_to_contain` relation above, illegal subset relationships errors need a fully closed over `known_placeholder_subset` relation.
```prolog
.decl known_placeholder_subset(Origin1:origin, Origin2:origin)
known_placeholder_subset(Origin1, Origin2) :-
known_placeholder_subset_base(Origin1, Origin2).
known_placeholder_subset(Origin1, Origin3) :-
known_placeholder_subset(Origin1, Origin2),
known_placeholder_subset_base(Origin2, Origin3).
```
## Liveness
```prolog
.decl loan_live_on_entry(Loan:loan, Node:node)
loan_live_on_entry(Loan, Node) :-
origin_contains_on_entry(Origin, Loan, Node),
(origin_live_on_entry(Origin, Node); placeholder(Origin, _)).
```
## Subset and contains (the heart of the borrow check)
```prolog
.decl subset(Origin1:origin, Origin2:origin, Node:node)
subset(Origin1, Origin2, Node) :-
subset_base(Origin1, Origin2, Node).
subset(Origin1, Origin3, Node) :-
subset(Origin1, Origin2, Node),
subset(Origin2, Origin3, Node).
subset(Origin1, Origin2, TargetNode) :-
subset(Origin1, Origin2, SourceNode),
cfg_edge(SourceNode, TargetNode),
(origin_live_on_entry(Origin1, TargetNode); placeholder(Origin1, _)),
(origin_live_on_entry(Origin2, TargetNode); placeholder(Origin2, _)).
```
```prolog
.decl origin_contains_loan_on_entry(Origin:origin, Loan:loan, Node:node)
origin_contains_loan_on_entry(Origin, Loan, Node) :-
loan_issued_at(Origin, Loan, Node).
origin_contains_loan_on_entry(Origin2, Loan, Node) :-
origin_contains_loan_on_entry(Origin1, Loan, Node),
subset(Origin1, Origin2, Node).
origin_contains_loan_on_entry(Origin, Loan, TargetNode) :-
origin_contains_loan_on_entry(Origin, Loan, SourceNode),
!loan_killed_at(Loan, SourceNode),
cfg_edge(SourceNode, TargetNode),
(origin_live_on_entry(Origin, TargetNode); placeholder(Origin, _)).
origin_contains_loan_on_entry(Origin, Loan, Node) :-
cfg_node(Node),
placeholder(Origin, Loan).
```
```prolog
loan_live_at(loan, point) :-
origin_contains_loan_on_entry(origin, loan, point),
origin_live_on_entry(origin, point).
```
## Error reporting
```prolog
.decl errors(L:loan, N:node)
errors(Loan, Node) :-
loan_invalidated_at(Loan, Node),
loan_live_at(Loan, Node).
```
```prolog
.decl subset_errors(Origin1:origin, Origin2:origin, Node:node)
subset_error(Origin1, Origin2, Node) :-
subset_placeholder(Origin1, Origin2, Node),
placeholder_origin(Origin2),
!known_placeholder_subset(Origin1, Origin2).
```
### Compiler notes on generating the placeholder loans support
We need to do a few things:
* Generate the `placeholder` facts
* for this we are going to need to synthesize some "placeholder loan" strings
* since these have to be integers...
* I think we maybe want to generate a `Loan(I + X)` where `I` is the universal region index and `X` is the number of loans created by ordinary means
* Process the errors and report them
* it's key that the errors are expressed in terms of origins
* we should be able to "hook in" to the existing error reporting, but it's going to take a bit of hacking
* we need to thread that part of the output into the `RegionInferenceContext::solve` function
* and from there to `check_universal_regions`
* the function `check_universal_region_relation` needs to be slightly refactored
* currently it begins with a ["is known subset? return" check](https://github.com/rust-lang/rust/blob/a0d40f8bdfcc3c28355467973f97fd4c45ac5876/src/librustc_mir/borrow_check/nll/region_infer/mod.rs#L1444-L1447) and then handles the ["handle error" case](https://github.com/rust-lang/rust/blob/a0d40f8bdfcc3c28355467973f97fd4c45ac5876/src/librustc_mir/borrow_check/nll/region_infer/mod.rs#L1449-L1505)
* we want to factor our the "handle error" code to a helper `report_or_propagate_universal_region_error`
* then we can replace [`check_universal_regions`] with a loop that goes over the `subset_errors`
* only in polonius mode, of course
* and invokes the `report_or_propagate_universal_region_error` helper we extraced above for each one
* note that [`check_universal_region`] is just not needed when polonius is used
* one open question:
* placeholder errors?
* I think that with the chalk approach I'd like to take here, these would get reported earlier in any case
* but for now we *could* keep the [part of the iteration in `check_universal_regions` that goes over the placeholders and dump them out](https://github.com/rust-lang/rust/blob/a0d40f8bdfcc3c28355467973f97fd4c45ac5876/src/librustc_mir/borrow_check/nll/region_infer/mod.rs#L1347-L1349)
so when we're done we'll have replaced `check_universal_regions` with a `check_polonius_subset_errors` or something which looks like
```rust
for subset_error in errors {
self.report_or_propagate_universal_region_error(...)
}
for (fr, fr_definition) in self.definitions.iter_enumerated() {
match fr_definition.origin {
NLLRegionVariableOrigin::FreeRegion => { /* handled by polonius */ }
NLLRegionVariableOrigin::Placeholder(placeholder) => {
self.check_bound_universal_region(infcx, body, mir_def_id, fr, placeholder);
}
NLLRegionVariableOrigin::Existential { .. } => { /* nothing to check here */ }
}
}
```
[`check_universal_regions`]: https://github.com/rust-lang/rust/blob/a0d40f8bdfcc3c28355467973f97fd4c45ac5876/src/librustc_mir/borrow_check/nll/region_infer/mod.rs#L1319-L1328
[`check_universal_region`]: https://github.com/rust-lang/rust/blob/a0d40f8bdfcc3c28355467973f97fd4c45ac5876/src/librustc_mir/borrow_check/nll/region_infer/mod.rs#L1366-L1376