owned this note
owned this note
Published
Linked with GitHub
# RFC 2229 Plans
## Goal
The goal is to refactor closures so that they capture [HIR places] and not just variables. For the initial version, this behavior will be gated, and we will always capture the most fine-grained places that we can. This will result in better borrow check results, but less efficient code, with some changes to runtime behavior (specifically, drop code may run at different times). This version would not be expected to stabilize as is, but rather to serve as the basis for evaluation about the runtime cost and the amount of potentially affected code -- but we could conceivably stabilize it over an edition, or we might make further changes.
[HIR places]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_typeck/expr_use_visitor/struct.Place.html
## High-level view of the changes required
* We will rewrite code that currently iterates over the "upvars" of a closure to iterate over a set of "places" produced by type-check.
* The factorings below are steps in this direciton.
* We will adjust the existing [upvar inference] for closures. Today, when it sees (say) a read of `a.b.c`, it tracks that down to a shared borrow of `a`. But the new mode inference will be coallescing into a minimal set of [HIR places] that are accessed (e.g., if `a.b.c` and `a.b` are both read, then we would just capture `a.b`).
* We will have to rewrite the HAIR and MIR lowering so that `a.b.c` is rewritten to use the minimal place instead of always being relative to an upvar `a`.
* The generated MIR that used to capture variables like `a` when constructing a [closure] will now capture the place (e.g., `a.b.c`)
* The only changes required in borrow check (afaik) are related to error reporting
[closure]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.AggregateKind.html#variant.Closure
[upvar inference]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_typeck/check/upvar/index.html
## Future optimizations
* After we complete the above work, we may wish to have some kind of "optimized form" of closures that don't capture paths like `a.b` and `a.c` individually, but instead capture just `a` at runtime (even though the borrow check and type system would view the capture as capturing `a.b` and `a.c`). I'm not sure yet how to approach this: my inclination is to write a MIR optimization that runs after borrow check and reworks the cin some grody way to capture a raw pointer (`a`) and use downcasts and things in the closure MIR. But we'll have to figure out various complications here, e.g. the closure layout, and it may not even turn out to be that important in practice, so I'm inclined to defer that.
* We will definitely have to figure out how to "transition" to the new drop behavior -- my preference would be to do this over an edition. It should be easy enough to detect when we are now capturing a place instead of a root variable and "suspicious dtors" would run at different times. We can warn and suggest a change like inserting `let x = x;` in the closure to preserve the old drop behavior. The main unknown is how many such things would be produced.
## Notes
### Upvar tuple refactoring
Delay the [instantiation of the "upvar tuple"](https://github.com/rust-lang/rust/issues/57482)
### Rename existing upvars query
There is a [query] [`upvars`] that returns the list of "upvars" (captured variables) used in a closure. Currently, the list of variables used corresponds one to one with each thing that the closure captures, but the whole point of this change is to break that association, so that a query like `|| (x.foo, x.bar)` might capture *just* `x.foo` and `x.bar`, and not capture `x`.
[query]: https://rustc-dev-guide.rust-lang.org/query.html
[`upvars`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.upvars
Much of the code that uses [`upvars`] therefore will want to be refactored, because it really wants to iterate over *all things captured* and not the *upvars* that the closure references.
We should introduce a terminology split, actually:
* "mentioned upvars" will be the list of variable names that are mentioned by a closure (i.e, `x`). This list is produced by the name resolution code.
* "captures" or "captured paths" will refer to the paths captured by the closure (e.g., `x.foo`). This list is produced by the type check.
We should rename the [`upvars`] query, therefore, to `upvars_mentioned`. This would make a nice, simply PR.
Similarly, we should rename the [`upvar_list`] field of `TypeckTable` to `closure_captures`.
[`upvar_list`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TypeckTables.html#structfield.upvar_list
The [`upvar_capture_map`] can keep its name, as I imagine it will be removed eventually, but for now it really does refer to *upvars* and not *captures*.
[`upvar_capture_map`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TypeckTables.html#structfield.upvar_capture_map
### Refactor uses of upvars query
Now that the rename is done, we want to rewrite things currently using the [`upvars`] [query] (updated version of [#60205](https://github.com/rust-lang/rust/issues/60205)) to instead iterate over the *captures* from the query that is produced by type-check (the [`upvar_list`] field we renamed in the previous step).
For now, the [`upvars`] query and the [`upvar_list`] field ultimately give access to a `FxIndexMap<HirId, Upvar>` map. Eventually this will change, with the [`upvar_list`] having a more general structure that expresses captured paths, not just the mentioned upvars. But we'll do that later.
(You can see the return type of the query [here](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/query/queries/struct.upvars.html), in the associated query struct.)
This means that we can change calls like
```rust
tcx.upvars(def_id)
```
to
```rust
tcx.typeck_tables(def_id).upvars_list[def_id]
```
We may not want to change every call, just those that occurs in code that comes "after" typecheck.
| Change? | Directory | Why |
| --- | --- | --- |
| :heavy_check_mark: | `librustc_middle/mir` | MIR related code comes after typeck |
| :confused: | `pretty.rs` | this pretty printing code could run at different times, it should probably be changed, but not entirely clear to what yet |
| :heavy_check_mark: | librustc_mir/... | MIR related code comes after typeck |
| :heavy_check_mark: | librustc_mir_build/... | MIR related code comes after typeck |
| :heavy_check_mark: | librusc_passes/liveness.rs | liveness analyses come after typeck |
| :confused: | librustc_typeck/expr_user_visitor | executes during typeck, so the tables are accessed differently, but still should be updated...I think |
| :confused: | librustc_typeck/mem_categorization | similar to the above |
| :x: | librustc_typeck/check/closure.rs | this code will be changing but it will need access to the raw mentioned upvars eventually |
| :x: | librustc_typeck/check/upvar.rs | this code will likely have the job of computing the upvar-list |
### Refactor the capture list to be a path
* Refactor the [UpvarListMap] so that the values are [HIR places], currently only capturing local variables
* now we can express capturing `*v` or `v[0]` or whatever
* have to change a bit of the code above
[HIR places]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_typeck/expr_use_visitor/struct.Place.html
[UpvarListMap]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.UpvarListMap.html