Compute generator saved locals on MIR, [#101692](https://github.com/rust-lang/rust/pull/101692)

# Compute generator saved locals on MIR, [#101692](https://github.com/rust-lang/rust/pull/101692) ## Context The inner state of a generator contains so-called "witness" types. Those types influence which auto types a generator implements: a generator may not be `Send` if its inner state contains a non-`Send` type. Generators are currently type-checked as follows. The generator has type `Generator(parent_substs, resume ty, yield ty, return ty, witness, upvars types)`. During most of typeck, `witness` is a type variable. Once most inference is done, the set of captured types is computed on HIR. The computation of the `captured types` needs to know which variables and temporaries are live at `yield` point, and which are not. Finally, `witness` is unified with a `GeneratorWitness(tuple of captured types)`, and typeck finalizes trait fulfillment. However, HIR is not very suited for such dataflow computations. This PR proposes to move the computation of captured types on MIR. Such a computation already exists on MIR as part of generator lowering. ## Proposal This PR proposes to drop the HIR-based computation, and only keep the MIR one. This refactor is gated behind `-Zdrop-tracking-mir` flag. The general idea is to skip checking any obligation that depends on a witness type during typeck, save them into typeck results, and then later verify that these obligations actually hold. Since the actual witness types are not available during typeck, we introduce a `GeneratorWitnessMIR(generator def id, parent substs)` to act as a placeholder. Trait selection will fetch the required information from a `mir_generator_info(def_id)` query when encountering it. More precisely: 1. During type-checking, do not unify the `witness` type variable. This stalls all the obligations that depend on it until the end of type-checking. 2. Perform type-checking, and call `select_obligations_where_possible`. 3. Unify `witness` with `GeneratorWitnessMIR`. The newly unstalled obligations are the one that bind witness types, or depend on those. As we do not have enough information to check them at this point, we save them into typeck results. Within the typeck context, these obligations are marked as succesful to proceed. As this unification happens at the end of typeck, normal inference will not attempt to fetch the witness types from MIR, avoiding a cycle. (Ambiguity error reporting may, I'm not sure.) 4. When checking traits that involve `GeneratorWitnessMIR(def_id, substs)`, fetch the MIR-computed witness types using a `mir_generator_witness(def_id)` query. As MIR only gives us erased lifetimes, we replace them with locally bound lifetimes, like what happens today with regular `GeneratorWitness`. 5. Finally, verify that the obligations we stashed in step 3 against the actually computed types. This is done in the new `check_generator_obligations` query. If any obligation was wrongly marked as fulfilled in step 2, it must be reported here. ## Open questions / Issues * We mark all generators as `needs_drop` to avoid cycles during MIR drop elaboration. @lcnr: alternatively we'd need to provide "should this treat generators as opaque" as input to `needs_drop`. I think it's fine to always require generators to be drop generators :shrug: we should check that it is consistent with `dropck_outlives` though, see the 17.07 meeting :grin: * This creates additional normalization steps, as occurences of `GeneratorWitnessMIR` need to be normalized and contain opaque types. @lcnr: this is completely fine. Having to normalize here is alright and my only worry would be a slight performance regression. This regression should be outweight by not having to recurse into the generator witness while folding/relating. * The diagnostics for captured types don't show where they are used/dropped. * We did not attempt to support chalk. @lcnr: that's fine ### Handling of fake borrows In match guards, MIR lowering introduces extra locals for borrow-checking purposes. For instance, consider ```rust match foo() { x if { bar(x); yield } => {} _ => {} } ``` Creates MIR that looks like: ```rust _1: T _2: &T bb0: _1 = foo() // matched-on local _2 = &_1 // match guard local <choose branch depending on _1> bb1: _3 = bar(_1) // compute guard value Yield -> bb2 bb2: FakeRead(ForMatch, _2) // ensure that _1 has not changed since choosing branch <proceed to arm depending on _3> ``` The local `_2` has type `&T` and is live at the yield point, while this local does not exist in the original code. This local `_2` only exists to prevent mutation of the original `_1`. It has no runtime counterpart. Proposed resolution: ignore fake borrow locals. ### Handling of statics Static variables appear in MIR as as raw pointers. ```rust= static A: T = /**/; let x = A; ``` Gives the following MIR: ```rust _1: *const T _2: T bb0: _1 = StaticRef(A) _2 = *_1 ``` The presence of a raw pointer in generator witnesses may result in wrongly infering `!Send` and `!Sync`. As they are `static`s, the exact point where they are borrowed does not matter, and they can be moved later than the `Yield` terminator. Proposed resolution: ignore raw pointers that come from statics. ## Meeting Topic/Question Queue ### how do I add a topic/question? pnkfelix: Like this! ### lcnr vibe why this is absolutely fine Having to prove all goals by the end of hir typeck is a bit unnecessary. You could imagine this to be: at the end of hir typeck inference needs to have made enough progress for `writeback` to succeed, but any goals which are not required for that can also be proven later. Doing so after MIR building is therefore fine. ### Simplifying to avoid phase intercommunication nikomatsakis: I like the general idea. I used to be opposed, but I've decided this is clearly right, and we should resolve the issues. However, I'm not convinced of the particulars. I would like to avoid "deferred" type check obliations from typeck that have be loaded and used from MIR, as that implies that MIR is not standalone. I'm also not sure if they are truly needed. Put another way, I'd like us to be clear about what inputs *MIR* needs to successfully type check. Although in writing this I'm wondering about the cycle scenario and how I expected to resolve that. =) Maybe I thought those cannot happen, which may be true. By "cycle scenario" what I mean is two async functions calling one another. I think for that to type check requires. @lcnr: i think the dependency graph is/should be the following: ```mermaid graph LR hirtypeck --"pending obligations"-->provehiroblig hirtypeck --"writeback results"-->mirbuild mirbuild --"generatorlocals"-->provehiroblig mirbuild --"provides mir body" -->mirborrowck provehiroblig --"disables if any errors"-->mirborrowck ``` ### potential digression: replacing erased lifetimes with locally-bound lifetimes pnkfelix: the text says: "As MIR only gives us erased lifetimes, we replace them with locally bound lifetimes, like what happens today with regular GeneratorWitness." Am I wrong to read that and get a little worried? How much risk, if any, (with respect to unsoundness or weird interactions with other code, including unsafe code), is there within injecting locally-bound lifetimes as replacements for erased lifetimes? (I understand the text says that this is already what happens today, and therefore one might argue that if there's any risk, we're already incurring it. Still, I'd like to better understand what that risk is, if any.) @lcnr: it does not pose soundness risk (apart from more general bugs with higher ranked bounds), bound lifetimes force us to prove "this holds for all lifetimes", which is more restrictive than what we'd prove for any specific local lifetime. It does cause annoying issues however, see https://github.com/rust-lang/rust/issues/110338/https://github.com/rust-lang/rust/issues/60658