owned this note changed 4 years ago
Linked with GitHub

Pattern matching

Fake Reads

MIR Build

Generates MIR for a match expression.

The MIR that we generate for a match looks like this.

[ 0. Pre-match ]
       |
[ 1. Evaluate Scrutinee (expression being matched on) ]
[ (fake read of scrutinee) ]
       |
[ 2. Decision tree -- check discriminants ] <--------+
       |                                             |
       | (once a specific arm is chosen)             |
       |                                             |
[pre_binding_block]                           [otherwise_block]
       |                                             |
[ 3. Create "guard bindings" for arm ]               |
[ (create fake borrows) ]                            |
       |                                             |
[ 4. Execute guard code ]                            |
[ (read fake borrows) ] --(guard is false)-----------+
       |
       | (guard results in true)
       |
[ 5. Create real bindings and execute arm ]
       |
[ Exit match ]

Borrow checker

Borrow checker handles these

fn visit_statement_before_primary_effect( &mut self, flow_state: &Flows<'cx, 'tcx>, stmt: &'cx Statement<'tcx>, location: Location, ) { // .... match &stmt.kind { StatementKind::Assign(box (lhs, ref rhs)) => { // .... } StatementKind::FakeRead(_, box ref place) => { // Read for match doesn't access any memory and is used to // assert that a place is safe and live. So we don't have to // do any checks here. // // FIXME: Remove check that the place is initialized. This is // needed for now because matches don't have never patterns yet. // So this is the only place we prevent // let x: !; // match x {}; // from compiling. self.check_if_path_or_subpath_is_moved( location, InitializationRequiringAction::Use, (place.as_ref(), span), flow_state, ); }

Example

fn main() {
    let t = (String::new(), String::new());

    match t {
        _ => {}
    };
}

http://csclub.uwaterloo.ca/~a52arora/test_suite/mir_dump/match_mir.main.-.nll.0.mir
Generated MIR contains:

    bb2: {
        _1 = (move _2, move _3);         // scope 0 at match_mir.rs:2:13: 2:43
        StorageDead(_3);                 // scope 0 at match_mir.rs:2:42: 2:43
        StorageDead(_2);                 // scope 0 at match_mir.rs:2:42: 2:43
        FakeRead(ForLet, _1);            // scope 0 at match_mir.rs:2:9: 2:10
        StorageLive(_4);                 // scope 1 at match_mir.rs:4:5: 6:6
        FakeRead(ForMatchedPlace, _1);   // scope 1 at match_mir.rs:4:11: 4:12
        _4 = const ();                   // scope 1 at match_mir.rs:5:14: 5:16
        StorageDead(_4);                 // scope 1 at match_mir.rs:6:6: 6:7
        _0 = const ();                   // scope 0 at match_mir.rs:1:11: 7:2
        drop(_1) -> [return: bb3, unwind: bb5]; // scope 0 at match_mir.rs:7:1: 7:2
    }

We gave two fake reads here one for checking for patterns in the let statement and the for the scrutinee in the match block.

Possible solution

Personally I feel this solution might be a little far from being ideal because:

  • We are special casing closures too much and it can result in harder debugging down the line
  • Several assumptions/assertions that we have when the feature is enabled will change
  • I'm also not sure if some of this is actual feasible to do, I'm thinking maybe not.

My thoughts

Assumption

Can we do something like

  • _1 = FakeRead(); _2 = CreateClosure(move _1) or
  • _2 = CreateClosure(FakeRead(_2))

Essentially we introduce a list of "mentioned" upvars/captures as Fake operands into the closure struct.

How do we do the capture analysis side of this?

  • ExprUseVisitor has a fake_read method, that is called on the pattern/match block scrutinee

    • Clippy just treats it as an imm borrow or can special case use it, might results in new lints :)
// Rvalue for the RHS, no projections
let x = SomeEnum::A(0, 0, 0);

// place: Place { base_ty: SomeEnum, base: Local(HirId { owner: DefId(0:11 ~ match_mir_2[317d]::main), local_id: 9 }), projections: [Projection { ty: i32, kind: Field(0, 0) }] } }
// place: Place { base_ty: SomeEnum, base: Local(HirId { owner: DefId(0:11 ~ match_mir_2[317d]::main), local_id: 9 }), projections: [Projection { ty: i32, kind: Field(1, 0) }] } }
match x {
    // no bindings created so we don't do anything here
    SomeEnum::A(_a, _b, _) => {}
    _ => {}
}

let y = Some(SomeEnum::A(0, 0, 0));

match y {
    Some(_) => {}
    _ => {}
}

let z = SomeEnum::A(0, 0, 0);

match z {
    // no bindings created so we don't do anything here
    SomeEnum::A(..) => {}
    _ => {}
}

let w = SomeEnum::A(0, 0, 0);

match w {
    // no bindings created so we don't do anything here
    SomeEnum::A(..) => {}
    // no bindings created so we don't do anything here
    SomeEnum::B => {}
    // place: Place { base_ty: SomeEnum, base: Local(HirId { owner: DefId(0:14 ~ match_mir_2[317d]::main), local_id: 87 }), projections: [Projection { ty: i32, kind: Field(0, 2) }]
    SomeEnum::C(_a) => {}
}

Some concerns:

  • Broken assumption: MultiVariant enums are captured completely. (MIR build would need to handle Downcast projection). Must ensure downcast is followed by a field
  • If we have an enum like where none of the variants contain any data of their own then we would not capture it.
  • The amount of information we store for captures will grow, consider an enum that contains 10+ variants, 2-3 fields each varirant (eg: ExprKind/TyKind in rustc) we now have 30 entries instead of 1.

Performance

Perf results after partial impl of use place builder everywhere

This is a bit interesting because we still would've converted the interned slice into a vector which would've required memory allocation and free etc.

We would also need to modify CFG to use PlaceBuilder which might affect perf all around the mir_build.

2021-01-15

Example that causes fake reads:

let x: !; match x { }

Right fix is to introduce ! patterns, where match x { } is shorthand for this:

let x: !; match x { ! }

Example with closures, which we also don't want to compile:

let x: !; let c = || match x { };
  • could do something like this:
    • if the match would introduce a fake-read in Mir build, but the place is not a real pattern, it is ignored during mir build
    • separately, when a closure is constructed, insert fake read for every upvar mentioned (in addition to its captured)

problems with this exact formulation:

  • imprecise: match x.y { } it would still do a FakeRead of x

  • improved version:

    • expr use visitor records "let/match scrutinees" ("fake reads")
    • must also record reads for patterns with discriminant tests etc (e.g., Foo::A)
    • when you match a place that is not captured, you do not introduce a fake read
      • match x { _ => { } }, x would not be captured, and therefore the match does not introduce a fake read
        • try to convert to a real place, if it fails, do not introduce a fake read
      • but match x { y => { } } would introduce a fake read, because x is captured
    • when a closure is constructed, insert fake read for every "let/match scrutinee" place
      • (or at least let/match scrutinees that are not otherwise captured)

Aman's premeeting thoughts 2021-01-22

For any Place inside a closure that starts of a variable that was defined outside the closure, we have a place that looks like:

    * UpvarRef(variable)
        * SomeProjection
            * AnotherProjection
                * YetAnotherProjection
    

Now when we need to use this place inside a closure, we convert it to a capture index that represents an ancestor path to this Place being built.

Eg: If a closure captured x.0.0.0 and we see a use of x.0.0.0.2 we will convert it into _1.<capture_index>.2

In the case of

let (val, _) = x.0.0;
  • MIR will try introduce a "FakeRead" for x.0.0 and since this isn't really captured the compiler ICEs.
  • Last meeting we discussed to store fake reads of Upvars Mentioned and then then use these fake reads to introduce fake reads on top of them.

How do we represent this in terms of data structures?

  • Essentially we can either add these as capture indecies starting at index n. Where n is the size of the actual captures.

  • We want to prioritize the set of real captured before we try looking into the fake captures

  • If we do convert place builder into a Place made for fake capture we should maintain the PlaceBuilder such that when more projections are applied, we can at some point get a Place starting off an actual capture.

    • PlaceBuilder::place_for_fake_read() -> (Place, PlaceBuilder)

foo

let x = (.., ..); || let (a, _) = x; // x.0 is captured, but FakeRead(x, ForLet) is introduced in MIR build

what this would compile to in mir

goal

// creator
let x = (.., ..);
FakeRead(x);
Closure { x0: x.0 }

// closure
a = self.x0

today (which fails)

// creator
let x = (.., ..);
FakeRead(x);
Closure { x0: x.0 }

// closure
fakeRead(self.x) // but there is no self.x
a = self.x0

how do we get to the goal

  • upvar analysis:
    • accesses: [x.0] // is used to create upvars
    • fakereads: [x] // not used to create upvars
  • mir build in the creator:
    • compile a closure expression to:
      • for each place p in the fake read list, a FakeRead(p) instructio
      • a Closure { ... } aggregate where ... is upvar0: place_for_upvar0 ... upvarN: place_for_upvarN (taken from the accesses list)
  • mir build in the closure
    • ignore fakeread for let (a, _) = x because x is an upvar with no capture

another example:

let x = (.., ..); || let a = x; // x.0 is captured, but FakeRead(x, ForLet) is introduced in MIR build
  • upvar analysis:
    • accesses: [x] // is used to create upvars
    • fakereads: [x] // not used to create upvars
  • mir build in the creator:
    • compile a closure expression to:
      • for each place p in the fake read list, a FakeRead(p) instructio
      • a Closure { ... } aggregate where ... is upvar0: place_for_upvar0 ... upvarN: place_for_upvarN (taken from the accesses list)
  • mir build in the closure
    • ignore fakeread for let (a, _) = x because x is an upvar with no capture

final proposal

  • improved version v2:
    • expr use visitor records both accesses (as today) and "fake reads" ("let/match scrutinees")
      • gets stored in typeck results somewhere
    • must also record reads for patterns with discriminant tests etc (e.g., Foo::A)
    • when a match scrutinee or rhs of a let is some place p that begins with an upvar, you omit the fake read
      • ideally, we assert that p is in the fake read list for the enclosing closure
    • when a closure is constructed, insert fake read for every "fake read" place
Select a repo