Try   HackMD

Background

For those unfamiliar, the mem2reg (memory to register) algorithm is the algorithm Noir uses to replace references with the values held by those references in the code. It is an optimization pass that occurs at several points in our SSA pipeline as various other optimizations simplify the program and references become easier to track.

Now onto the main topic: our current mem2reg algorithm is insufficient. In particular it is missing:

Without delving into the "why" too much further, the short is that we need to expand the mem2reg pass to support these features, and we need to do so cleanly.

New mem2reg algorithm:

Consider each block and reference in isolation.

For any given block bN and reference vN, the value of vN at the beginning of bN will always be equal to the unification of each value vN has at the end of each predecessor block of bN.

  • Values A and B from two blocks unify iff A = B. If the value from either predecessor block differs, the result of the unification is "Unknown".
  • If bN has no predecessors, the starting value of vN is also "Unknown." If it helps, you can think of this as unifying an empty set of predecessor blocks always resulting in an "Unknown" value.

So the value of a reference at the start of a block is either Unknown, or Known(value). We can represent this cleanly in an enum:

enum ReferenceValue {
    Unknown,
    Known(ValueId),
}

Now that we have the value of a reference at the start of a block, to find its value at the end of a block there are several cases:

  1. There is no store to vN within bN -> The value of vN at the end of bN equals its value at the beginning
  2. There is a store to vN within bN -> The value of vN after the store is Known(stored_value)
  3. There is a function call in bN containing vN as an argument -> The value of vN after the call is Unknown

At any point while going through the instructions of a block, if we find a vA = load vN in which the ReferenceValue for vN is Known(vB), we can replace the value vA for the value vB.

With these rules we have a relatively simple framework to handle allocations in each block. To expand this to a whole function, we could traverse each block in execution order, remembering the ReferenceValue of each reference at the end of each block and feeding those into the unified inputs of each subsequent block.

Compared to our current mem2reg algorithm, this version is fairly similar but provides a simpler base to build upon. Notably, it does not require any logic for finding loops within the program and extends to passing references to function calls cleanly.

Example

Consider a simple looping program:

fn main() {
    let mut foo = 0;
    for i in 0..3 {
        foo += 1;
        println(foo);
    }
    println(foo);
}

And its SSA representation before any optimizations:

b0():
  v0 = allocate          // let mut foo;
  store Field 0 in v0    // foo = 0;
  jmp b1(Field 0)
b1(v1: Field)            // for i in 0..3 {
  v2 = lt v1, Field 3
  jmpif v2, then: b2, else: b3
b2():
  v3 = load v0
  v4 = add v3, Field 1
  store v4 in v0         // foo += 1;
  v5 = load v0
  call println(v5)       // println(foo);
  v6 = add v1, Field 1
  jmp b1(v6)
b3():
  v7 = load v0
  call println(v7)       // println(foo);
  return

Let's apply the mem2reg algorithm we have so far to this code:

Block b0:

  1. It is the initial block of the function, so all references are Unknown
  2. We find a store Field 0 in v0 within, so the value of v0 by the end of the block is Field 0

Block b1:

  1. The predecessors are b0 and b2. Since we haven't explored b2 yet it must occur later in this analysis. So our starting value for v0 is Unknown (after unifying Known(Field 0) and Unknown).
  2. No stores occur, so the ending value is also Unknown.

Block b2:

  1. The sole predecessor is b1, so our starting value for v0 is Unknown.
  2. We find a v3 = load v0, but we do not know the value of v0, so it is kept in the code.
  3. We find a store v4 in v0, so the value of v0 is now Known(v4).
  4. We find a v5 = load v0 and replace v5 with v4.
  5. The final value of v0 is v4 since there are no later stores.

Block b3:

  1. The sole predecessor is b1 so the value of v0 is inherited as Unknown.
  2. We are not able to replace v7 = load v0 with anything yet.
  3. No stores occur, so the final value of v0 is Unknown.

There weren't many loads to replace in this example but we can consider how changing the code may affect the result. E.g. What if we added a store Field 7 in v0 in b1? That would change the initial value of v0 in both blocks b2 and b3 to Known(Field 7), so we'd replace both v3 and v7 with Field 7. This may seem unintuitive at first to replace the load at the end of the loop, thus ignoring all the mutations within the loop, but note that the loop block b2 always jumps back to b1 which only afterwards jumps to b3 to end the loop. Thus, there would be no way to get to the end of the loop without going through the new store Field 7 in v0 first.

Hopefully I've managed to convince you, the reader, so far that this new algorithm is simple and sound. Now that that is done, let's break this trust and disprove that assumption:

Aliasing

If handling function calls were all we needed, there would be no reason to change our mem2reg algorithm or write this whole document explaining it.

Let's consider the elephant in the room, aliasing:

fn main(v0: bool) {
    let v1 = &mut 0;
    let mut v2 = [v1, v1];

    if v0 {
        *v2[0] = 1;
    } else {
        *v2[1] = 2;
    }

    // What is the value of v1?
}

And in SSA:

b0(v0: u1):
  v1 = allocate
  store Field 0 in v1
  v2 = allocate
  store [v1, v1] in v2
  jmpif v0, then: b1, else: b2

b1():
  v3 = load v2
  v4 = array_get v3, index 0
  store Field 1 in v4
  jmp b3()

b2():
  v5 = load v2
  v6 = array_get v5, index 1
  store Field 2 in v6
  jmp b3()

b3():
  // What is the value of v1?
  return

Saving you the trouble of applying the algorithm yourself, the result would be Field 0 as if it were not modified in either block(*). This is because we've only been tracking references via their value ids (v1 here) so far. To properly handle this case we'd need to employ one of two strategies:

  1. More granularly track the flow of each reference throughout the program. In the above example, this implies remembering it was used in the array [v1, v1] which was then stored in v2.
  2. Find a good flag to use so that we know when to give up and remember ReferenceValue::Unknown for v1 despite there being no conflicting direct stores to it.

(*) Note that Noir's current mem2reg algorithm does not actually fail on this case yet, but more complex aliasing cases will still fail, e.g. https://github.com/noir-lang/noir/issues/2316.

Aliasing Part 2: Take a step back and research

Unfortunately, there's no single, standard algorithm for optimizing out loads to arbitrary pointers. Instead, there are many different trade-offs to consider along the way. We must also note that identifying all pointer aliases correctly has been proven to be undecidable (Landi, 1992), so we can abandon the notion of always identifying every alias. It can be helpful to get some background information on what options are available though. I've found this resource which seems like a good starter:

https://www.cs.toronto.edu/~pekhimenko/courses/cscd70-w19/docs/Lecture 7 [Pointer Analysis] 03.07.2019.pdf

I'll be going through each option to slowly build up the pass we need (you can find this list starting on slide 8):

  • Alias Representation:
    • We should track pointer aliases directly. Not only does this improve accuracy of the algorithm but I think it is simpler and more straightforward to understand as well.
  • Heap modeling:
    • It is somewhat unclear whether we need to model the heap at all for Noir. Unlike other languages, cyclical references are not possible (no null value and no enums). For this reason, I suggest we conservatively adopt the "heap merged" approach and do not directly model the heap.
  • Aggregate modeling:
    • We should treat each array element as an individual location. Note that arrays are also a bit different in Noir since you cannot actually obtain a reference to an element in an array, unless you declared it as [&mut T]. So this simplifies things somewhat. We also do not need to worry about structures at all since they are not present in our SSA form.
  • Flow sensitivity:
    • The algorithm that detailed in the prior sections is flow sensitive (and not path sensitive). I think this represents a nice compromise in terms of simplicity and accuracy of the algorithm.
  • Context sensitivity:
    • Our algorithm should be entirely context insensitive. That is - we shouldn't consider each callsite at all. This works for Noir because one of the required steps of our SSA optimizations is to inline every function. The mem2reg pass also currently occurs only after inlining takes place, so tracking contexts should be unneeded.

With these we can start to build out the algorithm a bit more.

Aliasing Part 3: A Solution!

Most of the points above are already covered by the existing algorithm structure that was outlined. The most notable addition is the first point on tracking aliases directly. We can add some additional rules to track alias sets:

  1. When a reference is created vN = allocate, it is initially aliased only to itself: {vN}.
  2. When a reference is taken in as a parameter, it is conservatively aliased with all other reference parameters. Unlike Rust, we cannot assume mutable references are unique. Also note that while this limits optimizations greatly, it doesn't come into play often since we forcibly inline every non-unconstrained function during our inlining pass.
  3. When we encounter store v1 in v2 where v1 is also a reference, we add *v2 to v1's alias set: {v1, *v2}.
  4. Similarly, when an array is created containing a reference, we add it to the alias set as well:
  • vA = [..., vN, ...] => {v1, vA[i]}
  • vA = array_set vB, index i, value vN => {v1, vA[i]}
  • If the index is non-constant, we can conservatively give the entire array in the alias set: {v1, vA[?]}
  • We'll also need to carry this analysis forward into functions that return arrays/slices like slice_push_back, slice_insert, slice_remove, etc.
  1. Finally, where we handled a block with multiple predecessors in our previous algorithm by taking the union of each value, we'll amend this to also union alias sets.

Luckily, Noir's core semantics have a fairly small surface area so references and arrays are our only worries. We don't have to worry about pointer arithmetic, pointer casting, long jumps, etc.

Applying the solution to the problematic example

Taking the example with aliasing from earlier:

b0(v0: u1):
  v1 = allocate
  store Field 0 in v1
  v2 = allocate
  store [v1, v1] in v2
  jmpif v0, then: b1, else: b2

b1():
  v3 = load v2
  v4 = array_get v3, index 0
  store Field 1 in v4
  jmp b3()

b2():
  v5 = load v2
  v6 = array_get v5, index 1
  store Field 2 in v6
  jmp b3()

b3():
  // What is the value of v1?
  return

Since we're now storing alias sets in addition to the ReferenceValue enum from before, I'll write this as the tuple of (alias set, reference value). Now, let's apply our algorithm:

  • b0:
    • We see ({v1}, Known(0)) initially
    • Then we see a new reference ({v2}, Known([v1, v1]))
      • Because this references v1, we modify v1's reference set to ({v1, *v2[0], *v2[1]}, Known(0))
  • b1:
    • v3 now gets added to v1's alias set: {v1, *v2[0], *v2[1], v3[0], v3[1]}
    • As does v4: {v1, *v2[0], *v2[1], v3[0], v3[1], v4}
    • When we see store Field 1 in v4, since we know v4 is only in a single alias set (it can be in multiple if we are unsure which reference it is aliased to), we can actually change our ReferenceValue here as well: ({v1, ...}, Known(1)). If v4 had aliased several references, we would have had to change all of those references to ReferenceValue::Unknown.
  • b2:
    • This block is similar to b1 but using v5, v6, and storing Field 2 instead.
  • b3:
    • b3 has two predecessors so we merge the references values:
      • ({v1, *v2[0], *v2[1], v3[0], v3[1], v4}, Known(1)) from b1
      • ({v1, *v2[0], *v2[1], v5[0], v5[1], v6}, Known(2)) from b2
      • ({v1, *v2[0], *v2[1], v3[0], v3[1], v4, v5[0], v5[1], v6}, Unknown) is the result
    • Since the result is Unknown, we would (correctly) fail to replace any loads to v1 in b3!

This should be a workable algorithm, but could use some fine tuning. There are some items we could improve upon for example:

  • We could make some attempt to remove old aliases that are no longer used
  • We could try to clean up our alias set somehow. In particular, arrays and array indices clutter the set particularly when the same reference is stored in many indices of the array.
  • We could consider tracking "point-to" information instead of alias sets (see slide 9 in the reference link). The reference mentions this would be more efficient but less precise, although truthfully I am not sure why.

And that's it! As long as there are not major objections to this approach, we will likely be seeing a greatly changed mem2reg in the near future. Also, I've done absolutely no proofreading, so let me know what you think

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
.

Better Loop Optimizations

At the time of writing, the algorithm above has been in use for quite a while and has generally worked well. Over time however, we began focusing on code size and optimizations for unconstrained code more. Compared to constrained code, different control flow patterns are much more important to handle well in unconstrained code since they won't be flattened into a single block later on. The mem2reg pass as it exists above doesn't optimize looping in particular as well as it could. Consider the program:

unconstrained fn main() {
    let mut foo = 7;
    for i in 0 .. 10 {}
    assert_eq(foo, 7);
}

We can see from the above that although it is mutable, foo is never modified within the loop (or anywhere), so we should be able to optimize the assert to assert_eq(7, 7), then remove it entirely. When converting this program to SSA we get:

brillig(inline) fn main f0 {
  b0():
    v1 = allocate -> &mut Field
    store Field 7 at v1
    jmp b1(u32 0)
  b1(v0: u32):
    v5 = lt v0, u32 10
    jmpif v5 then: b3, else: b2
  b3():
    v9 = add v0, u32 1
    jmp b1(v9)
  b2():
    v6 = load v1 -> Field
    constrain v6 == Field 7
    return
}

Which we can then run mem2reg on to get the same SSA back. Why is this happening? It's happening due to how references are tracked across blocks. Going step by step, mem2reg will visit:

  • b0: We find that *v1 = 7, ok
  • b1:
    • Starting values of each reference is the unification of its values in each predecessor block, so we have:
      • In b0, we have *v1 = 7
      • But we've yet to visit b3, so v1 is Unknown there, resulting in it being unknown in b1 as well.
  • b3: Only predecessor is b1, so it inherits v1 as unknown
  • b2: Only predecessor is b1, so it inherits v1 as unknown
    • Since v1 is unknown, we can't optimize out the load in this block

To better handle this case, we need a way to distinguish between Unknown (due to program logic that can't be known at compile-time) and Unknown (because a predecessor block hasn't been visited yet). Notably, we can't just introduce a NotYetKnown value for this case and allow unify(Known(v), NotYetKnown) = Known(v) since that would break mutations in loops. Consider the code:

brillig(inline) fn main f0 {
  b0():
    v1 = allocate -> &mut Field
    store Field 7 at v1
    jmp b1(u32 0)
  b1(v0: u32):
    v5 = lt v0, u32 10
    jmpif v5 then: b3, else: b2
  b3():
    v9 = add v0, u32 1
    v10 = load v1 -> Field    <-- new instruction
    constrain v10 == Field 7  <-- new instruction, `assert_eq(foo, 7)` in the loop
    store v9 at v1            <-- new instruction, `foo = i;` in the loop
    jmp b1(v9)
  b2():
    v6 = load v1 -> Field
    constrain v6 == Field 7
    return
}

If we had NotYetKnown with the unification change above, we'd perform:

  • b0: We know *v1 = 7
  • b1: Unify v1 from predecessors:
    • *v1 = 7 in b0
    • v1 is NotYetKnown in b3
    • So unify(Known(7), NotYetKnown), we know *v1 = 7 in b1
  • b3: We inherit *v1 = 7 from b1
    • We optimize v10 = Field 7 (!)
    • We remove the constraint since it is now constrain 7 == 7 (!)
    • We see store v9 at v1 so we now know *v1 = v9
  • b2: We inherit *v1 = 7 from b1 (!)
    • We optimize out the constrain here as well (!)

This is incorrect however. We're mutating foo = i; within the loop so we'd expect to fail the assert_eq(foo, 7); on the second iteration of the loop since foo would be 0 then, but the constrain was optimized out, so this is not what happens. Instead the program executes successfully when it should have failed. Another sign we can see of this failure is b2 after the loop inherits the value of v1 from before the loop, completely ignoring any mutations that happened within the loop. Finally, a third sign of failure is if we go back to b1 and check its predecessors again we find that they'd now be Known(7) and Known(v9), which should unify to Unknown, not 7. All of these issues resulted from the unification change with NotYetKnown, so this is not a solution.

Make Mem2reg Loop-Aware

We can modify the above approach a bit by changing the unification rule to unify(Known(v), NotYetKnown) = NotYetKnown instead. This is identical to the rule for Unknown and we'll generally treat these values as unknown as well. We'll also have unify(NotYetKnown, Unknown) = Unknown.

To use these new NotYetKnown values, we'll explicitly handle loops within mem2reg. This can be done by finding any strongly-connected components (SCCs) in the control flow graph for the program. Any SCCs larger than a single block would be a loop consisting of at least a header block (b1 above) and a body block (b3 above), although the body will often be larger than a single block.

After identifying where loops are, change the iteration order of blocks in mem2reg to be over SCCs in control-flow order instead of over blocks in control-flow order. For blocks not in loops this will be identical to the previous behavior, but for looping blocks this means we'll have a set of blocks to iterate over at the same time. For looping blocks we will:

  1. Perform mem2reg as normal on each loop block. We should expect the same safe but pessimistic behavior from the original algorithm.
  • The one difference here is that we now expect the initial unification of reference values from the header to be NotYetKnown instead of Unknown
  • In b1 above, this would be *v1 = unify(Known(7), NotYetKnown) = NotYetKnown
  1. After iterating over each block of the loop, perform unification again on the loop header block.
  • Now that the whole loop is analyzed, the results from the end of the loop should be known.
  • If a value was mutated, we'll get unify(Known(v1), Known(v2)) = Unknown. Alternatively, the value could also become Unknown within the loop, at which point it'd still unify to Unknown.
  • If a value was never mutated, it should still be NotYetKnown. If this is the case, instead of unifying, the known value of the reference becomes the value from the pre-header block.
    • In the example above, we'd get unify(Known(7), Known(v9)) = Unknown, which correctly identifies the variable is mutated within the loop and can't be optimized out.
    • In the original example without mutation, we'd get unify(Known(7), NotYetKnown) and at this point we could manually override the result to Known(7).
  • If any reference value goes from NotYetKnown to Known, we can iterate over the loop blocks once more at this point to potentially optimize them further.

Full Example

Going back to the original example:

unconstrained fn main() {
    let mut foo = 7;
    for i in 0 .. 10 {}
    assert_eq(foo, 7);
}

With the SSA:

brillig(inline) fn main f0 {
  b0():
    v1 = allocate -> &mut Field
    store Field 7 at v1
    jmp b1(u32 0)
  b1(v0: u32):
    v5 = lt v0, u32 10
    jmpif v5 then: b3, else: b2
  b3():
    v9 = add v0, u32 1
    jmp b1(v9)
  b2():
    v6 = load v1 -> Field
    constrain v6 == Field 7
    return
}

This loop-aware algorithm would visit:

  1. b0: find that *v1 = 7
  2. Find loop blocks {b1, b3}:
  3. b1: *v1 = unify(Known(7), NotYetKnown) = NotYetKnown (unify values of v1 from predecessors b0 & b3)
  4. b3: inherit *v1 = NotYetKnown
  5. Loop is finished, check unifications on b1 again
  6. Find that *v1 is still NotYetKnown, so we ignore b3 and set it to Known(7) from b1 instead. Because a value changed from NotYetKnown to Known, we iterate the loop blocks again:
    • Iterate through b1 again, looking for further optimizations now that we know v1 (there are none)
    • Iterate through b3 again, looking for further optimizations now that we know v1 (there are none)
    • End value of *v1 in b3 is overwritten to Known(7) since we ran this block again
  7. b2: *v1 = unify(Known(7), Known(7)) = Known(7)
    • Now we can optimize out the load and constrain in b2

This algorithm can now optimize this looping case successfully. If we look at the mutation example in the loop, the main change is in step 6 since when we recheck the unify of b1 we'd have *v1 = unify(Known(7), Known(v9)) = Unknown. Since a value didn't change from NotYetKnown to Known, we don't iterate the loop blocks again. After the loop in b2, the value of *v1 would still be Unknown so we correctly do not optimize out the constrain in the mutation case.

Nested Loops

Let's see how this handles nested loops:

unconstrained fn main() {
    let mut foo = 7;
    
    for i in 0 .. 2 {
        for j in 0 .. 3 {}
    }
    
    assert_eq(foo, 7);
}

Other Cases to Consider

Variables defined inside a loop:

unconstrained fn main() {
    for _ in 0 .. 2 {
        let mut foo = 7;
        for j in 0 .. 3 {}
        assert_eq(foo, 7);
    }
}
    let mut leading_zeroes = 0;
    let mut stop = false;
    let bits: [u1; 64] = C.to_be_bits();
    for i in 0..64 {
        if (bits[i] == 0) & !stop {
            leading_zeroes += 1;
        } else {
            stop = true;
        }
    }

    assert_eq(leading_zeroes, 30);
acir(inline) fn main f0 {
  b0():
    v1 = allocate -> &mut Field
    store Field 0 at v1
    v3 = allocate -> &mut u1
    store u1 0 at v3
    v6 = make_array [u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0,
 u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 1, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u
1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 1, u1 0] : [u1; 64]
    inc_rc v6
    jmp b1(u32 0)
  b1(v0: u32):
    v9 = lt v0, u32 64
    jmpif v9 then: b3, else: b2
  b3():
    v13 = array_get v6, index v0 -> u1
    v14 = not v13
    v15 = load v3 -> u1
    v16 = not v15
    v17 = mul v14, v16
    jmpif v17 then: b5, else: b4
  b5():
    v18 = load v1 -> Field
    v20 = add v18, Field 1
    store v20 at v1
    jmp b6()
  b6():
    v22 = add v0, u32 1
    jmp b1(v22)
  b4():
    store u1 1 at v3
    jmp b6()
  b2():
    v10 = load v1 -> Field
    v12 = eq v10, Field 30
    constrain v10 == Field 30
    return
}
  • b0:
    • *v1 = 0
    • *v3 = false
  • Looping blocks {b1, b3, b4, b5, b6}
    • b1 (preds b0, b6):
      • *v1 = 0 U NYK = NYK
      • *v3 = false U NYK = NYK
    • b3: inherit from b1
      • all nyk: no opts, no stores
    • b4: inherit from b3
      • all nyk to start, *v3 = true
    • b5: inherit from b3
      • all nyk to start, *v1 = v20
    • b6 (preds b4, b5):
      • *v1 = NYK U v20 = NYK
      • *v3 = true U NYK = NYK
    • Recheck b1
      • b1:
        • *v1 = 0 U NYK = NYK
        • *v3 = false U NYK = NYK
        • Need to distinguish NYK (unchanged) from NYK (now unknown)
        • Change unification & initialization rules
          • v U NYK = Unknown
          • Now only need to override the initial b1 unifications:
            • *v1 = 0 U NYK = NYK (override from Unknown)
            • *v3 = false U NYK = NYK (override from Unknown)
            • Only if they remain NYK can we then set them to 0/false.

Now this is breaking:

fn main() {
    let mut acc: Field = 0;
    let array = [[1, 2], [3, 4]];

    for i in 0..2/*57*/ {
        for j in 0..2/*57*/ {
            acc += array[i][j];
        }
    }
    assert(acc != 0);
}
acir(inline) fn main f0 {
  b0():
    v2 = allocate -> &mut Field
    store Field 0 at v2
    v6 = make_array [Field 1, Field 2] : [Field; 2]
    v9 = make_array [Field 3, Field 4] : [Field; 2]
    inc_rc v6
    inc_rc v9
    v10 = make_array [v6, v9] : [[Field; 2]; 2]
    inc_rc v10
    jmp b1(u32 0)
  b1(v0: u32):
    v13 = lt v0, u32 2
    jmpif v13 then: b3, else: b2
  b3():
    jmp b4(u32 0)
  b4(v1: u32):
    v18 = lt v1, u32 2
    jmpif v18 then: b6, else: b5
  b6():
    v21 = load v2 -> Field
    v22 = array_get v10, index v0 -> [Field; 2]
    inc_rc v22
    v23 = array_get v22, index v1 -> Field
    v24 = add v21, v23
    store v24 at v2
    v25 = add v1, u32 1
    jmp b4(v25)
  b5():
    v20 = add v0, u32 1
    jmp b1(v20)
  b2():
    v14 = load v2 -> Field
    v15 = eq v14, Field 0
    v16 = not v15
    constrain v15 == u1 0
    return
}
  • b0: *v2 = 0
  • loop {b1, b3, b4, b5, b6}
    • b1 (p b0, b5): *v2 = 0 U NYK = NYK (overridden)
    • b3: inherit b1
    • b4 (p b3, b6): *v2 = NYK U NYK = NYK
    • b5: inherit b4
    • b6: inherit b4
      • load remains, *v2 = v24
    • Recheck b1, *v2 = 0 U NYK (!) analyzed in wrong order
      • b5 didnt get updated from b6 & b4
      • Impossible to update b4 from b6 without a nested loop pass

If the loop check were recursive:

  • b0: *v2 = 0
  • loop {b1, b3, b4, b5, b6}
    • b1 (p b0, b5): *v2 = 0 U NYK = NYK (overridden)
    • b3: inherit b1
    • loop {b4, b6}
      • b4 (p b3, b6): *v2 = NYK U NYK = NYK
      • b6: inherit b4
        • load stays, *v2 = v24
      • recheck b4: *v2 = NYK U v24 = Unknown
    • b5: inherit b4
    • recheck b1: *v2 = 0 U Unknown