# Background For those unfamiliar, the mem2reg (memory to register) algorithm is the algorithm Noir uses to replace references with the values held by those references in the code. It is an optimization pass that occurs at several points in our SSA pipeline as various other optimizations simplify the program and references become easier to track. Now onto the main topic: our current mem2reg algorithm is insufficient. In particular it is missing: - Alias tracking (https://github.com/noir-lang/noir/issues/2316) - Invalidating references after being passed to a function (https://github.com/noir-lang/noir/pull/2348#issuecomment-1681202470) Without delving into the "why" too much further, the short is that we need to expand the mem2reg pass to support these features, and we need to do so cleanly. # New mem2reg algorithm: Consider each block and reference in isolation. For any given block `bN` and reference `vN`, the value of `vN` at the beginning of `bN` will always be equal to the unification of each value `vN` has at the end of each predecessor block of `bN`. - Values `A` and `B` from two blocks unify iff `A = B`. If the value from either predecessor block differs, the result of the unification is "Unknown". - If `bN` has no predecessors, the starting value of `vN` is also "Unknown." If it helps, you can think of this as unifying an empty set of predecessor blocks always resulting in an "Unknown" value. So the value of a reference at the start of a block is either Unknown, or Known(value). We can represent this cleanly in an enum: ```rs enum ReferenceValue { Unknown, Known(ValueId), } ``` Now that we have the value of a reference at the start of a block, to find its value at the end of a block there are several cases: 1. There is no store to `vN` within `bN` -> The value of `vN` at the end of `bN` equals its value at the beginning 2. There is a store to `vN` within `bN` -> The value of `vN` after the store is `Known(stored_value)` 3. There is a function call in `bN` containing `vN` as an argument -> The value of `vN` after the call is `Unknown` At any point while going through the instructions of a block, if we find a `vA = load vN` in which the `ReferenceValue` for `vN` is `Known(vB)`, we can replace the value `vA` for the value `vB`. With these rules we have a relatively simple framework to handle allocations in each block. To expand this to a whole function, we could traverse each block in execution order, remembering the `ReferenceValue` of each reference at the end of each block and feeding those into the unified inputs of each subsequent block. Compared to our current mem2reg algorithm, this version is fairly similar but provides a simpler base to build upon. Notably, it does not require any logic for finding loops within the program and extends to passing references to function calls cleanly. ### Example Consider a simple looping program: ```rs fn main() { let mut foo = 0; for i in 0..3 { foo += 1; println(foo); } println(foo); } ``` And its SSA representation before any optimizations: ```rs b0(): v0 = allocate // let mut foo; store Field 0 in v0 // foo = 0; jmp b1(Field 0) b1(v1: Field) // for i in 0..3 { v2 = lt v1, Field 3 jmpif v2, then: b2, else: b3 b2(): v3 = load v0 v4 = add v3, Field 1 store v4 in v0 // foo += 1; v5 = load v0 call println(v5) // println(foo); v6 = add v1, Field 1 jmp b1(v6) b3(): v7 = load v0 call println(v7) // println(foo); return ``` Let's apply the mem2reg algorithm we have so far to this code: Block `b0`: 1. It is the initial block of the function, so all references are `Unknown` 2. We find a `store Field 0 in v0` within, so the value of `v0` by the end of the block is `Field 0` Block `b1`: 1. The predecessors are `b0` and `b2`. Since we haven't explored `b2` yet it must occur later in this analysis. So our starting value for `v0` is `Unknown` (after unifying `Known(Field 0)` and `Unknown`). 2. No stores occur, so the ending value is also `Unknown`. Block `b2`: 1. The sole predecessor is `b1`, so our starting value for `v0` is `Unknown`. 2. We find a `v3 = load v0`, but we do not know the value of `v0`, so it is kept in the code. 3. We find a `store v4 in v0`, so the value of `v0` is now `Known(v4)`. 4. We find a `v5 = load v0` and replace `v5` with `v4`. 5. The final value of `v0` is `v4` since there are no later stores. Block `b3`: 1. The sole predecessor is `b1` so the value of `v0` is inherited as `Unknown`. 2. We are not able to replace `v7 = load v0` with anything yet. 3. No stores occur, so the final value of `v0` is `Unknown`. There weren't many loads to replace in this example but we can consider how changing the code may affect the result. E.g. What if we added a `store Field 7 in v0` in `b1`? That would change the initial value of `v0` in both blocks `b2` and `b3` to `Known(Field 7)`, so we'd replace both `v3` and `v7` with `Field 7`. This may seem unintuitive at first to replace the load at the end of the loop, thus ignoring all the mutations within the loop, but note that the loop block `b2` always jumps back to `b1` which only afterwards jumps to `b3` to end the loop. Thus, there would be no way to get to the end of the loop without going through the new `store Field 7 in v0` first. Hopefully I've managed to convince you, the reader, so far that this new algorithm is simple and sound. Now that that is done, let's break this trust and disprove that assumption: # Aliasing If handling function calls were all we needed, there would be no reason to change our mem2reg algorithm or write this whole document explaining it. Let's consider the elephant in the room, aliasing: ```rs fn main(v0: bool) { let v1 = &mut 0; let mut v2 = [v1, v1]; if v0 { *v2[0] = 1; } else { *v2[1] = 2; } // What is the value of v1? } ``` And in SSA: ```rs b0(v0: u1): v1 = allocate store Field 0 in v1 v2 = allocate store [v1, v1] in v2 jmpif v0, then: b1, else: b2 b1(): v3 = load v2 v4 = array_get v3, index 0 store Field 1 in v4 jmp b3() b2(): v5 = load v2 v6 = array_get v5, index 1 store Field 2 in v6 jmp b3() b3(): // What is the value of v1? return ``` Saving you the trouble of applying the algorithm yourself, the result would be `Field 0` as if it were not modified in either block(\*). This is because we've only been tracking references via their value ids (`v1` here) so far. To properly handle this case we'd need to employ one of two strategies: 1. More granularly track the flow of each reference throughout the program. In the above example, this implies remembering it was used in the array `[v1, v1]` which was then stored in `v2`. 2. Find a good flag to use so that we know when to give up and remember `ReferenceValue::Unknown` for `v1` despite there being no conflicting direct stores to it. (\*) Note that Noir's current mem2reg algorithm does not actually fail on this case yet, but more complex aliasing cases will still fail, e.g. https://github.com/noir-lang/noir/issues/2316. ## Aliasing Part 2: Take a step back and research Unfortunately, there's no single, standard algorithm for optimizing out loads to arbitrary pointers. Instead, there are many different trade-offs to consider along the way. We must also note that identifying all pointer aliases correctly has been proven to be undecidable (Landi, 1992), so we can abandon the notion of always identifying every alias. It can be helpful to get some background information on what options are available though. I've found this resource which seems like a good starter: https://www.cs.toronto.edu/~pekhimenko/courses/cscd70-w19/docs/Lecture%207%20%5BPointer%20Analysis%5D%2003.07.2019.pdf I'll be going through each option to slowly build up the pass we need (you can find this list starting on slide 8): - Alias Representation: - We should track pointer aliases directly. Not only does this improve accuracy of the algorithm but I think it is simpler and more straightforward to understand as well. - Heap modeling: - It is somewhat unclear whether we need to model the heap at all for Noir. Unlike other languages, cyclical references are not possible (no null value and no enums). For this reason, I suggest we conservatively adopt the "heap merged" approach and do not directly model the heap. - Aggregate modeling: - We should treat each array element as an individual location. Note that arrays are also a bit different in Noir since you cannot actually obtain a reference to an element in an array, unless you declared it as `[&mut T]`. So this simplifies things somewhat. We also do not need to worry about structures at all since they are not present in our SSA form. - Flow sensitivity: - The algorithm that detailed in the prior sections is flow sensitive (and not path sensitive). I think this represents a nice compromise in terms of simplicity and accuracy of the algorithm. - Context sensitivity: - Our algorithm should be entirely context insensitive. That is - we shouldn't consider each callsite at all. This works for Noir because one of the required steps of our SSA optimizations is to inline every function. The mem2reg pass also currently occurs only after inlining takes place, so tracking contexts should be unneeded. With these we can start to build out the algorithm a bit more. ## Aliasing Part 3: A Solution! Most of the points above are already covered by the existing algorithm structure that was outlined. The most notable addition is the first point on tracking aliases directly. We can add some additional rules to track alias sets: 1. When a reference is created `vN = allocate`, it is initially aliased only to itself: `{vN}`. 2. When a reference is taken in as a parameter, it is conservatively aliased with all other reference parameters. Unlike Rust, we cannot assume mutable references are unique. Also note that while this limits optimizations greatly, it doesn't come into play often since we forcibly inline every non-unconstrained function during our inlining pass. 3. When we encounter `store v1 in v2` where `v1` is also a reference, we add `*v2` to `v1`'s alias set: `{v1, *v2}`. 4. Similarly, when an array is created containing a reference, we add it to the alias set as well: - `vA = [..., vN, ...]` => `{v1, vA[i]}` - `vA = array_set vB, index i, value vN` => `{v1, vA[i]}` - If the index is non-constant, we can conservatively give the entire array in the alias set: `{v1, vA[?]}` - We'll also need to carry this analysis forward into functions that return arrays/slices like `slice_push_back`, `slice_insert`, `slice_remove`, etc. 5. Finally, where we handled a block with multiple predecessors in our previous algorithm by taking the union of each value, we'll amend this to also union alias sets. Luckily, Noir's core semantics have a fairly small surface area so references and arrays are our only worries. We don't have to worry about pointer arithmetic, pointer casting, long jumps, etc. ### Applying the solution to the problematic example Taking the example with aliasing from earlier: ``` b0(v0: u1): v1 = allocate store Field 0 in v1 v2 = allocate store [v1, v1] in v2 jmpif v0, then: b1, else: b2 b1(): v3 = load v2 v4 = array_get v3, index 0 store Field 1 in v4 jmp b3() b2(): v5 = load v2 v6 = array_get v5, index 1 store Field 2 in v6 jmp b3() b3(): // What is the value of v1? return ``` Since we're now storing alias sets in addition to the `ReferenceValue` enum from before, I'll write this as the tuple of `(alias set, reference value)`. Now, let's apply our algorithm: - `b0`: - We see `({v1}, Known(0))` initially - Then we see a new reference `({v2}, Known([v1, v1]))` - Because this references `v1`, we modify `v1`'s reference set to `({v1, *v2[0], *v2[1]}, Known(0))` - `b1`: - `v3` now gets added to `v1`'s alias set: `{v1, *v2[0], *v2[1], v3[0], v3[1]}` - As does `v4`: `{v1, *v2[0], *v2[1], v3[0], v3[1], v4}` - When we see `store Field 1 in v4`, since we know `v4` is only in a single alias set (it can be in multiple if we are unsure which reference it is aliased to), we can actually change our ReferenceValue here as well: `({v1, ...}, Known(1))`. If `v4` had aliased several references, we would have had to change all of those references to `ReferenceValue::Unknown`. - `b2`: - This block is similar to `b1` but using `v5`, `v6`, and storing `Field 2` instead. - `b3`: - `b3` has two predecessors so we merge the references values: - `({v1, *v2[0], *v2[1], v3[0], v3[1], v4}, Known(1))` from `b1` - `({v1, *v2[0], *v2[1], v5[0], v5[1], v6}, Known(2))` from `b2` - `({v1, *v2[0], *v2[1], v3[0], v3[1], v4, v5[0], v5[1], v6}, Unknown)` is the result - Since the result is Unknown, we would (correctly) fail to replace any `load`s to `v1` in `b3`! This should be a workable algorithm, but could use some fine tuning. There are some items we could improve upon for example: - We could make some attempt to remove old aliases that are no longer used - We could try to clean up our alias set somehow. In particular, arrays and array indices clutter the set particularly when the same reference is stored in many indices of the array. - We could consider tracking "point-to" information instead of alias sets (see slide 9 in the reference link). The reference mentions this would be more efficient but less precise, although truthfully I am not sure why. ... And that's it! As long as there are not major objections to this approach, we will likely be seeing a greatly changed mem2reg in the near future. Also, I've done absolutely no proofreading, so let me know what you think :slightly_smiling_face:. # Better Loop Optimizations At the time of writing, the algorithm above has been in use for quite a while and has generally worked well. Over time however, we began focusing on code size and optimizations for unconstrained code more. Compared to constrained code, different control flow patterns are much more important to handle well in unconstrained code since they won't be flattened into a single block later on. The mem2reg pass as it exists above doesn't optimize looping in particular as well as it could. Consider the program: ```rust unconstrained fn main() { let mut foo = 7; for i in 0 .. 10 {} assert_eq(foo, 7); } ``` We can see from the above that although it is mutable, `foo` is never modified within the loop (or anywhere), so we should be able to optimize the assert to `assert_eq(7, 7)`, then remove it entirely. When converting this program to SSA we get: ``` brillig(inline) fn main f0 { b0(): v1 = allocate -> &mut Field store Field 7 at v1 jmp b1(u32 0) b1(v0: u32): v5 = lt v0, u32 10 jmpif v5 then: b3, else: b2 b3(): v9 = add v0, u32 1 jmp b1(v9) b2(): v6 = load v1 -> Field constrain v6 == Field 7 return } ``` Which we can then run mem2reg on to get... the same SSA back. Why is this happening? It's happening due to how references are tracked across blocks. Going step by step, mem2reg will visit: - `b0`: We find that `*v1 = 7`, ok - `b1`: - Starting values of each reference is the unification of its values in each predecessor block, so we have: - In `b0`, we have `*v1 = 7` - But we've yet to visit `b3`, so `v1` is `Unknown` there, resulting in it being unknown in `b1` as well. - `b3`: Only predecessor is b1, so it inherits v1 as unknown - `b2`: Only predecessor is b1, so it inherits v1 as unknown - Since `v1` is unknown, we can't optimize out the load in this block To better handle this case, we need a way to distinguish between Unknown (due to program logic that can't be known at compile-time) and Unknown (because a predecessor block hasn't been visited yet). Notably, we can't just introduce a `NotYetKnown` value for this case and allow `unify(Known(v), NotYetKnown) = Known(v)` since that would break mutations in loops. Consider the code: ``` brillig(inline) fn main f0 { b0(): v1 = allocate -> &mut Field store Field 7 at v1 jmp b1(u32 0) b1(v0: u32): v5 = lt v0, u32 10 jmpif v5 then: b3, else: b2 b3(): v9 = add v0, u32 1 v10 = load v1 -> Field <-- new instruction constrain v10 == Field 7 <-- new instruction, `assert_eq(foo, 7)` in the loop store v9 at v1 <-- new instruction, `foo = i;` in the loop jmp b1(v9) b2(): v6 = load v1 -> Field constrain v6 == Field 7 return } ``` If we had `NotYetKnown` with the unification change above, we'd perform: - `b0`: We know `*v1 = 7` - `b1`: Unify `v1` from predecessors: - `*v1 = 7` in `b0` - `v1` is `NotYetKnown` in `b3` - So `unify(Known(7), NotYetKnown)`, we know `*v1 = 7` in b1 - `b3`: We inherit `*v1 = 7` from `b1` - We optimize `v10 = Field 7` (!) - We remove the constraint since it is now `constrain 7 == 7` (!) - We see `store v9 at v1` so we now know `*v1 = v9` - `b2`: We inherit `*v1 = 7` from `b1` (!) - We optimize out the constrain here as well (!) This is incorrect however. We're mutating `foo = i;` within the loop so we'd expect to fail the `assert_eq(foo, 7);` on the second iteration of the loop since `foo` would be `0` then, but the constrain was optimized out, so this is not what happens. Instead the program executes successfully when it should have failed. Another sign we can see of this failure is `b2` after the loop inherits the value of `v1` from before the loop, completely ignoring any mutations that happened within the loop. Finally, a third sign of failure is if we go back to `b1` and check its predecessors again we find that they'd now be `Known(7)` and `Known(v9)`, which should unify to `Unknown`, not `7`. All of these issues resulted from the unification change with `NotYetKnown`, so this is not a solution. ## Make Mem2reg Loop-Aware We can modify the above approach a bit by changing the unification rule to `unify(Known(v), NotYetKnown) = NotYetKnown` instead. This is identical to the rule for `Unknown` and we'll generally treat these values as unknown as well. We'll also have `unify(NotYetKnown, Unknown) = Unknown`. To use these new `NotYetKnown` values, we'll explicitly handle loops within mem2reg. This can be done by finding any strongly-connected components (SCCs) in the control flow graph for the program. Any SCCs larger than a single block would be a loop consisting of at least a header block (`b1` above) and a body block (`b3` above), although the body will often be larger than a single block. After identifying where loops are, change the iteration order of blocks in mem2reg to be over SCCs in control-flow order instead of over blocks in control-flow order. For blocks not in loops this will be identical to the previous behavior, but for looping blocks this means we'll have a set of blocks to iterate over at the same time. For looping blocks we will: 1. Perform mem2reg as normal on each loop block. We should expect the same safe but pessimistic behavior from the original algorithm. - The one difference here is that we now expect the initial unification of reference values from the header to be `NotYetKnown` instead of `Unknown` - In `b1` above, this would be `*v1 = unify(Known(7), NotYetKnown) = NotYetKnown` 3. After iterating over each block of the loop, perform unification again on the loop header block. - Now that the whole loop is analyzed, the results from the end of the loop should be known. - If a value was mutated, we'll get `unify(Known(v1), Known(v2)) = Unknown`. Alternatively, the value could also become `Unknown` within the loop, at which point it'd still unify to `Unknown`. - If a value was never mutated, it should still be `NotYetKnown`. If this is the case, instead of unifying, the known value of the reference becomes the value from the pre-header block. - In the example above, we'd get `unify(Known(7), Known(v9)) = Unknown`, which correctly identifies the variable is mutated within the loop and can't be optimized out. - In the original example without mutation, we'd get `unify(Known(7), NotYetKnown)` and at this point we could manually override the result to `Known(7)`. - If any reference value goes from `NotYetKnown` to `Known`, we can iterate over the loop blocks once more at this point to potentially optimize them further. ## Full Example Going back to the original example: ```rust unconstrained fn main() { let mut foo = 7; for i in 0 .. 10 {} assert_eq(foo, 7); } ``` With the SSA: ``` brillig(inline) fn main f0 { b0(): v1 = allocate -> &mut Field store Field 7 at v1 jmp b1(u32 0) b1(v0: u32): v5 = lt v0, u32 10 jmpif v5 then: b3, else: b2 b3(): v9 = add v0, u32 1 jmp b1(v9) b2(): v6 = load v1 -> Field constrain v6 == Field 7 return } ``` This loop-aware algorithm would visit: 1. `b0`: find that `*v1 = 7` 2. Find loop blocks {b1, b3}: 3. `b1`: `*v1 = unify(Known(7), NotYetKnown) = NotYetKnown` (unify values of v1 from predecessors b0 & b3) 4. `b3`: inherit `*v1 = NotYetKnown` 5. Loop is finished, check unifications on `b1` again 6. Find that `*v1` is still `NotYetKnown`, so we ignore b3 and set it to `Known(7)` from b1 instead. Because a value changed from `NotYetKnown` to `Known`, we iterate the loop blocks again: - Iterate through `b1` again, looking for further optimizations now that we know v1 (there are none) - Iterate through `b3` again, looking for further optimizations now that we know v1 (there are none) - End value of `*v1` in `b3` is overwritten to `Known(7)` since we ran this block again 7. `b2`: `*v1 = unify(Known(7), Known(7)) = Known(7)` - Now we can optimize out the load and constrain in `b2` This algorithm can now optimize this looping case successfully. If we look at the mutation example in the loop, the main change is in step 6 since when we recheck the unify of `b1` we'd have `*v1 = unify(Known(7), Known(v9)) = Unknown`. Since a value didn't change from NotYetKnown to Known, we don't iterate the loop blocks again. After the loop in `b2`, the value of `*v1` would still be `Unknown` so we correctly do not optimize out the constrain in the mutation case. ## Nested Loops Let's see how this handles nested loops: ```rust unconstrained fn main() { let mut foo = 7; for i in 0 .. 2 { for j in 0 .. 3 {} } assert_eq(foo, 7); } ``` ## Other Cases to Consider Variables defined inside a loop: ```rust unconstrained fn main() { for _ in 0 .. 2 { let mut foo = 7; for j in 0 .. 3 {} assert_eq(foo, 7); } } ``` ```rust let mut leading_zeroes = 0; let mut stop = false; let bits: [u1; 64] = C.to_be_bits(); for i in 0..64 { if (bits[i] == 0) & !stop { leading_zeroes += 1; } else { stop = true; } } assert_eq(leading_zeroes, 30); ``` ``` acir(inline) fn main f0 { b0(): v1 = allocate -> &mut Field store Field 0 at v1 v3 = allocate -> &mut u1 store u1 0 at v3 v6 = make_array [u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 1, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u 1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 0, u1 1, u1 0] : [u1; 64] inc_rc v6 jmp b1(u32 0) b1(v0: u32): v9 = lt v0, u32 64 jmpif v9 then: b3, else: b2 b3(): v13 = array_get v6, index v0 -> u1 v14 = not v13 v15 = load v3 -> u1 v16 = not v15 v17 = mul v14, v16 jmpif v17 then: b5, else: b4 b5(): v18 = load v1 -> Field v20 = add v18, Field 1 store v20 at v1 jmp b6() b6(): v22 = add v0, u32 1 jmp b1(v22) b4(): store u1 1 at v3 jmp b6() b2(): v10 = load v1 -> Field v12 = eq v10, Field 30 constrain v10 == Field 30 return } ``` - b0: - `*v1 = 0` - `*v3 = false` - Looping blocks `{b1, b3, b4, b5, b6}` - `b1` (preds `b0, b6`): - `*v1 = 0 U NYK = NYK` - `*v3 = false U NYK = NYK` - `b3`: inherit from `b1` - all nyk: no opts, no stores - `b4`: inherit from `b3` - all nyk to start, `*v3 = true` - `b5`: inherit from `b3` - all nyk to start, `*v1 = v20` - `b6` (preds `b4, b5`): - `*v1 = NYK U v20 = NYK` - `*v3 = true U NYK = NYK` - Recheck `b1` - `b1`: - `*v1 = 0 U NYK = NYK` - `*v3 = false U NYK = NYK` - Need to distinguish NYK (unchanged) from NYK (now unknown) - Change unification & initialization rules - `v U NYK = Unknown` - Now only need to override the initial `b1` unifications: - `*v1 = 0 U NYK = NYK` (override from Unknown) - `*v3 = false U NYK = NYK` (override from Unknown) - Only if they remain NYK can we then set them to 0/false. Now this is breaking: ```rust fn main() { let mut acc: Field = 0; let array = [[1, 2], [3, 4]]; for i in 0..2/*57*/ { for j in 0..2/*57*/ { acc += array[i][j]; } } assert(acc != 0); } ``` ``` acir(inline) fn main f0 { b0(): v2 = allocate -> &mut Field store Field 0 at v2 v6 = make_array [Field 1, Field 2] : [Field; 2] v9 = make_array [Field 3, Field 4] : [Field; 2] inc_rc v6 inc_rc v9 v10 = make_array [v6, v9] : [[Field; 2]; 2] inc_rc v10 jmp b1(u32 0) b1(v0: u32): v13 = lt v0, u32 2 jmpif v13 then: b3, else: b2 b3(): jmp b4(u32 0) b4(v1: u32): v18 = lt v1, u32 2 jmpif v18 then: b6, else: b5 b6(): v21 = load v2 -> Field v22 = array_get v10, index v0 -> [Field; 2] inc_rc v22 v23 = array_get v22, index v1 -> Field v24 = add v21, v23 store v24 at v2 v25 = add v1, u32 1 jmp b4(v25) b5(): v20 = add v0, u32 1 jmp b1(v20) b2(): v14 = load v2 -> Field v15 = eq v14, Field 0 v16 = not v15 constrain v15 == u1 0 return } ``` - b0: `*v2 = 0` - loop `{b1, b3, b4, b5, b6}` - b1 (p b0, b5): `*v2 = 0 U NYK = NYK (overridden)` - b3: inherit b1 - b4 (p b3, b6): `*v2 = NYK U NYK = NYK` - b5: inherit b4 - b6: inherit b4 - load remains, `*v2 = v24` - Recheck b1, `*v2 = 0 U NYK` (!) analyzed in wrong order - b5 didnt get updated from b6 & b4 - Impossible to update b4 from b6 without a nested loop pass If the loop check were recursive: - b0: `*v2 = 0` - loop `{b1, b3, b4, b5, b6}` - b1 (p b0, b5): `*v2 = 0 U NYK = NYK (overridden)` - b3: inherit b1 - loop `{b4, b6}` - b4 (p b3, b6): `*v2 = NYK U NYK = NYK` - b6: inherit b4 - load stays, `*v2 = v24` - recheck b4: `*v2 = NYK U v24 = Unknown` - b5: inherit b4 - recheck b1: `*v2 = 0 U Unknown`