Expanded Mem2Reg Pass

# Background For those unfamiliar, the mem2reg (memory to register) algorithm is the algorithm Noir uses to replace references with the values held by those references in the code. It is an optimization pass that occurs at several points in our SSA pipeline as various other optimizations simplify the program and references become easier to track. Now onto the main topic: our current mem2reg algorithm is insufficient. In particular it is missing: - Alias tracking (https://github.com/noir-lang/noir/issues/2316) - Invalidating references after being passed to a function (https://github.com/noir-lang/noir/pull/2348#issuecomment-1681202470) Without delving into the "why" too much further, the short is that we need to expand the mem2reg pass to support these features, and we need to do so cleanly. # New mem2reg algorithm: Consider each block and reference in isolation. For any given block `bN` and reference `vN`, the value of `vN` at the beginning of `bN` will always be equal to the unification of each value `vN` has at the end of each predecessor block of `bN`. - Values `A` and `B` from two blocks unify iff `A = B`. If the value from either predecessor block differs, the result of the unification is "Unknown". - If `bN` has no predecessors, the starting value of `vN` is also "Unknown." If it helps, you can think of this as unifying an empty set of predecessor blocks always resulting in an "Unknown" value. So the value of a reference at the start of a block is either Unknown, or Known(value). We can represent this cleanly in an enum: ```rs enum ReferenceValue { Unknown, Known(ValueId), } ``` Now that we have the value of a reference at the start of a block, to find its value at the end of a block there are several cases: 1. There is no store to `vN` within `bN` -> The value of `vN` at the end of `bN` equals its value at the beginning 2. There is a store to `vN` within `bN` -> The value of `vN` after the store is `Known(stored_value)` 3. There is a function call in `bN` containing `vN` as an argument -> The value of `vN` after the call is `Unknown` At any point while going through the instructions of a block, if we find a `vA = load vN` in which the `ReferenceValue` for `vN` is `Known(vB)`, we can replace the value `vA` for the value `vB`. With these rules we have a relatively simple framework to handle allocations in each block. To expand this to a whole function, we could traverse each block in execution order, remembering the `ReferenceValue` of each reference at the end of each block and feeding those into the unified inputs of each subsequent block. Compared to our current mem2reg algorithm, this version is fairly similar but provides a simpler base to build upon. Notably, it does not require any logic for finding loops within the program and extends to passing references to function calls cleanly. ### Example Consider a simple looping program: ```rs fn main() { let mut foo = 0; for i in 0..3 { foo += 1; println(foo); } println(foo); } ``` And its SSA representation before any optimizations: ```rs b0(): v0 = allocate // let mut foo; store Field 0 in v0 // foo = 0; jmp b1(Field 0) b1(v1: Field) // for i in 0..3 { v2 = lt v1, Field 3 jmpif v2, then: b2, else: b3 b2(): v3 = load v0 v4 = add v3, Field 1 store v4 in v0 // foo += 1; v5 = load v0 call println(v5) // println(foo); v6 = add v1, Field 1 jmp b1(v6) b3(): v7 = load v0 call println(v7) // println(foo); return ``` Let's apply the mem2reg algorithm we have so far to this code: Block `b0`: 1. It is the initial block of the function, so all references are `Unknown` 2. We find a `store Field 0 in v0` within, so the value of `v0` by the end of the block is `Field 0` Block `b1`: 1. The predecessors are `b0` and `b2`. Since we haven't explored `b2` yet it must occur later in this analysis. So our starting value for `v0` is `Unknown` (after unifying `Known(Field 0)` and `Unknown`). 2. No stores occur, so the ending value is also `Unknown`. Block `b2`: 1. The sole predecessor is `b1`, so our starting value for `v0` is `Unknown`. 2. We find a `v3 = load v0`, but we do not know the value of `v0`, so it is kept in the code. 3. We find a `store v4 in v0`, so the value of `v0` is now `Known(v4)`. 4. We find a `v5 = load v0` and replace `v5` with `v4`. 5. The final value of `v0` is `v4` since there are no later stores. Block `b3`: 1. The sole predecessor is `b1` so the value of `v0` is inherited as `Unknown`. 2. We are not able to replace `v7 = load v0` with anything yet. 3. No stores occur, so the final value of `v0` is `Unknown`. There weren't many loads to replace in this example but we can consider how changing the code may affect the result. E.g. What if we added a `store Field 7 in v0` in `b1`? That would change the initial value of `v0` in both blocks `b2` and `b3` to `Known(Field 7)`, so we'd replace both `v3` and `v7` with `Field 7`. This may seem unintuitive at first to replace the load at the end of the loop, thus ignoring all the mutations within the loop, but note that the loop block `b2` always jumps back to `b1` which only afterwards jumps to `b3` to end the loop. Thus, there would be no way to get to the end of the loop without going through the new `store Field 7 in v0` first. Hopefully I've managed to convince you, the reader, so far that this new algorithm is simple and sound. Now that that is done, let's break this trust and disprove that assumption: # Aliasing If handling function calls were all we needed, there would be no reason to change our mem2reg algorithm or write this whole document explaining it. Let's consider the elephant in the room, aliasing: ```rs fn main(v0: bool) { let v1 = &mut 0; let mut v2 = [v1, v1]; if v0 { *v2[0] = 1; } else { *v2[1] = 2; } // What is the value of v1? } ``` And in SSA: ```rs b0(v0: u1): v1 = allocate store Field 0 in v1 v2 = allocate store [v1, v1] in v2 jmpif v0, then: b1, else: b2 b1(): v3 = load v2 v4 = array_get v3, index 0 store Field 1 in v4 jmp b3() b2(): v5 = load v2 v6 = array_get v5, index 1 store Field 2 in v6 jmp b3() b3(): // What is the value of v1? return ``` Saving you the trouble of applying the algorithm yourself, the result would be `Field 0` as if it were not modified in either block(\*). This is because we've only been tracking references via their value ids (`v1` here) so far. To properly handle this case we'd need to employ one of two strategies: 1. More granularly track the flow of each reference throughout the program. In the above example, this implies remembering it was used in the array `[v1, v1]` which was then stored in `v2`. 2. Find a good flag to use so that we know when to give up and remember `ReferenceValue::Unknown` for `v1` despite there being no conflicting direct stores to it. (\*) Note that Noir's current mem2reg algorithm does not actually fail on this case yet, but more complex aliasing cases will still fail, e.g. https://github.com/noir-lang/noir/issues/2316. ## Aliasing Part 2: Take a step back and research Unfortunately, there's no single, standard algorithm for optimizing out loads to arbitrary pointers. Instead, there are many different trade-offs to consider along the way. We must also note that identifying all pointer aliases correctly has been proven to be undecidable (Landi, 1992), so we can abandon the notion of always identifying every alias. It can be helpful to get some background information on what options are available though. I've found this resource which seems like a good starter: https://www.cs.toronto.edu/~pekhimenko/courses/cscd70-w19/docs/Lecture%207%20%5BPointer%20Analysis%5D%2003.07.2019.pdf I'll be going through each option to slowly build up the pass we need (you can find this list starting on slide 8): - Alias Representation: - We should track pointer aliases directly. Not only does this improve accuracy of the algorithm but I think it is simpler and more straightforward to understand as well. - Heap modeling: - It is somewhat unclear whether we need to model the heap at all for Noir. Unlike other languages, cyclical references are not possible (no null value and no enums). For this reason, I suggest we conservatively adopt the "heap merged" approach and do not directly model the heap. - Aggregate modeling: - We should treat each array element as an individual location. Note that arrays are also a bit different in Noir since you cannot actually obtain a reference to an element in an array, unless you declared it as `[&mut T]`. So this simplifies things somewhat. We also do not need to worry about structures at all since they are not present in our SSA form. - Flow sensitivity: - The algorithm that detailed in the prior sections is flow sensitive (and not path sensitive). I think this represents a nice compromise in terms of simplicity and accuracy of the algorithm. - Context sensitivity: - Our algorithm should be entirely context insensitive. That is - we shouldn't consider each callsite at all. This works for Noir because one of the required steps of our SSA optimizations is to inline every function. The mem2reg pass also currently occurs only after inlining takes place, so tracking contexts should be unneeded. With these we can start to build out the algorithm a bit more. ## Aliasing Part 3: A Solution! Most of the points above are already covered by the existing algorithm structure that was outlined. The most notable addition is the first point on tracking aliases directly. We can add some additional rules to track alias sets: 1. When a reference is created `vN = allocate`, it is initially aliased only to itself: `{vN}`. 2. When a reference is taken in as a parameter, it is conservatively aliased with all other reference parameters. Unlike Rust, we cannot assume mutable references are unique. Also note that while this limits optimizations greatly, it doesn't come into play often since we forcibly inline every non-unconstrained function during our inlining pass. 3. When we encounter `store v1 in v2` where `v1` is also a reference, we add `*v2` to `v1`'s alias set: `{v1, *v2}`. 4. Similarly, when an array is created containing a reference, we add it to the alias set as well: - `vA = [..., vN, ...]` => `{v1, vA[i]}` - `vA = array_set vB, index i, value vN` => `{v1, vA[i]}` - If the index is non-constant, we can conservatively give the entire array in the alias set: `{v1, vA[?]}` - We'll also need to carry this analysis forward into functions that return arrays/slices like `slice_push_back`, `slice_insert`, `slice_remove`, etc. 5. Finally, where we handled a block with multiple predecessors in our previous algorithm by taking the union of each value, we'll amend this to also union alias sets. Luckily, Noir's core semantics have a fairly small surface area so references and arrays are our only worries. We don't have to worry about pointer arithmetic, pointer casting, long jumps, etc. ### Applying the solution to the problematic example Taking the example with aliasing from earlier: ``` b0(v0: u1): v1 = allocate store Field 0 in v1 v2 = allocate store [v1, v1] in v2 jmpif v0, then: b1, else: b2 b1(): v3 = load v2 v4 = array_get v3, index 0 store Field 1 in v4 jmp b3() b2(): v5 = load v2 v6 = array_get v5, index 1 store Field 2 in v6 jmp b3() b3(): // What is the value of v1? return ``` Since we're now storing alias sets in addition to the `ReferenceValue` enum from before, I'll write this as the tuple of `(alias set, reference value)`. Now, let's apply our algorithm: - `b0`: - We see `({v1}, Known(0))` initially - Then we see a new reference `({v2}, Known([v1, v1]))` - Because this references `v1`, we modify `v1`'s reference set to `({v1, *v2[0], *v2[1]}, Known(0))` - `b1`: - `v3` now gets added to `v1`'s alias set: `{v1, *v2[0], *v2[1], v3[0], v3[1]}` - As does `v4`: `{v1, *v2[0], *v2[1], v3[0], v3[1], v4}` - When we see `store Field 1 in v4`, since we know `v4` is only in a single alias set (it can be in multiple if we are unsure which reference it is aliased to), we can actually change our ReferenceValue here as well: `({v1, ...}, Known(1))`. If `v4` had aliased several references, we would have had to change all of those references to `ReferenceValue::Unknown`. - `b2`: - This block is similar to `b1` but using `v5`, `v6`, and storing `Field 2` instead. - `b3`: - `b3` has two predecessors so we merge the references values: - `({v1, *v2[0], *v2[1], v3[0], v3[1], v4}, Known(1))` from `b1` - `({v1, *v2[0], *v2[1], v5[0], v5[1], v6}, Known(2))` from `b2` - `({v1, *v2[0], *v2[1], v3[0], v3[1], v4, v5[0], v5[1], v6}, Unknown)` is the result - Since the result is Unknown, we would (correctly) fail to replace any `load`s to `v1` in `b3`! This should be a workable algorithm, but could use some fine tuning. There are some items we could improve upon for example: - We could make some attempt to remove old aliases that are no longer used - We could try to clean up our alias set somehow. In particular, arrays and array indices clutter the set particularly when the same reference is stored in many indices of the array. - We could consider tracking "point-to" information instead of alias sets (see slide 9 in the reference link). The reference mentions this would be more efficient but less precise, although truthfully I am not sure why. ... And that's it! As long as there are not major objections to this approach, we will likely be seeing a greatly changed mem2reg in the near future. Also, I've done absolutely no proofreading, so let me know what you think :slightly_smiling_face:.