MIR move elimination

# MIR move elimination ## Summary This RFC aims to clarify the semantics of move operations in MIR, which will improve rustc's ability to optimize away unnecessary copies. ## Motivation Consider the following idiomatic Rust code which constructs an outer object containing an inner object. <details> <summary>Rust code</summary> [Godbolt](https://rust.godbolt.org/z/q5hcWafbK) ```rust struct Inner { array: [i32; 5], } impl Inner { fn new(val: i32) -> Self { let mut x = Inner { array: [val; 5] }; x.init(); x } #[inline(never)] fn init(&mut self) { // ... } } struct Outer { inner: Inner, } impl Outer { fn new(val: i32) -> Self { let mut x = Outer { inner: Inner::new(val), }; x.init(); x } #[inline(never)] fn init(&mut self) { // ... } } fn main() { let mut foo = Outer::new(123); // ... } ``` </details> The construction of `Outer` involves several function calls which create local objects, mutate them and then return them. The MIR produced by rustc copies the 20-byte array **4 times** before it reaches its final location as a local variable in `main`. LLVM is able to eliminate 2 of these copies, but it fundamentally cannot do more because the LLVM IR produced by rustc forbids `main`, `Outer::init` and `Inner::init` from observing the same address. Contrast this with the equivalent code in C++: <details> <summary>C++ code</summary> [Godbolt](https://godbolt.org/z/fKxf5EhY6) ```c++ struct Inner { int array[5]; Inner(int val): array{val, val, val, val, val} { this->init(); } void init(); }; struct Outer { Inner inner; Outer(int val): inner(val) { this->init(); } void init(); }; int main() { Outer foo(123); // ... } ``` </details> In C++, objects are always constructed at their final location in memory. C++ doesn't have a concept of implicit copies/moves like Rust does. Instead, all copies are explicit and involve calling a copy constructor which creates a *new* object at the destination address. This means that when the constructor for `Inner` is called, `this` already points to the local variable `foo` in `main`. As a result, the resulting assembly code does not have any copies. The inability of Rust to eliminate these copies requires awkward workarounds to avoid a performance hit or excessive stack usage (which could lead to stack overflows), usually in the form of "deferred initialization". This involves creating an object in an uninitialized state at its final location and then manually initializing it, often using unsafe code. As an example, the `brie-tree` crate needs to use this pattern ([1](https://github.com/Amanieu/brie-tree/blob/5b0a72fcf66dc12e4754f387794afe59167bbc3b/src/lib.rs#L186-L187) [2](https://github.com/Amanieu/brie-tree/blob/5b0a72fcf66dc12e4754f387794afe59167bbc3b/src/cursor.rs#L1069-L1076)) to avoid a 15% hit in the performance of B-Tree insertion. ## Status quo ```rust= struct Foo([u8; 100]); unsafe extern "C" { safe fn observe(b: *mut Foo); safe fn foo() -> Foo; } pub fn example() { let mut a = foo(); observe(&raw mut a); let mut b = a; observe(&raw mut b); } ``` <details> <summary>MIR</summary> ``` fn example() -> () { let mut _0: (); let mut _1: Foo; let _2: (); let mut _3: *mut Foo; let _5: (); let mut _6: *mut Foo; scope 1 { debug a => _1; let mut _4: Foo; scope 2 { debug b => _4; } } bb0: { StorageLive(_1); _1 = foo() -> [return: bb1, unwind unreachable]; } bb1: { StorageLive(_3); _3 = &raw mut _1; _2 = observe(move _3) -> [return: bb2, unwind unreachable]; } bb2: { StorageDead(_3); StorageLive(_4); _4 = move _1; StorageLive(_6); _6 = &raw mut _4; _5 = observe(move _6) -> [return: bb3, unwind unreachable]; } bb3: { StorageDead(_6); StorageDead(_4); StorageDead(_1); return; } } ``` </details> To understand why rustc is unable to perform move optimization we need to look at the generated MIR in detail. In this example we would like the move on line 11 to be eliminated. This is only possible if `a` and `b` have the same address, which would turn the assignment into a no-op. `a` and `b` are mapped to locals `_1` and `_4` respectively in the MIR. Each local corresponds to a stack allocation with a certain lifetime. The lifetime of `_4` is specified by a pair of `StorageLive`/`StorageDead` statements, while `_1` has no such statements and is therefore live over the entire function. Since the lifetimes of the locals overlap, they are forbidden from having the same address. There are 2 important factors at play here: * MIR generation assigns storage lifetimes based on scope, meaning that the storage of `a` and `b` start from the `let` binding and end once the name goes out of scope. * The assignment `_4 = move _1` is treated the same way as `_4 = copy _1` for operational semantics purposes: `move` is only used for borrow checking[^1]. [^1]: This is only true for assignments. `move` has special operational semantics for call arguments. As a consequence it is perfectly valid today, from an operational semantics point of view, for the first call to `observe` to stash a copy of the address of `a` and in the second call mutate the data behind that pointer. Fundamentally, the language needs to be changed to allow `a` and `b` to share the same address in this example and eliminate the copy. Specifically, accessing memory that has been moved must become UB. ## Proposed MIR changes This section proposes several changes to MIR to enable the move optimization. ### Storage lifetime Currently, the lifetime of locals in a function is determined using `StorageLive` and `StorageDead` statements in MIR. These serve two purposes: - They are lowered to [`llvm.lifetime.start`](https://llvm.org/docs/LangRef.html#int-lifestart) and [`llvm.lifetime.end`](https://llvm.org/docs/LangRef.html#llvm-lifetime-end-intrinsic) intrinsics which are used by LLVM for stack slot coloring, which reduces stack usage. - `StorageDead` is also used by the borrow checker to ensure that any borrows do not outlive the underlying allocation. This RFC proposes to change MIR to make the lifetime of a local implicitly defined as the point where it is initialized to the point where its contents are moved out. This involves removing `StorageLive` and reworking how `StorageDead` works. This has previously been proposed in [this issue](https://github.com/rust-lang/rust/issues/68622), but the idea is further expanded here. To achieve this we need to introduce the concept of an *active* and *inactive* allocation. When an allocation is active, it works just like an allocation today. However it is UB to access (read or write) the memory of an inactive allocation *with the provenance of that allocation*. This is an important distinction because while an allocation is inactive it is possible for another allocation to overlap with it. At any one time a byte of memory can be part of any number of inactive allocations but at most one active allocation. Specifically: - On function entry, all locals have a fixed address for the duration of the function and their allocations start as inactive. - An allocation becomes active when it is the destination of a MIR call or assignment. - An allocation becomes inactive when it is used as a `Move` operand. - The address of allocation doesn't change when it is activated and de-activated. `StorageDead` is changed to take a place instead of a local and becomes logically equivalent to an assignment that moves from the place (invalidating borrows and de-activating the allocation), but has no effect if the underlying allocation is already inactive. This needs to be preserved for the correctness of unsafe code which assumes that the allocation remains valid until the end of the scope but can be eliminated for locals that are not borrowed or where the place is always known to be inactive. Here's how this plays out with `example`: ```rust pub fn example() { // Inactive allocations for a and b is created on function entry. Both are // assigned address 0x1000. // The allocation for a is activated when it is assigned. let a = Foo([0; 100]); // Address 0x1000 is observed. observe(&a); // In one atomic operation, the allocation for a is de-activated and the // allocation for b is activated. No memory copying is needed since they // have the same address. let b = a; // Address 0x1000 is observed. observe(&b); } ``` <details> <summary>MIR</summary> ``` fn example() -> () { let mut _0: (); let mut _1: Foo; let _2: (); let mut _3: *mut Foo; let _5: (); let mut _6: *mut Foo; scope 1 { debug a => _1; let mut _4: Foo; scope 2 { debug b => _4; } } bb0: { _1 = foo() -> [return: bb1, unwind unreachable]; } bb1: { _3 = &raw mut _1; _2 = observe(move _3) -> [return: bb2, unwind unreachable]; } bb2: { _4 = move _1; _6 = &raw mut _4; _5 = observe(move _6) -> [return: bb3, unwind unreachable]; } bb3: { StorageDead(_4); return; } } ``` </details> ### Partially active allocations Moves and assignments operate on places, but allocations are created for entire locals. This means that an assignment could partially activate a local and a move could partially de-activate a local. Therefore we need to allow each byte in an allocation to be individually activated or de-activated. ```rust pub fn example2() { let mut a = (foo(), foo()); observe(&raw mut a.0); observe(&raw mut a.1); let mut b = a.0; observe(&raw mut b); } ``` <details> <summary>MIR</summary> ``` fn example2() -> () { let mut _0: (); let mut _1: (Foo, Foo); let mut _2: Foo; let mut _3: Foo; let _4: (); let mut _5: *mut Foo; let _6: (); let mut _7: *mut Foo; let _9: (); let mut _10: *mut Foo; scope 1 { debug a => _1; let mut _8: Foo; scope 2 { debug b => _8; } } bb0: { _2 = foo() -> [return: bb1, unwind unreachable]; } bb1: { _3 = foo() -> [return: bb2, unwind unreachable]; } bb2: { _1 = (move _2, move _3); _5 = &raw mut (_1.0: Foo); _4 = observe(move _5) -> [return: bb3, unwind unreachable]; } bb3: { _7 = &raw mut (_1.1: Foo); _6 = observe(move _7) -> [return: bb4, unwind unreachable]; } bb4: { _8 = move (_1.0: Foo); _10 = &raw mut _8; _9 = observe(move _10) -> [return: bb5, unwind unreachable]; } bb5: { StorageDead(_8); StorageDead(_1); return; } } ``` </details> In the example above, the move only de-activates `a.0` but not `a.1`. This leaves `a` in a partially-activated state, which allows the allocation for `b` to overlap with `a.0` while still preserving the validity of the pointer to `a.1` passed to `observe`. ### Move paths Activation and de-activation only applies to places which form a valid [*move path*](https://rustc-dev-guide.rust-lang.org/borrow_check/moves_and_initialization/move_paths.html). This concept is used by the borrow checker to track which parts of a local are currently initialized. Move paths are places which have projections restricted to only the following: - Field access - Constant indexing of array types - Deref of `Box` (notably *not* references or raw pointers!) This has the following consequences: - Move operands are only allowed on places that are valid move paths. Non-move-path places can only be copied from since we don't track whether they are active. - The destination place of any MIR statement (e.g. assignments) *except for calls*: - if it is a valid move path, will be activated it as part of the write. - if it is not a valid move path, must already be active otherwise behavior is undefined. - The return place of a MIR call must be a valid move path.[^2] - This is always the case for the initial MIR lowering since a fresh local is allocated for the result of a call. - This constraint is necessary because the return place is de-activated before the call. This allows the return place in the callee to have the same address as the call destination in the caller. [^2]: An alternative design would be to treat the return place just like any other destination place, but this prevents de-activating the destination before the call. This doesn't seem beneficial in any way. Note that it is possible for MIR optimizations to transform a non-move-path place into a valid move path, such as by removing deref projections where the target is known. This doesn't cause any soundness issues since the borrow checker ensures that borrowed places are fully initialized and remain initialized as long as they are borrowed. The reverse transformation (turning a move path into a non-move-path) is possible in theory by ensuring the place is already active when it is used via a non-move-path, but this isn't done because it generally isn't a beneficial transformation. ### Evaluation order Since `move` operands now effectively have side-effects, it is necessary to precisely define the order in which the side-effects of a MIR statement occur. The general rule is that the side-effect of `move` de-activating a place occurs after all operands and places have been evaluated, but before the main effect of a statement (writing the destination of an assignment, calling a function, etc). This means that in a MIR assignment, the source and destination places are not active at the same time and can therefore overlap. While this may seem like a downside since it forces the use of `memmove` instead of `memcpy`, in practice LLVM is able to optimize this back down to `memcpy` when the copy involves separate `alloca` which are always disjoint. Call terminators are the one exception to this rule because they involve sharing places between two functions ([#71117](https://github.com/rust-lang/rust/issues/71117)). The specific evaluation order for calls is: 1. Operands are evaluated and copied to the corresponding local in the callee. This all happens as a single parallel assignment, with the place of any `move` operands being de-activated. This also means that argument locals in the callee are allowed to re-use the address of `move` operands since their lifetimes do not overlap. 2. The destination place of the call terminator is de-activated before the call if it was previously active. This guarantees that the return place of the callee can overlap the destination place of the call. 3. The callee's body is executed. The return place of the callee starts out inactive and must be activated (i.e. initialized) before returning. 4. Upon returning, the return place of the callee is copied to the destination place of the call. This can be no-op in the compiled code if they share the same address (whether this is the case in practice depends on the ABI). This model neatly clarifies all of the [outstanding issues](https://github.com/rust-lang/rust/issues/71117) around MIR function calls, in particular justifying *why* and *when* some places in a callee are allowed to have the same address as a place in the caller. ## Impact on the compiler & Miri ### Miri Since this RFC introduces new forms of UB, Miri will need to be modified to detect accesses to de-activated allocations. Specifically, each allocation will need to track the activation state of each byte in that allocation. Notably, Miri does *not* need to handle overlapping allocations: Miri can always assign locals to non-overlapping addresses. ### MiniRust MiniRust aims to completely specify the behavior of Rust and needs to describe how memory addresses are selected for allocation. This is a non-deterministic choice from the set of addresses that an allocation *could* have, which today consists of the set of addresses not currently in use by other active allocations. However, partially-active allocations lead to a surprising behavior: in the example below, it is not possible to know at the point where `b` is defined whether that allocation is allowed to overlap with `a` since this depends on future information. ```rust // a.0 is allowed to overlap with b fn example3() { let mut a = (foo(), foo()); observe(&a.0); let b = a.0; // De-activate a.0, activate b observe(&b); drop(b); // De-activate b a.0 = foo(); // Re-activate a.0 } // a.0 is not allowed to overlap with b fn example4() { let mut a = (foo(), foo()); observe(&a.0); let b = a.0; // De-activate a.0, activate b observe(&b); // b stays active until the end of the function a.0 = foo(); // Re-activate a.0 } ``` This is not an issue for Miri and rustc since they can pick an address conservatively: it is always possible to fall back to picking an address that is not currently used by any other allocation. MiniRust needs to be able to explore all possible behaviors of a Rust program and therefore needs to reject executions where overlapping allocation addresses later cause conflicts where the same byte is active in two allocations. This would be done by declaring that executions where such conflict occur has "no behavior" and are therefore invalid. The non-deterministic choice of allocation addresses would only be allowed to select addresses which do not cause "no behavior" in the future. ### MIR optimizations MIR optimizations need to be careful not to shorten the live range of a local by moving or eliminating assignments or moves. Doing so could cause the lifetime analysis to conclude that 2 locals could share the same address where this would not be allowed in the source program. Note that this restriction only applies to locals whose address has been taken. Extending the live range of a local is not a problem since it just pessimizes the optimization by forcing locals to have separate addresses. In cases where an assignment needs to be moved down, a placeholder assignment `_dest = undef` can be left in the place of the old one. This doesn't emit any machine code, but has the effect of activating the destination local without writing anything into it, thus ensuring its live range matches the one specified in the original program. ### MIR destination propagation Rustc currently has a [destination propagation](https://github.com/rust-lang/rust/blob/master/compiler/rustc_mir_transform/src/dest_prop.rs) MIR optimization that attempts to eliminate unnecessary copies between locals. However it is currently limited to copies between unprojected locals and only supports locals whose address is not taken. This RFC aims to extend this optimization to also apply to locals which have had their address taken. This will be done in 3 steps: 1. As a pre-processing step, shrink the live range of locals. For each local whose address is not taken, perform a backwards liveness analysis to turn any `Copy` of that local into a `Move` where the local is dead after that operand. This causes these locals to be de-initialized early, but this is fine since this is not observable for these locals. 2. Then, compute the points where each local is initialized as a `SparseIntervalMatrix`. This is a forward analysis which starts from assignments to a local and ends when it is used as a `Move` operand or when an `StorageDead` is reached. 3. We can then iteratively unify pairs of locals that are linked by assignment where the source operand is moved, as long as their live range is disjoint. This is sound because the storage of a local is inactive while it is not live. The analysis could be further extended to work on move paths rather than locals, which would allow unifying a projected local with an unprojected one. For example with `_1 = _2.field` all uses of `_1` in the MIR body could be replaced with `_2.field`. ### LLVM lifetime intrinsics With the removal of `StorageLive`/`StorageDead`, codegen will need to use liveness analysis to determine the correct place to insert the LLVM lifetime intrinsics. Specifically, the lifetime of a local starts when its first byte becomes active and ends whenever all of its bytes become inactive. This lifetime analysis is already required by the destination propagation optimization, so this comes at little cost in optimized builds. Non-optimized builds don't use lifetime intrinsics. ## Drawbacks ### `Copy` is no longer "free" One surprising consequence of these changes is that implementing `Copy` for a type can inhibit the move optimization. This is because a copy doesn't invalidate any borrows of a value and therefore forbids re-using the allocation. It migth be possible to address this with a language-level `move` keyword which forces a move even for `Copy` types, but at this point it's not clear that there is sufficient justification for adding this. ### Operands have side-effects This change causes `Move` operands to have side-effects in the operational semantics, which is not the case today. While it may be possible to represent activation and de-activation of place using separate statements like `StorageLive`/`StorageDead`, this would actually make lifetime analysis much harder for MIR optimization. ### `ptr::offset` Since allocations can now have holes which are effectively "outside" the allocation, it isn't clear how to interpret the contraint on `ptr::offset` that the resulting pointer must remain in-bounds of the allocation. Note that this isn't an issue at the LLVM IR level since such "allocations with holes" only exist at the MIR level, and the holes no longer exist after lowering to LLVM IR. ### Discriminant access after drop Drop elaboration [currently generates code](https://github.com/rust-lang/rust/issues/91029) that will attempt to read the discriminant of an enum after a field in that enum has been moved out. This is currently fine since dropping currently maintains the validity invariants of the underlying data. However this would no longer work if this RFC is accepted since accessing the memory of a moved-out field is now UB. This can be worked around by changing drop elaboration to track enum discriminants separately if a field is moved out. ### Src/dest aliasing in assignments MIR assignments currently have the constraint that, for types which are not treated as scalars, the source and destination places are [not allowed to overlap](https://github.com/rust-lang/rust/issues/68364). This constraint exists to make codegen easier since such assignments can be lowered to a `memcpy` call. Not having this constraint would require codegen to insert an intermediate allocation to handle cases such as `_1 = (_1.1, _1.0)` where the source and destination may alias. This kind of MIR does not arise directly from MIR lowering, and MIR optimizations often have to do extra work to avoid generating invalid MIR. This has in the led to several bugs in MIR optimizations ([#141038](https://github.com/rust-lang/rust/issues/141038) [#141313](https://github.com/rust-lang/rust/issues/141313) [#146383](https://github.com/rust-lang/rust/issues/146383)). This constraint will have to be relaxed for this proposal to be implemented since there will no longer be any guarantee that 2 locals don't share the same address. The impact on codegen can be mitigated by apply some basic alias analysis: the extra intermediate allocation in codegen isn't needed if the source or destination is a local whose address hasn't been taken. ### Potentially breaking change This is technically a breaking change since it effectively reduces the live range for which an allocation is valid. This has 2 effects in practice, the most obvious one being that it's possible to write unsafe code that was previously accepted by Miri but that will result in UB under this new model. For example: ```rust struct Foo(i32); // Doesn't implement Copy fn main() { let a = Foo(123); let ptr = &raw const a; drop(a); // This is fine today: the storage of `a` // is valid until the end of the scope. let b = unsafe { ptr.read() }; assert_eq!(b.0, 123); } ``` Such examples are almost always contrived and unlikely to occur in real code. In fact, the behavior of allocations after a move is an [open question](https://github.com/rust-lang/unsafe-code-guidelines/issues/188) in the unsafe code guidelines and not something that we make a hard guarantee on in the language. The second effect of this change is that users will now be able to observe some place addresses being the same when this would previously have been impossible. The example from earlier demonstrates this: ```rust struct Foo([u8; 100]); unsafe extern "C" { safe fn observe(b: *mut Foo); safe fn foo() -> Foo; } pub fn example() { let mut a = foo(); observe(&raw mut a); let mut b = a; observe(&raw mut b); } ``` In some situations, users may rely on the address of a local as a unique hash map key. But even then it's unreasonable to assume that this address remains unique once the local is dropped, so breakage in practice should be non-existant.