# MIR move elimination
## Summary
This RFC aims to clarify the semantics of move operations in MIR, which will improve rustc's ability to optimize away unnecessary copies.
## Motivation
Consider the following idiomatic Rust code which constructs an outer object containing an inner object.
<details>
<summary>Rust code</summary>
[Godbolt](https://rust.godbolt.org/z/q5hcWafbK)
```rust
struct Inner {
array: [i32; 5],
}
impl Inner {
fn new(val: i32) -> Self {
let mut x = Inner { array: [val; 5] };
x.init();
x
}
#[inline(never)]
fn init(&mut self) {
// ...
}
}
struct Outer {
inner: Inner,
}
impl Outer {
fn new(val: i32) -> Self {
let mut x = Outer {
inner: Inner::new(val),
};
x.init();
x
}
#[inline(never)]
fn init(&mut self) {
// ...
}
}
fn main() {
let mut foo = Outer::new(123);
// ...
}
```
</details>
The construction of `Outer` involves several function calls which create local objects, mutate them and then return them. The MIR produced by rustc copies the 20-byte array **4 times** before it reaches its final location as a local variable in `main`.
LLVM is able to eliminate 2 of these copies, but it fundamentally cannot do more because the LLVM IR produced by rustc forbids `main`, `Outer::init` and `Inner::init` from observing the same address.
Contrast this with the equivalent code in C++:
<details>
<summary>C++ code</summary>
[Godbolt](https://godbolt.org/z/fKxf5EhY6)
```c++
struct Inner {
int array[5];
Inner(int val): array{val, val, val, val, val} {
this->init();
}
void init();
};
struct Outer {
Inner inner;
Outer(int val): inner(val) {
this->init();
}
void init();
};
int main() {
Outer foo(123);
// ...
}
```
</details>
In C++, objects are always constructed at their final location in memory. C++ doesn't have a concept of implicit copies/moves like Rust does. Instead, all copies are explicit and involve calling a copy constructor which creates a *new* object at the destination address. This means that when the constructor for `Inner` is called, `this` already points to the local variable `foo` in `main`. As a result, the resulting assembly code does not have any copies.
The inability of Rust to eliminate these copies requires awkward workarounds to avoid a performance hit or excessive stack usage (which could lead to stack overflows), usually in the form of "deferred initialization". This involves creating an object in an uninitialized state at its final location and then manually initializing it, often using unsafe code.
As an example, the `brie-tree` crate needs to use this pattern ([1](https://github.com/Amanieu/brie-tree/blob/5b0a72fcf66dc12e4754f387794afe59167bbc3b/src/lib.rs#L186-L187) [2](https://github.com/Amanieu/brie-tree/blob/5b0a72fcf66dc12e4754f387794afe59167bbc3b/src/cursor.rs#L1069-L1076)) to avoid a 15% hit in the performance of B-Tree insertion.
## Status quo
```rust=
struct Foo([u8; 100]);
unsafe extern "C" {
safe fn observe(b: *mut Foo);
safe fn foo() -> Foo;
}
pub fn example() {
let mut a = foo();
observe(&raw mut a);
let mut b = a;
observe(&raw mut b);
}
```
<details>
<summary>MIR</summary>
```
fn example() -> () {
let mut _0: ();
let mut _1: Foo;
let _2: ();
let mut _3: *mut Foo;
let _5: ();
let mut _6: *mut Foo;
scope 1 {
debug a => _1;
let mut _4: Foo;
scope 2 {
debug b => _4;
}
}
bb0: {
StorageLive(_1);
_1 = foo() -> [return: bb1, unwind unreachable];
}
bb1: {
StorageLive(_3);
_3 = &raw mut _1;
_2 = observe(move _3) -> [return: bb2, unwind unreachable];
}
bb2: {
StorageDead(_3);
StorageLive(_4);
_4 = move _1;
StorageLive(_6);
_6 = &raw mut _4;
_5 = observe(move _6) -> [return: bb3, unwind unreachable];
}
bb3: {
StorageDead(_6);
StorageDead(_4);
StorageDead(_1);
return;
}
}
```
</details>
To understand why rustc is unable to perform move optimization we need to look at the generated MIR in detail. In this example we would like the move on line 10 to be eliminated. This is only possible if `a` and `b` have the same address, which would turn the assignment into a no-op.
`a` and `b` are mapped to locals `_1` and `_4` respectively in the MIR. Each local corresponds to a stack allocation with a certain lifetime. The lifetime of `_4` is specified by a pair of `StorageLive`/`StorageDead` statements, while `_1` has no such statements and is therefore live over the entire function. Since the lifetimes of the locals overlap, they are forbidden from having the same address.
There are 2 important factors at play here:
* MIR generation assigns storage lifetimes based on scope, meaning that the storage of `a` and `b` start from the `let` binding and end once the name goes out of scope.
* The assignment `_4 = move _1` is treated the same way as `_4 = copy _1` for operational semantics purposes: `move` is only used for borrow checking[^1].
[^1]: This is only true for assignments. `move` has special operational semantics for call arguments.
As a consequence it is perfectly valid today, from an operational semantics point of view, for the first call to `observe` to stash a copy of the address of `a` and in the second call mutate the data behind that pointer.
Fundamentally, the semantics of MIR need to be changed to allow `a` and `b` to share the same address in this example and eliminate the copy. Specifically, accessing memory that has been moved must become UB.
## Proposed MIR changes
This section proposes several changes to MIR to enable the move optimization.
### Storage lifetime
Currently, the lifetime of locals in a function is determined using `StorageLive` and `StorageDead` statements in MIR. These serve two purposes:
- They are lowered to [`llvm.lifetime.start`](https://llvm.org/docs/LangRef.html#int-lifestart) and [`llvm.lifetime.end`](https://llvm.org/docs/LangRef.html#llvm-lifetime-end-intrinsic) intrinsics which are used by LLVM for stack slot coloring, which reduces stack usage.
- `StorageDead` is also used by the borrow checker to ensure that any borrows do not outlive the underlying allocation.
This RFC proposes to remove `StorageLive` and `StorageDead` from MIR and instead make the lifetime of a local implicitly defined as the point where it is initialized to the point where its contents are moved out. This has previously been proposed in [this issue](https://github.com/rust-lang/rust/issues/68622), but the idea is further expanded here.
To achieve this we need to introduce the concept of an *active* and *inactive* allocation. When an allocation is active, it works just like an allocation today. However it is UB to access (read or write) the memory of an inactive allocation *with the provenance of that allocation*. This is an important distinction because while an allocation is inactive it is possible for another allocation to overlap with it. At any one time a byte of memory can be part of any number of inactive allocations but at most one active allocation.
Specifically:
- On function entry, all locals have a fixed address for the duration of the function and their allocations start as inactive.
- An allocation becomes active when it is the destination of a MIR call or assignment.
- An allocation becomes inactive when it is used as a `Move` operand.
- The address of allocation doesn't change when it is activated and de-activated.
The behavior of `StorageDead` around borrow checking is preserved by adding an `InvalidateBorrows(<local>)` statement to MIR where `StorageDead` used to be. This is logically equivalent to an assignment that moves from the place (invalidating borrows and de-activating the allocation), but has no effect if the underlying allocation is already inactive. These are necessary for the correctness of unsafe code which assumes that the allocation remains valid until the end of the scope but can be eliminated for locals that are not borrowed or where the place is always known to be inactive.
Here's how this plays out with `example`:
```rust
pub fn example() {
// Inactive allocations for a and b is created on function entry. Both are
// assigned address 0x1000.
// The allocation for a is activated when it is assigned.
let a = Foo([0; 100]);
// Address 0x1000 is observed.
observe(&a);
// In one atomic operation, the allocation for a is de-activated and the
// allocation for b is activated. No memory copying is needed since they
// have the same address.
let b = a;
// Address 0x1000 is observed.
observe(&b);
}
```
<details>
<summary>MIR</summary>
```
fn example() -> () {
let mut _0: ();
let mut _1: Foo;
let _2: ();
let mut _3: *mut Foo;
let _5: ();
let mut _6: *mut Foo;
scope 1 {
debug a => _1;
let mut _4: Foo;
scope 2 {
debug b => _4;
}
}
bb0: {
_1 = foo() -> [return: bb1, unwind unreachable];
}
bb1: {
_3 = &raw mut _1;
_2 = observe(move _3) -> [return: bb2, unwind unreachable];
}
bb2: {
_4 = move _1;
_6 = &raw mut _4;
_5 = observe(move _6) -> [return: bb3, unwind unreachable];
}
bb3: {
InvalidateBorrows(_4);
return;
}
}
```
</details>
### Partially active allocations
Moves and assignments operate on places, but allocations are created for entire locals. This means that an assignment could partially activate a local and a move could partially de-activate a local. Therefore we need to allow each byte in an allocation to be individually activated or de-activated.
```rust
pub fn example2() {
let mut a = (foo(), foo());
observe(&raw mut a.0);
observe(&raw mut a.1);
let mut b = a.0;
observe(&raw mut b);
}
```
<details>
<summary>MIR</summary>
```
fn example2() -> () {
let mut _0: ();
let mut _1: (Foo, Foo);
let mut _2: Foo;
let mut _3: Foo;
let _4: ();
let mut _5: *mut Foo;
let _6: ();
let mut _7: *mut Foo;
let _9: ();
let mut _10: *mut Foo;
scope 1 {
debug a => _1;
let mut _8: Foo;
scope 2 {
debug b => _8;
}
}
bb0: {
_2 = foo() -> [return: bb1, unwind unreachable];
}
bb1: {
_3 = foo() -> [return: bb2, unwind unreachable];
}
bb2: {
_1 = (move _2, move _3);
_5 = &raw mut (_1.0: Foo);
_4 = observe(move _5) -> [return: bb3, unwind unreachable];
}
bb3: {
_7 = &raw mut (_1.1: Foo);
_6 = observe(move _7) -> [return: bb4, unwind unreachable];
}
bb4: {
_8 = move (_1.0: Foo);
_10 = &raw mut _8;
_9 = observe(move _10) -> [return: bb5, unwind unreachable];
}
bb5: {
InvalidateBorrows(_8);
InvalidateBorrows(_1);
return;
}
}
```
</details>
In the example above, the move only de-activates `a.0` but not `a.1`. This leaves `a` in a partially-activated state, which allows the allocation for `b` to overlap with `a.0` while still preserving the validity of the pointer to `a.1` passed to `observe`.
### Evaluation order
Since `move` operands now effectively have side-effects, it is necessary to precisely define the order in which the side-effects of a MIR statement occur. The general rule is that the side-effect of `move` de-activating a place occurs after all operands and places have been evaluated, but before the main effect of a statement (writing the destination of an assignment, calling a function, etc).
This means that in a MIR assignment, the source and destination places are not active at the same time and can therefore overlap. While this may seem like a downside since it forces the use of `memmove` instead of `memcpy`, in practice LLVM is able to optimize this back down to `memcpy` when the copy involves separate `alloca` which are always disjoint.
Call terminators are the one exception to this rule because they involve sharing places between two functions ([#71117](https://github.com/rust-lang/rust/issues/71117)). The specific evaluation order for calls is:
1. All operands and places are evaluated.
2. The values of `copy` and `const` operands are read and copied to a new place in the callee.
3. The places of `move` operands may be non-deterministically used directly by the callee, or have their value copied to a new place in the callee. This is optimization and backend dependent.
4. If the destination place was previously active, it is de-activated.
5. The call happens.
6. After returning, the places of `move` operands is de-activated.
7. The destination place may be non-deterministically used directly by the callee, or have its value copied from the callee's destination place after the call returns. In the former case it must have been activated by the callee, in the latter the copy activates it.
One consequence of this is that the places of `move` operands are not allowed to overlap with the destination place since both may be directly used by the callee.
## Impact on the compiler & Miri
### Miri
Since this RFC introduces new forms of UB, Miri will need to be modified to detect accesses to de-activated allocations. Specifically, each allocation will need to track the activation state of each byte in that allocation.
Notably, Miri does *not* need to handle overlapping allocations: Miri can always assign locals to non-overlapping addresses.
### MiniRust
MiniRust aims to completely specify the behavior of Rust and needs to describe how memory addresses are selected for allocation. This is a non-deterministic choice from the set of addresses that an allocation *could* have, which today consists of the set of addresses not currently in use by other active allocations.
However, partially-active allocations lead to a surprising behavior: in the example below, it is not possible to know at the point where `b` is defined whether that allocation is allowed to overlap with `a` since this depends on future information.
```rust
// a.0 is allowed to overlap with b
fn example3() {
let mut a = (foo(), foo());
observe(&a.0);
let b = a.0; // De-activate a.0, activate b
observe(&b);
drop(b); // De-activate b
a.0 = foo(); // Re-activate a.0
}
// a.0 is not allowed to overlap with b
fn example4() {
let mut a = (foo(), foo());
observe(&a.0);
let b = a.0; // De-activate a.0, activate b
observe(&b);
// b stays active until the end of the function
a.0 = foo(); // Re-activate a.0
}
```
This is not an issue for Miri and rustc since they can pick an address conservatively: it is always possible to fall back to picking an address that is not currently used by any other allocation.
MiniRust needs to be able to explore all possible behaviors of a Rust program and therefore needs to reject executions where overlapping allocation addresses later cause conflicts where the same byte is active in two allocations. This would be done by declaring that executions where such conflict occur has "no behavior" and are therefore invalid. The non-deterministic choice of allocation addresses would only be allowed to select addresses which do not cause "no behavior" in the future.
### MIR optimizations
MIR optimizations need to be careful not to shorten the live range of a local by moving or eliminating assignments or moves. Doing so could cause the lifetime analysis to conclude that 2 locals could share the same address where this would not be allowed in the source program. Note that this restriction only applies to locals whose address has been taken. Extending the live range of a local is not a problem since it just pessimizes the optimization by forcing locals to have separate addresses.
### MIR destination propagation
Rustc currently has a [destination propagation](https://github.com/rust-lang/rust/blob/master/compiler/rustc_mir_transform/src/dest_prop.rs) MIR optimization that attempts to eliminate unnecessary copies between locals. However it is currently limited to copies between unprojected locals and only supports locals whose address is not taken.
This RFC aims to extend this optimization to also apply to locals which have had their address taken. This will be done in 3 steps:
1. As a pre-processing step, shrink the live range of locals. For each local whose address is not taken, perform a backwards liveness analysis to turn any `Copy` of that local into a `Move` where the local is dead after that operand. This causes these locals to be de-initialized early, but this is fine since this is not observable for these locals.
2. Then, compute the points where each local is initialized as a `SparseIntervalMatrix`. This is a forward analysis which starts from assignments to a local and ends when it is used as a `Move` operand or when an `InvalidateBorrows` is reached.
3. We can then iteratively unify pairs of locals that are linked by assignment where the source operand is moved, as long as their live range is disjoint. This is sound because the storage of a local is inactive while it is not live.
The analysis could be further extended to work on move paths rather than locals, which would allow unifying a projected local with an unprojected one. For example with `_1 = _2.field` all uses of `_1` in the MIR body could be replaced with `_2.field`.
### LLVM lifetime intrinsics
With the removal of `StorageLive`/`StorageDead`, codegen will need to use liveness analysis to determine the correct place to insert the LLVM lifetime intrinsics.
Specifically, the lifetime of a local starts when its first byte becomes active and ends whenever all of its bytes become inactive.
This lifetime analysis is already required by the destination propagation optimization, so this comes at little cost in optimized builds. Non-optimized builds don't use lifetime intrinsics.
## Drawbacks
### `Copy` is no longer "free"
One surprising consequence of these changes is that implementing `Copy` for a type can inhibit the move optimization. This is because a copy doesn't invalidate any borrows of a value and therefore forbids re-using the allocation.
It migth be possible to address this with a language-level `move` keyword which forces a move even for `Copy` types, but at this point it's not clear that there is sufficient justification for adding this.
### Operands have side-effects
This change causes `Move` operands to have side-effects in the operational semantics, which is not the case today. While it may be possible to represent activation and de-activation of place using separate statements like `StorageLive`/`StorageDead`, this would actually make lifetime analysis much harder for MIR optimization.
### `ptr::offset`
Since allocations can now have holes which are effectively "outside" the allocation, it isn't clear how to interpret the contraint on `ptr::offset` that the resulting pointer must remain in-bounds of the allocation. Note that this isn't an issue at the LLVM IR level since such "allocations with holes" only exist at the MIR level, and no longer exist after lowering to LLVM IR.