# RFC Draft: Guaranteed Value Emplacement
## Summary
Introduce two features to the language:
- **Guaranteed Value Emplacement (GVE)**
- **Guaranteed Named Value Emplacement (GNVE)**
These features would let us create locals and temporary values in a way that's guaranteed to not involve a move.
## Motivation
In Rust, when you initialize a local to a value and then pass that value to a function, the value is usually moved several times in its lifetime:
```rust=
let foo: MyStruct = make_foo();
some_collection.push(foo);
```
Conceptually, the `MyStruct` instance is first created by `make_foo()`, then moved to the `foo` local, and then moved to the argument slot of `push()`.
"Emplacement" means being able to initialize the foo value directly into its destination, in this case inside the memory owned by `some_collection`.
Safe emplacement is something the Rust community has wanted for years.
It's a pre-requisite for two categories of features:
- Creating large values cheaply.
- Returning self-referential values.
### Creating large values
In the above example, a slot for `MyStruct` has to be reserved on the stack, so that the result of `make_foo()` can be moved to it.
This is a problem if `MyStruct` is large enough: the struct may be expensive to copy; reserving space for the struct might even trigger a stack overflow.
Emplacement would let us write the `MyStruct` instance directly into its final address on the heap.
Currently the compiler may or may not optimize the intermediary steps away.
These optimizations are useful, but unreliable, and we want a way to express the desired semantics explicitly.
### Returning self-referential values
Consider one of the classic use cases for immovable types: interop with C++ strings.
```rust=
struct CppString {
ptr: *mut u8,
len: usize,
capacity: usize,
}
```
C++ does something called "small string optimization", where for small-enough strings `ptr` may (or may not) point to the area covered by `len` and `capacity`.
This means Rust code cannot safely move a C++ string.
Ideally, we'd like to write a constructor like this:
```rust=
impl CppString {
fn new_small(content: &[u8]) -> Self {
let mut value = Self {
ptr: null(),
len: 0,
capacity: 0,
};
value.ptr = &raw const value.len;
// ... (Interpret len,capacity as a byte buffer and write to it.)
value
}
}
```
The problem with that code is that `value` is moved when the function returns it, and therefore `value.ptr` might be outdated if it had previously pointed to `value`'s fields.
Emplacement would let us write a self-referential value directly into its final address.
## Guide-level explanation
In this document, we use "places" to refer to locations in memory where values are stored.
They can be accessed through [place expressions](https://doc.rust-lang.org/reference/expressions.html#place-expressions-and-value-expressions).
### Guaranteed Value Emplacement
**Guaranteed Value Emplacement**, or GVE, means that some expressions are guaranteed not to move the values of their sub-expressions. Instead, whatever place the parent expression ends up stored in, the sub-expression is directly initialized and stored in a subset of that place.
The rest of this document will detail what that means in practice.
For instance:
```rust=
let x = ParentStruct {
foo: foo(),
child: VeryBigStruct {
// ...
},
bar: bar(),
};
```
In the above code, the compiler guarantees that it will not move `VeryBigStruct`: instead the `VeryBigStruct` expression will be emplaced at address `&x + offset_of!(ParentStruct, child)` from the start.
When expressions emplace their arguments instead of moving them, we say that these expressions "preserve GVE".
The following expressions preserve GVE:
- `GroupedExpression`
- `ArrayExpression` in comma-separated form.
- `TupleExpression`
- `StructExpression`
- `CallExpression` of tuple structs.
- `RangeExpression`
- Any `ExpressionWithBlock` with a tail expression.
#### GVE is transitive
GVE applies to *any combination* of the above expressions.
The sub-expression's values will all be emplaced into an address derived from the root place.
For instance:
```rust=
let x = ParentStruct {
wrapper: WrapperStruct {
child: VeryBigStruct {
// ...
},
},
};
```
In this code, `VeryBigStruct` will be emplaced in a sub-place of `x`.
#### Destination place
In the above definition, we said intermediary values are emplaced in a subset of "whatever place the parent expression ends up stored in".
That place depends on what context the value expression is evaluated in:
- In a [`let` statement](https://doc.rust-lang.org/reference/statements.html#grammar-LetStatement), the destination place is the new variable.
- In an [assignment expression](https://doc.rust-lang.org/reference/expressions/operator-expr.html#assignment-expressions):
- If the assignee expression is an uninitialized variable, the destination place is that variable.
- Else, the destination place is a temporary.
- In a [`return` expression](https://doc.rust-lang.org/reference/expressions/return-expr.html), the destination place is the function's return slot.
- In the tail expression of a function, the destination place is the function's return slot.
- In every other case, the destination place is a temporary.
Later sections will have more detail on this.
### Guaranteed Named Value Emplacement
**Guaranteed Named Value Emplacement**, or GNVE, means that some local variables can be assigned a place at creation, in a way that guarantees the contents of the local variable are never moved, even when the local is used as the argument of another expression.
Example:
```rust=
let child = VeryBigStruct {
// ...
};
let x = ParentStruct {
child
};
```
In the above code, the compiler *still* guarantees that it will not move `VeryBigStruct`, and `child` will be constructed in a sub-place of `x`.
We can verify this empirically:
```rust=
let child = VeryBigStruct {
// ...
};
let child_addr = &raw const child as usize;
let x = ParentStruct {
child,
// ...
};
let x_addr = &raw const x as usize;
assert_eq!(child_addr, x_addr + offset_of!(ParentStruct, child));
```
The assertion will always pass.
#### Restrictions on GNVE
To facilitate implementation, we limit the scope of GNVE as much as possible.
GNVE only applies to a local variable when:
- It's moved exactly once.
- It's moved in the block they're declared in.
- It's not moved in child blocks.
- The expression that moves it preserves GVE.
- The expression that moves it is either an initialiazing assignment to a local declared in the same block, a return expression, or the tail expression of the block.
We can intuitively think of many use-cases where emplacement ought to work that isn't covered by the above, but for the initial implementation we try to keep rules simple.
Future proposals might loosen these restrictions.
### Schrodinger's GVE
GVE and GNVE are only guaranteed in cases where they can be *observed*.
For example, the last code snippet with the `assert_eq!` statement is an instance of observation: it reads the addresses of the places of `child` and `x`.
In formal terms, GVE and GNVE guarantee that the observable addresses of places follow the rules listed above.
While we can assume that the compiler doesn't add superfluous moves (for performance reasons if nothing else), *the compiler is still allowed to do so* as long as these moves do not change the addresses observed by the program.
As an example:
```rust=
let z = if CONDITION {
let x: u32 = 100;
let mut y = x;
y += bar(33);
y
} else {
// ...
}
do_stuff(z);
```
While in theory, the compiler should enforce that `x`, `y` and `z` have the same address, in practice that address isn't observable anywhere, and the compiler backend is likely to store these values in registers with no memory address at all.
### GVE across function returns
If a GVE-preserving expression ends up as the tail expression of a function, or the argument of a return expression, then the sub-expression's values will be emplaced directly into the function's return place.
That means, in some cases, that the values may be emplaced directly in a place defined in the caller, or a transitive caller:
```rust=
fn make_big_struct() -> VeryBigStruct {
VeryBigStruct {
// ...
}
}
fn foo() -> ParentStruct {
ParentStruct {
wrapper: WrapperStruct {
child: make_big_struct(),
},
}
}
let x = foo();
```
In the above example, `VeryBigStruct` likely isn't moved at all.
**By default, the language does not guarantee that function returns preserve GVE**.
For instance, `VeryBigStruct` might be emplaced in `make_big_struct()`'s return place, and *then* moved to `foo()`'s return place.
The language does not guarantee that those are the same.
The reason for this is that functions in some ABIs return small values through registers, which do not have a memory address.
#### Emplacing functions
For cases where we do need cross-function GVE, we add a perma-unstable function attribute named `#[rustc_emplace]`.
This attribute is a placeholder, and future proposals should come up with better ways to specify GVE-preserving functions.
Here's how we might write the "C++ string constructor" example:
```rust=
impl CppString {
#[rustc_emplace]
unsafe fn new_small(content: &[u8]) -> Self {
let mut value = Self {
ptr: null(),
len: 0,
capacity: 0,
};
value.ptr = &raw const value.len;
// ... (Interpret len,capacity as a byte buffer and write to it.)
value
}
}
```
Note that `new_small()` is declared as `unsafe`, because the user might still move the returned string after it is returned.
For that string to remain valid, the caller must guarantee that it is emplaced and then immediately pinned.
Other APIs could then rely on `ptr` pointing to non-garbage data.
(See [Futur possibilities - Pinned places](#pinned-places) section for more.)
### Emplacing into a pointer
In the [Destination place](#destination-place) section, we said:
> - If the assignee expression is an uninitialized variable, the destination place is that variable.
> - Else, the destination place is a temporary.
This means that it's impossible to emplace a value directly into the pointee of a mutable reference, pointer, or slice, into a static, or into an already initialized local variable.
This is because emplacement may put a value in a partially initialized state.
If the emplacing code panics (or if the value is accessed elsewhere through interior mutability), this could lead to undefined behavior:
```rust=
fn foo(x: &mut MyStruct) {
let a = ...;
panic!("oops");
let b = ...;
*x = MyStruct { a, b };
}
fn bar() {
let x: MyStruct = ...;
foo(&mut x);
// Do we call the destructors of x.a and x.b?
}
```
So as a general rule, you can only write into a destination place if it's safe for that place to hold uninitialized memory.
If you have a pointer that you know respects that condition, you can use the unsafe macro `std::mem::emplace_into` to emplace an expression into a pointer:
```rust=
fn foo(x: &mut MaybeUninit<MyStruct>) {
let a = ...;
let b = ...;
// SAFETY: MaybeUninit::as_mut_ptr() can safely point to
// partially-initialized memory.
unsafe { emplace_into!(x.as_mut_ptr(), MyStruct { a, b }) };
}
```
### Enforcing GVE and GNVE
The rules listed above are intricate, and purely implicit.
In a large enough function, it might be hard to notice that a value you wanted to emplace ends up moved instead.
To address this, we add an `#[emplace]` attribute, which can annotate both local variables and value expressions.
For every expression annotated with `#[emplace]`, the compiler will check that the expression is assigned to a sub-place of the desired destination.
By default, the desired destination is the return place, but developers can use the `#[emplace(variable_name)]` syntax to instead mark a given local variable as the desired destination.
```rust=
#[emplace(x)]
let child = VeryBigStruct {
// ...
};
let x = ParentStruct {
// ERROR: child is supposed to be emplaced into x
child: Box::new(child)
// ^~~~~ But child is moved here.
};
```
## Reference-level explanation
TODO
## Drawbacks
This proposal makes the rules for place expressions more complex.
This proposal introduces a kind of "time-travel" effect where later expressions may affect how earlier expressions are computed (though this is already the case with type inference).
The proposal might add a false sense of security to users, who might assume that a value is automatically emplaced in cases where it is actually moved.
## Rationale and alternatives
TODO
- Not guaranteeing emplacement across returns
- [Init expressions](https://hackmd.io/@aliceryhl/BJutRcPblx)
- [In-place initializion via outptr](/awB-GOYJRlua9Cuc0a3G-Q)
## Future possibilities
### APIs based on placement return
TODO
https://github.com/rust-lang/rfcs/pull/2884
### Pinned places
TODO
Pinned places and pinned return values.
https://without.boats/blog/pinned-places/
https://poignardazur.github.io/2024/08/16/pinned-places/
### Init and PinInit traits
TODO
### Returning `impl Trait` from `dyn` traits.
#### `async fn` in `dyn` traits.
TODO
### `#[must_emplace]` attribute for types
TODO
Similar to `#[must_use]`
### Accessing `#[repr(transparent)]` field
TODO
### Looser restrictions on GNVE
TODO