owned this note
owned this note
Published
Linked with GitHub
This document is also on [Internals](https://internals.rust-lang.org/t/pre-rfc-minimal-flexible-placement-and-returning-unsized-types/10032). I'll try to keep them in sync.
Pre-RFC: Minimal, flexible placement and returning unsized types
===
Implement ["placement"](https://internals.rust-lang.org/t/removal-of-all-unstable-placement-features/7223) with no new syntax, by extending the existing capabilities of ordinary `return`. This involves [copying Guaranteed Copy Elision rules pretty much wholesale from C++17](https://jonasdevlieghere.com/guaranteed-copy-elision/), adding functions like `fn new_with<F: FnOnce() -> T, T: ?Sized>(f: F) -> Self` to Box and `fn push_with<F: FnOnce() -> T>(&mut self, f: F)` to Vec to allow performing the allocation before evaluating F, providing raw access to the return slot for functions as an unsafe feature, and allowing unsized object to be returned directly by compiling such functions into a special kind of "generator."
Starting with the questions given at the end of the [old RFC's mortician's note](https://github.com/rust-lang/rust/issues/27779):
* **Does the new RFC support DSTs? serde and fallible creation?** Yes on DSTs. On fallible creation, it punts it into the future section.
* **Does the new RFC have a lot of traits? Is it justified?** It introduces no new traits at all.
* **Can the new RFC handle cases where allocation fails? Does this align with wider language plans (if any) for fallible allocation?** Yes.
* **are there upcoming/potential language features that could affect the design of the new RFC? e.g. custom allocators, NoMove, HKTs? What would the implications be?** Not really. `Pin` can have a `new_with` function just like anyone else, custom allocators would happen entirely behind this, true HKT's are probably never going to be added, and associated type constructors aren't going to affect this proposal since the proposal defines no new traits or types that would use them.
# Motivation
[motivation]: #Motivation
Rust has a dysfunctional relationship with objects that are large or variable in size. It can accept them as parameters pretty well using references, but creating them is horrible:
* A function pretty much has to use `Vec` to create huge arrays, even if the array is fixed size. The way you'd want to do it, `Box::new([0; 1_000_000])`, will allocate the array on the stack and then copy it into the Box. This same form of copying shows up in tons of API's, like serde's Serialize trait.
* There's no safe way to create gigantic, singular structs at all. If your 1M array is wrapped somehow, you pretty much have to allocate the memory by hand and transmute.
* You can't return bare unsized types. [You can create them locally, and you can pass them as arguments](https://github.com/rust-lang/rfcs/blob/master/text/1909-unsized-rvalues.md), but not return them.
As far as existing emplacement proposals go, this one was written with the following requirements in mind:
* **It needs to be possible to wrap it in a safe API.** Safe API examples are given for built-in data structures, including a full sketch of the implementation for Box, including exception safety.
* **It needs to support already-idiomatic constructors like `fn new() -> GiantStruct { GiantStruct { ... } }`** Since this proposal is defined in terms of Guaranteed Copy Elision, this is a gimme.
* **It needs to be possible to in-place populate data structures that cannot be written using a single literal expression.** The `write_return_with` intrinsic allows this to be done in an unsafe way. Sketches for APIs built on top of them are also given in the [future-possibilities] section.
* **It needs to avoid adding egregious overhead in cases where the values being populated are small (in other words, if the value being initialized is the size of a pointer or smaller, it needs to be possible for the compiler to optimize away the outptr).** Since this proposal is written in terms of Guaranteed Copy Elision, this is a gimme. The exception of the "weird bypass functions" `read_return_with` and `write_return_with` may seem to break this; see the [example desugarings here](#How-do-the-return-slot-functions-work-when-the-copy-is-not-actually-elided) for info on how these functions work when the copy is not actually elided.
* **It needs to solve most of the listed problems with the old proposals.** Since this one actually goes the distance and defines when copy elision will kick in, it fixes the biggest problems that the old `box` literal system had. It is also written in terms of present-day Rust, using `impl Trait`, and with the current direction of Rust in mind.
# Guide-level explanation
[guide-level-explanation]: #Guide-level-explanation
If you need to allocate a very large data structure, or a DST, you should prefer using `Box::new_with` over `Box::new`. For example:
```rust
let boxed_array = Box::new_with(|| [0; 1_000_000]); // instead of Box::new([0; 1_000_000])
let boxed_data_struct = Box::new_with(DataStruct::new); // instead of Box::new(DataStruct::new())
```
The `new_with` function will perform the allocation first, then evaluate the closure, placing the result directly within it. The `new` function, on the other hand, evaluates the argument *then* performs the allocation and copies the value into it.
A similar function exist for Vec and other data structures, with names like `push_with` and `insert_with`.
When writing constructors, you should create large data structures directly on the return line. For example:
```rust
// This function will allocate its return value directly in the return slot.
fn good() -> [i32; 1_000_000] {
[1; 1_000_000]
}
// This function will return a raw slice, and can also be called as `Box::new_with(good_dst)`
fn good_dst() -> [i32] {
let n = 1_000_000;
[1; n]
}
// This function will copy the array when it returns.
fn bad() -> [i32; 1_000_000] {
let mut arr = [0; 1_000_000];
for i in 0..1_000_000 {
arr[i] = i;
}
arr
}
// This function will compile successfully, but it will allocate the array on the stack,
// which is probably very bad.
fn bad_dst() {
fn takes_dst(_k: [i32]) {}
fn returns_dst() -> [i32] { let n = 1_000_000; [1; n] }
takes_dst(returns_dst())
}
```
This is guaranteed to see through if expressions, function calls, literals, unsafe blocks, and the return statement. It is not guaranteed to see through variable assignment or break with value.
```rust
// "good" functions will write their results directly in the caller-provided slot
fn good() -> Struct {
if something() { Struct { ... } } else { Struct { ... } }
}
fn good() -> Struct {
loop { return Struct { ... } }
}
fn good() -> Struct2 {
Struct2 { member: Struct { ... } }
}
// "bad" functions will not necessarily do that
fn bad() -> Struct {
loop { break Struct { ... } }
}
fn bad() -> Struct {
let q = Struct { ... };
q
}
fn bad() -> Struct2 {
let q = Struct { ... }
Struct2 { member: q }
}
```
In other words, Rust does not currently guarantee Named Return Value Optimization.
In the rustonomicon, it should mention that the `write_return_with` intrinsic can be used to build a function that's equivalent to `bad()`:
```rust
use std::mem::{write_return_with, ReturnSlot};
use std::alloc::Layout;
fn not_bad() -> [i32] {
unsafe {
write_return_with(Layout::new::<[i32; 1_000_000]>(), |arr: *mut u8| {
let arr: &mut [i32] = slice::from_raw_parts_mut(arr, 1_000_000);
for i in 0..1_000_000 {
arr[i] = i;
}
arr
})
}
}
```
# Reference-level explanation
[reference-level-explanation]: #Reference-level-explanation
## Did you say I can return unsized types?
A function that directly returns an unsized type should be compiled into two functions, essentially as a special kind of generator. Like this, basically:
```rust
// sugar
fn my_function() -> str {
*"hello world"
}
fn function_that_calls_my_function() -> str {
println!("Hi there!");
my_function()
}
// desugar my_function
// this is written in order to aid understanding, not because these are good APIs
use std::alloc::Layout;
struct __MyFunction__Internal {
local_0: &'static str,
}
impl __MyFunction__Internal {
/* "unsize coroutines" are different from normal generators because their behavior is much more
restricted. They yield exactly one Layout, and then they are finished by writing to a pointer. */
unsafe fn start(state: *mut __MyFunction__Internal) -> Layout {
/* As usual for generators, locals (including anonymous ones) get desugared into member
variables of an anonymous struct. In this case, the local variable is just a string literal. */
state.local_0 = "hello world";
Layout::for_value("hello world")
}
unsafe fn finish(state: *mut __MyFunction__Internal, slot: *mut u8) -> &mut str {
ptr::copy(state.local_0, slot.as_mut_ptr(), mem::size_of_val(state.local_0));
/* Notice how I also have to return a properly-built fat pointer?
This isn't very important for string slices, but it's mandatory for trait objects, because I need to
supply a vtable. */
str::from_raw_parts_mut(slot, state.local_0.len())
}
}
// desugar function_that_calls_my_function
use std::alloc::Layout;
struct __FunctionThatCallsMyFunction__Internal {
delegate: __MyFunction__Internal,
}
impl __FunctionThatCallsMyFunction__Internal {
unsafe fn start(state: *mut __FunctionThatCallsMyFunction__Internal ) -> Layout {
/* first, run everything leading up to the return */
println!("Hi, there!");
/* then, desugar the return (in this case, by delegating */
__MyFunction__Internal::start(&mut state.delegate)
}
unsafe fn finish(state: *mut __FunctionThatCallsMyFunction__Internal, slot: *mut u8) -> &mut str {
__MyFunction__Internal::start(&mut state.delegate, slot)
}
}
```
This interface ends up putting some pretty harsh limitations on what functions that return unsized types can do. An unsized type-returning function can always return the following kinds of expression:
* Constants and dereferencing constants (as shown above). These are desugared by yielding the layout of the literal and returning the literal by copying it.
* Directly returning the value of another function that also returns the same unsized type. These are desugared by forwarding, as shown above with `function_that_calls_my_function`.
* [RFC 1909](https://github.com/rust-lang/rfcs/blob/master/text/1909-unsized-rvalues.md) variable-length array literals. These are desugared by yielding the length variable, and returned by writing the value repeatedly though ptr offsets and ptr write.
* Unsized coersions. These are desugared by storing the sized type as function state, yielding the layout of the sized type, and returning by copying.
* Blocks, unsafe blocks, and branches that have acceptable expressions in their tail position.
As is typical for generators, these functions may need to be desugared into simple "state machines" if they return branches or have more than one exit point.
```rust
fn with_branch() -> Option<Error> {
if coin_flip() {
None as Option<!>
} else {
Some("tails")
}
}
fn with_multiple_exit_points() -> [i32] {
while keep_going() {
if got_cancelation() {
return *&[]; // returning constant
}
}
let n = 100;
[1; n] // returning variable-length-array expression
}
// desugar with_branch
// this is written in order to aid understanding, not because these are good APIs
use std::alloc::Layout;
enum __WithBranch__Internal {
S0(Option<!>),
S1(Option<&'static str>),
}
impl __WithBranch__Internal {
unsafe fn start(state: *mut __WithBranch__Internal) -> Layout {
if coin_flip() {
*state = __WithBranch__Internal::S0(None);
Layout::new::<Option<!>>()
} else {
*state = __WithBranch__Internal::S1(Some("tails"));
Layout::new::<Option<&'static str>>()
}
}
unsafe fn finish(state: *mut __MyFunction__Internal, slot: *mut u8) -> &mut [i32] {
match *state {
__WithBranch__Internal::S0(value) => {
ptr::copy(literal, slot, mem::size_of_value(value));
slice::from_raw_parts_mut(slot, value.len())
}
__WithBranch_Internal::S1(value) => {
ptr::copy(literal, slot, mem::size_of_value(value));
slice::from_raw_parts_mut(slot, value.len())
}
}
}
}
// desugar with_multiple_exit_points
enum __WithMultipleExitPoints__Internal {
S0(&[i32]),
S1(i32, usize),
}
impl __WithMultipleExitPoints__Internal {
unsafe fn start(state: *mut __WithMultipleExitPoints__Internal) -> Layout {
while keep_going() {
if got_cancelation() {
*state = __WithMultipleExitPoints__Internal::S0(&[]);
return Layout::for_value(&[]);
}
}
let n = 100;
*state = __WithMultipleExitPoints__Internal::S1(1, n);
Layout::from_size_align_unchecked(n * mem::size_of::<i32>(), mem::align_of::<i32>())
}
unsafe fn finish(state: *mut __WithMultipleExitPoints__Internal, slot: *mut u8) -> &mut Option<Error> {
match *state {
__WithMultipleExitPoints__Internal::S0(literal) => {
ptr::copy(literal, slot, mem::size_of_value(literal));
/* as you can see, I need to extract the vtable pointer */
let t: TraitObject = mem::transmute(&mut literal as &mut Option<Error>);
mem::transmute(TraitObject { data: slot, ..t })
}
__WithMultipleExitPoints_Internal::S1(value, n) => {
for i in 0..n { ptr::write(slot.offset(i), value) };
let t: TraitObject = mem::transmute(&mut literal as &mut Option<Error>);
mem::transmute(TraitObject { data: slot, ..t })
}
}
}
}
```
Important point: returning unsized variables is not allowed.
```rust
// fn invalid() -> [i32] {
// let n = 100;
// let k = [1; n];
// k // ERROR: cannot return unsized variable
// }
```
This is because there's no way to break all such functions in half the way we need to. Where would we put `k` between the calls to `start()` and the call to `finish()`? We can't just put it in the struct, because we need to know how big a generator's stack frames will be in order to generate its state machine struct.
On the other hand, a function that returns an unsized value can have its return value assigned to a
local variable (the compiler will `start()` the function to get a layout for it, then `alloca()` the space, then `finish()` the function into that space). This is a natural thing to support, and it's great for trait objects, but it's probably a horrible idea to do it with dynamically-sized slices, and should be linted against.
```rust
fn valid_but_terrible() {
let n: [i32] = returns_slice();
takes_slice(n);
}
```
## How this works with sized types
Any function that returns a sized type can be trivially adapted to the "unsized return generator" interface of `start()` and `finish()`, by having start return a constant and have finish do all the work.
```rust
fn return_a_sized_value() -> i32 {
println!("got random value");
4 // determined by a fair dice roll
}
// desugar return_a_sized_value
// this is written in order to aid understanding, not because these are good APIs
use std::alloc::Layout;
struct __ReturnASizedValue__Internal;
impl __ReturnASizedValue__Internal {
unsafe fn start(state: *mut __WithBranch__Internal) -> Layout {
// there's only one possible layout for sized types; that's the definition of being "sized"
Layout::new::<i32>()
}
unsafe fn finish(state: *mut __MyFunction__Internal, slot: *mut u8) -> &mut i32 {
// just invoke the function and copy its return value into the slot
// this mostly gets the behavior we want, but see below for info about copy elision
ptr::copy(&return_a_sized_value(), slot, mem::size_of::<i32>());
slot as *mut i32 as &mut i32
}
}
```
This is how it would conceptually work whenever a facility designed to handle placement is invoked on a function that returns a sized value. This is not how it should be implemented, though, because we don't want a bunch of unnecessary copying.
## Absolutely minimum viable copy elision
To make sure this functionality is useful for sized types, we should probably also guarantee some amount of copy elision:
* Directly returning the result of another function that also returns the same type.
* Blocks, unsafe blocks, and branches that have acceptable expressions in their tail position.
* Struct and array literals.
The first two cases are chosen as "must-have" copy elision because they allow functions that use the unsafe methods to be composed with other functions without introducing overhead. The last rule is added because it's trivial to guarantee, and because the line between "tuple struct literal" and "function call" is blurry anyway.
Guaranteed Copy Elision only kicks in when a GCE-applicable expression is directly returned from a function, either by being the function's last expression or by being the expression part of a `return` statement.
## The unsafe method of writing return values
The `std::mem` module exposes two symmetrical intrinsics that can allow you to achieve Copy Elision in any circumstance.
### `write_return_with`
pub unsafe fn write_return_with<T, F: for<'a> FnOnce(*mut u8)>(f: F) -> T {
write_unsized_return_with(Layout::new::<T>(), |p| {f(p); &mut *(p as *mut T)})
}
extern "intrinsic" pub unsafe fn write_unsized_return_with<T: ?Sized, F: for<'a> FnOnce(*mut u8) -> &mut T>(layout: Layout, f: F) -> T;
[write_return_with]: #write_return_with
Directly access the caller-provided return slot.
The return slot is an uninitialized memory space provided by a function's caller to write the return value, and implicitly passed as a pointer. A function which directly returns the result of evaluating another function, like `fn foo() -> ReturnedValue { bar() }`, may simply pass the slot along, thus avoiding expensive copies. In cases where it is impossible to implement a function by directly returning the value of a function call or a primitive expression, directly accessing the return slot and writing it directly may be necessary.
Return slots are not always used; small values like pointers, numbers, and booleans may be returned by filling a register instead. In this case, `write_return_with` will implicitly create a "return slot", pass a pointer to it to `f`, and then copy the return value from the "return slot" to the register.
The pointer returned by `write_unsized_return_with`'s callback should be the same as the one provided to it, along any necessary unsize information (such as the slice's length or the trait object's vtable pointer). Additionally, when using `write_unsized_return_with` for sized types, the provided layout must be exactly the same as the one produced by `Layout::new<T>()`. These two values must be correct even when they're redundant. A function that returns a slice may return a slice that is smaller than the requested allocation Layout, in case it is unable to predict the amount of data that will be available to it, but when producing a trait object, it must know exactly the right size for its allocation.
#### Panicking
`f` is allowed to panic. If it panics, the underlying memory should still be freed (the built-in collections are well-behaved), but the return slot implementation will assume that the allocated memory was not initialized, *and shall not call T's destructor*. If `f` allocates additional resources itself before panicking, it is responsible for freeing it.
If allocation fails, `write_return_with` may panic without calling `f`. The return slot may also be pre-allocated, resulting in the allocation failure before the call to `write_return_with` is reached.
IMPORTANT IMPLEMENTATION NOTE: Because of the way unsized return generators are codegenned as generators, it would be possible to tell that `write_unsized_return_with` wasn't actually panicking by wrapping its invocation in `catch_panic`. To ensure the user cannot do this, the closure passed to `catch_panic` must return a sized type; we still technically won't be unwinding through their stack frames, but we will be calling the drop functions with `is_panicking` set to true, so they won't be able to tell. Additionally, of course, the return slot for sized types is always pre-allocated, so this function will never panic in that case.
#### Example: raw_as_bytes_with
```rust
mod str {
/// This is a function adapter. Usage:
///
/// fn get_str() -> str;
/// let get_bytes = str::raw_as_bytes_with(get_str);
/// get_bytes()
pub fn raw_as_bytes_with<Args, F: FnOnce<Args, Output=str>>(f: F) -> impl FnOnce<Args, Output=[u8]> {
unsafe {
struct ConvertFn<F>(F);
impl<Args, F: FnOnce<Args, Output=str>> FnOnce<Args> for ConvertFn<F> {
type Output = [u8];
fn call(self, a: Args) -> [u8] {
let finish = read_unsized_return_with(|| self.0.call(a));
write_unsized_return_with(
finish.layout(),
|slot: &mut MaybeUninit<[u8]>| finish.finish(MaybeUninit::from_mut_ptr(slot.as_mut_ptr() as *mut str))) as *mut str as *mut [u8] as &mut [u8]
}
}
}
}
}
```
### `read_return_with`
pub unsafe fn read_return_with<'a, T, F: FnOnce() -> T>(f: F, slot: &mut MaybeUninit<T>) {
let finish = read_unsized_return_with(f);
debug_assert!(finish.layout() = Layout::new::<T>());
finish.finish(slot);
}
#[lang(read_unsized_return_with_finish)]
pub trait ReadUnsizedReturnWithFinish<T: ?Sized> {
pub fn finish(self, &mut MaybeUninit<T>);
pub fn layout(&self) -> Layout;
}
extern "intrinsic" pub unsafe fn read_unsized_return_with<'a, T: ?Sized, F: FnOnce() -> T>(f: F) -> impl ReadUnsizedReturnWithFinish<T>;
[read_return_with]: #read_return_with
Directly supply a return slot. See [`write_return_with`] for information on what return slots are.
#### Example: Box::new_with
```rust
struct BoxUninit<T: ?Sized> {
p: Option<NonNull<MaybeUnint<T>>>,
}
impl<T: ?Sized> Drop for BoxUninit<T> {
fn drop(&mut self) {
if let Some(p) = self.p {
System.dealloc(p.as_mut_ptr() as *mut u8, Layout::for_value(&*p);
}
}
}
struct Box<T: ?Sized> {
p: NonNull<T>,
}
impl<T: ?Sized> Box<T> {
fn new_with<F: FnOnce() -> T>(f: F) -> Self {
unsafe {
let mut uninit = BoxUninit { p: None };
let state = read_unsized_return_with(f);
let p = NonNull::from_mut_ptr(GlobalAlloc.alloc(finish.layout()));
uninit.p = Some(p);
state.finish(p.as_mut_ptr() as *mut MaybeUninit<T> as &mut MaybeUninit<T>);
forget(uninit);
Box { p }
}
}
}
impl<T: ?Sized> Drop for Box<T> {
fn drop(&mut self) {
let layout = Layout::for_value(&*p);
drop_in_place(&mut *p);
System.dealloc(p.as_mut_ptr() as *mut u8, layout);
}
}
```
#### Example: Vec::extend_from_raw_slice_with
Copy-and-paste this example to create `String::extend_from_raw_str_with`.
```rust
impl<T> Vec<T> {
pub fn extend_from_raw_slice_with<F: FnOnce() -> [T]>(&mut self, f: F) {
let finish = read_unsized_return_with(f);
let layout = finish.layout();
debug_assert_eq!(layout.size() % size_of::<T>(), 0);
debug_assert_eq!(layout.align(), align_of::<T>());
let count = layout.size() / size_of::<T>();
self.0.reserve(count);
let p = ((&mut self[..]) as *mut [u8]).offset(self.len());
// this slice may be smaller than the given allocation, as described above
let slice = finish.finish(p as *mut MaybeUninit<[u8]> as &mut MaybeUninit<[u8]>);
debug_assert!(slice.len() <= count);
self.set_len(self.len() + slice.len());
}
}
```
### Lint: incorrect use of `write_return_with`
All usage of `write_return_with` should call it directly in their return clause, or in an unsafe block directly in their return clause.
```rust
fn good() -> BigDataStructure {
unsafe {
write_return_with(Layout::new::<BigDataStructure>(), |slot| {
populate_big_data_structure(slot);
})
}
}
fn bad() -> BigDataStructure {
unsafe {
let k = write_return_with(Layout::new::<BigDataStructure>(), |slot| {
populate_big_data_structure(slot);
});
k
}
}
fn also_bad_even_though_it_technically_works() -> BigDataStructure {
unsafe {
if coin_flip() {
write_return_with(Layout::new::<BigDataStructure>(), |slot| {
populate_big_data_structure(slot);
})
} else {
unimplemented!()
}
}
}
```
Assigning the result of `write_return_with` to a local variable, like in `bad()`, is essentially a no-op. The data structure is being emplaced into `k`, and then it's being immediately copied on return. If you actually intend to manually initialize a local value, just use `MaybeUninit` like a normal person.
Since it's always possible to write a function with `write_return_with` in the return clause of a function or an unsafe block which is itself in the return clause of a function (if you need a conditional somewhere, you can just put the conditional *inside* the closure), the lint should just require you to do that.
### How do the return slot functions work when the copy is not actually elided?
When the copy is not elided (which is only ever the case for values with a statically known size), they simply use temporaries for their implementation. The return slot functions still have to "work" when the copy is not elided, so that they can be used in generic contexts.
```rust
// Imagine these functions being used only when T is sized and less than one pointer.
#[inline(always)]
unsafe fn write_return_with<T, F: FnOnce(*mut u8)>(f: F) -> T {
let slot: MaybeUninit<T> = MaybeUninit::empty();
f(&mut slot as *mut MaybeUninit<T> as *mut u8);
slot.take()
}
#[inline(always)]
unsafe fn read_return_with<'a, T, F: FnOnce() -> T>(f: F, slot: *mut u8) {
let value = f();
ptr::write(&value, slot.get() as *mut T);
}
```
## New functions added to existing types
```rust
impl<T: ?Sized> Box<T> {
fn new_with<F: FnOnce() -> T>(f: F) -> Self;
}
impl<T> Vec<T> {
fn from_raw_slice_with<F: FnOnce() -> [T]>(f: F);
fn push_with<F: FnOnce() -> T>(&mut self, f: F);
fn insert_with<F: FnOnce() -> T>(&mut self, index: usize, f: F);
fn extend_from_raw_slice_with<F: FnOnce() -> [T]>(&mut self, f: F);
}
impl <K: Eq + Hash, V, S: BuildHasher> HashMap<K, V, S> {
/// Unlike the regular `insert` function, this one returns a `&mut` to the new one,
/// not an optional old one.
/// This is because the other one can't be returned without copying the old value
/// from the map to the return slot, which is wasteful for large objects.
fn insert_with<'a, FV: FnOnce() -> V>(&'a mut self, k: K, fv: FV) -> &'a mut V;
}
impl <K: Eq + Hash, V, S: BuildHasher> hash_map::OccupiedEntry<K, V, S> {
fn insert_with<FV: FnOnce() -> V>(&mut self, k: K, fv: FV) -> V;
}
impl <K: Eq + Hash, V, S: BuildHasher> hash_map::VacantEntry<K, V, S> {
fn insert_with<'a, FV: FnOnce() -> V>(&'a mut self, k: K, fv: FV) -> &'a mut V;
}
impl String {
fn from_raw_str_with<F: FnOnce() -> str>(f: F);
fn extend_from_raw_str_with<F: FnOnce() -> str>(&mut self, f: F);
}
mod str {
// Yes, this is a generic function adapter, up to and including use of the unstable Fn* trait.
// Use it like raw_as_bytes_with(fn_that_returns_raw_str)
fn raw_as_bytes_with<Args, F: FnOnce<Args, Output=str>>(f: F) -> impl FnOnce<Args, Output=[u8]>;
}
```
# Drawbacks
[drawbacks]: #Drawbacks
The biggest issue with Guaranteed Copy Elision is that it's actually kind of hard to *specify* it. The abstract machine, after all, doesn't specify the amount of memory that a function call occupies; that's part of the ABI. So how do you phrase GCE in an ABI-agnostic way? The C++ answer is "a convoluted combination of requirements about when the address of an object may change and when a move constructor may be called". The specification of GCE in the to-be-written Rust specification will probably be just as bad, since while it's pretty obvious how it works for unsized types (which cannot be moved without invoking an allocator, and we can certainly specify when that's allowed), we also want to guarantee it for sized types.
There have also been people recommending a more do-what-I-mean approach where `Box::new(f())` is guaranteed to perform copy elision. That would induce the absolute minimal churn, though how you'd handle intermediate functions like `fn foo(t: T) { box::new(t) }` is beyond me.
The third drawback, and I personally think this is the *worst* drawback, is that it's invisible. This means that when a user accidentally writes code that is performs copies, it isn't always obvious that they messed up. There isn't any way to get existing code to achieve GCE without making it invisible, and I wanted to avoid churn in everyone's `new()` functions, as described in [rationale-and-alternatives]. Presumably, it can be solved using optional [lints].
Another major drawback is that it doesn't compose well with pattern matching (or `?`). Imagine you have a function `fn foo() -> Result<Struct, Error>` and you want to do something like `Box::new(foo()?)`, but you want to directly place it. This is impossible; not just because with closures `Box::new_with(|| foo()?)` won't do what you want, but any system where the function being called writes its entire result through a pointer (like the old Placer proposal) cannot have the function being called write part of its result in one place (the `Ok` value should be written into the `Box<Struct>`) while other parts are written in other places (the discriminator should be written to a slot on the stack, since there isn't room for it). You can only assemble a `Box<Result<Struct, Error>>`, or abandon Result entirely and use an error handler callback or a panic/catch.
Also, like all placement proposals, it involves adding a new API surface area to most of the built-in data structures. Since there are known problems with how this proposal works with error handling, the GCE-based version may end up being replaced with an approach that does.
Additionally, this proposal deliberately does not implement NRVO. This means people will end up writing stilted code, or just using the unsafe bypass functions, instead of doing what's most readable.
# Rationale and alternatives
[rationale-and-alternatives]: #Rationale-and-alternatives
## Why not use an alternative ABI for returning Result?
The biggest drawback to this proposal, like most placement proposals, is that it works poorly with Result. You can emplace a function returning `Result<T>` into a `Box<Result<T>>`, and you cannot emplace it into a `Box<T>`. Making this work would require the inner payload to be written to the box, while the tag is written to somewhere on the stack.
This proposal does not define an alternative ABI for results because it would be work poorly with `read_return_with` and `write_return_with`. Both of these functions expose the return slot as a raw pointer, which means the return slot has to be in a continuous region of memory. This proposal does not, strictly speaking, lock out alternative ABIs for Result, but it does impose copying whenever such ABIs are used along with the `return_with` functions. These functions have to work with results, because they have to work in generic contexts.
Using an alternative ABI remains practical as a solution for the "peanut butter" conditional cost (the `return_with` functions will be rarely used in most Rust code, especially when the code is just propogating error flags), but the way these functions work precludes using an alternative ABI as a solution for placement.
## Vs older emplacement proposals
[The previous emplacement RFC was rejected for the following reasons](https://github.com/rust-lang/rust/issues/27779#issuecomment-378416911), most of which this new RFC addresses:
* The implementation does not fulfil the design goals
* Place an object at a specific address
This is literally what `read_return_with` does.
* Allocate objects in arenas (references the original C++ goals)
This is what `new_with` and related functions achieve.
* Be competitive with the implementation in C++
Passing a closure is essentially the same thing as passing an initializer list, so it should have the same performance as C++ emplacement.
* The functionality of placement is unpredictable
This is probably the least-well-addressed concern. This RFC spends a lot of its text spelling out when GCE can kick in, which means it should be possible for a coder (or a lint) to tell whether a value is being copied or not, but it's still invisible.
The other way it tries to address predictability is by offering the explicit-mode `write_return_with` intrinsic. Unfortunately, it's unsafe.
* Specific unresolved questions
* make placement work with fallible creation
This RFC honestly has terrible support for this, mostly because it's painted itself into a corner. The sized-value-returning functions all have to be emplaceable, and when they return a Result, they return it by writing it directly through a pointer all at once. Making placement work really well with fallible creation rould require the Result to be split up, with the discriminant going into a stack frame or something, while the payload goes into the allocated area.
* trait design
The `ReadUnsizedReturnWithFinish` trait will probably be somewhat controversial. It is an unsafe abstraction, but that's not a good excuse for a poor design. OTOH, I don't know of a better design to use.
For providing a way to allocate unsized types by return, I don't think its design could be any better: you're passing a layout, and getting a pointer that you can fill with your data.
The sized type story is the annoying one, since we're using an abstraction that pretends you're allocating space at the callee-time when you really allocated the space before it was even called and the compiler has to generate a fake implementation just to keep up the illusion. The only-for-sized-types wrapper functions tries to paper over that, so it at least isn't requiring `write_return_slot_with` users to provide a Layout that only has one valid value anyway.
* support for DST/unsized structs
Check!
* More speculative unresolved questions include:
* better trait design with in the context of future language features
I suppose the read/write return slot with functions would probably be affected by future proposals
for custom unsized types. Since they use fat pointers, it should work no matter what they are, but
there might eventually be a better way to handle that.
* interaction between custom allocators and placement
This interface should support any allocator, since it uses a "supply your own pointer" philosophy for `read_return_with` users like collections.
## Vs other possible tweaks to this RFC
The design for this proposal is meant to allow already-idiomatic "constructors" like [this one](https://github.com/rust-ammonia/ammonia/blob/a6f0cf7886653ce1a982aab1c522cd44eab1267a/src/lib.rs#L342-L355) to be allocated directly into a container. Any proposal that wishes to pull that off *has* to involve GCE, including more elaborate ones like RFC-8089. The choice of modeling that part of the proposal after existing parts of C++17 was made for familiarity and ease of implementation. Clang already does this stuff in LLVM, so we know it's possible.
The weird bypass functions, `write_return_with` and `read_return_with`, both accept closures because the only alternatives I could think of involved even weirder calling convention hacks, even weirder type system hacks (`fn get_return_slot<T>() -> *mut T`, for example, would be required to ensure that `T` is the same as the return type of the current function), or new language features like Go's nameable return slot.
The idea of compiling functions that return unsized types into pseudo-generators came from the Rust Discord server. [Thanks @rpjohnst](https://discordapp.com/channels/442252698964721669/443151225160990732/550731094027403285)!
This RFC supports returning DSTs for two reasons:
* It allows emplacing dynamic byte blobs, which is really important for use cases like Mio and Serde.
* It is one of the main requirements set out in the older RFC's mortician's note.
The specific design for DST returning is, of course, optimized for emplacement, and the main use case is for emplacing big byte blobs. It supports returning trait objects as well, but that's not really what it's for, and it kind of shows. A proposal using implicit boxing would be more useful for such a case, though at that point it seems unnecessary (if you're given a function that returns a bare trait object, you can call it with `Box::new_with` and get on with your life).
# Prior art
[prior-art]: #Prior-art
Any place where my proposal is ambiguous? Let me know and I'll try to make it [the same as C++17 Guaranteed Copy Elision](https://jonasdevlieghere.com/guaranteed-copy-elision/).
It was after I came up with the idea that I realized `write_return_with` is basically a feature in the Go language, but Go has dedicated syntax for it and zero-initializes the return slot. I don't know of any prior art for `read_return_with`; it's not really all that similar to "placement new", even though it's arguably achieving the same goal.
The `_with` suffix is based on functions like [`get_or_insert_with`](https://doc.rust-lang.org/stable/std/option/enum.Option.html#method.get_or_insert_with). They're basically filling the same role as C++ `emplace_` methods.
# Unresolved questions
[unresolved-questions]: #Unresolved-questions
Commence the bikeshedding for alternative names and designs for the new functions:
* `write_return_with` could be named `get_return_slot`, and `read_return_with` could be named `put_return_slot`.
* Or we could use the existing names, but with "slot" in them, like `write_return_slot_with`.
* I personally oppose having "slot" in the function names, because for small values that get returned in registers, there isn't actually a return slot anywhere.
* `read_return_with` seems like a particularly bad name, since it writes to the supplied pointer.
* What happens if a `read_return_with` user simply doesn't call `finish()`? It would be as if it had panicked, but the destructors would be called with `is_panicking` set to false.
* This is the best way to support non-panicking fallible allocation, so it should probably be allowed. Existing allowed functions cannot experience this problem, since they return sized types, sized types have a no-op `start()`, and do all of their work in `finish()`, they won't experience breakage. Additionally, this sort of happy nonsense is already possible in `async` functions and generators, so most data types will already be able to handle this. Just don't return an unsized type if you don't want to act like a stackless coroutine.
* Right now, the API exclusively uses raw pointers, for syntactic simplicity. Maybe it should use `MaybeUninit`?
* NRVO is not impossible. In fact, since the implementation of RVO is almost certainly going to be based on MIR or LLVM IR, it's almost impossible *not* to provide NRVO in cases where the "named" variant desugars to exactly the same MIR code as the "unnamed" variant that this proposal guarantees RVO for. How much do we want to guarantee, vs just "accidentally providing it" because of the way the compiler is implemented?
# Future possibilities
[future-possibilities]: #Future-possibilities
## Additional lints
[lints]: #Additional-lints
In an attempt to directly address the problem of "it's too implicit", an HIR lint might be used to detect functions that are copying large amounts of data around. Deciding the cutoffs for this thing sounds like a mess, and it should probably be off by default for just that reason. Worse yet, we're stuck deciding when to warn on a struct literal where the struct itself gets returned in-place, *but none of its components do*. Honestly, the more I think about it, the more it seems like the "big copy" lint is just a gigantic quagmire of edge cases.
Additionally, a few much-simpler lints can push users in the direction of getting GCE. For example, since break-with-value isn't GCE'ed but return short-circuiting is, a lint should recommend returning from top-level loops instead of breaking from them. Similarly, a lint should recommend assigning struct members inline instead of going through local variables.
## Safe abstractions on top of `write_return_with`
While this RFC specifies a bunch of cases where existing containers should incorporate `read_unsized_return_with` to allow in-place construction of container members, it doesn't specify any built-in functions that should use `write_(unsized_)return_with` exclusively.
Functions which return potentially-large data structures that they construct will probably wind up using it. For example, `std::mem::zeroed` can be implemented like this:
```rust
unsafe fn zeroed<T>() -> T {
write_return_with(|slot: *mut u8| {
for i in 0 .. size_of::<T>() {
*slot.offset(i) = 0;
}
});
}
```
Once type-level integers are provided, a function for constructing arrays in-place would also be possible:
```rust
fn fill_array<T, F: Fn() -> T, const N: usize>(f: F) -> [T; N] {
unsafe {
write_return_with(|slot: *mut u8| {
let start = slot as *mut T;
let filled = Filled(start, start);
for _ in 0 .. N {
read_return_with(&f, filled.1 as *mut u8);
filled.1 = filled.1.offset(1);
}
forget(filled);
});
}
}
// This struct is used to take care of freeing the already-created items
// in case `f` panics in the middle of filling the array.
struct Filled<T>(*mut T, *mut T);
impl<T> Drop for Filled<T> {
fn drop(&mut self) {
while self.0 < self.1 {
drop_in_place(self.0);
self.0 = self.0.offset(1);
}
}
}
```
## Integration with futures, streams, serde, and other I/O stuff
It's an annoying fact that this RFC does not compose terribly well with `Result`, `Option`, or other pattern-matching constructs, specifically because it's not possible to write the value and the discriminator to separate parts of memory (ideally, the error flags would be written to a stack variable or register somewhere, while the data would be written directly to the heap- or kernel-backed buffer somewhere). `Future` and `Stream`, in particular, wraps the result of polling in the `Poll` enum, so while it's certainly possible to allocate such a result into a `Box<Poll<T>>` without copying, its not possible to get a `Box<T>` without copying given the existing futures API.
Mapping doesn't work, either, because it will pass the payload of a future as a parameter, which means that futures have to allocate storage for their contents. Sized types work as usual, and `Future<[u8]>` would wind up allocating stack space for the byte blob in order to pass it to the mapping function as a move-ref, as described in [RFC 1901 "unsized rvalues"](https://github.com/rust-lang/rfcs/blob/master/text/1909-unsized-rvalues.md).
The only real option for handling zero-copy data, in any of these cases, is to build these things with errors treated as side-channels. None of this is hard, but it is annoying and not very dataflow-y. It's largely inherent to the goal of having zero-copy I/O while the type system doesn't support keeping pattern-matching semantics orthogonal to memory representation.
```rust
trait Read {
/// Read into a raw slice. This allows the reader to determine the proper size to read, unlike the
/// other `read` function, which allow the caller to do so.
///
/// # Example
///
/// fn read_to_end<R: Read>(r: mut Read, v: &mut Vec<u8>) -> Result<(), Error> {
/// let mut prev_len = v.len();
/// let mut err = None;
/// v.extend_from_raw_slice_with(|| reader.read_to_raw_slice(&mut err));
/// while err.is_none() && v.len() != prev_len {
/// prev_len = v.len();
/// v.extend_from_raw_slice_with(|| reader.read_to_raw_slice(&mut err));
/// }
/// if let Some(err) { Err(err) } else { Ok(()) }
/// }
fn read_to_raw_slice(&mut self, err: &mut Option<Error>) -> [u8] {
unsafe {
mem::write_unsized_return_with(
// This function should be overloaded with better default buffer sizes.
// For example, BufReader can just use the size of its own buffer,
// and File can use fstat.
Layout::from_size_align_unchecked(super::DEFAULT_BUF_SIZE, 1),
|slot: *mut u8| {
let slice = slice::from_raw_parts(slot, super::DEFAULT_BUF_SIZE);
match self.read(slice) {
Ok(count) => {
assert!(count <= super::DEFAULT_BUF_SIZE),
slice[..count]
}
Err(e) => {
*err = e;
[]
}
}
}
)
}
}
}
```
The default implementation is kind of dumb, but a properly specialized implementation would be able to know how big its buffer should be instead of requiring the caller to guess (or use something like fstat to communicate to the caller manually), and allowing you to read directly into buffers without those buffers being able to give you an `&mut [u8]`,
like the internal buffers maintained by Serde or Tokio. Additionally, because the Read implementation can request a buffer,
scribble all over it, and then return an error anyhow, it's much better doing it this way than by returning a Result anyway.
Presumably, the asynchronous systems could yield non-blocking Read implementations as streams or futures,
but, based on [previous discussion of what a good I/O abstraction would be](https://users.rust-lang.org/t/towards-a-more-perfect-rustio/18570), we probably want to avoid using a get-a-byte abstraction for everything.
```rust
/// A non-blocking, non-copying, unbounded iterator of values.
/// This trait should be used instead of Iterator for I/O backed streams,
/// since it can report error values separately from result values,
/// and it can be used with async functions and futures-based reactors.
trait Stream {
type Item;
type Error;
/// Fetch the next batch of values.
///
/// This function has six possible results:
/// * returns an empty slice, result is `Async::Pending` means the async operation is in progress
/// * returns an empty slice, result is `Async::Ready(None)` means the stream has terminated and `poll_next` should never be called again
/// * returns an empty slice, result is `Async::Ready(Some(_))` means there was a fatal error
/// * returns a non-empty slice, result is `Async::Pending` is a violation of the API contract and should lead to a panic
/// * returns a non-empty slice, result is `Async::Ready(None)` means that the underlying async operation is done, and the caller should call `poll_next` again immediately to get the rest of the data
/// * returns a non-empty slice, result is `Async::Ready(Some(_))` means that a fatal error occurred and the results should be ignored and dropped without further processing
fn poll_next(&mut self, cx: &mut Context, result: &mut Async<Option<Error>>) -> [Item];
}
```
The requirements here are the same as Read:
* You need to be able to hand more than one value into some recipient, at once. Byte-at-a-time reading is not acceptable.
* You need to be able to implement the API using direct I/O, writing the result directly to wherever the consumer wants it written. Reading a Vec<> is not acceptable, though it seems easy enough to write an adapter that will convert to one. The goal here is to request a page and DMA directly into it (in this case, the implementation will probably have to use weird special-cases to fall back to software I/O if the slice isn't aligned).
* We want the same API to be usable for types other than `u8`.
* We need to be able to request our output buffer, scribble all over it, and then yield an error.
* It needs to allow infinite streams, so just returning a single raw slice and calling it done won't be enough.
That last point isn't so bad if you're just populating a Vec or other data structure that allows you to just append forever,
but many abstractions, like Serde, would actually have to directly support Streams themselves.
```rust
trait Serializer {
/// This function uses an intermediate buffer by default,
/// but third-party implementations can use a zero-copy implementation.
/// The produced format is identical to using `serialize_bytes`.
async fn serialize_byte_stream<S: Stream<Item=u8>>(self, stream: S) -> Result<Self::Ok, Self::Error> {
let mut v = Vec::new();
let mut e = Async::Ready(None);
let mut prev_len = 1;
while prev_len != v.len() {
prev_len = v.len();
// completely hand-waving the right way to await a stream,
// by basically asking the stream for a Future<Output=()> and awaiting on it.
await stream.unit();
v.extend_from_raw_slice_with(|| stream.poll_next(&mut e));
}
if let Async::Ready(Some(e)) = e { return Err(e.into()); }
self.serialize_bytes(&v[..])
}
}
```