owned this note
owned this note
Published
Linked with GitHub
# Opsem for `async` blocks
#### Dependencies
Besides all of the "standard" parts of our current working semantics, this proposal depends on two non-standard parts:
1. `AliasCell<T>`. This type has been proposed a number of times, the intent is that `AliasCell<T>` is to `&mut` as `UnsafeCell<T>` is to `&`. (I have not thought about whether there are more subtleties to this, so if there are that's probably a bug in what I'm saying here)
2. [Overlapping allocations]. See [UCG#328] for context. We will actually need to adjust this proposal a bit, more below.
[Overlapping allocations]: https://hackmd.io/@2S4Crel_Q9OwC_vamlwXmw/HkQaMnB49
[UCG#328]: https://github.com/rust-lang/unsafe-code-guidelines/issues/328
### StorageLive modifications
One key thing that we need to justify is that locals in generators have addresses that overlap with their enclosing type. We achieve this by adding a field `generator_info: Option<(AllocId, Range<Address>)>` to stack frames. This value is set to `Some` for all stack frames that are associated with generator bodies. When performing allocations for a StorageLive, we search the entire call stack for frames in which this is set to `Some`. We then explicitly permit the address chosen for the allocation to overlap the given allocations at the given address range, even if both allocations are forced.
*We previously proved that the scheme laid out in UCG#328 was implementable wrt ptr2int rountrips by arguing that forced allocations never overlap. This obviously breaks that. However, implementability of this scheme follows from the generator lowering pass being correct, and so there should be no "real" problems here.*
### Generator Types
Generator types are always of the form `AliasCell<GenType>` where `GenType` is freshly generated for each generator and of the form:
```rust=
#[repr(Rust)]
enum GenType {
Unstarted,
Finished,
}
```
### Generator Table
The key idea of this proposal is to store all info about generator state in a "generator table." The GT is a `Map<(AllocId, Address), GeneratorSuspendState>` for:
```rust=
struct GeneratorSuspendState {
frame: StackFrame,
resume_block: BbName,
resume_place: Place,
unwind_block: BbName,
}
```
There is exactly one entry in the GT for each currently suspended generator. The gist of the idea is that all execution basically proceeds as expected, and when hitting a yield point, instead of returning completely, we stash the current stack frame in the generator table so that we can resume execution later.
Concretely, this means that `resume` looks like this (in MiniRust):
```rust=
impl Machine {
// Interface depends on how we decide to set up shims. For now, we assume
// that we call this shim instead of pushing a stack frame.
fn builtin_generator_resume_shim(
&mut self,
// Magic type that happens to contain all the information we need
generator_info: GeneratorInfo,
self_pointer: Pointer,
argument: Value,
) -> Result<()> {
// First, retag the `Pin<&mut Self>` pointer like for any call
let self_pointer = self.mem.retag(self_pointer);
let address = self_pointer.address;
// Next, check if there is already an in-progress execution
let key = (self_pointer.alloc_id, address);
if let Some(suspend_state) = self.generator_table.remove(key) {
// Resuming with the wrong type is UB
if suspend_state.frame.func != generator_info.body {
throw_ub!();
}
self.mem.typed_store(
suspend_state.resume_place,
argument,
generator_info.resume_ty
);
suspend_state.frame.next = (suspend_state.resume_block, 0);
// FIXME: Fails to correcty set the return place
self.push_stack_frame(suspend_state.frame);
return Ok(());
}
let start_new = evaluate!(
let self_pointer = self_pointer.get().cast<GenTy>();
match *self_pointer {
GenTy::Unstarted => true,
GenTy::Finished => false,
}
);
if start_new {
// Won't re-write everything here, none of this is interesting
let frame = self.make_frame_for_call(generator_info.body, ...);
frame.generator_key = Some(key);
// The only thing that is different from a normal function call
// is that we must retag the self pointer with a protector
// belonging to the generator frame.
self.mem.retag(self_pointer, protector = frame.call_id);
self.push_stack_frame(frame);
} else {
self.panic_shim()
}
}
}
```
The `drop` impl for `GenType` is very similar but simpler because there are fewer cases we have to care about:
```rust=
impl Machine {
// Interface depends on how we decide to set up shims. For now, we assume
// that we call this shim instead of pushing a stack frame.
fn builtin_generator_drop_shim(
&mut self,
// Magic type that happens to contain all the information we need
generator_info: GeneratorInfo,
self_pointer: Pointer,
argument: Value,
) -> Result<()> {
// First, retag the `Pin<&mut Self>` pointer like for any call
let self_pointer = self.mem.retag(self_pointer);
// Next, check if there is already an in-progress execution
let key = (self_pointer.alloc_id, self_pointer.address);
if let Some(suspend_state) = self.generator_table.remove(key) {
// Resuming with the wrong type is UB
if suspend_state.frame.func != generator_info.body {
throw_ub!();
}
suspend_state.frame.next = (suspend_state.unwind_block, 0);
self.push_stack_frame(suspend_state.frame);
return Ok(());
}
// Drops are a nop before execution has started or after it has ended
}
}
```
### Yield and Return
At MIR building time, MIR inside generator bodies is modified so that `return foo;` actually does `return GeneratorState::Complete(foo);` and `yield foo;` does `yield GeneratorState::Yielded(foo);`.
The implementation of yield is then extremely simple: Copy the yielded value similarly to what happens when returning, and push the top frame onto the generator table.
FIXME: The return place vs yield place thing kind of makes a mess here. Might have to jump through some hoops as long as we don't have return by value.
## Correctness
These are just outlines, but should get the point across.
### Optimizations
The only change to the way that functions normally execute is the change to StorageLive (and having more overlapping allocations). With this exception, the proofs of correctness for optimizations before this proposal are expected to continue to go through after this proposal.
### Generator lowering
We show correctness by doing the lowering in four phases:
#### Storing yield point in memory
Number the yield points 1..Y. We introduce a new local to the function that is always set to N immediately before yield point N.
#### Using generator memory as an allocator
When we created the stack frame for the generator, its memory was retagged with Unique permissions and we included a protector. We now store the resulting pointer (in the stack frame or something), and replace all locals with places that are derived from that pointer. We assume that the generator was large enough so that it is possible to find offsets for all locals such that simultaneously storage-live locals get non-overlapping memory.
As a part of this change, we also stop including the generator info in the generator's stack frame.
This change is correct wrt code observing addresses because we explicitly allowed ahead of time for these allocations to be within the given range. They are still non-overlapping wrt each other. They are also non-overlapping wrt other allocations - this is true by induction hypothesis (or something) when the generator frame is created, and because we now no longer insert the generator info into the frame, it is not possible for any other allocation to overlap with the ones that are local to this body.
This change is also hopefully correct wrt SB.
#### Removing the protected retag
We now replace all uses of the pointer that is created by the protected retag with the pointer that is passed in to the `poll` or `drop` call.
I don't think this is actually strictly speaking sound, specifically I have concerns about all the pointers not being in a contiguous SRW block; however, I think the bug here is that we have insufficiently strong requirements for the pointers used before this transformation.
#### Match + Inlining
The remainder of the generator transform now just replaces what is left of the shim with a `match` on the local we introduced in the first step and inlines the relevant section of the generator body into each arm of that match. (I'm blindly assuming that correctness would trivially follow if we rigorously wrote down what this transform actually was)