# WebAssembly Heap Reference-to-Reference Semantics in Rust
## WebAssembly Memory Recap
When talking about "memory", we usually think of a continuous segment of bytes that can be addressed in some fashion. This is called "linear memory" in WebAssembly. The Rust heap and the Rust stack live in linear memory. Rust references are indices into linear memory[^1].
[^1]: Function pointers are not. They are also indices, but not into linear memory.
WebAssembly programs often need to interact with objects that cannot be represented in linear memory. These objects live on the WebAssembly heap, and the only way to interact with them is via references to the WebAssembly heap.
## Terminology
Assume that `H` is some type that can only exist on the WebAssembly heap, that `HeapRef<H>` is a (non-null) reference to the WebAssembly heap, and that `heap_ref` is a variable of type `HeapRef<H>`.
Unless explicitly noted otherwise, everything below that talks about Rust references applies equally to raw pointers.
## Properties of `HeapRef<H>`
`HeapRef<H>`:
* has a representation that is entirely _**opaque**_ to the compiler.
* _**can**_ be used as the type of a local variable.
* _**can**_ be used as a function argument type or return type.
* _**does not**_ have a known size or alignment.
* _**does not**_ have a bit pattern that could be read or written.
* _**cannot**_ be stored in linear memory.
* _**cannot**_ be part of an aggregate data type.
* _**cannot**_ have its address taken.
* Thus a "normal" reference to `HeapRef<H>` cannot exist.
## What about `&HeapRef<H>`?
So if `HeapRef<H>` cannot have its address taken, how should Rust handle `&HeapRef<H>` and `&heap_ref`?
### Option 1: Uninhabited References
`&HeapRef<H>` is considered uninhabited and `&heap_ref` is a compile-time error (possibly post-monomorphization). (Put differently: The type `&HeapRef<H>` is allowed to exists, but attempting to create a value of that type is prohibited).
Functions with a `param: &HeapRef<H>` parameter are allowed and `match param {}` works.
Positives:
* The most explicit and least magic option.
Negatives:
* `HeapRef<H>` is mostly useless if it is not `Copy`.
* If `HeapRef<H>` implements `Copy`, it also needs to implement `Clone`, which was the main motivation for treating `&HeapRef<H>` as uninhabited instead of forbidding the type completely.
* `HeapRef<H>` cannot be used with the majority of stdlib traits, because most take `&self` as an argument.
### Option 1a: Support `&H`
:::info
Note: This extension is possible for all options, but is most relevant for Option 1.
:::
Instead of special-casing `HeapRef<H>`, special-case `&H` (and `&mut H`, `*const H`, ...). That is, for any WebAssembly heap type `H` (potentially represented by an "extern type"), treat `&H` as a reference to the WebAssembly heap, with the semantics and properties described for `HeapRef<H>` above.
`HeapRef<H>` would then be essentially equivalent to `NonNull<H>`.
This allows implementing a lot of stdlib traits for `H`, but not for `&H`.
### Option 2: Fake References
`&HeapRef<H>` is allowed and is represented the same as `HeapRef<H>`, i.e. they represent the same address. (In Rust terms this would basically mean `(&x as *const _).addr() == (&&x as *const _).addr()`). `&heap_ref` would be a no-op at the assembly level.
Positives:
* "zero cost".
* A lot of stdlib traits become usable.
Negatives:
* Does not support trait objects (and thus no `format_args!`).
* Writing to dereferences would not be possible (i.e. with `m: &mut HeapRef<T>`, `*m = heap_ref` would be a compile-time error, possibly post-monomorphization).
* Non-trivial nullability semantics:
* `&&HeapRef<H>` is allowed, `*const *const HeapRef<H>` is not.
* With fake references, we only have a single bit of nullability information across all levels of indirection. So if we have a possibly-null `*const HeapRef<T>`, then we cannot add another layer of possibly-null around that. (We would not be able to differentiate which of the two `*const` is null).
* This is mostly relevant for code dealing with raw pointers.
* We already have a similar situation with Option 1: `Option<HeapRef<H>>` is fine (and considered a nullable reference to the WebAssembly heap). `Option<Option<HeapRef<H>>>` is not allowed.
* Though this would also get more complicated with Option 2: `Option<&HeapRef<H>>` is fine, `Option<&Option<HeapRef<H>>>` is not.
### Option 3: Indirect References
WebAssembly has support for "tables", which can be though of as global `Vec<HeapRef<H>>` values. They can be used to assign an index to a `HeapRef<H>`. The index can be stored in linear memory and can also be used to retrieve the original `HeapRef<H>` again.
`&HeapRef<H>` is represented as an index into a table managed and owned by the Rust compiler, and when encountering `&heap_ref` the compiler would write `heap_ref` to the table and then use the corresponding index.
Summarizing the representations:
* `HeapRef<H>`: opaque value, can be stored only in local variables (and tables).
* `&HeapRef<H>`: Index into a Rust-owned table.
* `&&HeapRef<H>`: Index into linear memory
Positives:
* Most general solution.
* Trait objects are supported.
* This is somewhat similar to how function pointers are currently treated (though in that case LLVM owns the table).
Negatives:
* Not "zero cost".
* There is some precedence in Rust for making indirection explicit: Implicit trait objects were deprecated in favor of the `dyn` syntax.
* Requires a non-trivial runtime library to manage the table.
* Non-trivial codegen.
### Option 4:
:::info
TODO: Expand this section.
:::
Represent `HeapRef<H>` as an actual WebAssembly heap reference in function signatures. In all other cases, represent `HeapRef<H>` as a table index. Implement optimization passes (MIR, LLVM, or binaryen) to avoid unnecessary table accesses.