Try   HackMD

WebAssembly Heap Reference-to-Reference Semantics in Rust

WebAssembly Memory Recap

When talking about "memory", we usually think of a continuous segment of bytes that can be addressed in some fashion. This is called "linear memory" in WebAssembly. The Rust heap and the Rust stack live in linear memory. Rust references are indices into linear memory[1].

WebAssembly programs often need to interact with objects that cannot be represented in linear memory. These objects live on the WebAssembly heap, and the only way to interact with them is via references to the WebAssembly heap.

Terminology

Assume that H is some type that can only exist on the WebAssembly heap, that HeapRef<H> is a (non-null) reference to the WebAssembly heap, and that heap_ref is a variable of type HeapRef<H>.

Unless explicitly noted otherwise, everything below that talks about Rust references applies equally to raw pointers.

Properties of HeapRef<H>

HeapRef<H>:

  • has a representation that is entirely opaque to the compiler.
  • can be used as the type of a local variable.
  • can be used as a function argument type or return type.
  • does not have a known size or alignment.
  • does not have a bit pattern that could be read or written.
  • cannot be stored in linear memory.
  • cannot be part of an aggregate data type.
  • cannot have its address taken.
    • Thus a "normal" reference to HeapRef<H> cannot exist.

What about &HeapRef<H>?

So if HeapRef<H> cannot have its address taken, how should Rust handle &HeapRef<H> and &heap_ref?

Option 1: Uninhabited References

&HeapRef<H> is considered uninhabited and &heap_ref is a compile-time error (possibly post-monomorphization). (Put differently: The type &HeapRef<H> is allowed to exists, but attempting to create a value of that type is prohibited).

Functions with a param: &HeapRef<H> parameter are allowed and match param {} works.

Positives:

  • The most explicit and least magic option.

Negatives:

  • HeapRef<H> is mostly useless if it is not Copy.
    • If HeapRef<H> implements Copy, it also needs to implement Clone, which was the main motivation for treating &HeapRef<H> as uninhabited instead of forbidding the type completely.
  • HeapRef<H> cannot be used with the majority of stdlib traits, because most take &self as an argument.

Option 1a: Support &H

Note: This extension is possible for all options, but is most relevant for Option 1.

Instead of special-casing HeapRef<H>, special-case &H (and &mut H, *const H, ). That is, for any WebAssembly heap type H (potentially represented by an "extern type"), treat &H as a reference to the WebAssembly heap, with the semantics and properties described for HeapRef<H> above.

HeapRef<H> would then be essentially equivalent to NonNull<H>.

This allows implementing a lot of stdlib traits for H, but not for &H.

Option 2: Fake References

&HeapRef<H> is allowed and is represented the same as HeapRef<H>, i.e. they represent the same address. (In Rust terms this would basically mean (&x as *const _).addr() == (&&x as *const _).addr()). &heap_ref would be a no-op at the assembly level.

Positives:

  • "zero cost".
  • A lot of stdlib traits become usable.

Negatives:

  • Does not support trait objects (and thus no format_args!).
  • Writing to dereferences would not be possible (i.e. with m: &mut HeapRef<T>, *m = heap_ref would be a compile-time error, possibly post-monomorphization).
  • Non-trivial nullability semantics:
    • &&HeapRef<H> is allowed, *const *const HeapRef<H> is not.
    • With fake references, we only have a single bit of nullability information across all levels of indirection. So if we have a possibly-null *const HeapRef<T>, then we cannot add another layer of possibly-null around that. (We would not be able to differentiate which of the two *const is null).
    • This is mostly relevant for code dealing with raw pointers.
    • We already have a similar situation with Option 1: Option<HeapRef<H>> is fine (and considered a nullable reference to the WebAssembly heap). Option<Option<HeapRef<H>>> is not allowed.
      • Though this would also get more complicated with Option 2: Option<&HeapRef<H>> is fine, Option<&Option<HeapRef<H>>> is not.

Option 3: Indirect References

WebAssembly has support for "tables", which can be though of as global Vec<HeapRef<H>> values. They can be used to assign an index to a HeapRef<H>. The index can be stored in linear memory and can also be used to retrieve the original HeapRef<H> again.

&HeapRef<H> is represented as an index into a table managed and owned by the Rust compiler, and when encountering &heap_ref the compiler would write heap_ref to the table and then use the corresponding index.

Summarizing the representations:

  • HeapRef<H>: opaque value, can be stored only in local variables (and tables).
  • &HeapRef<H>: Index into a Rust-owned table.
  • &&HeapRef<H>: Index into linear memory

Positives:

  • Most general solution.
  • Trait objects are supported.
  • This is somewhat similar to how function pointers are currently treated (though in that case LLVM owns the table).

Negatives:

  • Not "zero cost".
    • There is some precedence in Rust for making indirection explicit: Implicit trait objects were deprecated in favor of the dyn syntax.
  • Requires a non-trivial runtime library to manage the table.
  • Non-trivial codegen.

Option 4:

TODO: Expand this section.

Represent HeapRef<H> as an actual WebAssembly heap reference in function signatures. In all other cases, represent HeapRef<H> as a table index. Implement optimization passes (MIR, LLVM, or binaryen) to avoid unnecessary table accesses.


  1. Function pointers are not. They are also indices, but not into linear memory. ↩︎