owned this note
owned this note
Published
Linked with GitHub
# Proposal for Component Pre-initialization
## TL;DR
I'm proposing a pre-init method for WebAssembly [components](https://github.com/WebAssembly/component-model) which operates on an input component of a certain shape (single memory; no subcomponents), adding temporary export functions to expose the memory and global variables, from which we can create a snapshot using any component-capable Wasm runtime.
## Background
One of the major challenges we've faced in designing and implementing [shared-everything linking](https://hackmd.io/IlY4lICRRNy9wQbNLdb2Wg) for the component model is that, as of this writing, there are no tools supporting [Wizer](https://github.com/bytecodealliance/wizer)-style pre-initialization for components. We've discussed various options, including adding such support to either Wizer or Wasmtime, but none of them are particularly attractive. Adding it to Wizer would require adding new APIs to Wasmtime for walking a component instance hierarchy and extracting internal state, and it's not clear that the feature belongs in Wasmtime itself.
## Proposal
I'm proposing that we create a new tool, initially focused on the specific shape of component produced by a shared-everything linking operation (single memory; single table; each module instantiated exactly once; no subcomponents), which does the following:
- Validates the input component, verifying it does no runtime table operations and does not use reference types (i.e. the same restrictions Wizer currently enforces, except that modules inside the component _may_ import a memory, table, and globals from the "main" module).
- Adds exports to each module for any mutable globals not already exported by that module.
- Adds a synthesized module which imports the memory and all mutable globals from all modules and exports functions which may be lifted to provide the following component-level exports:
- One function for each mutable global in each module, of the form`get-{module}-{name}: func() -> {type}`, where `{module}` identifies the module, `{name}` identifies the global, and `{type}` is the lifted type of the global.
- A `get-memory: func() -> list<u8>` which returns the entire content of the memory.
- Instantiates the resulting component using any component-capable runtime (Wasmtime or otherwise) and invokes the pre-init function specified by the user
- Calls the above synthetic functions to create a snapshot of the memory and globals.
- Edits the component again, this time removing the synthesized module and any global exports added earlier. This step also removes all data segments and `start` functions from all modules, replacing them with new data segments comprised of the non-zero parts of the memory snapshot captured above. Finally, it will update the initializers for each mutable global to match the snapshot values.
- Outputs the resulting component.
Note that, while this proposal only considers a certain shape of component, I expect this technique could be generalized to support arbitrary component graphs. One complication in the general case is that a given module may be instantiated more than once. In that case it may be necessary to edit that module so that it exports its memory and all its mutable globals, and these would be initialized by separate, synthesized modules -- one per instantiation.
Also note that, since there's nothing runtime-specific about this approach, we could theoretically use a Wasm interpreter which itself compiles to Wasm and package the whole tool as a Wasm module or component.
## Implementation
The `componentize-py` repository contains first draft [implementation](https://github.com/dicej/componentize-py/blob/98a7c3bbc60fe8cafcf39174df7b599e91b66202/component-init/src/lib.rs) which I hope will form the basis for a more polished, general-purpose library.