owned this note
owned this note
Published
Linked with GitHub
# Shared-Everything Linking RFC
## Caveat
This is an early draft and will evolve based on feedback and experimentation.
## Summary
This is a proposal to standardize "shared-everything" linking of WebAssembly modules, including a runtime `dlopen`/`dlsym`-style interface to accommodate existing software which uses those APIs. It reuses the existing [`emscripten` convention](https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md) for dynamic linking and anticipates the [Component Model approach](https://github.com/WebAssembly/component-model/blob/main/design/mvp/examples/SharedEverythingDynamicLinking.md) to "shared-everything" linking.
## Motivation
The [Component Model](https://github.com/WebAssembly/component-model) is a proposed standard for composing WebAssembly binaries, primarily using a "shared-nothing" model in which each component has its own private memory, tables, etc. which are not directly available to other components. While this "shared-nothing" model is a good default choice for composition, it's not necessarily ideal for the large body of existing shared libraries which were designed and written for a "shared-everything" model. Such libraries may rely on passing pointers to native data structures for performance, and even in cases where performance is not critical, adapting an existing library (or ecosystem of libraries) for use in a "shared-nothing" context is a significant undertaking, especially when those libraries must continue to work in existing "shared-everything" contexts. The vast ecosystem of Python native extensions is a good example of this.
Therefore, we would like to support intra-component, shared-everything linking as a complementary feature to inter-component, shared-nothing linking. Fortunately, the [emscripten](https://emscripten.org/) project has already established a general-purpose [dynamic linking ABI](https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md), and the Component Model provides [the tools](https://github.com/WebAssembly/component-model/blob/main/design/mvp/examples/SharedEverythingDynamicLinking.md) we need to link modules adhering to that ABI (modulo [cycle-breaking](https://github.com/WebAssembly/component-model/blob/main/design/mvp/examples/SharedEverythingDynamicLinking.md#cyclic-dependencies)) without code duplication.
Our vision is to build a tool which accepts modules targeting the `emscripten` dynamic library ABI and produces a component according to the Component Model shared-everything linking proposal. However, such a component could not be pre-initialized using [Wizer](https://github.com/bytecodealliance/wizer) because, as of this writing, Wizer does not yet support components. This is a problem because most high level languages benefit considerably from pre-initialization, and losing that ability has an unacceptable performance cost for many applications. Thus we are left with a few options:
1. Require pre-initialization to happen _before_ linking, and disallow access to dynamic libraries until after pre-init.
2. Implement dynamic linking as part of pre-initialization in Wizer
3. Add component support to Wizer (or at least a subset which covers the style of component produced by linking shared library modules)
4. For the time being, make the linker generate a module instead of a component so that Wizer can handle it. This module may be pre-initialized by Wizer as it exists today, and then converted into a component using `wit-component`.
Option (1) would be limiting and confusing to users (e.g. adding an import to your Python app might break pre-init), while option (2) would be a big effort without moving us closer to the end goal of component-level linking. Option (3) would be ideal, but is a very large undertaking. This document proposes (4) for the initial implementation since it's the easiest one to deliver quickly. Option (3) would be the logical next step, and could even proceed in parallel if there are resources available.
The primary drawbacks of option (4) are:
- Code duplication: since the main module and its shared libraries are flattened into a monolithic module, there's no realistic opportunity for the host to reuse those libraries (e.g to save memory and AOT compilation time).
- Any DWARF debug info will likely be discarded, since transforming it in parallel with the code transformations needed for linking will probably be prohibitively difficult.
We expect both of those drawbacks to be addressed once we transition to component output.
## Proposal
We propose to add a new tool, named `wasm-dyld` (open to bikeshedding), which accepts as input a main module and one or more dynamic library modules conforming to the `emscripten` ABI. In the initial implementation, the output of this tool will also be a module representing the linked composition of the inputs. Once Wizer supports components, `wasm-dyld` will be updated to output a component instead.
We do not expect that the transition to component output will require changes to the format or ABI of the input files. However, each new `wasi-sdk` release may introduce breaking changes at the libc level; see the `ABI and compatibility` section below for further discussion.
Note that, since the component model disallows cyclic dependencies, `wasm-dyld` will need to transform its input modules using the cycle-breaking algorithm described [here](https://github.com/WebAssembly/component-model/blob/main/design/mvp/examples/SharedEverythingDynamicLinking.md#cyclic-dependencies).
For each function or global which is explicitly imported by the main module and exported by one of the shared libraries (and vice-versa), `wasm-dyld` will link them together, erasing the original imports and exports. For example, if the main module imports a function `foo` from the `libfoo` module, and `libfoo.so.wasm` exports a function `foo` with a matching core Wasm type, then each call to `foo` in the main module will be updated to call the `libfoo` version directly.
In order to support existing apps which use `dlopen` and `dlsym` to link code at runtime, `wasm-dyld` will detect whether the main module imports those functions from a well-known module (name TBD; `wasm-dyld-v0`?) and, if so, synthesize them based on the shared library import names and the names of their exported functions and globals. Note that we will _not_ support true dynamic linking in the sense of loading code at runtime -- the synthesized functions will simply resolve symbols to table indexes which refer to functions and globals already present in the module.
Finally, in order to support apps which search the filesystem for shared libraries, `wasm-dyld` will optionally virtualize filesystem access, creating the illusion that those files exist even though they do not.
### Details
#### ABI and compatibility
As of this writing, there is only one known ABI difference between `wasm32-unknown-emscripten` and `wasm32-wasi` (the alignment of the `long double` type -- 8 and 16 bytes, respectively), and we expect that difference to be eliminated in an upcoming LLVM release. That means there is no need to distinguish between the two in the `dylink.0` metadata.
On the other hand, since `wasi-libc` does not yet have the resources needed to ensure backwards compatibility across releases, `WASI` shared libraries will need to indicate which version of `wasi-libc` is needed by way of the `WASM_DYLINK_NEEDED` section. For example: `wasi-libc.so.21`.
The main module input to `wasm-dyld` may export a memory and a table, and is _not_ required to use position-independent code (TODO: need to verify this experimentally; alternative is to add support in `wasi-sdk` for building libc with `-fPIC`). It _is_ required to only use memory it has allocated itself via either the original minimum memory size or `memory.grow` (and likewise for table allocations). This is because `wasm-dyld` will extend the memory and tables for use by shared libraries, and the main module must not try to use those allocations.
The output module produced by `wasm-dyld` will conform to the Core WebAssembly specification, version 1 and the `wasm32-wasi` ABI. Once it begins generating components instead, the output format may vary over time as the Component Model proposal evolves.
#### Comparison to ELF, etc.
Existing executable formats such as ELF, PE, Mach-O, etc. may support features beyond what the `emscripten` format supports. For example, Mach-O has a [two-level namespacing scheme](https://developer.apple.com/library/archive/documentation/DeveloperTools/Conceptual/MachOTopics/1-Articles/executing_files.html#//apple_ref/doc/uid/TP40001829-97182-TPXREF112), which may eventually be useful for Wasm linking. Adding support for such a feature will likely involve bumping the custom section name from `dylink.0` to `dylink.1` and either adding or changing subsection definitions in the conventions document.
#### Where does each piece live? Who is responsible for what?
LLVM already supports generating shared libraries, and we expect support will continue to live there. It may be necessary to change LLVM and/or `wasm-ld`, e.g. to ensure an entry for `wasi-libc` is added to `WASM_DYLINK_NEEDED` when appropriate, but we expect any such changes to be minimal.
We propose that `wasm-dyld` would be hosted by the Bytecode Alliance organization on GitHub and be maintained by members of that organization. (TODO: would the WebAssembly org on GitHub be a better home?)
#### What new features are required in core Wasm or the Component Model?
None expected.
#### Security implications
Shared-everything linking implies that each shared library linked into an app will have full, unrestricted access to that app's memory, tables, globals, etc, and vice-versa. Thus, this is only suitable for linking trusted code.
TODO: other considerations?
## Examples
TODO