# Wasm split
:::info
Since this is largely about size efficiency we will only
define this splitting for binary forms (not WAT).
:::
Design goals:
- The hash of a "top level split Wasm" should validate the contents of all (recursively) split sections
- Deterministic split algo to allow signature/hash validation of both original and split content
- Can be _spliced_ without knowing canonical _split_ algo
- Can determine original size from the split container (allow pre-allocating space while splicing)
- Split "fragments" contain just the "main content" of their original section/segment
## Candidate splits
- Components
- Components (recursive)
- Custom sections
- Core modules
- Data segments
- Custom sections
- Core data segments (if added)
## Proposal
> TBD: preamble layer "split" bit
### Split algorithm
Taking binary Wasm as input:
- Set the preamble layer "split" bit and emit the updated
preamble
- For each section:
- If the section is _not_ to be split:
- Emit the section, unchanged
- If the section _is_ to be split:
- Hash the section's "content" (see "Split sections")
and write the content to a store, keyed by the
digest
- Emit the corresponding "split section"
#### Digest calculation algorithm
Digest calculation is a variation of the split algorithm which requires all possible (recursive) section splits to be performed. Split contents do not need to be stored, just included in recursive digest calculations.
### Splice algorithm
Taking binary Wasm as input:
- Check the preamble layer "split" bit
- If unset, emit the entire binary and quit
- If set, unset it and emit the updated preamble
- For each section:
- If the section is _not_ a "split section":
- Emit the section unchanged and continue with the next
section
- If the section _is_ a "split section":
- Reconstruct and emit the corresponding section header
- Look up the section contents by digest and emit
### Typed hash digest
Content is hashed using a supported hash algorithm to produce a content digest. This is encoded along with an identifier of the hash algorithm:
```
typeddigest ::= 0x00 sha256digest:byte[32]
```
For split sections with optional splitting of inner sections/segments, the `typeddigest` _must_ be computed "as if" all allowed splitting has been done.
### Split section
> Split section ID (`X`) TBD; should be reserved in component and core specs
```
splitsection_N(S) ::= section_X(
sectionid:N
sectionsize:u32
S
)
```
- `sectionid` and `sectionsize` record the original section's id and size
#### Component
```
componentsplit ::= splitsection_4(typeddigest)
```
#### Core module
```
coremodulesplit ::= splitsection_1(typeddigest)
```
#### Custom
```
customsplit ::= splitsection_0(customsplitdata)
customsplitdata ::= customname:name typeddigest
```
- `customname` is copied from the original custom section's [`name`](https://webassembly.github.io/spec/core/binary/values.html#binary-name)
#### Data
```
datasplit ::= splitsection_11(segments:datasegmentsplitopt*)
datasegmentsplitopt ::= 0:u8 segmentdata:vec<byte>
| 1:u8 segmentheader:vec<byte>
segmentdatasize:u32
segmentdatadigest:typeddigest
```
- The first variant of `datasegmentsplitopt` allows a splitting implementation to leave a data segment inline (`segmentdata`) but may not be used when calculating digests for the parent core module.
- `segmentheader` records the original data segment's "header": the segment type and any other fields that come before the actual segment's data. This allows the splice algorithm to mechanically reconstruct segments without "knowing" about segment types
- `segmentdatasize` and `segmentdatadigest` record the original segment data's size and digest
## Other thoughts
### MIME Types
You could think of "split" Wasm as just another kind of binary Wasm with particular features, in which case it could be covered by the existing `application/wasm` type. However, it may be useful to allow consumers to differentiate between "split" and "unsplit" forms. We could specify a separate type (e.g. `application/split+wasm`) or a parameter (e.g. `application/wasm; split=1`) to distinguish between them.