# Wasm split :::info Since this is largely about size efficiency we will only define this splitting for binary forms (not WAT). ::: Design goals: - The hash of a "top level split Wasm" should validate the contents of all (recursively) split sections - Deterministic split algo to allow signature/hash validation of both original and split content - Can be _spliced_ without knowing canonical _split_ algo - Can determine original size from the split container (allow pre-allocating space while splicing) - Split "fragments" contain just the "main content" of their original section/segment ## Candidate splits - Components - Components (recursive) - Custom sections - Core modules - Data segments - Custom sections - Core data segments (if added) ## Proposal > TBD: preamble layer "split" bit ### Split algorithm Taking binary Wasm as input: - Set the preamble layer "split" bit and emit the updated preamble - For each section: - If the section is _not_ to be split: - Emit the section, unchanged - If the section _is_ to be split: - Hash the section's "content" (see "Split sections") and write the content to a store, keyed by the digest - Emit the corresponding "split section" #### Digest calculation algorithm Digest calculation is a variation of the split algorithm which requires all possible (recursive) section splits to be performed. Split contents do not need to be stored, just included in recursive digest calculations. ### Splice algorithm Taking binary Wasm as input: - Check the preamble layer "split" bit - If unset, emit the entire binary and quit - If set, unset it and emit the updated preamble - For each section: - If the section is _not_ a "split section": - Emit the section unchanged and continue with the next section - If the section _is_ a "split section": - Reconstruct and emit the corresponding section header - Look up the section contents by digest and emit ### Typed hash digest Content is hashed using a supported hash algorithm to produce a content digest. This is encoded along with an identifier of the hash algorithm: ``` typeddigest ::= 0x00 sha256digest:byte[32] ``` For split sections with optional splitting of inner sections/segments, the `typeddigest` _must_ be computed "as if" all allowed splitting has been done. ### Split section > Split section ID (`X`) TBD; should be reserved in component and core specs ``` splitsection_N(S) ::= section_X( sectionid:N sectionsize:u32 S ) ``` - `sectionid` and `sectionsize` record the original section's id and size #### Component ``` componentsplit ::= splitsection_4(typeddigest) ``` #### Core module ``` coremodulesplit ::= splitsection_1(typeddigest) ``` #### Custom ``` customsplit ::= splitsection_0(customsplitdata) customsplitdata ::= customname:name typeddigest ``` - `customname` is copied from the original custom section's [`name`](https://webassembly.github.io/spec/core/binary/values.html#binary-name) #### Data ``` datasplit ::= splitsection_11(segments:datasegmentsplitopt*) datasegmentsplitopt ::= 0:u8 segmentdata:vec<byte> | 1:u8 segmentheader:vec<byte> segmentdatasize:u32 segmentdatadigest:typeddigest ``` - The first variant of `datasegmentsplitopt` allows a splitting implementation to leave a data segment inline (`segmentdata`) but may not be used when calculating digests for the parent core module. - `segmentheader` records the original data segment's "header": the segment type and any other fields that come before the actual segment's data. This allows the splice algorithm to mechanically reconstruct segments without "knowing" about segment types - `segmentdatasize` and `segmentdatadigest` record the original segment data's size and digest ## Other thoughts ### MIME Types You could think of "split" Wasm as just another kind of binary Wasm with particular features, in which case it could be covered by the existing `application/wasm` type. However, it may be useful to allow consumers to differentiate between "split" and "unsplit" forms. We could specify a separate type (e.g. `application/split+wasm`) or a parameter (e.g. `application/wasm; split=1`) to distinguish between them.