Push-Based Hypermachines

# Push-Based Hypermachines What if instead of the pull-based model above that's based on a combination of polling and waiting, we used a pushed-based model instead? This should eliminate the need for the `wait` syscall and simplify the remaining syscalls as well. ## Execution Trace The runtime will append records to an execution trace feed for all traced syscalls on two conditions: 1. Whenever a traced syscall is executed by the guest. 2. Whenever the guest is notified that new blocks are available and `on_append` is invoked. The execution trace can be used to replay a machine during auditing. Replaying a machine involves iterating over the trace (starting from an optional checkpoint) and calling `on_append`, `on_pause`, and `on_resume` in the order specified by the trace. In auditing mode, the tree hashes of the inputs/outputs used during replay are compared against those of the machine being audited to ensure the machine was executed correctly. ### Trace Schema An execution trace is a feed of the form: `[ TraceMessage1, TraceMessage2, ... TraceMessageN ]` Trace messages are encoded using Protocol Buffers. Each message is encoded as a `TraceMessage`, which is a union type of the form: ```proto message TraceMessage { enum Type { AddInput = 1; AddOutput = 2; RemoveInput = 3; RemoveOutput = 4; Has = 5; Get = 6; Append = 7; Pause = 8; Terminate = 9; } required Type type = 1; oneof msg { AddInput add_input = 2; AddOutput add_output = 3; RemoveInput remove_input = 4; RemoveOutput remove_output = 5; Has has = 6; Get get = 7; Append append = 8; Pause pause = 9; Terminate terminate = 10; } } ``` #### Common Types Trace messages will need to reference both feeds and blocks in inputs/outputs. These types are defined at the top-level of the schema: ##### Seq References a block in a feed. The seq can be made immutable by including a Merkle tree hash, which will be verified by the runtime. ```proto message Seq { required uint64 pos = 1; optional bytes hash = 2; } ``` ##### FeedLink A feed link references a feed before it has been assigned an identifier by the runtime. This type is used when adding new inputs/outputs. If `seq` is provided, then the feed will be static (it will never update). ```proto message FeedLink { required bytes key = 1; optional Seq seq = 2; } ``` ##### IdLink An ID link references a feed after it has been assigned an identifier by the runtime. This type is used both when fetching data and when removing inputs/outputs. If `seq` is provided, then the IdLink references a block. ```proto message IdLink { required uint32 id = 1; optional Seq seq = 2; } ``` ##### Range A range represents a range of contiguous blocks within a feed, referenced by ID. ```proto message Range { required uint32 id = 1; required Seq start = 2; optional Seq end = 3; } ``` #### Message Types ##### AddInput Traced whenever the machine has dynamically created a new input, and when the machine is initialized with an initial set of (externally-defined) inputs. The latter inputs are indicated with an `external` flag. This message type defines the mapping from an input FeedLink to ID. ```proto message AddInput { required uint32 id = 1; required FeedLink link = 2; required bool external = 3; // true if this input was added externally during initialization. } ``` ##### AddOutput Traced whenever the machine has dynamically created a new input, and when the machine is initialized with an initial set of (externally-defined) inputs. The latter inputs are indicated with an `external` flag. The message type defines the mapping from an output FeedLink to an ID. ```proto message AddOutput { required uint32 id = 1; required FeedLink link = 2; required bool external = 3; // true if this input was added externally during initialization. } ``` ##### RemoveInput Traced whenever the machine has dynamically removed an existing input. The input must be referenced by its ID. ```proto message RemoveInput { required uint32 id = 1; } ``` ##### RemoveOutput Traced whenever the machine has dynamically removed an existing output. The output must be referenced by its ID. ```proto message RemoveOutput { required uint32 id = 1; } ``` ##### Has Traced whenever the runtime notifies the guest that new data is available on a single input (via the `on_append` call). The `Has` message must precede any trace messages generated during the `on_append` call. Everything during the `on_append` call, including the `Has`, must be atomically batched into the trace. ```proto message Has { required IdLink input = 1; required Seq length = 2; optional Seq previousLength = 3; } ``` ##### Get Traced whenever the guest performs a `get` syscall to read multiple block ranges from a set of inputs. ```proto message Get { repeated Range ranges = 1; } ``` ##### Append Traced whenever the guest performs an `append` syscall to write ranges to a set of outputs. ```proto message Append { repeated Range ranges = 1; } ``` ##### Pause Traced whenever the runtime pauses execution of a machine. ```proto message Pause {} ``` ##### Terminate Traced whenever the guest performs a `terminate` syscall. ```proto message Terminate {} ``` ## Syscalls Each Hypermachine will import a standard library of syscalls for: 1. Creating/removing inputs and outputs. 2. Getting information about inputs/outputs. 3. Getting blocks from inputs/outputs. 4. Terminating. A subset of these syscalls will be traced: a record of their execution will be stored in the machine's execution trace feed. Additionally, a subset of these syscalls are asynchronous and will transparently block. That is, the engine will be paused until a result is available. The remainder of this document will use AssemblyScript pseudocode to describe syscalls and code examples for clarity, rather than using base WebAssembly types. In practice, all syscalls will consume/output WASM types. All types assume a 32-bit WASM runtime. ### Library Types #### `Pointer: (i32, i32)` An offset/length pair pointing to a spot in WASM linear memory. #### `Id: i32` An identifier used to represent an input/output feed. #### `TreeHashPointer: Pointer` A pointer/length pair to a feed's Merkle tree root hash. #### `KeyPointer: Pointer` A pointer/length pair for a feed's key. #### `Seq: (i64, TreeHashPointer | Null)` An integer/hash referencing a feed block. This type must be 64-bit because i32::MAX is too small to represent large feeds (with over 2e9 blocks). The reference can be made immutable by including an optional TreeHashPointer. #### `FeedLink: (KeyPointer, Seq | Null)` Represents a link to a particular feed. The link can be made immutable by including a `Seq`. If a `Seq` is included, the input or output will be static. #### `IdRef: (Id, Seq)` Represents a link to a particular feed or block, referenced by ID. The link can be made to reference an (optionally immutable) block by including a `Seq` with an optional `TreeHashPointer`. #### `RangeRef: (Id, Seq, Seq)` Points to a contiguous range of blocks within a single input/output. #### `BlockPointer: (IdRef, Pointer)` Points to the block data for a given IdLink. #### `RangePointer: (RangeRef, Pointer, Pointer)` Points to a contiguous range of blocks. The second pointer references a list of i32s containing block lengths. ### Traced Syscalls These syscalls will store protobuf records in the machine's execution trace. The execution trace can be used to replay and verify a machine during auditing. #### `add_input(link: FeedRef | Null): Id` Creates a new input. If `link` is `null`, then the feed will be initialized. Otherwise, an existing feed will be loaded. Returns the ID that has been assigned to the input by the runtime. Traces an `AddInput` message. #### `add_output(link: FeedRef | Null): Id` Creates a new output. If `link` is `null`, then the feed will be initialized. Otherwise, an existing feed will be loaded. Returns the ID that has been assigned to the output by the runtime. Traces an `AddOutput` message. #### `remove_input(id: Id)` Removes an existing input by ID. Traces a `RemoveInput` message. #### `remove_output(id: Id)` Removes an existing output by ID. Traces a `RemoveOutput` message. #### `get(ranges: [RangePointer])` Fetches block data into the memory sections specified by the ranges links. A `get` can fetch ranges from many different inputs/outputs. This syscall is blocking. The machine will be paused while the requested data is fetched from the network. Traces a `Get` message. #### `append(ranges: [RangePointer]` Writes ranges to a set of outputs. Each range can contain many blocks to append to a specified output, and these blocks will be batch inserted. This syscall is blocking. The machine will be paused until the data has been appended to the feeds. Traces an `Append` message. #### `terminate()` Terminate the machine. Traces a `Terminate` message. ### Untraced Syscalls These syscalls do not need to record information in the execution trace, since they return information that must be consistent across all possible program executions, and they do not modify state. #### `block_length(block: IdRef): i32` Gets the length of the specified block, if that block is available. -1 otherwise. If the IdLink has a Seq with offset -1, then the length of the latest block will be returned. ## Lifecycle Methods Every Hypermachine must export a subset of the following methods, which are invoked by the runtime at different points during the machine's lifecycle. Every method is optional. #### `on_initialize(num_inputs: i32, num_outputs: i32)` Called when the machine is started for the very first time. If the machine has been created with an initial set of inputs/outputs, then it's assumed that those feeds have been assigned IDs starting from 1, and the largest ID for each is passed in as an argument. #### `on_append(id: IdRef, start: Seq, end: Seq)` Called when new data is available on one of the inputs. `prev` is the `Seq` of the first unprocessed block, and `end` is the `Seq` of the latest block in the feed. When this function exits, the runtime must record a `Has` message in the trace log, preceding any additional messages generated during the call (in a batch). #### `on_pause()` Called when the runtime wishes to cleanly shut down. This function can be used to record checkpoints, for example. This method must trace a `Pause` message, as it might trigger side-effects. During replay, pause/resume message must be used to reinvoke `on_pause` and `on_resume`. #### `on_resume()` Called when a previously-initialized machine is restarted. ## Examples The following examples are written in AssemblyScript pseudocode. The chosen examples are order-dependent, which better highlights the use-cases of Hypermachines. ### Hasher This hashing machine computes a rolling has over all blocks in a set of N statically-defined input feeds. Each new hash is appended to a single statically-defined output. This machine does not need to implement any of the other lifecycle methods, as it does not require any initialization and also does not keep any in-memory indexes (which might require checkpointing). ```javascript= const machine = require('@hypermachines/machine') const hash = require('some-wasm-hash-function') const { RangeRef, IdRef, Seq, RangePointer } = require('@hypermachines/types') // Can we provide this as a library, given that it'll need to malloc? function allocateRange (range: RangeRef): RangePointer { var prevOffset: i64 = range.start.seq const latestOffset: i64: range.end.seq const lengths = [] var totalLength = 0; for (let i = prevOffset; i <= latestOffset; i++) { const blockRef = IdRef(range.id, Seq(i, null)) const blockLength = machine.block_length(blockRef) lengths.push(blockLength) totalLength += blockLength } const blocks = malloc(totalLength) return RangePointer(range, blocks, lengths) } class HashingMachine { on_append (range: RangeRef) { const input = allocateRange(range) // The machine only has one statically-defined output at ID 0. // Allocate room for the latest block in feed 0. // A RangeRef where both start and end are -1 will point to the latest block. const latestOutputRef = RangeRef(0, Seq(-1, null), Seq(-1, null)) const output = allocateRange(lastOutputRef) machine.get([input, output]) var nextHash = output.blocks[0] for (const block of input.blocks) { nextHash = hash(nextHash + block) } // We can reuse the memory allocated for the previous hash now. // The lengths should be the same, as hashes have consistent block lengths. output.blocks = nextHash machine.append([output]) } } module.exports = HashingMachine ```