Maru: Current Status & What Next

# Maru: Current Status & What Next This document has three purposes: 1. Describe precisely what the goal of Maru is 2. Describe the current state of the project 3. Describe what still must be done to get to: * get a working demo of ingress * get a working demo of something useful (e.g. a TWAP). ## Goal of Maru The goal of Maru is, most abstractly, to be able to succinctly verify dataflows whose input is the Ethereum blockchain. To be more precise, we need a few definitions. We say a *dataflow* is a chain of *operators* composed together, and an *operator* is a construct with three components: 1. an input stream 2. a state machine that processes elements of the input stream(s) one message at a time 3. an output stream A dataflow over the Ethereum blockchain means that all data in every stream ultimately comes from the Ethereum blockchain. Thus we can split our system into two pieces: * ingress - how to "pull in" data from the ethereum blockchain into a more friendly, standardized "stream" format * processing - how to actually process the stream. Let's first focus on how we prove correctness of processing the stream. We'll come back to ingress. ### Processing Such systems are nothing new. With Maru, our goal is to make them *succinctly verifiable*, which mean, at a high level, being able to output a *single* proof that, at a particular logical timestep (index in the output stream), the entire history of the output is "correct" - i.e. each of the operators computed the correct result, were composed in the correct order, processed all messages, and processed messages in the correct order. However, for it to be *succinct*, we cannot require the verifier to download the entire history of the stream. Instead, we *commit* to a stream, where the commitment serves as a "snapshot" of the stream at a particular (logical) point in time. Then, we generate a SNARK proof of the following statement: * public inputs: * `input_stream_commitment` * `output_stream_commitment` * `input_timestamp` * `output_timestamp` * `op_chain_digest` * private inputs: * `input_stream` * `output_stream` * statement: * `input_stream_commitment = MSH([(i, m_i) for i in 0..input_timestamp])`, where `m_i` is the `i`th element of `input_stream` * `output_stream_commitment = MSH([(i, output_m_i) for i in 0..output_timestamp])`, where `output_m_i` is the `i`th element of `output_stream` * `op_chain_digest` is an order-dependent hash of every operator in the chain. * The history of `output_stream` as of `output_timestamp` is the correct output of the chain of operators when fed the entirety of `input_stream` as input * the last timestamp for the input stream's history is `input_timestamp` * the last timestamp for the output stream's history is `output_timestamp`. There are three questions to be answered: 1. How are "stream histories" defined? 2. What exactly is a "stream commitment", how are they computed, and how are the snapshots defined? 3. How do we prove each operator's logic and compose them together into a single proof The answer to the first question is as follows: 1. A stream's history is the list containing every message sent over the stream 2. A message is an arbitrary chunk of data. For instance, it can be a list of all receipts from a single Ethereum block. The answer to the second question is that the commitment is something called a multiset hash (denoted `MSH`, read about it [here](https://cronokirby.com/posts/2021/07/on_multi_set_hashing/)) and it's calculated as follows: * define `indexed_msg(i, stream_history) = (i, stream_history[i])`. `indexed_msg` maps an index of a message received over the stream to a tuple containing the index and the message. * the stream's commitment is then `MSH([(i, m_i) for i in len(stream_history)])` The answer to the third question has two parts: 1. How do we prove correctness for a single operator? 2. How do we compose them together? To accomplish the first part, for a given operator `Op`, we can write a *recursive* circuit for the `Op`'s' state machine, and generate a proof for the operator's state transition over a continuously batch of input messages. At the base case, the batch size is 1, so we actually check the operator logic. If we're not at the base case, we verify two "inner" proofs for processing two adjacent batches and accummulate the multiset hashes: * public inputs: * `input_commitment` * `output_commitment` * `i` - the index of the last input message before processing the batch * `j` - the indes o fteh last input message after processing the batch * `k` - the index of the last output message before processing the batch * `l` - index of the last output message after processing the batch * `curr_operator_state_commitment` * `next_operator_state_commitment` * private_inputs: * `batch` - the batch of input messages, where the `n`th msg in the batch is `m_(i + n)` * `curr_operator_state` - the operator state before the batch was processed * `next_operator_state` - the operator state after the btach was processed, * `output_msgs`: `l - k` batch messages * `left_proof`: a proof of an "inner" instance of this SNARK and its public inputs (don't care what this is if `j-i == 1`) * `right_proof`: same as `left_proof` * statement: * if `i == 0`, then `curr_operator_state` is the initial state for the state machine. * `j - i > 0` * if `j - i == 1`, it's the base case and we ignore both `left_proof` and `right_proof`: * `curr_operator_state_commitment = state_commit(curr_operator_state)` * `next_operator_state_commitment = state_commit(next_operator_state)` * `input_commitment` is `MSH([(n, m_n) for n in i..j]` * `output_commitment` is `MSH([(n, output_msg_n) for n in k..l])` (the multiset hash of the empty set is defined as the zero element of the group) * When `Op` has state `curr_operator_state` and processes the batch in order, the new state is `next_operator_state`. * When `Op` has state `curr_operator_state` and processes the batch in order, the output messages are `(n, output_msg_n) for n in k..l`, in that order. * otherwise, we accumulate: * `left_proof`'s verifier accepts given its public inputs * `right_proof`'s verifier accepts given its public inputs * `i == left_proof.i` * `j == right_proof.j` * `left_proof.j == right_proof.i` * `k == left_proof.k` * `l == right_proof.l` * `left_proof.l == right_proof.k` * `curr_operator_state_commitment == left_proof.curr_operator_state_commitment` * `next_operator_state_commitment == right_proof.next_operator_state_commitment` * `left_proof.next_operator_state_commitment == right_proof.curr_operator_state_commitment` * `input_commitment == left_proof.input_commitment + right_proof.input_commitment` * `output_commitment == left_proof.output_commitment + right_proof.output_commitment` Using a proof system with fast, cyclic recursion such as [plonky2](https://github.com/mir-protocol/plonky2), such a circuit can be implemented and the performance won't be horrible. This design allows for horizontal scalability, as the [proof recursion structure forms a tree and can be pipelined](https://minaprotocol.com/blog/fast-accumulation-on-streams). > Note: If `Op`'s logic or state is complex, it may make sense to write the logic itself in a STARK VM like Cairo, and feed the input / output messages and state commitment into the VM via a public memory argument, and then wrap *that* STARK proof in a SNARK with identical interface to the one above. The only difference is that, instead of checking the `Op`'s logic, the SNARK proof described above would verify the STARK proof and check the public memory argument to ensure the correct messages were processed / emitted. Now let's answer the second part - composition. Let's name the above SNARK system `OpSNARK(op)` for a given operator `op`. Suppose we had proofs for two of them for different operators, `OpSNARK(op_0)` and `OpSNARK(op_1)`. Then we can compose them with another recursive SNARK, which we'll call `CompSNARK`: * public inputs: * `input_commitment` * `output_commitment` * `input_timestamp` * `output_timestamp` * `op_chain_digest` * `num_ops` * private inputs: * `left_proof`: proof of an "inner" instance of this SNARK, along with its public inputs * `right_proof`: same as `left_proof` * `left_op_proof`: proof for `OpSNARK(op_left)` for some operator `op_left` and its public inputs * `right_op_proof`: same as `left_op_proof`, but for a different operator `op_right` * statement: * `num_ops > 0` * if `num_ops == 2`, it's a base case and we ignore both `left_proof` and `right_proof`: * verifier for `left_op_proof` accepts * verifier for `right_op_proof` accepts * `input_commitment == left_op_proof.input_commitment` * `output_commitment == right_op_proof.output_commitment` * `left_op_proof.output_commitment == right_op_proof.input_commitment` * `input_timestamp == left_op_proof.input_timestamp` * `output_timestamp == right_op_proof.output_timestamp` * `left_op_proof.output_timestamp == right_op_proof.input_timestamp` * `op_chain_digest == hash(left_circuit_digest, right_circuit_digest)` * where `left_circuit_digest` and `right_circuit_digest` are the circuit digests for `OpSNARK(op_left)` and `OpSNARK(op_right)` respectively. * This is something you have access to when building a recursive circuit with `plonky2`. This should also be the case for other fast recursive proof systems. * else if `num_ops == 1` it's the other base case and we ignore `left_proof`, `right_proof`, and `right_op_proof`: * verifier for `left_op_proof` accepts * `input_commitment == left_op_proof.input_commitment` * `output_commitment == left_op_proof.output_commitment` * `input_timestamp == left_op_proof.input_timestamp` * `output_timestamp == left_op_proof.output_timestamp` * `op_chain_digest == left_circuit_digest` * where `left_circuit_digest` is defined the same as it is defined in the other base case * otherwise, we accumulate and we ignore both `left_op_proof` and `right_op_proof`: * verifier for `left_proof` accepts * verifier for `right_proof` accepts * `input_commitment == left_proof.input_commitment` * `output_commitment == right_proof.output_commitment` * `left_proof.output_commitment == right_proof.input_commitment` * `input_timestamp == left_proof.input_timestamp` * `output_timestamp == right_proof.output_timestamp` * `left_proof.output_timestamp == right_proof.input_timestamp` * `num_ops == left_proof.num_ops + right_proof.num_ops` * `op_chain_digest == hash(left_proof.op_chain_digest, right_proof.op_chain_digest)` This recursive proof achieves the above interface we desire for a dataflow, and we get a few nice properties from this approach: it's horizontally scalable *and* succinct - the user only has to verify a single proof to audit the entire computational trace of the dataflow. > If we consider a single system where each operator is proven only once, the structure pertaining to how each operator flows into the next looks like a tree, allowing different applications to share computational resources and never duplicate work. It's possible to extend the scheme to allow multiple input streams, which would extend the overall system to a DAG. However, there's one important thing we need to take care of: the verifier needs to independently check the `input_commitment` - otherwise, a malicious prover could just run the dataflow on a different input. With our current scheme, this would require the user to download the entire input stream and hash it themselves. Instead, we need some "root of trust" for the input into the dataflow that contains a lot of valuable data and provides the a client with the ability to trustlessly keep track of a commitment to all of that data. This is why we only ingest data from a public blockchain, namely Ethereum, because it more or less provides those requirements. However, it doesn't use a multiset hash to commit to the data it contains, so we need a special kind of operator whose `input_commitment` is the Ethereum block hash. We call that "Ingress Operator". ### Ingress Ethereum commits to its data in the following pattern: * the block header contains roots of Modified Merkle Patricia Trees for different parts of the blockchain's data and/or state and the hash of the previous block * the hash of the block header - i.e. the `block_hash`, commits to it all We ingest data from each root in the block header separately - namely the transaction root, receipts root, and state root. For Maru, most of the relevant data for an application can be extracted from contract logs, which are packed inside transaction recipts. Thus, here we focus on the design for an "ingress" operator against `receiptsRoot` and the receipt trie. That said, a similar construction can be utilized for the other tries. What we need to end up with is a SNARK that matches the following interface: * public inputs: * `block_hash` * `receipts_msh` * `num_txs` * private inputs: * `block_header` * `receipts` * statement: * `receipt_msh = MSH([(i, R_i) for i in 0..num_txs])`, where `num_txs` is the total number of transactions processed in the ethereum blockchain with hash `block_hash`, and `R_i` denotes the receipt for the `i`th transaction. To generate such a proof, we take a similar strategy to the above: 1. generate proofs of the above for each block individually 2. use recursion to accumulate the proofs into a single proof for the entire blockchain For the first part, we want a SNARK that matches the following interface, which we'll call `RBlockSNARK`: * public inputs: * `receipts_root` * `receipts_msh` * `total_txs`: the number of txs in the blockchain before the current block * `num_txs`: the number of txs in the current block * private inputs: * `receipts` * statement: * `num_txs = len(receipts)` * `receipts_root = IMPT(receipts)` * where `IMPT` computes the root hash of Ethereum's receipt trie, which is a modified merkle patricia trie where the corresponding key for each receipt is the variable-length (e.g. 0 is "", 1 is "0x01", etc) , big-endian representation of the index in the block of the corresponding transaction. * `receipts_msh = MSH([(i, R_i) for i in total_txs..total_txs + num_receipts]` Then, we can construct an "outer" recursive SNARK that accumulates `RBlockSNARK` proofs. Let's call it `RIngressSNARK`: * public inputs: * `block_hash` * `receipts_msh` * `num_txs` * private inputs: * `block_header`: the current block's header * `prev_proof`: a proof for `RIngressSNARK` for the previous block and its public inputs * `block_proof`: a proof for `RBlockSNARK` for the current block and its public inputs * `GENESIS_NUM_TXS`: a constant representing the number of TXs in the genesis block * `GENESIS_BLOCK_HASH`: a constant representing the gensesis block's hash * `GENESIS_RECEIPTS_MSH`: a constant representing the MSH of the * statement: * `num_txs >= GENESIS_NUM_TXS` * if `num_txs == GENESIS_NUM_TXS`, it's the base case and both proofs are ignored: * `block_hash = GENESIS_BLOCK_HASH` * `receips_msh = GENESIS_RECEIPTS_MSH` * otherwise, accumulate: * `block_proof`'s verifier accepts * `prev_proof`'s verifier accepts * `block_hash == keccak256(block_header)` * `block_header.prev_block_hash == prev_proof.block_hash` * `block_header.receipts_root == block_proof.receipts_root` * `receipts_msh == prev_proof.receipts_msh + block_proof.receipts_msh` * `num_txs == prev_proof.num_txs + block_proof.num_txs` #### Implementation So how do we actually implement these? To generate the inner proof, we need the prover to accomplish two things given the claimed list of receipts in-circuit: * compute `receipts_root` * compute `receipts_msh` The following challenges have to be overcome: 1. The list of receipts is not fixed-sized, and neither are the receipts themselves. 2. computing `receipts_root` entails iterating through the receipts, creating a complex recursive structure, RLP-encoding it, and hashing each node with Keccak256. 3. computing `receipts_msh` entails elliptic curve arithmetic, but goldilocks is too small to directly embed secure curve. These are difficult to meet with a fixed-size SNARK. The first point because the length of receipts can vary *drastically* - some can be as small as 300-something bytes, others can be almost 200KB. The second point presents an even larger challenge because of the nature of the computation - it requires highly complex control flow, including recursion and lots of branching logic. Instead, we use STARKs to write state machines for each of the components, and then we wrap their proofs in a SNARK that matches the interface for `RBlockSNARK`. We use cross-table lookups to then "link" together the components similar to how [Hermez does this for their zkEVM](https://docs.hermez.io/zkEVM/zk-Tooling/PIL/Modularity/). To address the third problem, we implement our MSH over an elliptic curve called EcGFp5 whose base field is a degree-5 extension field of goldilocks we refer to as `GF(p^5)` to avoid non-native arithmetic. The current architceture contains the following STARKs: * `ro_memory`: a STARK that checks semantics of a read-only memory - this more or less does offline memory checking in a STARK * `stack`: a STARK that checks semantics of a stack memory - this more or less does offline memory checking in a STARK, but specifically for push/pop semantics. * `keccak_f`: a STARK for the keccak permutation function used by `keccak256` * `keccak256_sponge`: a STARK that computes `keccak256` hashes by operating as a sponge using the `keccak_f` STARK as the underlying permutation. * `keccak256_memory`: a STARK that reads encoded byte strings from a the output (read-only) memory of the `rlp` STARK, hashes them with `keccak256_sponge`, and "writes" the output to another read-only memory called the `hash_memory` * `rlp`: a STARK that reads structured input data from a memory with a self-describing binary format, recursively traverses it using a `stack` STARK, and "writes" the output to another read-only memory called the `rlp_output_memory`. Each byte string written to this memory is read by `keccak256_memory`. * `idx_keyed_trie`: a STARK that reads from an instance of `ro_memory` called the `receipt_memory` that contains the list of receipts and "writes" the recursive node structure into the RLP STARK's memory while reading intermediate hashes from the `hash_memory`, checking that its first public input - `receipts_root` - is equal to the hash of the root node in`hash_memory` and counting the number of receipts and exposing it as the second public input - `num_receipts` * `poseidon`: a STARK implementing the `poseidon` permutation similar to `keccak-f` * `poseidon_sponge`: a STARK that implements a sponge over `poseidon`, similar to `keccak256_sponge` * `poseidon_memory`: a STARK that reads each receipt `R_i` from memory and uses `poseidon_sponge` to hash `(i, R_i)` to an array of five goldilocks field elements, which can directly be interpreted as an element of `GF(p^5)`. It "writes" this to an `ro_memory` we call `field_hash_memory`. * `ecgfp5_vartime`: a STARK that implements curve addition and scalar multiplication for EcGFp5. * `ecgfp5_hash_to_curve`: a STARK that hashes an element of `GF(p^5)` to point on EcGFp5 in a manner suitable for use as a random orcale (necessary for MSH to be secure) as defined [here](https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-hash-to-curve-07#section-3). * `multiset_hasher`: a STARK that reads elements of `GF(p^5)` elements from `field_hash_memory`, uses `ecgfp5_hash_to_curve` to hash them to curve points, and accumulates them into a multiset hash using `ecgfp5_vartime` for arithmetic. In total, there are 18 STARK instances linked together with cross-table lookups (many of them are instances of memories and stacks). Here's a diagram of the STARKs involved and which ones are "linked" together via cross-table lookups, with arrows indicating the direction of the lookups ("looked" STARK at the base of the arrow, "looking" STARK at the head): ![](https://i.imgur.com/MiA8peV.png) ## Current State As of now, the receipt ingress is only partially completed. The following component STARKs have been implemented: * `xor` * `ro_memory` * `stack` * `keccak_f` * `keccak256_sponge` * `rlp` * `ecgfg5_vartime` These STARKs are implemented atop a fork of `starky` which (for now) is just called `starky2`. `starky2` adds some additional features to `starky`, namely gadgets for `ConstraintConsumer` and a generic implementation of cross table lookups (CTLs) via [RapidUp](https://eprint.iacr.org/2022/1050), a multi-domain lookup argument argument that recently came out of polygon. `starky2` can be found at https://github.com/proxima-one/plonky2/tree/starky2/starky2. See the readme for an explanation of how to write STARKs and how to use CTLs. Component STARKs can be found at https://github.com/proxima-one/plonky2/tree/starky2/starky2/src/starky2lib. See the README for the relevant STARKs for a high-level description of how they work and the "shape" of CTLs they expect. ## What Must Be Done We split this section into two parts: 1. What must be done to finish the receipt ingress operator 2. What must be done to do something useful ### For ingress As mentioned above, the ingress is only partly finished. The following still has happen to finish it: 1. Remaining component STARKs must be implemented * `keccak256_memory` * `idx_keyed_trie` * `poseidon_sponge` * `multiset_hasher` * `ecgfp5_to_curve` 2. Recursion circuitry for cross-table lookups must be implemented 3. Recursive constraints for all of the component STARKs must be written 4. `CtlStark` and `AllStark` must be implemented to for the full multi-STARK construction 5. A plonky2 SNARK implementing `RBlockSNARK` that verifies the `AllStark` must be written 6. Plumbing: * use an RPC client to fetch receipts and block headers * set up recursion pipeline ### For something useful Once the ingress is done, we have a prover that generates proofs of the form required by `RIngressSNARK`. Then, we can implement something useful by figuring out what chain of operators we need, and then implementing the scheme above, which entails implementing: * `CompSNARK` * `OpSNARK(op)` for every operator `op` one wishes to support In principle, any prover we write that satisfies those interfaces should work. In practice, there's two strategies that might be sensible: 1. different circuit for every operator 2. use a Cairo-like VM The first approach has greater complexity, as we need `CompSNARK` to aggregate a non-fixed set of inner circuits. It also more difficult to hand-write circuits than assembly (or even a higher level language) targeting a VM. It is also easier to audit / formally verify a small constant number of VM circuits than a constantly growing number of operator-specific circuits. Therefore we consider the second approach to be more suitable. To make the second approach work, we need to be able to "hash" the bytecode for `op_chain_digest`. That is, instead of using the inner circuit digest in `CompSNARK`, we the bytecode hash by exposing it as a public input `OpSNARK(op)` and using that instead. We also need to read/write messages to/from the VM. Therefore we define an operator's prover as a multi-STARK construction with the following STARKs: 1. An input message read-only memory 2. An output message read-only memory 3. A read-only memory for bytecode 4. A read-write Scratch memory 5. A VM that reads from / writes to the output memory 6. A poseidon sponge that hashes the bytecode, input memory, and output memory 7. an instance `ecgfp5_to_curve` that hashes the input and output memory's hashes to curve elements, which happens to be the single-element multiset hash for the input/output message. Luckily, we already have an implementation of a VM very similar to Cairo (with a few minor differences) over an old version of starky `starky`. However, some modifications will need to be made to link it together with the above architecture. (see the appending for more information) Once the above construction is implemented (for which all STARKs would have already been implemented, the VM just has to change), `OpSNARK(op)` can be constructed as a plonky2 verifier for the VM construction and its CTLs that exposes the bytecode hash as an additional public input. Since we use the same set of STARKs and CTLs for every operator, we actually only need to implement it once. Then `CompSNARK` can be implemented with the same interface as above as a cyclicallly recursive `plonky2` SNARK using the bytecode hash instead of the circuit digest to check `op_chain_digest`. ## Appendix ### `starky` Cairo implementation The prototype Cairo implementation can be found here: https://github.com/proxima-one/aincrad. Currently, it's built off of the `maru-stark` branch of the proxima fork of plonky2 (https://github.com/proxima-one/plonky2/tree/maru-stark). It relies on a public memory argument (see the Cairo paper linked below) that's not supported in starky directly, so I more or less had to separately implement it on top of `starky`. The way it works is there's a thing called the "runner" that runs machine code and generates the STARK trace. It then feeds the trace into a STARK implementation to generate proofs. The runner can be found here: * https://github.com/proxima-one/aincrad/blob/runner/runner/src/lib.rs The STARK can be found here: * https://github.com/proxima-one/aincrad/tree/runner/air It also comes with a prototype assembler, which can be found here: * https://github.com/proxima-one/aincrad/tree/runner/assembler * You can see an example fibonacci program here: https://github.com/proxima-one/aincrad/blob/main/assembler/test_asm/fib.aincrad To make it ready to use as a building block, the following needs to be done: 1. Make it use the `starky2` branch instead of `maru-stark` so that it shares the same underlying implementation as the rest of the STARKs. 2. The `starky2` branch includes an implementation of Cairo's read-only memory argument, but not the public memory extension yet, so we need to add that if we wish to make certain memory cells public inputs. * In principle this isn't *strictly* necessary. You could dedicate fixed memory addresses to contain the input / output multiset hashes and add constraints to check they're equivalent to the multiset hashes given as public inputs. 3. All the accompanying STARKs described above need to be assembled alongsides the VM STARK via cross-table lookups. 4. (Optional, but probably a good idea) Currently `aincrad` isn't exactly the same as Cairo. Making it equivalent to Cairo might allow us to use the Cairo language and its corresponding toolchain to write operators, if it's possible. ### Submitting / Verifying Proofs On-chain Depending on the destination chain, this might look different. On EVM chains, plonky2 proof verification is expensive due to their size and the lack of precompiles for the field. So instead the best option currently seems to be wrapping a plonky2 proof in a groth16 proof. There's a prototype implementation that some have been experimenting with here: * https://github.com/polymerdao/plonky2-circom Depending on the destination chain this might look different. For a cosmos chain, you might just want to write a module that runs the existing plonky2 verifier. For solana, you'll have split it up into many transactions, and you'll probably want to add a syscall for SHAKE operations (generating a lot of entropy by "squeezing" a really long mesage out of a sponge). ### Various source material * Keccak and Sponge Construction: https://keccak.team/sponge_duplex.html * RLP Encoding: https://ethereum.org/en/developers/docs/data-structures-and-encoding/rlp/ * Goldilocks-native elliptic curve: https://github.com/pornin/ecgfp5 * Hash-to-curve: https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-hash-to-curve-07