A fast-forwardable Witnet superblock chain without introducing any new cryptographic primitives (approach #2)

# A fast-forwardable Witnet superblock chain without introducing any new cryptographic primitives (approach #2) > This is a follow up to [this other document](https://hackmd.io/uUbOoiurT6asFXuhk32GMQ) outlining a mechanism for fast-forwarding a Witnet superblock chain using polynomial vector commitments and Verkle trees. ## Abstract Mechanisms for fast-forwarding a Witnet superblock chain have been proposed to use novel cryptographic primitives such as vector commitments and Verkle trees. This document tries to explore the feasibility of constructing a similar mechanism solely using well-proven primitives that already exist in the Witnet protocol, namely, Merkle trees. ## Constraints The constraints for this analysis are the same as for the former document using Verkle trees, albeit this adds another one: - No new cryptographic primitive should be introduced. ## Welcome merkle addresses To achieve the goals of this proposal, the concept of *merkle addresses* is introduced. Merkle addresses build on the idea of merkle indexes, i.e. the position of a leaf node inside the deepest level of a perfectly balanced merkle tree. However, we will be using unbalanced trees, and hence need a system for addressing leaf nodes that may exist not only in the deepest level of a merkle tree but also on other intermediary levels, in a sort of 2-dimensional way. While merkle indexes are static (appending new leaves to the tree will not change the index of existing leaves), merkle addresses can change over time as the addition of new leaves grow the number of levels of the tree. This happens because addresses encode what is the path from an arbitrary tree node to the tree root. As the addition of leaf nodes can make the root "climb" to a new level, the paths encoded in addresses do change accordingly. ### Bidimensional encoding of merkle addresses In their most straightforward and readable format, merkle addresses can be expressed as a 2-item tuple in which the first item is the level in which a node is (as counted from the tree root, being `0` the 2-items first layer below the tree root); and the position that it occupies inside that level (following the direction of merkle concatenation). ``` BAddress(n) = (Layer(n), Index(n)) ``` ### Compact encoding of merkle addresses When working in code, merkle addresses can be expressed in a compact numeric / binary form by adding `Index(n)` to `2^Layer(n)`: ``` CAddress(n) = 2^Layer(n) + Index(n) ``` This will result in a number whose binary representation immediately tells us about the path from the root to the node (whose address is `0`). For example, merkle address `7` (`0b111`) tells us that from the root to that leaf, we need to go twice to the right (`0` is left, `1` is right). As another example, merkle address `20` (`0b10100`) tells to go left, right, left, left. As a consequence, the number of significant bits in this representation of the merkle address is always `Level(n) + 1`, as the first bit is always a `1`, used to mark where the significant bits actually start. Decoding a merkle address into its bidimensional form is equally straightforward: ``` Layer(n) = floor(log2(CAddress)) Inex(n) = Caddress - 2^Layer(n) ``` Hence merkle address `7` translates to `(2, 3)`, and `20` translates to `(4, 4)`. ## Merkle tree engrafting > In Stampery we called this "stitching". This technique allows to merge together pre-computed merkle trees into a bigger meta-tree, and deriving merkle indexes on the fly. In the same way, when we "engraft" a tree into another one, we are also able to efficiently derive updated merkle addresses for each of the nodes in the engrafted tree, based in the position that its root will take in the receiving tree. Given a pre-built tree, we can always engraft it into another tree without having to actually nest those two data structures by inserting the root of the tree being engrafted as a leaf into the receiving tree. As a consequence, may the root of the engrafted tree be inserted as a leaf at merkle address `(A, B)` in the receiving tree, a leaf from the engrafted tree that used to occupy address `(C, D)` will now be found at `(A + C, B * C ^ 2 + D)` in the receiving tree. ## Data structures and identifiers For making this fast-forwarding protocol possible, superblocks and their identifiers need to be constructed in a particular way. Namely, what really matters is how a superblock points to older superblocks and to the blocks contained in itself. ### Superblocks The content of a superblock includes the identifier of the 10 blocks that it confirms, plus the merkleization of the identities in its voting committee: | Merkle address | Name | Description | | -------------- | ----- | ---------------------------------------------------------- | | 7 | `c` | The identifier for the signing committee of superblock `n` | | 16 | `b_0` | The block identifier for epoch `n*10` | | 17 | `b_1` | The block identifier for epoch `n*10+1` | | 18 | `b_2` | The block identifier for epoch `n*10+2` | | 19 | `b_3` | The block identifier for epoch `n*10+3` | | 20 | `b_4` | The block identifier for epoch `n*10+4` | | 21 | `b_5` | The block identifier for epoch `n*10+5` | | 22 | `b_6` | The block identifier for epoch `n*10+6` | | 23 | `b_7` | The block identifier for epoch `n*10+7` | | 13 | `b_8` | The block identifier for epoch `n*10+8` | | 14 | `b_9` | The block identifier for epoch `n*10+9` | To derive the identifier for a superblock (`R_n`), a merkle tree is constructed replicating the merkle addresses above: | `b_0` | `b_1` | `b_2` | `b_3` | `b_4` | `b_5` | `b_6` | `b_7` | | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | | `b_0·b_1` | `b_2·b_3` | `b_4·b_5` | `b_6·b_7` | `b_8` | `b_9` | | --------- | --------- | --------- | --------- | ----- | ----- | | `(b_0·b_1)·(b_2·b_3)` | `(b_4·b_5)·(b_6·b_7)` | `b_8·b_9` | `c` | | --------------------- | --------------------- | --------- | --- | | `((b_0·b_1)·(b_2·b_3))·((b_4·b_5)·(b_6·b_7))` | `(b_8·b_9)·c` | | --------------------------------------------- | ------------- | | `R_n = (((b_0·b_1)·(b_2·b_3))·((b_4·b_5)·(b_6·b_7)))·((b_8·b_9)·c)` | | ------------------------------------------------------------------- | When using merkle trees in their *bubble-up* flavor (if a leaf has not a sibling to hash with, promote it to the next level towards the root), `committee`, `b_8` and `b_9` can take multiple mekle addresses, but its smalles value is always preferred. For example, `b_8` could also take address `24`. ### Superchain state One key feature of this protocol is the merkleization of the superblock chain (aka *superchain*) into a single root `S_n` that is a cryptographic commitment of every superblock and block that has existed prior to superepoc `n`. #### Genesis superchain For any given superblock `R_n`, the superchain state `S_n` commits to every superblock with a superepoch number that differs in a natural power of two. | Merkle address | Name | Description | | -------------- | ------------ | ------------------------------------------ | | 15 | `R_n` | The identifier for the superblock `n` | | 16 | `S_(n-1024)` | Superchain state for superepoch `n - 1024` | | 17 | `S_(n-512)` | Superchain state for superepoch `n - 512` | | 18 | `S_(n-256)` | Superchain state for superepoch `n - 256` | | 19 | `S_(n-128)` | Superchain state for superepoch `n - 128` | | 20 | `S_(n-64)` | Superchain state for superepoch `n - 64` | | 21 | `S_(n-32)` | Superchain state for superepoch `n - 32` | | 22 | `S_(n-16)` | Superchain state for superepoch `n - 16` | | 23 | `S_(n-8)` | Superchain state for superepoch `n - 8` | | 12 | `S_(n-4)` | Superchain state for superepoch `n - 4` | | 13 | `S_(n-2)` | Superchain state for superepoch `n - 2` | | 14 | `S_(n-1)` | Superchain state for superepoch `n - 1` | Because of the nested nature of this merkleization, all prior superchain states are contained within the superchain tree. That is, you can construct a merkle tree that connects `S_n` to any former superchain state in a few "jumps". E.g. `S_(n-200)` is contained as `S_(n-8)` within `S_(n-192)`, which in turn is `S_(n-64)` within `S_(n-128)`, which is finally part of `S_n`. Hence, `S_n` commits to `S_(n-200)` with only 13 hashing operations. ### Signing committee identifiers The identities in signing committees need to be merkleized in a similar way to how the superblocks themselves are mekleized. That is, they need to be added as leaves in the deepest level of an unbalanced merkle tree with 7 levels (as `ceil(log2(100)` equals 7). The root of that tree, `C_n` will be the signing committe identifier for superepoch `n`. Hence, if using a bubble-up merkle tree, the merkle addresses for identities `0..63` will be encoded with level `6`, while identities `64..99` can be more efficiently encoded with level `5`. ## Superblock endorsements Superblock endoresements need to operate in this proposal just in the same way they do in the former [Verkle trees powered](https://hackmd.io/uUbOoiurT6asFXuhk32GMQ) proposal. That is, members of past superblock voting comittees are expected to produce and broadcast signatures—called *endorsements* in this context—for some specific future superblocks. While those signatures are not counted for the in-Witnet superblock consensus, and rather only for superchain fast-forwarding, it remains to be explored if we could also use them as a recovery mechanism in cases of huge ARS evictions and other black swan events. However, unlike the former proposal, the *powers of two* requirement is not compulsory here. Any other committee recycling period will work just as good, with the only trade-off of latency in case that the amount of superblocks that need to be fast-forwarded cannot be factorized with the chosen periods. Therefore, it is still recommended to implement it using powers of two, as any integer number can be factorized as the addition of a non-repeating sequence of those (which is exactly how you convert any integer number into binary). ### Degrees of endorsements For better comprehension, here we introduce the concept of degrees of endorsement. An `Nth` degree endorsement of some superchain state `S_n` is that produced by an identity belonging to the superblock voting committee of `R_(n-2^N)`. That is, a 1st degree endorsement is produced by an identity from 2 superblocks back, a 2nd degree endorsement is 4 superblocks old, and so forth. So imagine that we need to prove superchain state `S_7` to a client whose latest confirmed superchain state is `S_3`. As `S_3` contains a commitment to `C_3`, and `S_7` also commits to `S_3` and every superchain state in between, we can proof `S_7` provided that the identities in `C_3` are producing 2nd degree endorsements of `S_7`. In a more complex scenario in which for example we want to go from `S_3` to `S_9`, as `S_9` commits to `S_7`, we can use the 2nd degree endorsements of `S_7` from the identities in `C_3` to advance to `S_7`, and potentially in the same trasanction, use the 1st degree endorsements of `S_9` from the identities in `C_7` In order to limit the bandwith taken by endorsements, there needs to be a maximum number of degrees of endorsement. While the former [Verkle trees powered](https://hackmd.io/uUbOoiurT6asFXuhk32GMQ) proposal used 16 of them, it is here suggested to use only 10, which allow for 5-days *single-jump* and 10 days *multi-jump* fast forwarding, while keeping the in-memory data structures more manageable. May a longer fast-forward be needed, it can still be performed in several stances or transactions. ### Aggregation of endorsements Taking into account the limitations of the clients validating proofs and endorsements generated by this protocol (e.g. an EVM smart contract), endorsements should use a signature scheme that allows aggregation. As in all former proposals, it is suggested to use BLS for that purpose. With BLS, the size of the element of proof that needs to be validated remains constant, regardless of the number of individual endorsements that we are aggregating. However, in order to validate the aggregate endorsement, the BLS public keys of the endorsers need to be known. The disclosure of those public keys is indeed not aggregable (because of *existential forgery*), and its size is actually linear to the number of endorsers. Luckily, we can leverage the clever math behind BLS, and instead of providing the public keys of the endorsers (between 67 and 100), we can choose to provide those of the non-endorsers (between 0 and 33) and then perform the validation using the complementary of the aggregated BLS public key of those non-endorsers (which happen to equal the aggregated BLS public key of the actual endorsers). Under the assumption that the failure rate for endorsements will be only slightly superior to that of superblock votes, most often we would only need to disclose some ~5 public keys. ### Endorsement algorithm Potential endorsing identities / superblock voters need at all times to keep track of `S_n` for the last 1024 superblocks (assuming that the degrees of endorsements are limited to 10). At any superepoch `n`, all identities MUST check whether they were members of one or more committees `C_(n-2^x)` for all `x` in `[0, 10]`. (Note that this also includes the preceding supererpoch `n-1`, as there is no need to handle superblock votes and superblock endorsements differently at this point.) May they be members of any of those committee, they MUST: 1. Construct `S_n` as defined above. 2. Derive `M_n` by appending the big-endian byte representation of the endorsement degree and `S_n`. 3. Produce a BLS signature of `M_n`, `E_n`. 4. Broadcast `E_n` in a `SUPERBLOCK_VOTE` message. This endorsement procedure is reasonabily cheap, computation wise—as only 11 hash operations are needed to produce an endorsement. ## Merkle multiproofs There is now significant prior art on the topic of proving multiple leaves in a merkle tree using the most compact proof possible. That is, presenting the smallest number of nodes from the tree that act as "siblings" of the leaves that we are trying to prove. It is worth mentioning [Champine](https://gitlab.com/NebulousLabs/merkletree/-/commit/bae5b547495f9dddbd4ddeecfdcb00cb89d99d76), [Wuille](https://github.com/bitcoin/bitcoin/commit/4bedfa9223d38bbc322d19e28ca03417c216700b), [McDonald](https://www.wealdtech.com/articles/understanding-sparse-merkle-multiproofs/) and specially, [Ramabaja et. al](https://arxiv.org/pdf/2002.07648.pdf). Merkle multiproofs have even been recently implemented in the famous Solidity [OpenZeppelin](https://github.com/OpenZeppelin/openzeppelin-contracts/blob/82a63f63890d8b136478158ca12346d396154855/contracts/utils/cryptography/MerkleProof.sol#L59-L64) library. Merkle multiproofs are perfect for the use case of fast forwarding, specially when going from a superchain state to another in multiple "jumps" because the difference in superepochs between those two is not a power of two. However, [Ramabaja et. al](https://arxiv.org/pdf/2002.07648.pdf) operates on the assumption that all merkle trees are perfectly balanced, and that all leaves to be proven lay in the deepest level of the tree. This is not a big deal thanks to merkle addresses. In our version of merkle multiproofs, instead of presenting the merkle indexes of the sibling / complementary nodes, we just need to present the merkle addresses, which do not take much more bytespace (arguibly, they can be encoded into a single byte for trees of up to 8 levels, 2 bytes for 16 levels, etc.). Then the obvious question is what nodes exactly need to be proven for convincing a client that a subsequent superchain state is legitimate, given a previously validated superchain state that predates the one to be proven. ## Proving and validating algorithm Foreseeably, the sizes of the proofs—and the computation cost of their validation—are going to be mostly linear with the number of "jumps", i.e. whether the superepoch difference between a pre-validated superchain state and the target state is itself a power of 2, or on the contrary, factorization is needed to turn that into the addition of the lesser number of powers of 2. For the sake of consistency, it is in theory possible—and advisable—to implement the single-jump and the muti-jaup fast-forwarding procedures through only one method. However, may a client incurr in a non-nengible additional gas cost for performing a single-jump through the same method as multi-jumps, the creation of a single-jump dedicated method is recommended. ### Single jump case The *single jump* case is that in which we want to fast-forward from a proven superchain state `S_n` to a subsequent state `S_m = S_(n + 2 ^ x)`. That is, the difference between the two superepochs is a natural power of two, and hence, we can collect endorsements for `S_m` from the committeee of `S_n`. #### Facts to be proven - Superblock identifier `R_m` MUST exist inside `S_m` at merkle address `(3, 8) = 15`. - Committee identifier `C_n` MUST exist inside `S_m` at a specific merkle address[^1] depending on how many superepochs we are fast-forwarding: - For `m = n + 1`, merkle address `(3,6)(3,7)(2,3) = (8,247) = 503`. - For `m = n + 2`, merkle address `(3,5)(3,7)(2,3) = (8,211) = 467`. - For `m = n + 4`, merkle address `(3,4)(3,7)(2,3) = (8,175) = 431`. - For `m = n + 8`, merkle address `(4,7)(3,7)(2,3) = (9,283) = 795`. - For `m = n + 16`, merkle address `(4,6)(3,7)(2,3) = (9,247) = 759`. - For `m = n + 32`, merkle address `(4,5)(3,7)(2,3) = (9,211) = 723`. - For `m = n + 64`, merkle address `(4,4)(3,7)(2,3) = (9,175) = 687`. - For `m = n + 128`, merkle address `(4,3)(3,7)(2,3) = (9,139) = 651`. - For `m = n + 256`, merkle address `(4,2)(3,7)(2,3) = (9,103) = 615`. - For `m = n + 512`, merkle address `(4,1)(3,7)(2,3) = (9,63) = 579`. - For `m = n + 1024`, merkle address `(4,0)(3,7)(2,3) = (9,31) = 543`. - Each of the non-endorsing public keys MUST exist inside `C_n` at any merkle address, i.e. they MUST exist in `S_m` at an address within range `[C_n * 128, C_n * 128 + 100)`. - An aggregated endorsement `ΣE_n_m` is valid for the complementary of the aggregated public key `ΣK_n` of the non-endorsing identities `K_n[]` from `C_n`. > [^1] The merkle address of `C_n` is always the merkle address of the superchain identifier of `S_n` inside `S_m`, times `32`, plus `31`. #### Elements of proof Given a capable client of this protocol, the *elements of proof* are pieces of data that are not previously known to the client, and that we need to provide in order to convince it of fast-forwarding from `S_n` to `S_m`, getting `R_m` verified in its way. - An already validated superepoch, `n` - The target superepoch, `m` - The identifier of the superblock to be proven, `R_m` - The aggregated endorsement of `S_m` from identities in `C_n`, `ΣE_n_m`. - The public keys of the non-endorsing identities from `C_n`, `K_n[]`. - A merkle multiproof `P_n_m` that proves that `S_m` commits to `R_m` and `K_n[]`. #### Proving algorithm 1. Derive `S_m` as described above. 2. Collect and aggregate `ΣE_n_m`. 3. Collect `K_n[]`. 4. Compute `P_n_m`. 5. Present all these pieces to the verifying client. > TODO: elaborate on how to compute the multiproof and what merkle addresses actually need to be provided (as some of them are static or can easily be derived from `n` or `m`) The merkle multiproof `P_n_m` is composed of: - A vector containing the merkle address of each `K_n[]` within `S_m`. - A vector of merkle sibling nodes, equivalent to the `M` term in [Ramabaja et. al](https://arxiv.org/pdf/2002.07648.pdf) #### Validation algorithm The validation process goes like this: 1. Aggregate `K_n[]` into `ΣK_n`. 2. Compute the complementary of `ΣK_n`, aka `Σ!K_n`. 3. Derive `M_m` from `S_m` and as described above (read *endorsement algorithm*), using `log2(m - n)` as the endorsement degree. 4. Verify that `ΣE_n_m` is a valid BLS signature for message `M_m` and public key `Σ!K_n`. 5. Use the provided multiproof to verify that all keys in `K_n[]` exist in `C_n`, and that `S_m` also commits to `R_m`. ### Multi-jump case The *multi-jump* case is that in which we want to fast-forward from a proven superchain state `S_n` to a subsequent state `S_m` such that `floor(log2(m-n)) != log2(m-n)` . That is, the difference between the two superepochs is NOT a natural power of two, and hence, we CANNOT collect endorsements for `S_m` from the committeee of `S_n`. When *multijumping*, the minimum number of jumps is 2 and the maximum is 10. While a single-jump fast-forwarding transaction can only fast-forward up to 5 days and 8 hours, a 10-jumps multi-jump can fast-forward twice as long: up to a total of 10 days and 16 hours. This multi-jump mechanism is made possible by the fact that `S_m` is itself the root of a merkle tree in which the root of other merkle trees is engrafted. Therefore, the membership of identities to signing committees from multiple intermediary superepochs can be efficiently proven at once by using a single merkle multiproof does not necessarily grow linearly with the number of jumps𠅅—as most of the nodes in the merkle tree that leads to `S_m` are actually shared by the subtrees that are engrafted into it. However, as mentioned before, a different instance of `K_n[]` is needed for each of the jumps; and no aggregation whatsoever is possible because of concerns of signature forgery (abusing the math behind BLS to factor in some made-up "negative" keys that will cancel out or replace other legitimate keys). #### Elements of proof Again, the *elements of proof* are pieces of data that are not previously known to a client, and that we need to provide in order to convince it of fast-forwarding from `S_n` to `S_m`, getting one or more`R_m` verified in its way. ## High-level changes - The message to be signed in superblock votes and endorsements MUST include a first byte indicating the degree of endorsement (being `0x00` a regular superblock vote, `0x01` the 1st degree of endorsement, etc. up until `0x10`) - Superblock votes MUST be signed using BLS instead of `ECDSA-Secp256k1` (or both as in [WIP0006](https://github.com/witnet/WIPs/blob/master/wip-0006.md)) - Merkleization of superblocks and superchain states SHOULD use `keccak256` for cheaper gas cost - Needs MUST store and persist a circular buffer of 1024 superchain states - Nodes MUST compute endorsement eligibility at the beginning of every superepoch, produce endorsements and broadcast them as `SUPERBLOCK_VOTE` messages - Nodes MUST feed endorsement messages through JSONRPC subscriptions - A new bridge component MUST be created such that it does: - collect, store and persist endorsement messages by subscribing to a node - monitor instances of the block relay contract and determine when a fast-forward is needed (remember: moving 1 superepoch forward is also a fast-forward) - aggregate `ΣE_n_m` and compose the fast-forwarding proof - present the proof to the block relay contract