A fast-forwardable Witnet superblock chain without introducing any new cryptographic primitives (approach #1)

# A fast-forwardable Witnet superblock chain without introducing any new cryptographic primitives (approach #1) > This is a follow up to [this other document](https://hackmd.io/uUbOoiurT6asFXuhk32GMQ) outlining a mechanism for fast-forwarding a Witnet superblock chain using polynomial vector commitments and Verkle trees. ## Abstract Mechanisms for fast-forwarding a Witnet superblock chain have been proposed to use novel cryptographic primitives such as vector commitments and Verkle trees. This document tries to explore the feasibility of constructing a similar mechanism solely using well-proven primitives that already exist in the Witnet protocol, namely, Merkle trees. ## Constraints The constraints for this analysis are the same as for the former document using Verkle trees, albeit this adds another one: - No new cryptographic primitive should be introduced. ## Welcome merkle addresses To achieve the goals of this proposal, the concept of *merkle addresses* is introduced. Merkle addresses build on the idea of merkle indexes, i.e. the position of a leaf node inside the deepest level of a perfectly balanced merkle tree. However, we will be using unbalanced trees, and hence need a system for addressing leaf nodes that may exist not only in the deepest level of a merkle tree but also on other intermediary levels, in a sort of 2-dimensional way. While merkle indexes are static (appending new leaves to the tree will not change the index of existing leaves), merkle addresses can change over time as the addition of new leaves grow the number of levels of the tree. This happens because addresses encode what is the path from an arbitrary tree node to the tree root. As the addition of leaf nodes can make the root "climb" to a new level, the paths encoded in addresses do change accordingly. ### Bidimensional encoding of merkle addresses In their most straightforward and readable format, merkle addresses can be expressed as a 2-item tuple in which the first item is the level in which a node is (as counted from the tree root, being `0` the 2-items first layer below the tree root); and the position that it occupies inside that level (following the direction of merkle concatenation). ``` BAddress(n) = (Layer(n), Index(n)) ``` ### Compact encoding of merkle addresses When working in code, merkle addresses can be expressed in a compact numeric / binary form by adding `Index(n)` to `2^Layer(n)`: ``` CAddress(n) = 2^Layer(n) + Index(n) ``` This will result in a number whose binary representation immediately tells us about the path from the root to the node (whose address is `0`). For example, merkle address `7` (`0b111`) tells us that from the root to that leaf, we need to go twice to the right (`0` is left, `1` is right). As another example, merkle address `20` (`0b10100`) tells to go left, right, left, left. As a consequence, the number of significant bits in this representation of the merkle address is always `Level(n) + 1`, as the first bit is always a `1`, used to mark where the significant bits actually start. Decoding a merkle address into its bidimensional form is equally straightforward: ``` Layer(n) = floor(log2(CAddress)) Inex(n) = Caddress - 2^Layer(n) ``` Hence merkle address `7` translates to `(2, 3)`, and `20` translates to `(4, 4)`. ### Merkle tree engrafting TO BE DEVELOPED. > Some notes: in Stampery we called this "stitching". This technique allows to merge together pre-computed merkle trees into a bigger meta-tree, and deriving merkle indexes on the fly. In the same way, we should also be able to derive updated merkle addresses when stitching trees). ## Data structures and identifiers For making this construct possible, superblocks and their identifiers need to be constructed in a particular way. Namely, what really matters is how a superblock points to older superblocks and to the blocks contained in itself. ### Superblocks The content of a superblock includes the identifier of the 10 blocks that it confirms, plus the merkleization of the identities in its voting committee: | Merkle address | Name | Description | | -------------- | ----- | ---------------------------------------------------------- | | 7 | `c` | The identifier for the signing committee of superblock `n` | | 16 | `b_0` | The block identifier for epoch `n*10` | | 17 | `b_1` | The block identifier for epoch `n*10+1` | | 18 | `b_2` | The block identifier for epoch `n*10+2` | | 19 | `b_3` | The block identifier for epoch `n*10+3` | | 20 | `b_4` | The block identifier for epoch `n*10+4` | | 21 | `b_5` | The block identifier for epoch `n*10+5` | | 22 | `b_6` | The block identifier for epoch `n*10+6` | | 23 | `b_7` | The block identifier for epoch `n*10+7` | | 13 | `b_8` | The block identifier for epoch `n*10+8` | | 14 | `b_9` | The block identifier for epoch `n*10+9` | To derive the identifier for a superblock (`R_n`), a merkle tree is constructed replicating the merkle addresses above: | `b_0` | `b_1` | `b_2` | `b_3` | `b_4` | `b_5` | `b_6` | `b_7` | | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | | `b_0·b_1` | `b_2·b_3` | `b_4·b_5` | `b_6·b_7` | `b_8` | `b_9` | | --------- | --------- | --------- | --------- | ----- | ----- | | `(b_0·b_1)·(b_2·b_3)` | `(b_4·b_5)·(b_6·b_7)` | `b_8·b_9` | `c` | | --------------------- | --------------------- | --------- | --- | | `((b_0·b_1)·(b_2·b_3))·((b_4·b_5)·(b_6·b_7))` | `(b_8·b_9)·c` | | --------------------------------------------- | ------------- | | `R_n = (((b_0·b_1)·(b_2·b_3))·((b_4·b_5)·(b_6·b_7)))·((b_8·b_9)·c)` | | ------------------------------------------------------------------- | When using merkle trees in their *bubble-up* flavor (if a leaf has not a sibling to hash with, promote it to the next level towards the root), `committee`, `b_8` and `b_9` can take multiple mekle addresses, but its smalles value is always preferred. For example, `b_8` could also take address `24`. ### Superchain state One key feature of this proposal is the merkleization of the superblock chain (aka *superchain*) into a single root `S_n` that is a cryptographic commitment of every superblock and block that has existed prior to superepoc `n`. #### Genesis superchain Given a genesis superblock `R_0`, the superchain state `S_0` equals `R_0`. #### Appending to the superchain Given a first genesis superblock `R_0`, and a subsequent superblock `R_1`, the superchain state `S_1` will equal the merkleization of `R_0` and `R_1`. #### Generalization For any given superblock `R_n`, the superchain state `S_n` will equal the merkleization of every prior superblock. For example, `S_9` equals `(((R_0·R_1)·(R_2·R_3))·((R_4·R_5)·(R_6·R_7)))·(R_8·R_9)`. Because of the nested nature of this merkleization, all prior superchain states are contained within the superchain tree, and hence, `S_9` also equals `S_7·(R_8·R_9)`. ### Signing committee identifiers The identities in signing committees need to be merkleized in a similar way to how the superblocks themselves are mekleized. That is, they need to be added as leaves in the deepest level of an unbalanced merkle tree with 7 levels (as `ceil(log2(100)` equals 7). The root of that tree, `C_n` will be the signing committe identifier for superepoch `n`. Hence, if using a bubble-up merkle tree, the merkle addresses for identities `0..63` will be encoded with level `6`, while identities `64..99` can be more efficiently encoded with level `5`. ## Superblock endorsements Superblock endoresements need to operate in this proposal just in the same way they do in the former [Verkle trees powered](https://hackmd.io/uUbOoiurT6asFXuhk32GMQ) proposal. That is, members of past superblock voting comittees are expected to produce and broadcast signatures—called *endorsements* in this context—for some specific future superblocks. While those signatures are not counted for the in-Witnet superblock consensus, and rather only for superchain fast-forwarding, it remains to be explored if we could also use them as a recovery mechanism in cases of huge ARS evictions and other black swan events. However, unlike the former proposal, the *powers of two* requirement is not compulsory here. Any other committee recycling period will work just as good, with the only trade-off of latency in case that the amount of superblocks that need to be fast-forwarded cannot be factorized with the chosen periods. Therefore, it is still recommended to implement it using powers of two, as any integer number can be factorized as the addition of a non-repeating sequence of those (which is exactly how you convert any integer number into binary). ### Degrees of endorsements For better comprehension, here we introduce the concept of degrees of endorsement. An `Nth` degree endorsement of some superchain state `S_n` is that produced by an identity belonging to the superblock voting committee of `R_(n-2^N)`. That is, a 1st degree endorsement is produced by an identity from 2 superblocks back, a 2nd degree endorsement is 4 superblocks old, and so forth. So imagine that we need to prove superchain state `S_7` to a client whose latest confirmed superchain state is `S_3`. As `S_3` contains a commitment to `C_3`, and `S_7` also commits to `S_3` and every superchain state in between, we can proof `S_7` provided that the identities in `C_3` are producing 2nd degree endorsements of `S_7`. In a more complex scenario in which for example we want to go from `S_3` to `S_9`, as `S_9` commits to `S_7`, we can use the 2nd degree endorsements of `S_7` from the identities in `C_3` to advance to `S_7`, and potentially in the same trasanction, use the 1st degree endorsements of `S_9` from the identities in `C_7` In order to limit the bandwith taken by endorsements, there needs to be a maximum number of degrees of endorsement. While the former [Verkle trees powered](https://hackmd.io/uUbOoiurT6asFXuhk32GMQ) proposal used 16 of them, it is here suggested to use only 10, which allow for 5-days single-step fast forwarding, while keeping the in-memory data structures more manageable. ### Aggregation of endorsements Taking into account the limitations of the clients validating proofs and endorsements generated by this protocol (e.g. an EVM smart contract), endorsements should use a signature scheme that allows aggregation. As in all former proposals, it is suggested to use BLS for that purpose. With BLS, the size of the element of proof that needs to be validated remains constant, regardless of the number of individual endorsements that we are aggregating. However, in order to validate the aggregate endorsement, the BLS public keys of the endorsers need to be known. The disclosure of those public keys is indeed not aggregable (because of *existential forgery*), and its size is actually linear to the number of endorsers. Luckily, we can leverage the clever math behind BLS, and instead of providing the public keys of the endorsers (between 67 and 100), we can choose to provide those of the non-endorsers (between 0 and 33) and then perform the validation using the complementary of the aggregated BLS public key of those non-endorsers (which happen to equal the aggregated BLS public key of the actual endorsers). Under the assumption that the failure rate for endorsements will be only slightly superior to that of superblock votes, most often we would only need to disclose some ~5 public keys. ## Merkle multiproofs There is now significant prior art on the topic of proving multiple leaves in a merkle tree using the most compact proof possible. That is, presenting the smallest number of nodes from the tree that act as "siblings" of the leaves that we are trying to prove. It is worth mentioning [Champine](https://gitlab.com/NebulousLabs/merkletree/-/commit/bae5b547495f9dddbd4ddeecfdcb00cb89d99d76), [Wuille](https://github.com/bitcoin/bitcoin/commit/4bedfa9223d38bbc322d19e28ca03417c216700b), [McDonald](https://www.wealdtech.com/articles/understanding-sparse-merkle-multiproofs/) and specially, [Ramabaja et. al](https://arxiv.org/pdf/2002.07648.pdf). Merkle multiproofs have even been recently implemented in the famous Solidity [OpenZeppelin](https://github.com/OpenZeppelin/openzeppelin-contracts/blob/82a63f63890d8b136478158ca12346d396154855/contracts/utils/cryptography/MerkleProof.sol#L59-L64) library. Merkle multiproofs are perfect for the use case of fast forwarding, specially when going from a superchain state to another in multiple "jumps" because the difference in superepochs between those two is not a power of two. However, [Ramabaja et. al](https://arxiv.org/pdf/2002.07648.pdf) operates on the assumption that all merkle trees are perfectly balanced, and that all leaves to be proven lay in the deepest level of the tree. This is not a big deal thanks to merkle addresses. In our version of merkle multiproofs, instead of presenting the merkle indexes of the sibling / complementary nodes, we just need to present the merkle addresses, which do not take much more bytespace (arguibly, they can be encoded into a single byte for trees of up to 8 levels, 2 bytes for 16 levels, etc.). Then the obvious question is what nodes exactly need to be proven for convincing a client that a subsequent superchain state is legitimate, given a previously validated superchain state that predates the one to be proven. ## Proving and validating algorithm Foreseeably, the sizes of the proofs—and the computation cost of their validation—are going to be mostly linear with the number of "jumps", i.e. whether the superepoch difference between a pre-validated superchain state and the target state is itself a power of 2, or on the contrary, factorization is needed to turn that into the addition of the lesser number of powers of 2. ### Single jump case The *single jump* case is that in which we want to fast-forward from a proven superchain state `S_n` to a subsequent state `S_m = S_(n + 2 ^ x)`. That is, the difference between the two superepochs is a natural power of two, and hence, we can collect endorsements for `S_m` from the committeee of `S_n`. #### Facts to be proven The facts to be proven are: - Superchain state dentifier `S_m` MUST be the root of the merkle tree. - Superblock identifier `R_n` MUST exist in the merkle tree. - Committee identifier `C_n` MUST exist in the merkle tree. - Position of all the previous elements in the tree MUST be as expected, to avoid some potential malleability attacks (yet to be confirmed). As you can see, `S_n` is not proved, because `S_n` is disposable for all `n != 2 ^ x - 1`. #### Elements of proof Given a capable client of this protocol, the *elements of proof* are pieces of data that are not previously known to the client, and that we need to provide in order to convince it of fast-forwarding from `R_n` to `R_m` - `n` - `m` - `R_m` - `C_n` - TBC... #### Proving algorithm #### Elements to store