Merkle Trees - HackMD

# Merkle Trees ###### tags: `audit` ###### Authors: Suyash and Arijit **Disclaimer:** This documentation was written on September, 2021. It is intended to give readers a high-level understanding. The codebase is the canonical source of truth, and over time this document might fall behind the implementation details of the code. ## Introductory Theory A Merkle tree provides a way to cryptographically commit to a set of values. Further, it allows proving membership of a particular value stored in the Merkle tree. A Merkle tree with depth $d$ allows storing upto $2^d$ values and the membership proofs are of size $\mathcal{O}(d)$. We start by hashing each value in the set, and then keep hashing our way up the tree in pairs until we get to the root node. Let us look at a simple example: suppose the set of the values we wish to commit to is $(f_0, f_1, \dots, f_7)$ and $H, H'$ are collision-resistant hash functions. Note that $H$ and $H'$ could be the same hash functions. ![](https://hackmd.io/_uploads/Bkp8a5fKF.png) The leaves of the tree are shown in yellow boxes while the nodes are shown in blue circles. The depth of the tree is equal to number of steps required to reach to the root node from the leaves. Here, the depth is $d=3$. :::success 💡 If each left and right node in a given pair is marked with $0$ and $1$ respectively, on tracing the path from $f_4$ to the root (shown as dotted path), we get a binary number $0b100 \equiv 4$, which is nothing but the index of the leaf $f_4$ in the tree! ::: ### Merkle Proofs of Membership In the above example, the Merkle proof of inclusion of the value $\color{blue}{f_4}$ in the tree with root $R := h_0^{3}$ is given by the sister nodes $\left(\color{red}{H(f_5)}, \color{red}{h_3^{1}}, \color{red}{h_0^{2}}\right)$ of $f_4$. To verify the correctness of this merkle proof, one can check: $$ R \stackrel{?}{=} H'\left( \color{red}{h_0^{2}}, \ H'\left( H'\left(H(\color{blue}{f_4}), \color{red}{H(f_5)}\right), \ \color{red}{h_3^{1}} \right) \right). $$ Note that a merkle proof would *only* prove that a particular value is present as a leaf in the tree. To ensure that the leaf is present at the correct index, we need to keep a record if a given node is a left or a right node. We encode the left and right node information in the generator indices of the Pedersen hash function $H'$. We define the hash function $H' : \mathbb{F}_r^{2} \rightarrow \mathbb{F}_r$ as $$ H'(l,r) := (\mathsf{l}\cdot G_0 + \mathsf{r}\cdot G_1)_x $$ where $(G_0, G_1) \in \mathbb{G}_1^2$ are group generators and $(P)_x \in \mathbb{F}_r$ denotes the $x$-coordinate of the group element $P \in \mathbb{G}_1$. In the above figure, if we trace the generator indices on the dotted path starting from root to $f_4$, we get $(1,0,0) \in \mathbb{Z}_2^{3}$ which is equal to $4$ in decimal - the index of $f_4$ in the tree! #### Hash Paths in Aztec In Aztec, we refer to Merkle proofs as hash paths. A hash path is a more explicit representation of a Merkle proof. For example, the hash path from the leaf $f_4$ to the root node is: $$ \vec{h}(f_4) := \left\{ \left(H(f_4),H(f_5)\right), \ (h^1_2,h^1_3), \ (h^2_0,h^2_1) \right\} \in \mathbb{F}_r^{2d}. $$ Note that $\vec{h}[0]$ stores the left and right leaf values and $\vec{h}[1]$ and $\vec{h}[2]$ store the node values in the subsequent levels. :::success 💡 Adjacent leaves have the exact same hash paths. Furthermore, given the root of a subtree in a Merkle tree, each leaf in the subtree would have the same hash path starting from the subtree root! For example, in the following Merkle tree, consider the subtree with green leaves. For any green leaf, the hash path from subtree root $S$ is always going to be $\{(S, h_1), \ (h_2^0, h_2^1)\}$. Also, if we are updating the subtree exclusively, the nodes $(h_1, h_2^0)$ won't change. ![](https://hackmd.io/_uploads/H1TuTTEtF.png) ::: ### Sparse Merkle Trees and Non-Membership Proofs Merkle trees are very convenient for proving memberships, but proving *non-membership* is not trivial using simple Merkle trees. A sparse merkle tree is a merkle tree with indexed data. This means that a particular value can be inserted only at a particular index in the merkle tree. For example, consider the following tree which can store the letters $(A, B, \dots, P)$ in its leaves in the same index as they appear in the alphabet. ![](https://hackmd.io/_uploads/Sk8ntwJiO.png) In the current state, only the letters $A, F, N$ are inserted into the merkle tree at indices $0, 5$ and $13$ respectively. Rest of the leaf values are $\phi$ (null, shown in grey). Suppose we wish to check if $J$ isn't already inserted in the tree. In that case, a merkle proof that the leaf at index $10$ is empty would suffice! This becomes the non-membership proof of the letter $J$ in this example. ### Merkle Trees in Aztec In Aztec 2.0, we use Merkle trees to store the state of the system. Particularly, we use the data tree and the nullifier tree: | | Data Tree | Nullifier Tree | | --- | ----------- | ------------------ | | **Type** | Merkle tree | Sparse Merkle tree | | **Depth** | $32$ | $256$ | | **Description** | Stores all the notes ever *created* in Aztec | Keeps a record of the notes in Aztec which are *spent* | | **Usage** | Membership proofs to prove the ownership of notes before spending | Non membership proofs to prove that a note isn't already spent | | **Updates** | Supports batch updates | Cannot support batch updates | ## Implementation The Merkle tree implementation in Barretenberg's `stdlib` consists primarily of three parts: - **Memory tree**: Preliminary Merkle tree implementation without optimizations, - **Merkle tree**: Optimized Merkle tree implementation using levelDB and memory store, - **Memberships**: Circuit-optimized verification of state updates of Merkle trees. The memory and merkle tree submodules are used by the rollup providers to update the data and nullifier trees and also compute membership and non-membership proofs (i.e. hash paths). The rollup and the root rollup circuits use the memberships submodule to verify if the state was correctly updated. ### Memory Tree In this section, we elaborate `src/aztec/stdlib/merkle_tree/memory_tree.cpp`. Memory tree is a generic and auxiliary construction of merkle tree without using any database optimization like levelDB. The `MemoryTree` class declaration is given below. ```cpp= class MemoryTree { public: MemoryTree(size_t depth); fr_hash_path get_hash_path(size_t index); fr update_element(size_t index, fr const& value); fr root() const { return root_; } private: size_t depth_; size_t total_size_; barretenberg::fr root_; std::vector<barretenberg::fr> hashes_; }; ``` The leaves and the intermediate nodes of a memory tree are stored in the vector `hashes_` sequentially from left to right, bottom to top. ![](https://hackmd.io/_uploads/HkTO7LStK.png) For example, the above merkle tree of depth 2 is stored in the `hashes_` vector as follows: ``` hashes_ = [A, B, C, D, E, F]. ``` Note that `root` is a public information and is not stored in the `hashes_` vector. Also note that a merkle tree of depth $d$ has $2^d$ leaves (denoted by `total_size_` in the code) and the number of elements for the corresponding `hashes_` array is `2 * tatal_size_ - 2`. (The last quantity is the sum of the series $2^d + 2^{d-1} + \ldots + 2$ which equals to $2^{d+1} -2$). * The method `MemoryTree::MemoryTree(size_t depth)` initialises all leave values with -1 and computes the intermediate nodes as the hashes of two leaves or nodes accordingly. For example, E node in the above figure is computed as `compress_native(A, B)`. * The method `MemoryTree::get_hash_path(size_t index)` returns the hash path of the node at `index`. For example, suppose we want to obtain the hash path of D. The code will return hash path as `((C,D), (E,F))`. * The method `MemoryTree::update_element(size_t index, fr const& value)` updates the leave at index `index` with value `value`. If `value = 0`, it replaces it with -1. ### Merkle Tree In the `merkle_tree/merkle_tree.hpp`, we define a template class `MerkleTree` which uses either a levelDB database or a memory store. It consists of the following member variables: - `store_`: the underlying database - `zero_hashes_`: pre-computed values of the roots of all possible empty subtrees - `depth_`: depth $d$ of the tree - `tree_id_`: identifier of the tree The merkle tree is stored in the memory using a key-value dictionary with levelDB used in the background. An example of how a depth-3 merkle tree is stored is shown below. Note that each node acting as *key* is storing the pair of its child nodes in the *value* field. ![](https://hackmd.io/_uploads/SyEoNcmtF.png) Here, $h^{i}_j$ denotes the value of the node at level $i$ and local index $j$. The above example assumes that all the leaves in the tree are filled. Suppose the leaves only at the indices $0,1,2,3$ and $6$ are filled (grey nodes denote empty leaves). ![](https://hackmd.io/_uploads/BJb2xIoYK.png) In this case, we don't need to store all intermediate nodes. In the right subtree at height $i=2$, observe that only one leaf is filled, so all we need to store is that value and its index relative to the subtree. For instance, in the third row of the key-value table, we store $h_1^{2}: (h_6^0, \ 2, \ \texttt{true})$ where $2$ is the index of the value $h^0_6$ in subtree marked with dotted lines and $\texttt{true}$ is a single byte to indicate its a stump. This ensures that we use minimal memory for storing sparse merkle trees. Another example with two stumps in a tree is shown below. ![](https://hackmd.io/_uploads/HybLlLBYK.png) :::success 💡 If a subtree has only one leaf filled, it is referred to as a *stump*. A stump is a particularly useful concept while computing hash paths in a sparse merkle tree. ::: #### Computing Hash Paths in `MerkleTree` For a depth $d$ merkle tree, we compute the hash path using the following algorithm: 1. We start an iteration from the root $r$ of the tree. Set current node $n=r$. 2. We read the two node values at level $i=d-1$: $\text{data} \leftarrow \text{read}(n)$. Here $\text{data}$ is a buffer which stores the value read from the key-value map and $\text{read}(k)$ is a function that reads the value given a key $k$. 3. If $len(\text{data}) = 64$, fill $\vec{h}[i] \leftarrow \{\text{data}[0], \text{data}[1]\}$. This case implies that both nodes at level $i$ are *non-empty*: i.e. each of the subtrees at level $i$ contain atleast one non-empty leaf. 4. If $len(\text{data}) = 65$, this means that one of the subtree at level $i$ is a stump. In this case, we reconstruct that subtree with the given information: - The filled leaf value in the stump: $val = \text{data}[0]$, - The index of that leaf in the subtree: $j_{\text{subtree}} = \text{read}_{idx}(n)$. 6. If $len(\text{data}) = 0$, we've reached at a level where both subtrees are completely empty. In this case, we use the pre-computed `zero_hashes_` to fill up the hash path. 7. Continue the iteration until $i=0$. #### Updating an element in `MerkleTree` To update a tree given the root $r$ with value $v$ at index $\text{idx}$, we use the following algorithm: 1. We start from the root $r$ of the tree. Set current node $n=r$. 2. Read data with the root as the key at level $i=d-1$: $\text{data} \leftarrow \text{read}(n)$ 3. If tree is empty, i.e. $len(\text{data}) = 0$, we simply need to create a stump such that $\text{key}=r, \text{value}=(v, \text{idx}, \texttt{true})$. 4. If $len(\text{data}) = 65$, this means we've reached a stump node. From hereon, we could have two conditions: - If $\text{idx} =\text{data}[1]$, we're updating the existing leaf, so this is easy: $\text{data}[1] \leftarrow v$ and recompute the root of the new tree. - Otherwise, we need to fork the stump: this can lead to creation of two smaller stumps in the same subtree or we could end up with the new value being inserted in the leaf adjacent to the original non-empty leaf. 5. If $len(\text{data}) = 64$, this means we need to update a regular subtree. In this case, we recursively traverse down the tree until we reach the leaves. Once we do, we need to update the leaf $\text{idx}$ with new value $v$. The recursion would ensure that subsequent nodes up the tree are correctly updated. As explaining the above algorithms in words is very difficult, we've tried to put together some animations to explain all of this better. <iframe src="https://aztec.slides.com/suyashbagad_aztec/merkle-trees/embed?byline=hidden&share=hidden" width="720" height="500" title="Merkle Trees" scrolling="no" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe> ### Memberships In this section, we discuss the various methods of `src/aztec/stdlib/merkle_tree/membership.hpp`. ![](https://hackmd.io/_uploads/r1hhC58Ft.png) * `check_subtree_membership` : This method checks whether a subtree is inserted properly in a merkle tree. Suppose it is to check whether subtree starting from node S is inserted properly in the above figure. For this the method takes the followings as inputs, 1. subtree root (S), 2. hash path `hashes` of **any leaves** of the subtree (In the above figure `hashes` = ((E,F),(A,S),(B,C))). Here, E is denoted by `hashes[0].first` and so on. 3. the `index` of **any leaves** of the subtree (110 or 111), 4. the `height` of the subtree (1 in our case) 5. The root of the Merkle tree $R$. We need to check the following equation to verify whether the subtree is inserted properly. $$ R \stackrel{?}{=} \text{Hash}(B,\text{Hash}(A,S)). $$ The values A, B can be obtained from the `hashes` vector. But how do we know which values of the hash path to take and in which order (left or right)? This information can be obtained from the bit string of `index` starting from position `height`, upto MSB (11 in our example for `index` = 110/111). For example, the LSB of 11 instructs us to compute the first hash as $$ \text{hash}_1 = \text{Hash}(hashes[1].first,S) $$ as S is in the right position (had the bitstring been 10 instead of 11, we would have to compute $\text{hash}_1 = \text{Hash}(S, hashes[1].first)$). The MSB of 11 instructs us to compute the second hash as $$ \text{hash}_2 = \text{Hash}(hashes[2].first, hash1). $$ Finally, we check $R \stackrel{?}{=} \text{hash}_2$ as `hashes` stops at index 2. * `assert_check_subtree_membership`: Asserts if a given subtree is correctly inserted in the Merkle tree using the `check_subtree_membership` method. * `check_membership`: Checks if a leaf value is correctly updated in the Merkle tree using `check_subtree_membership` method and taking `height = 0` (e.g., in this case there are no leaves after node S in the above figure). * `assert_check_membership`: Asserts if a leaf value is correctly updated in the Merkle tree using the method `check_membership`. Note that in all four methods it does not matter whether the hash path is computed before or after updating the Merkle tree. To see this, consider the vector `hashes = ((E,F),(A,S),(B,C)))` for the subtree with root S as shown above. The method `check_subtree_membership` will only parse the values A, B, C which remain the same before and after the Merkle tree updation. Obviously, the root of the merkle tree passed to `check_subtree_membership` has to be computed after the Merkle tree updation. *This property helps us to ensure that we are dealing with the same Merkle tree before and after updating the tree*. The same principle applies to the below description. * `update_membership`: Asserts if old and new state of the tree is correct after updating a single leaf. It uses `assert_check_membership` method with old and new Merkle root, and old and new leaf values. It keeps the index and hash path same to ensure that both the trees refer to the same Merkle tree. * `update_subtree_membership`: Asserts if old and new state of the Merkle tree is correct after a subtree-update. It uses `assert_check_subtree_membership` method and height of the subtree. It uses the same principle of using the common index and hash path to ensure that the old and new trees are the same Merkle tree. * `compute_tree_root`: Computes the root of a tree when leaves are given as the elements of vector `input`. It first converts any zero value in `input` to -1. Then it recursively computes all the intermediate nodes upto the root by hashing a pair of contiguous nodes from the below layer. It returns the root of the tree as output. * `check_tree`: Checks if a given root matches the root computed from a set of leaves using the method `compute_tree_root`. * `assert_check_tree`: Asserts if a given root matches the root computed from a set of leaves using the method `check_tree`. * `batch_update_membership`: Updates the tree with a vector of new values starting from the leaf at `start_index`. It determines the height of the subtree to be inserted and calculates its root. Then it update the Merkle tree with this new root using the method `update_subtree_membership`.

Read more

From AIRs to RAPs - how PLONK-style arithmetization works

Note Attestation Spec Implementation

Unit testing Aztec.nr Contracts

Goblin Plonk Recursion Book