# Rust-Verkle https://github.com/crate-crypto/rust-verkle # English Version ## Verkle tree motivation Stateless client is a key milestone in the Ethereum roadmap. The state of the existing Ethereum execution layer is stored in the MPT structure. Although MPT can also implement Stateless client, the corresponding Proof size is too large and cannot be used in one area. The block interval is broadcast to all nodes: > Verkle trees solve a key problem standing in the way of Ethereum being stateless-client-friendly: witness sizes. A witness accessing an account in today’s hexary Patricia tree is, in the average case, close to 3 kB, and in the worst case it may be three times larger. Assuming a worst case of 6000 accesses per block (15m gas / 2500 gas per access), this corresponds to a witness size of ~18 MB, which is too large to safely broadcast through a p2p network within a 12-second slot. Verkle trees reduce witness sizes to ~200 bytes per account in the average case, allowing stateless client witnesses to be acceptably small. > > from https://eips.ethereum.org/EIPS/eip-6800 And Verkle Tree will greatly reduce the Proof size. ## What is Verkle tree Verkle tree ("Vector commitment" and "Merkle Trees") is a data structure based on Trie. The goal is to store a collection of KVs and provide the function of inserting, modifying and getting KVs efficiently, and can provide proof for any certain set of KVs, which proves the V corresponding to K is indeed this value or does not exist. ### Commitment, Proof Here is a simple science for readers who are not familiar with commitments and proofs. In one scenario, Alice and Bob play a game: - Alice: I know a polynomial $f(x) = a_0 + a_1x + \cdots + a_{n-1}x^{n-1}$, and I can prove it to you - Bob: I don’t know anything about $f(x)$, how can I verify it? - Alice: Yes, now $f(x)$ is entirely up to me, and I can modify its coefficients at will. Therefore, I need to public a **Commitment** of $f(x)$ first, so that I cannot change $f(x)$ randomly. - Bob: Yes, I first generate a random number $r$, but I won’t tell you this $r$, then select two points $G_1, G_2$ of BLS12-381, and calculate $(P_0, P_1, P_2 \dots, P_{n-1}) = (G_1, rG_1, r^2G_1, \dots, r^{n-1}G_1)$, and sends you the $n$ points on the elliptic curve. Because according to the elliptic curve discrete logarithm problem, there is no way to get the value of $r$ by having $rG$. - Alice: OK, I calculated $P = f(r)G_1 = a_0G_1 + (a_1\cdot r)G_1 + (a_2\cdot r^2)G_1 + \cdots + (a_{n-1}r^{ n-1})G_1 = a_0P_0 + a_1P_1 + a_2P_2 + \cdots + a_{n-1}P_{n-1}$, and then send $P$ points to you, then $P$ points are $f(x)$ commitment. As can be seen, although I don't know $r$, I can compute the commitment of any polynomial. - Bob: OK, I will generate another challenge value (random number) $s$ and send this $s$ to you. - Alice: I calculate $f(s)$, and $q(x) = \frac{f(x) -f(s)}{x-s}$, and calculate $Q = q(r)G_1$, I Send $f(s), Q$ to you. where $f(s)$ is the result and $Q$ is the proof of $f(s)$. - Bob: I verify whether $e(f(r)G_1 - f(s)G_1, G_2) = e(q(r)G_1, (r-s)G_2)$ is true. If it is true, it can prove $f(r) - f(s) = q(r)(r-s)$, I can most likely believe that you know $f(x)$. $e(G_1, G_2)$ is pairing, and some elliptic curves have this property, such as BLS12-381. The above is a simplified description of KZG. For details, please refer to https://hackmd.io/DA-UGXFFRzCCznM1JBuZcA To sum up, Bob does not know the coefficient $a_i$ of $f(x)$ in the end, but he believes that Alice knows $f(x)$ with a high probability, so which $f(x)$ does he believe Alice knows? That is, the one corresponding to the commitment $P$. Back to Verkle Tree, you don't need to know the state(whole KV pairs), you only need to know the state root in advance, similar to the commitment to all KV, and the relevant KVs' proof, you can verify that the relevant KVs are indeed under the Tree represented by this state root. ### Structure of Verkle Tree ![image](https://hackmd.io/_uploads/r1feentzA.png) *figure 1* A typical Verkle tree is shown in the figure above and contains 3 types of nodes. According to the naming in rust-verkle, they are `Branch` node, `Stem` node, and `Leaf` node. Note that although the values in node in the above figure are represented by the same characters, the values in different nodes are not necessarily equal. Among them, $C$ represents the commitment of `Branch` node, which is a point on the elliptic curve. $H_C$ represents the hash value of the Commitment, which is the element in the scalar field $\mathbb{F}_r$ on the elliptic curve. $C_1, C_2, C_s$ respectively represent the Commitment of `Stem` node. $K, V$ represent the actual data to be saved. We can draw conclusions based on the above figure: - The root node is the `Branch` node - The child node of `Branch` node may be `Branch` node, which may be `Stem` node - **The parent node of `Stem` node has multiple child nodes**, because if there is only one child node, it will be optimized into stem - The child node of `Stem` node is `Leaf` node, and the parent node of `Leaf` node is `Stem` node - There are at most `256` child nodes under a certain `Branch`/`Stem` node - Because the key of Leaf node is represented by `[u8;32]`, the deepest of the entire tree is `32` other information: - The key value of the `Stem` node is `31` bytes. Its Path in the Tree can only obtain the first part to the entire value of the key (depending on its depth), so additional field storage is required. - Based on the `31` bytes of `Leaf` node, plus the index of its position under the `Stem` node, `32` bytes can be formed, which can be used as the key of **`Leaf`** node - Knowing the key of the `Stem` node and its depth `d`, you can get its path: `path = key[0..d]` question: - What is the function of `Stem` node? This is an optimization for trie. There is also a similar optimization in MPT: when consecutive nodes in the trie have only one child node, they can be compressed into one node to represent them, which can reduce the depth of the tree. - For example, when using `2-ary` to represent `256-ary`, the following figure is a simulation of the actual situation, because all keys are obtained through pedersen hash of address and other information, which can be regarded as random number, so leaf nodes are evenly distributed in each branch starting from the root. Assuming that there is no `Stem` node, and the depth is after 3-4, most nodes will appear as shown in the figure above, that is, they will degenerate into a Linked list. ![image](https://hackmd.io/_uploads/r15UNYDXC.png) Some data show that in actual situations, the average depth of the Verkle tree is 3.8, which is normal, $256^{3.8} \approx 1.4 \ \text{B}$, and [the number of Ethereum addresses is 0.268B](https://etherscan.io/chart/address), the capacity is sufficient. - Why does the second-to-last node start to fork again (the so-called suffix trie)? ![image](https://hackmd.io/_uploads/HyF1s_Kz0.png) Interesting question, I am still discussing it with the ETH R&D team responsible for the verkle tree. > I do have a question about the design of verkle trees, here a binary tree is used to represent a 256-ary tree. My understanding is that since all keys are the results of pedersen_hash and are relatively random, the keys will be evenly distributed in each branch starting from the root. In actual situations, the structure of the tree should be like Figure 1: the first 3-4 layers are almost a balanced tree, and the subsequent nodes will degenerate into linked lists. Then use stem node to represent nodes that degenerate into linked lists for optimization. My question is, 1. Why not put the nonce, balance, and even some storage slots of the same address under the same stem node, because when a transaction occurs, these fields are relatively easy to update at the same time, and this can reduce the number of required to update stem nodes and branch nodes. The actual situation is that they will be randomly distributed in different stem nodes. 2. Since all leaf nodes will be randomly distributed, why is Figure 2 chosen for the final layout, which branches into 256 child nodes after the stem node? I think the actual situation is that under a stem node, in most cases only one child has a value, and the others are Null. - The disadvantage of the `Stem` node is that if Leaf nodes 2 and 4, or 3 and 4 appear at the same time, it will not play a role in reducing the depth, and it is a possible DDoS attack surface. And you don’t need to have the private key of the relevant address to attack. For example, if I directly transfer a small amount of ether as a balance to the accounts represented by Leaf nodes 2 and 3, the Verkle Tree can be deepened. The solution is that the key is not generated directly based on the account address, but is hashed and reflected in [EIP 6800](https://eips.ethereum.org/EIPS/eip-6800). - Why is the Key of `Stem` node designed to be 31 bytes, and `Stem` node cannot be root, so at least its first byte can be obtained directly from the Path of `Stem` without saving it? My understanding is that this is a difference in implementation. This is how it is implemented in memoryDB: `[u8;31]` - I saw in some information before that with the `Verkle Tree`, there is no need to create an additional Storage tree for the contract account under the account. How is this achieved? The potential space of the Verkel tree is very large ($2^{256}$), so the storage tree is embedded into the Verkle tree in the following form. ```python= def get_tree_key_for_storage_slot(address: Address32, storage_key: int): if storage_key < (CODE_OFFSET - HEADER_STORAGE_OFFSET): pos = HEADER_STORAGE_OFFSET + storage_key else: pos = MAIN_STORAGE_OFFSET + storage_key return get_tree_key( address, pos // VERKLE_NODE_WIDTH, pos % VERKLE_NODE_WIDTH ) ``` ## Verkle Tree based on Memory ```rust= // https://github.com/crate-crypto/rust-verkle/blob/7688f0aedfb147d3d391abfe8495e46c46d72ce0/verkle-trie/src/database/memory_db.rs#L6-L11 pub struct MemoryDb { pub leaf_table: HashMap<[u8; 32], [u8; 32]>, pub stem_table: HashMap<[u8; 31], StemMeta>, pub branch_table: HashMap<Vec<u8>, BranchChild>, } pub struct StemMeta { pub c_1: Element, pub hash_c1: Fr, pub c_2: Element, pub hash_c2: Fr, pub stem_commitment: Element, pub hash_stem_commitment: Fr, } pub enum BranchChild { Stem([u8; 31]), Branch(BranchMeta), } pub struct BranchMeta { pub commitment: Element, pub hash_commitment: Fr, } ``` Three types of nodes correspond to three tables, leaf_table corresponds to the KV pair of 32 bytes, stem_table corresponds to the key of 31 bytes and the value of $C_1, C_2, C_s, H_{C_1}, H_{C_2}, H_{C_s}$ . branch_table corresponds to variable-length bytes (Path) as the key. The node under the Path may be `Stem` or `Branch`. ![image](https://hackmd.io/_uploads/B1cTziKGC.png) The above picture is the result of printing Figure 1. You can see 5 leaf nodes, 31 branch nodes, and 4 stem nodes. Note that branch_table contains 31 branch nodes and 4 stem nodes. ## Method of Verkle Tree ```rust= // https://github.com/crate-crypto/rust-verkle/blob/7688f0aedfb147d3d391abfe8495e46c46d72ce0/verkle-trie/src/lib.rs#L17-L50 pub type Key = [u8; 32]; pub type Value = [u8; 32]; pub trait TrieTrait { /// Inserts multiple values into the trie /// If the number of items is below FLUSH_BATCH, they will be persisted /// atomically /// This method will implicitly compute the new root fn insert(&mut self, kv: impl Iterator<Item = (Key, Value)>); /// Inserts a single value /// This method will implicitly compute the new root fn insert_single(&mut self, key: Key, value: Value) { self.insert(vec![(key, value)].into_iter()) } /// Gets the value at the `Key` if it exists /// Returns an none if it does not exist /// TODO: Find out if this method is ever needed fn get(&self, key: Key) -> Option<Value>; /// Returns the root of the trie fn root_hash(&self) -> Fr; /// Returns the root commitment of the trie fn root_commitment(&self) -> Element; /// Creates a verkle proof over many keys fn create_verkle_proof( &self, key: impl Iterator<Item = Key>, ) -> Result<proof::VerkleProof, ProofCreationError>; } ``` In fact, Ethereum's State Trie only needs to implement `TrieTrait`, that is, it only needs to provide the above functions to the outside world to complete the task, regardless of whether the underlying layer uses Verkle Tree or SALT. Here we first analyze how it will be implemented if it is a Verkle Tree. - `insert`, given a set of KV pairs, save them. In MemoryDB, insert into leaf_table - `insert_single`, call insert directly - `get`, given a K, returns V. In MemoryDB, get the KV pair in leaf_table - `root_hash`, the hash of root node commitment - `root_commitment`, root node commitment - `create_verkle_proof`, given a set of K, returns a VerkleProof In addition to `insert_single`, let’s take a look at the implementation of the remaining 5 interfaces. First, let’s take a look at the definition of Trie. Trie has two fields, storage and committer. Their types are both generic. There are no requirements for the Storage type. The PolyCommit type also needs to implement the Committer trait. Let’s focus on storage first. ```rust= // https://github.com/crate-crypto/rust-verkle/blob/7688f0aedfb147d3d391abfe8495e46c46d72ce0/verkle-trie/src/trie.rs#L12-L15 pub struct Trie<Storage, PolyCommit: Committer> { pub storage: Storage, pub committer: committer, } ``` ### get For the `Storage` type that implements the `ReadWriteHigherDb` trait, because there is a get_leaf interface in the `ReadWriteHigherDb` trait, `self.storage.get_leaf(key)` is called directly ```rust= // https://github.com/crate-crypto/rust-verkle/blob/7688f0aedfb147d3d391abfe8495e46c46d72ce0/verkle-trie/src/trie.rs#L26-L28 impl<S: ReadWriteHigherDb, P: Committer> TrieTrait for Trie<S, P> { fn get(&self, key: crate::Key) -> Option<crate::Value> { self.storage.get_leaf(key) } } ``` So how does MemoryDb implement `get_leaf`? You can get it directly from the HashMap get of leaf_table in MemoryDb. ```rust= // https://github.com/crate-crypto/rust-verkle/blob/7688f0aedfb147d3d391abfe8495e46c46d72ce0/verkle-trie/src/database.rs#L8 pub trait ReadWriteHigherDb: ReadOnlyHigherDb + WriteOnlyHigherDb {} // https://github.com/crate-crypto/rust-verkle/blob/7688f0aedfb147d3d391abfe8495e46c46d72ce0/verkle-trie/src/database/memory_db.rs#L59-L63 impl ReadOnlyHigherDb for MemoryDb { fn get_leaf(&self, key: [u8; 32]) -> Option<[u8; 32]> { self.leaf_table .get(&key) .map(|val| val.to_vec().try_into().unwrap()) } } ``` I think there is no need to convert types. It can be achieved like this: ```rust= fn get_leaf(&self, key: [u8; 32]) -> Option<[u8; 32]> { self.leaf_table.get(&key).copied() } ``` **the complexity** In the implementation of MemoryDB, on average get is a $\mathcal{O}(1)$ operation, because the get in HashMap is directly called. ### root_hash and root_commitment Similarly, for the Trie that implements the ReadWriteHigherDb trait for the S type, because there is a get_branch_meta interface in the ReadWriteHigherDb trait, you can directly call self.storage.get_branch_meta and enter the path of the root node: `&[]` to obtain the meta information of the root node. Including commitment and hash. ```rust= impl<S: ReadWriteHigherDb, P: Committer> TrieTrait for Trie<S, P> { fn root_hash(&self) -> Fr { // This covers the case when the tree is empty // If the number of stems is zero, then this branch will return zero let root_node = self .storage .get_branch_meta(&[]) .expect("this should be infallible as every trie should have a root upon creation"); root_node.hash_commitment } fn root_commitment(&self) -> Element { // TODO: This is needed for proofs, can we remove the root hash as the root? let root_node = self.storage.get_branch_meta(&[]).unwrap(); root_node.commitment } } impl ReadOnlyHigherDb for MemoryDb { fn get_branch_meta(&self, key: &[u8]) -> Option<BranchMeta> { let branch_child = match self.branch_table.get(key) { Some(b_child) => b_child, None => return None, }; match branch_child { BranchChild::Stem(stem_id) => panic!( "expected branch meta data, however under this path there is a stem: {}", hex::encode(stem_id) ), BranchChild::Branch(b_meta) => Some(*b_meta), } } } ``` And how does MemoryDb implement get_branch_meta? Because the HashMap of branch_table is saved, you can get it directly. However, as described before, there are two types of value, one is Branch and the other is Stem. If it is Stem, panic directly because the parameter you gave is Stem. path, and then you want to get Branch here. **the complexity** In the implementation in MemoryDB, on average root_hash and root_commitment are a $\mathcal{O}(1)$ operation, because the get in HashMap is directly called ### insert Insert will be more complicated, we will analyze it by combining the schematic diagram and code. From the implementation below, you can see that the implementation involves two processes. Each KV pair to be inserted is converted into a series of small operations consisting of instructions, and then the instructions are applied to the Tree. ![image](https://hackmd.io/_uploads/rk5ax-oM0.png) ```rust= impl<S: ReadWriteHigherDb, P: Committer> TrieTrait for Trie<S, P> { fn insert(&mut self, kv: impl Iterator<Item = (crate::Key, crate::Value)>) { for (key_bytes, value_bytes) in kv { let ins = self.create_insert_instructions(key_bytes, value_bytes); self.process_instructions(ins); } } } ``` However, it should be noted that the processing of KV here is serial. Only one KV can be processed after updating one KV. An optimization idea is to preprocess these KVs in advance, and try to allow KVs with a common path to be processed at the same time. Processing, so that the number of updates to stem nodes and branch nodes on the public path can be reduced. ![image](https://hackmd.io/_uploads/H1CqoUjMA.png) There are three types of instructions, `UpdateLeaf`, `ChainInsert`, `InternalNodeFallThrough` ```rust= pub enum Ins { UpdateLeaf { ... }, ChainInsert { ... }, InternalNodeFallThrough { ... }, } ``` #### UpdateLeaf ```rust= pub(crate) type BranchId = Vec<u8>; UpdateLeaf { key: [u8; 32], new_leaf_value: [u8; 32], depth: u8, branch_id: BranchId, branch_child_index: u8, }, ``` Updating the leaf node will update all nodes on the path from the root to the leaf node: 1. Update the value of the leaf node 2. Update one of $C_1, C_2$ of its corresponding stem node, and the corresponding hash 3. Update $C$ and $H_C$ of the stem node’s parent nodes (branch node) Information included: - `key`, `new_leaf_value`, used for the first update - `depth`, the depth of the leaf node - `branch_id`, the path of the stem node corresponding to the leaf node - `branch_child_index`, the position of the leaf node under the stem node. A scenario of UpdateLeaf is as shown below, assuming that leaf node 6 is added based on the existing tree: ![image](https://hackmd.io/_uploads/SyBFoIhzA.png) Let’s first look at how the `UpdateLeaf` instruction is processed in `process_instructions`: ```rust= impl<Storage: ReadWriteHigherDb, PolyCommit: Committer> Trie<Storage, PolyCommit> { pub fn process_instructions(&mut self, instructions: Vec<Ins>) { for ins in instructions.into_iter().rev() { match ins { Ins::UpdateLeaf { key, new_leaf_value, depth, branch_id, branch_child_index, } => { let leaf_update = match self.update_leaf_table(key, new_leaf_value, depth) { Some(leaf_update) => leaf_update, None => { // No value was updated, early exit return; } }; let stem_update = self.update_stem_table(leaf_update, depth); self.update_branch_table(stem_update, branch_id, branch_child_index, depth); } ... } } } ... } ``` As shown in the figure above, if the leaf node has a value before, `update_leaf_table`, `update_stem_table`, `update_branch_table` are called in sequence. If the return is empty, only `update_leaf_table` is needed, which feels a bit strange. Look at the implementation of these three updates: **update_leaf_table**: ```rust= // Store the leaf, we return data on the old leaf, so that we can do the delta optimization // // If a leaf was not updated, this function will return None // else Some will be returned with the old value fn update_leaf_table( &mut self, key: [u8; 32], value: [u8; 32], depth: u8, ) -> Option<LeafUpdated> { let old_val = match self.storage.insert_leaf(key, value, depth) { Some(vec) => { // Check if they have just inserted the previous value // if so, we exit early and return None if vec == value { return None; } Some(vec) } None => None, }; Some(LeafUpdated { old_val, new_value: value.to_vec(), key: key.to_vec(), }) // Storing a leaf means we need to change the stem table too } impl WriteOnlyHigherDb for MemoryDb { fn insert_leaf(&mut self, key: [u8; 32], value: [u8; 32], _depth: u8) -> Option<Vec<u8>> { self.leaf_table .insert(key, value) .map(|old_val| old_val.to_vec()) } } ``` The insert_leaf method of storage is called. Then the insert of HashMap is called. If there is a value, return the value; if there is no value, return none. The summary is update_leaf_table if leaf node - The new KV is the same as the previous KV - V did not exist before this K It returns None, and UpdateLeaf only executes update_leaf_table. Otherwise, return the update result: LeafUpdated, and continue to execute update_stem_table, update_branch_table. **update_stem_table** - Why only one commitment is needed in the branch node, but two commitments are needed in the stem node. Aren't they both committed to 256 child nodes? Because the objects being promised are different. The child node of a branch node is a branch node or a stem node, and the committed object is the hash value of the commitment of the branch node or stem node. The value range is `[0, order)`, and you can commit directly. The child node of the stem node is KV, the committed object is V, the value range is `[0,2^256)`, and the part beyond the order cannot be committed. Therefore two commitments are required. How to divide it? A natural idea is to put the [0,127] bit of the 256 child nodes V in the first $C_1$, and the [128,255] bit in $C_2$. However, if you do this, as long as a certain value in V is updated, it will This results in the need to update $C_1$, $C_2$. Therefore, different partitioning methods are used: $C_1$ is used for the commitment of V with index 0-127, $C_2$ is used for the commitment of V with index 128-255. The specific implementation is: ![image](https://hackmd.io/_uploads/ByZLayWQA.png) There are a total of 256 leaf nodes $V_0$~$V_{255}$ under a stem node, each $K_i, V_i$, assuming [u8; 32], then: - `Vi_l = Fr::from_le_bytes_mod_order(V_i[0..=15]) + 2^128` - `Vi_h = Fr::from_le_bytes_mod_order(V_i[16..=31])` - `stem: [u8;31] = Ki[0..=30]` If the node with index `i` does not exist, `Vi` uses 0 instead. Assume that the Commitment base is $(G_0, G_1,\dots, G_{255})$. In fact, all $C$ of branch nodes and $C_1$ and $C_2$ of stem nodes use the same set of Commitment base. Then the calculation formula is: Assume that the Commitment base is $(G_0, G_1,\dots, G_{255})$. In fact, all $C$ of branch nodes and $C_1$ and $C_2$ of stem nodes use the same set of Commitment base .Then the calculation formula is: $$\begin{align*}C_1 &= \sum_{i=0}^{127} (V_i\_l\cdot G_{2i} + V_i\_h \cdot G_{2i+1}) \\ C_2 & = \sum_{i=0}^{127} (V_{i+128}\_l\cdot G_{2i} + V_{i+128}\_h \cdot G_{2i+1}) \\ H_{C} &= \text{group_to_field}(C) \\ C_{stem} &= 1 \cdot G_0 + \text{stem} \cdot G_1 + H_{C_1} \cdot G_2 + H_{C_2} \cdot G_3\end{align*}$$ The calculation method of $C_{stem}$ is also the source of the green node in the figure below, The calculation method of $C_{stem}$ is also the source of the green node in the figure below, ![image](https://hackmd.io/_uploads/HyVsdyWXC.png) > from https://blog.ethereum.org/2021/12/02/verkle-tree-structure I won’t post all the specific codes, there are quite a few, you can go here to view them yourself: https://github.com/crate-crypto/rust-verkle/blob/e07b17b0538fcd227c221d2388e5468417116f00/verkle-trie/src/trie.rs#L537-L675 It should be noted that the additive homomorphism of points on the elliptic curve is used for optimization. Assume that $V_1$ is updated, then: $C_1^\prime = C_1 + (V_1\_l^\prime - V_1\_l) \cdot G_2 + (V_1\_h^\prime - V_1\_h) \cdot G_3$ $C_2^\prime = C_2$ $C_{stem}^\prime = C_{stem} + (H_{C_1}^\prime - H_{C_1})*G_2$ Then the following is my step-by-step calculation of $C_1, C_2, C_{stem}$ and the equality in the updated trie based on the above formula: https://github.com/flyq/verkle_tree_example/blob/da877d33309ee036661c4408a519b27357dbd335/src/main.rs#L14-L47 Finally, call insert_stem to insert directly on the stem_table of MemoryDb: ```rust= fn insert_stem(&mut self, key: [u8; 31], meta: StemMeta, _depth: u8) -> Option<StemMeta> { self.stem_table.insert(key, meta) } ``` **update_branch_table** $$\begin{align*} C_{branch} &= \sum_{i=0}^{255} H_{i} \cdot G_i \\ H_i &= \text{group_to_field}(C_i) \end{align*}$$ Where $C_i$ represents the commitment of the child node under the branch node, which may be a branch node or a stem node. ```rust= fn update_branch_table( &mut self, stem_update: StemUpdated, branch_id: BranchId, branch_index: u8, depth: u8, ) -> Fr { let old_stem_hash = stem_update.old_val.unwrap_or_else(Fr::zero); let new_stem_hash = stem_update.new_val; let delta = new_stem_hash - old_stem_hash; let old_branch_comm = self.storage.get_branch_meta(&branch_id).unwrap().commitment; let delta_comm = self.committer.scalar_mul(delta, branch_index as usize); let updated_branch_comm = old_branch_comm + delta_comm; let hash_updated_branch_comm = group_to_field(&updated_branch_comm); // Update the branch metadata self.storage.insert_branch( branch_id.clone(), BranchMeta { commitment: updated_branch_comm, hash_commitment: hash_updated_branch_comm, }, depth, ); let mut branch_child_id = branch_id; branch_child_id.push(branch_index); self.storage .add_stem_as_branch_child(branch_child_id, stem_update.stem, depth); hash_updated_branch_comm } ``` The above code implementation also takes advantage of the additive homomorphism of elliptic curve points: $C_{branch}^\prime = C_{branch} + (H_i^\prime - H_i) G_i$ Then the following is my step-by-step calculation of the equality between $C_{branch}$ and trie based on the original formula: https://github.com/flyq/verkle_tree_example/blob/d4fdd7f5565db9bcb5aaabe861c549847063598e/src/main.rs#L78-L112 #### InternalNodeFallThrough ```rust= InternalNodeFallThrough { branch_id: BranchId, branch_child_index: u8, child: BranchId, old_child_value: Option<Meta>, depth: u8, }, ``` ```rust= Ins::InternalNodeFallThrough { branch_id, branch_child_index, child, depth, old_child_value, } => { let new_branch_meta = self.storage.get_branch_meta(&child).unwrap(); let new_hash_comm = new_branch_meta.hash_commitment; let old_hash_comm = match old_child_value { Some(old_branch_meta) => old_branch_meta.into_branch().hash_commitment, None => Fr::zero(), }; let delta = new_hash_comm - old_hash_comm; let delta_comm = self .committer .scalar_mul(delta, branch_child_index as usize); let old_parent_branch_metadata = self.storage.get_branch_meta(&branch_id).unwrap(); let old_branch_comm = old_parent_branch_metadata.commitment; let updated_comm = old_branch_comm + delta_comm; let hash_updated_comm = group_to_field(&updated_comm); self.storage.insert_branch( branch_id, BranchMeta { commitment: updated_comm, hash_commitment: hash_updated_comm, }, depth, ); } ``` This is used to update C and Hc of a special type of branch node. The child nodes of this type of branch node are also branch nodes, not stem nodes. And after the trie has updated the child nodes of the branch node, the current branch node is updated. #### ChainInsert ```rust= ChainInsert { starting_depth: u8, chain_insert_path: Vec<u8>, parent_branch_node: BranchId, child_index: u8, old_leaf_index: u8, new_leaf_key: [u8; 32], new_leaf_value: [u8; 32], new_leaf_index: u8, }, ``` This instruction is used to update a series of branch nodes under a certain path. I won’t post the source code, which is in: https://github.com/crate-crypto/rust-verkle/blob/e07b17b0538fcd227c221d2388e5468417116f00/verkle-trie/src/trie.rs#L362-L485 ### create_verkle_proof ```rust= fn create_verkle_proof( &self, keys: impl Iterator<Item = [u8; 32]>, ) -> Result<crate::proof::VerkleProof, crate::errors::ProofCreationError> { use crate::proof::prover; prover::create_verkle_proof(&self.storage, keys.collect()) } pub fn create_verkle_proof<Storage: ReadOnlyHigherDb>( storage: &Storage, keys: Vec<[u8; 32]>, ) -> Result<VerkleProof, ProofCreationError> { if keys.is_empty() { return Err(ProofCreationError::EmptyKeySet); } let (queries, verification_hint) = create_prover_queries(storage, keys); // Commitments without duplicates and without the root, (implicitly) sorted by path, since the queries were // processed by path order let root_query = match queries.first() { Some(query) => query, None => return Err(ProofCreationError::ExpectedOneQueryAgainstRoot), }; let root_comm = root_query.commitment; let comms_sorted: Vec<_> = queries .iter() // Filter out the root commitment .filter(|query| query.commitment != root_comm) // Pull out the commitments from each query .map(|query| query.commitment) // Duplicate all commitments .dedup() .collect(); let mut transcript = Transcript::new(b"vt"); let proof = MultiPoint::open(CRS.clone(), &PRECOMPUTED_WEIGHTS, &mut transcript, queries); Ok(VerkleProof { comms_sorted, verification_hint, proof, }) } ``` verkle proof relies on the implementation of IPA #### ipa-multipoint ```rust= pub struct IPAProof { pub(crate) L_vec: Vec<Element>, pub(crate) R_vec: Vec<Element>, pub(crate) a: Fr, } ``` Let’s deduce the `create` function in `ipa-multipoint/src/ipa.rs`: parameter: - `transcript`, used for Fiat-Shamir Transform, that is, the interactive part in IPA (generating random challenge values) is replaced by random oracle. The transcript plays the role of random oracle, and the verifier can also obtain the corresponding challenge by following the same steps. value, eliminating the need for transcript transmission - `crs`, Common Reference String, used here to save the base corresponding to Pedersen commitments, that is, 256 + 1 linearly independent point Element - `a_vec`, vector a - `a_comm`, the commitment of vector a, equal to $\sum_{i=0}^{n-1} a_i G_i$ - `b_vec`, vector b - `input_point`, the value of the polynomial argument, only used for transcript logic: - $out\_point = \vec{a} \cdot \vec{b}$ - $Q = w \cdot Q$ - $C_{L_i} = a_R \cdot G_L + a_R \cdot b_L \cdot Q$ - $C_{R_i} = a_L \cdot G_R + a_L \cdot b_R \cdot Q$ - $L = (C_{L_0}, C_{L_1}, \dots, C_{L_{rounds}} )$ - $R = (C_{R_0}, C_{R_1}, \dots, C_{R_{rounds}} )$ - $a = a_L + x \cdot a_R$ - $b = b_L + x^{-1} b_R$ - $G = G_L + x^{-1} G_R$ Output: proof: $L, R, a[0]$ **verify function** parameter: - `transcript`, same as create - `crs`, same as create - `b`, same as `b_vec` of create - `a_comm`, same as create - `input_point`, input_point - `output_point`, the result, is equal to `f(input_point)`, also equal to $\vec{a} \cdot \vec{b}$ Compared with create, the parameters include less vector a and more output_point. logic: - $a\_comm = a\_comm + w \cdot Q \cdot output\_point$ - $a\_comm = a\_comm + C_{L_i} \cdot x_i + C_{R_i} \cdot x^{-1}$ - $G_i = G_{L_{i-1}} + x_i^{-1} G_{R_{i-1}}$ - $b_i = b_{L_{i-1}} + x_i^{-1} b_{R_{i-1}}$ - $a\_comm ?= G_n[0] \cdot proof.a + proof.a \cdot b_n[0] \cdot w \cdot Q$ The summary of the above is: - $\mathcal{P}$ knows $\vec{a}$, $\vec{b}$, - $\mathcal{V}$ knows $\vec{b}$, doesn’t know (don’t want to know) $\vec{a}$ - base is $\vec{G} = (G_0, G_1, \cdots, G_{n-1})$ and $U$ process: - $C_a = \vec{a} \cdot \vec{G}$ - $U = w \cdot U$ - $C_0 = C_a + \vec{a} \cdot \vec{b}U$ - $C_{L1}= \vec{a}_{R0} \cdot \vec{G}_{L0} + \vec{a}_{R0} \cdot \vec{b}_{L0}U$ - $C_{R1}= \vec{a}_{L0} \cdot \vec{G}_{R0} + \vec{a}_{L0} \cdot \vec{b}_{R0}U$ - $C_1 = C_0 + x_1 C_{L1} + x_1^{-1}C_{R1}$ Until $k = \log n$: - $\vec{a}_k = \vec{a}_{Lk-1} + x_k \vec{a}_{Rk-1}$ - $\vec{b}_k = \vec{b}_{Lk-1} + x_k^{-1}\vec{b}_{Rk-1}$ - $C_{Lk}= \vec{a}_{Rk-1} \cdot \vec{G}_{Lk-1} + \vec{a}_{Rk-1} \cdot \vec{b} _{Lk-1}U$ - $C_{Rk}= \vec{a}_{Lk-1} \cdot \vec{G}_{Rk-1} + \vec{a}_{Lk-1} \cdot \vec{b} _{Rk-1}U$ - $C_k = C_{k-1} + x_kC_{Lk} + x_k^{-1}C_{Rk}$ also, - $\vec{a}_k, \vec{G}_k$ are both of length 1 $C_a, \vec{a}_k, \vec{a} \cdot \vec{b}, \vec{C}_{L}, \vec{C}_{R}$ sent to $\mathcal{V }$ $\mathcal{V}$ Verification: - $C_k = C_a + \vec{a}\cdot \vec{b}U + \sum_{i=1}^k (x_iC_{Li} + x_i^{-1} C_{Ri})$ - $C_k \overset{\text{?}}{=} \vec{a}_k[0] \cdot \vec{G}_k[0] + \vec{a}_k[0]\vec{b} _k[0]U$ verify_multiexp: - $\sum_{i=1}^k (x_iC_{Li} + x_i^{-1} C_{Ri}) + C_a + (\vec{a}\cdot \vec{b} - \vec{a} _k[0]\vec{b}_k[0]) \cdot U - \vec{a}_k[0] \cdot \vec{G}_k[0] \overset{\text{?}}{=} 0$ ### MultiPointProof ```rust= pub struct MultiPointProof { open_proof: IPAProof, g_x_comm: Element, } ``` ```rust= pub struct PrecomputedWeights { // This stores A'(x_i) and 1/A'(x_i) barycentric_weights: Vec<Fr>, // This stores 1/k for k \in [-255, 255] inverted_domain: Vec<Fr>, } ``` - `PrecomputedWeights.barycentric_weights`: In the verkle tree scenario, there are 256+256 values, $A'(0), A'(1), \dots, A'(255), \frac{1}{A' (0)}, \frac{1}{A'(1)},\dots, \frac{1}{A'(255)}$, - `PrecomputedWeights.inverted_domain`: In the verkle tree scenario, there are 255+255 values, $1,\frac{1}{2},\frac{1}{3},\dots,\frac{1}{ 255},-1, -\frac{1}{2}, -\frac{1}{3}, \dots, -\frac{1}{255}$ ```rust= pub struct LagrangeBasis { // We assume that the domain starts at zero, // so we only need to supply the upperbound domain: usize, values: Vec<Fr>, } ``` For the polynomial $f(x)$ obtained by Lagrange interpolation by: $(0,f(0)), (1, f(1)), (2, f(2)), \dots, (255, f(255))$: - `LagrangeBasis.domain` is 256 - `LagrangeBasis.values` is $(f(0), f(1), f(2), \dots, f(255))$ Methods: - `divide_by_linear_vanishing`, computes new polynomial $q(x) = \frac{f(x) - f(x_i)}{x-x_i}$ - `evaluate_in_domain`, when $x_i \in \{0, 1, \dots, 255\}$, directly take $f(x_i)$ - `evaluate_lagrange_coefficients`, when $x_i$ is not in domain, return new `LagrangeBasis`, $f(z) = A(z)\sum_{i=0}^{N-1}\frac{f(i) }{A^\prime(x_i)(z-x_i)}$, - $A(z) = \prod_{i=0}^{255} (z-i)$ - a vector, which contains: $res_i = \frac{A(z)}{A^\prime(i)(z-i)}$ ```rust= pub struct ProverQuery { pub commitment: Element, pub poly: LagrangeBasis, // TODO: Make this a reference so that upstream libraries do not need to clone // Given a function f, we use z_i to denote the input point and y_i to denote the output, ie f(z_i) = y_i pub point: usize, pub result: Fr, } ``` After reading the previous dependency data structure, let’s see how open is implemented. `open()` -> `MultiPointProof` enter: - `crs`, base of Pedersen promise - `precomp`, precompute some $A^\prime, +- \frac{1}{i}$ - `transcript`, for Fait-Shamir transformation - `queries`, an array of length $t$, representing $t$ polynomials $f_i(x)$ logic: - Get a random r and generate $1, r, r^2, r^3, \dots, r^{t-1}$ - Grouping: $(i, vec\{(r^i, f_i(x)), (r^j, f_j(x)), \cdots\})$, where $i \in [0..t]$ - aggregated_polynomial has 256 items, assumed to be $h_i(x)$ - $h_i(j) = f_i(j) \cdot r^i$ - $h_i(x) = A(x) \cdot r^i \cdot \sum_{j=0}^{255} f_i(j) \cdot \frac{1}{A^\prime \cdot (x-j) }$ Polynomials opened for the same point: `HashMap<point, Vec<(f_i, r^i)>>` performs aggregation: - $ag\_poly = \sum_i r^i \cdot \vec{f_i}$ Then we have a Vector consisting of $(point, ag\_poly)$ Use $d_i(x)$ to represent $ag\_poly$ $g(x) = \sum_{i = 0}^h \frac{d_i(x) -d_i(point)}{x - point}$ h is equal to how many different points there are, and is also the length of the Vector above. $C_g = \sum_{i=0}^{255} g(i) \cdot G_i$ ## Reference - https://blog.ethereum.org/2021/12/02/verkle-tree-structure - https://vitalik.eth.limo/general/2021/06/18/verkle.html - https://ihagopian.com/posts/anatomy-of-a-verkle-proof - https://verkle.dev - https://verkle.info - https://eips.ethereum.org/EIPS/eip-6800 - https://dankradfeist.de/ethereum/2021/07/27/inner-product-arguments.html - https://dankradfeist.de/ethereum/2021/06/18/pcs-multiproofs.html # Chinese Version ## Verkle tree 的动机 Stateless 客户端是以太坊路线图里面的关键里程碑,现有的以太坊 execution layer 的状态存储在 MPT 结构里面,虽然 MPT 也能实现 Stateless 客户端,但是对应的 Proof size 太大了,无法在一个区块的时间间隔广播到所有节点: > Verkle trees solve a key problem standing in the way of Ethereum being stateless-client-friendly: witness sizes. A witness accessing an account in today’s hexary Patricia tree is, in the average case, close to 3 kB, and in the worst case it may be three times larger. Assuming a worst case of 6000 accesses per block (15m gas / 2500 gas per access), this corresponds to a witness size of ~18 MB, which is too large to safely broadcast through a p2p network within a 12-second slot. Verkle trees reduce witness sizes to ~200 bytes per account in the average case, allowing stateless client witnesses to be acceptably small. > > from https://eips.ethereum.org/EIPS/eip-6800 而 Verkle Tree 将大大减少 Proof size。 ## 什么是 Verkle tree Verkle tree("Vector commitment" 和 "Merkle Trees") 是一种基于 Trie 的数据结构,目标是存储 KVs 的集合,并尽可能效率高的提供对 KVs 的增删改查的功能,而且能够提供某组 K 对应的 V 确实是返回的 V 的正确性证明。 ### 承诺,证明 这里对不熟悉承诺和证明的读者进行一个简单的科普。 一个这样的场景,Alice 和 Bob 玩一个游戏: - Alice:我知道一个多项式 $f(x) = a_0 + a_1x + \cdots + a_{n-1}x^{n-1}$,我能够向你证明 - Bob:我对 $f(x)$ 一无所知,怎么验证呢? - Alice:是的,现在 $f(x)$ 完全是我说了算,我可以随意修改它的系数。因此需要我先公开一个 $f(x)$ 的**承诺(Commitment)**,这样我就不能乱改 $f(x)$ - Bob:是的,我先生成一个随机数 $r$,但是这个 $r$ 我不会告诉你,然后选取 BLS12-381 的 $G_1, G_2$ 两点, 并且计算 $(P_0, P_1, P_2 \dots, P_{n-1}) = (G_1, rG_1, r^2G_1, \dots, r^{n-1}G_1)$,并将这 $n$ 个椭圆曲线上的点发送给你。因为根据椭圆曲线离散对数难题,拥有 $rG$ 无法得到 $r$ 的值。 - Alice:好的,我计算 $P = f(r)G_1 = a_0G_1 + (a_1\cdot r)G_1 + (a_2\cdot r^2)G_1 + \cdots + (a_{n-1}r^{n-1})G_1 = a_0P_0 + a_1P_1 + a_2P_2 + \cdots + a_{n-1}P_{n-1}$,然后将 $P$ 点发送给你,那么 $P$ 点就是 $f(x)$ 的承诺。可以看出,虽然我不知道 $r$,但是我能够计算任意多项式的承诺。 - Bob:好的,我再生成一个挑战值(随机数) $s$,并且把这个 $s$ 发送给你。 - Alice:我计算 $f(s)$,以及 $q(x) = \frac{f(x) -f(s)}{x-s}$,并计算 $Q = q(r)G_1$,我把 $f(s), Q$ 发送给你。其中 $f(s)$ 是结果,$Q$ 是对 $f(s)$ 的证明 proof。 - Bob:我验证 $e(f(r)G_1 - f(s)G_1, G_2) = e(q(r)G_1, (r-s)G_2)$ 是否成立,如果成立,就能证明 $f(r) - f(s) = q(r)(r-s)$, 我就大概率可以相信你知道 $f(x)$。$e(G_1, G_2)$ 是 pairing,部分椭圆曲线拥有该性质,比如 BLS12-381。 以上是对 KZG 的一个简化描述。具体可以参考 https://hackmd.io/DA-UGXFFRzCCznM1JBuZcA 总结一下,Bob 到最后都不知道 $f(x)$ 的系数 $a_i$,但是他大概率相信 Alice 知道 $f(x)$,那么他相信 Alice 知道哪个 $f(x)$?即承诺 $P$ 对应的那个。 回到 Verkle Tree,你不需要知道具体 KV 的值,只需要提前知道 state root,类似对所有的 KV 的承诺,以及相关 KV proof,你就能验证相关 KV 确实在这个 state root 代表的 Tree 的下面。 ### Verkle Tree 的结构 ![image](https://hackmd.io/_uploads/r1feentzA.png) *图 1* 一个典型的 Verkle tree 如上图所示,包含 3 类节点,根据 rust-verkle 里面的命名,为 `Branch` node,`Stem` node, `Leaf` node。注意,上图里面 node 的里面值虽然用相同字符表示,但不同 node 中的值并不一定相等。其中 $C$ 表示 `Branch` node 的 commitment,是一个椭圆曲线上的点,$H_C$ 表示对该 Commitment 的hash 值,是椭圆曲线上的标量域 $\mathbb{F}_r$ 里面的元素。$C_1, C_2, C_s$ 分别表示 `Stem` node 的 Commitment。$K, V$ 表示要存的实际数据。 我们可以根据上图得出结论: - root 节点是 `Branch` node - `Branch` node 的子节点可能是 `Branch` node,可能是 `Stem` node - **`Stem` node 的父节点有多个子节点**,因为假设只有一个子节点的话,它会被优化进 stem - `Stem` node 的子节点为 `Leaf` node,`Leaf` node 的父节点为 `Stem` node - 某个 `Branch`/`Stem` node 下最多有 `256` 子节点 - 因为 Leaf node 的 key 使用 `[u8;32]` 表示,整个 tree 最深为 `32` 其他信息: - `Stem` node 的 key 值为 `31` bytes,它在 Tree 里面的 Path 只能获取这个 key 的前面一部分到全部的值(取决于它所处深度),因此需要额外字段存储。 - `Leaf` node 在这 `31` bytes 的基础上,再加上它在 `Stem` node 下所处位置的 index,就能组成 `32` bytes,作为 **`Leaf`** node 的 key - 已知 `Stem` node 的 key,以及它所处的深度 `d`,可以得到它的 path:`path = key[0..d]` 问题: - `Stem` node 的作用是什么?这是对 trie 的一种优化,MPT 里面也存在类似的优化:当 Trie 中连续 node 都只有一个子节点时,可以把它们压缩成一个节点来表示,这样可以降低 Tree 的深度。 - 比如使用 `2-ary` 来代表 `256-ary` 时,下图是对实际情况的一种模拟,因为所有的 Key 都是通过 address 和其他信息的 pedersen hash 得来,可以看作是随机数,因此 leaf node 从 root 开始就均匀分布在各个分支,假设没有 `Stem` node,深度在 3-4 之后,大部分 node 将呈现如上图所示的情况,即退化成 Linked list。![image](https://hackmd.io/_uploads/r15UNYDXC.png) 某个资料显示,在实际情况中 Verkle tree 的平均深度为 3.8,这是正常的,$256^{3.8} \approx 1.4 \ \text{B}$,而[以太坊的地址数为 0.268B](https://etherscan.io/chart/address),容量足够。 - 为什么到倒数第 2 个 node 又开始分叉了(所谓的 suffix trie)?![image](https://hackmd.io/_uploads/HyF1s_Kz0.png) 有意思的问题,还在和 ETH R&D 负责 verkle tree 团队讨论。 > I do have a question about the design of verkle trees, here a binary tree is used to represent a 256-ary tree. My understanding is that since all keys are the results of pedersen_hash and are relatively random, the keys will be evenly distributed in each branch starting from the root. In actual situations, the structure of the tree should be like Figure 1: the first 3-4 layers are almost a balanced tree, and the subsequent nodes will degenerate into linked lists. Then use stem node to represent nodes that degenerate into linked lists for optimization. My question is, 1. Why not put the nonce, balance, and even some storage slots of the same address under the same stem node, because when a transaction occurs, these fields are relatively easy to update at the same time, and this can reduce the number of required to update stem nodes and branch nodes. The actual situation is that they will be randomly distributed in different stem nodes. 2. Since all leaf nodes will be randomly distributed, why is Figure 2 chosen for the final layout, which branches into 256 child nodes after the stem node? I think the actual situation is that under a stem node, in most cases only one child has a value, and the others are Null. - `Stem` node 的缺点是如果同时出现 Leaf node 2 和 4,或者 3 和 4 时,它就没起到降低深度的作用了,是一个可能的 DDoS 攻击面。并且你并不需要拥有相关地址的私钥就能攻击,比如我直接给 Leaf node 2 和 3 所代表的账号转入微量的 ether 作为余额,就可以让 Verkle Tree 深度加深。解决方案是 key 不是根据 account address 直接生成,而是经过一次 hash,在 EIP 6800 里面有体现。 - 为什么 `Stem` node 的 Key 设计为 31 bytes,`Stem` node 不能做 root,那么至少它的第一个 byte 就可以不存,直接从 `Stem` 的 Path 获取?我的理解这是实现方式的差异,memoryDB 里面就是这么实现的:`[u8;31]` - 之前在某个资料里面看到,有了 `Verkle Tree` 之后,account 下面不需要给 contract account 额外新建 Storage tree 了,这是怎么实现的?Verkel tree 的潜在空间非常大($2^{256}$),因此按照下面的形式将 storage tree 嵌入到 Verkle tree 里面。 ```python= def get_tree_key_for_storage_slot(address: Address32, storage_key: int): if storage_key < (CODE_OFFSET - HEADER_STORAGE_OFFSET): pos = HEADER_STORAGE_OFFSET + storage_key else: pos = MAIN_STORAGE_OFFSET + storage_key return get_tree_key( address, pos // VERKLE_NODE_WIDTH, pos % VERKLE_NODE_WIDTH ) ``` ## 基于 Memory 的 Verkle Tree ```rust= // https://github.com/crate-crypto/rust-verkle/blob/7688f0aedfb147d3d391abfe8495e46c46d72ce0/verkle-trie/src/database/memory_db.rs#L6-L11 pub struct MemoryDb { pub leaf_table: HashMap<[u8; 32], [u8; 32]>, pub stem_table: HashMap<[u8; 31], StemMeta>, pub branch_table: HashMap<Vec<u8>, BranchChild>, } pub struct StemMeta { pub c_1: Element, pub hash_c1: Fr, pub c_2: Element, pub hash_c2: Fr, pub stem_commitment: Element, pub hash_stem_commitment: Fr, } pub enum BranchChild { Stem([u8; 31]), Branch(BranchMeta), } pub struct BranchMeta { pub commitment: Element, pub hash_commitment: Fr, } ``` 三种类型的 node,对应三个 table,leaf_table 对应 32 bytes 的 KV pair,stem_table 对应 31 bytes 的 key 和 $C_1, C_2, C_s, H_{C_1}, H_{C_2}, H_{C_s}$ 的 value。branch_table 对应可变长度的 bytes(Path)作为 key,该 Path 下的 node 可能是 `Stem`, 可能是 `Branch`。 ![image](https://hackmd.io/_uploads/B1cTziKGC.png) 上图为打印图 1 的结果,可以看到 5 个 leaf node,31 个 branch node,4 个 stem node,注意 branch_table 里面包含 31 个branch node 和 4 个 stem node。 ## Verkle Tree 的 Method ```rust= // https://github.com/crate-crypto/rust-verkle/blob/7688f0aedfb147d3d391abfe8495e46c46d72ce0/verkle-trie/src/lib.rs#L17-L50 pub type Key = [u8; 32]; pub type Value = [u8; 32]; pub trait TrieTrait { /// Inserts multiple values into the trie /// If the number of items is below FLUSH_BATCH, they will be persisted /// atomically /// This method will implicitly compute the new root fn insert(&mut self, kv: impl Iterator<Item = (Key, Value)>); /// Inserts a single value /// This method will implicitly compute the new root fn insert_single(&mut self, key: Key, value: Value) { self.insert(vec![(key, value)].into_iter()) } /// Gets the value at the `Key` if it exists /// Returns an none if it does not exist /// TODO: Find out if this method is ever needed fn get(&self, key: Key) -> Option<Value>; /// Returns the root of the trie fn root_hash(&self) -> Fr; /// Returns the root commitment of the trie fn root_commitment(&self) -> Element; /// Creates a verkle proof over many keys fn create_verkle_proof( &self, key: impl Iterator<Item = Key>, ) -> Result<proof::VerkleProof, ProofCreationError>; } ``` 实际上,以太坊的 State Trie 只需要实现 `TrieTrait`,即对外只需要提供以上几条功能就能完成任务,而不管底层使用 Verkle Tree 还是 SALT。这里我们先分析如果是 Verkle Tree 将如何实现。 - `insert`,给定一组 KV pair,将其保存。在 MemoryDB 中,即 insert 到 leaf_table - `insert_single`,直接的调用 insert - `get`,给定一个 K,返回 V。在 MemoryDB 中,get leaf_table 里面的 KV pair - `root_hash`,root 节点 commitment 的 hash - `root_commitment`,root 节点 commitment - `create_verkle_proof`,给定一组 K,返回一个 VerkleProof 除了 `insert_single`,接下来看一下剩下 5 个接口的实现。 首先看一下 Trie 的定义,Trie 有两个字段,storage 和 committer,他们的类型都是泛型,其中 Storage 类型没做要求, PolyCommit 类型还需要实现 Committer 这个 trait,我们先关注 storage。 ```rust= // https://github.com/crate-crypto/rust-verkle/blob/7688f0aedfb147d3d391abfe8495e46c46d72ce0/verkle-trie/src/trie.rs#L12-L15 pub struct Trie<Storage, PolyCommit: Committer> { pub storage: Storage, pub committer: committer, } ``` ### get 对于 `Storage` 类型实现了 `ReadWriteHigherDb` trait 的 Trie,因为 `ReadWriteHigherDb` trait 里面有 get_leaf 接口,因此直接调用 `self.storage.get_leaf(key)` ```rust= // https://github.com/crate-crypto/rust-verkle/blob/7688f0aedfb147d3d391abfe8495e46c46d72ce0/verkle-trie/src/trie.rs#L26-L28 impl<S: ReadWriteHigherDb, P: Committer> TrieTrait for Trie<S, P> { fn get(&self, key: crate::Key) -> Option<crate::Value> { self.storage.get_leaf(key) } } ``` 那么 MemoryDb 是怎么实现 `get_leaf` 的?直接从 MemoryDb 里面的 leaf_table 这个 HashMap get 就能拿到。 ```rust= // https://github.com/crate-crypto/rust-verkle/blob/7688f0aedfb147d3d391abfe8495e46c46d72ce0/verkle-trie/src/database.rs#L8 pub trait ReadWriteHigherDb: ReadOnlyHigherDb + WriteOnlyHigherDb {} // https://github.com/crate-crypto/rust-verkle/blob/7688f0aedfb147d3d391abfe8495e46c46d72ce0/verkle-trie/src/database/memory_db.rs#L59-L63 impl ReadOnlyHigherDb for MemoryDb { fn get_leaf(&self, key: [u8; 32]) -> Option<[u8; 32]> { self.leaf_table .get(&key) .map(|val| val.to_vec().try_into().unwrap()) } } ``` 实际上我觉得不必把类型转换来转换去,可以这样实现: In fact, I think there is no need to convert types. It can be achieved like this: ```rust= fn get_leaf(&self, key: [u8; 32]) -> Option<[u8; 32]> { self.leaf_table.get(&key).copied() } ``` **复杂度** 在 MemoryDB 中的实现,平均来说 get 是一个 $\mathcal{O}(1)$ 的操作,因为直接调 HashMap 里面的 get ### root_hash 和 root_commitment 同样,对于 S 类型实现了 ReadWriteHigherDb trait 的 Trie,因为 ReadWriteHigherDb trait 里面有 get_branch_meta 接口,因此直接调用 self.storage.get_branch_meta,输入 root node 的 path:`&[]`,就能获取 root node 的 meta 信息,包括 commitment 和 hash。 ```rust= impl<S: ReadWriteHigherDb, P: Committer> TrieTrait for Trie<S, P> { fn root_hash(&self) -> Fr { // This covers the case when the tree is empty // If the number of stems is zero, then this branch will return zero let root_node = self .storage .get_branch_meta(&[]) .expect("this should be infallible as every trie should have a root upon creation"); root_node.hash_commitment } fn root_commitment(&self) -> Element { // TODO: This is needed for proofs, can we remove the root hash as the root? let root_node = self.storage.get_branch_meta(&[]).unwrap(); root_node.commitment } } impl ReadOnlyHigherDb for MemoryDb { fn get_branch_meta(&self, key: &[u8]) -> Option<BranchMeta> { let branch_child = match self.branch_table.get(key) { Some(b_child) => b_child, None => return None, }; match branch_child { BranchChild::Stem(stem_id) => panic!( "expected branch meta data, however under this path there is a stem: {}", hex::encode(stem_id) ), BranchChild::Branch(b_meta) => Some(*b_meta), } } } ``` 而 MemoryDb 是怎么实现 get_branch_meta 的?因为存了 branch_table 这个 HashMap,可以直接拿,但是正如前面描述的, value 有两种类型,一种是 Branch,另一种是 Stem,如果是 Stem 的时候,直接 panic,因为你给的参数是 Stem 的 path,然后你想在这里拿 Branch。 **复杂度** 在 MemoryDB 中的实现,平均来说 root_hash 和 root_commitment 是一个 $\mathcal{O}(1)$ 的操作,因为直接调 HashMap 里面的 get ### insert insert 会比较复杂,我们结合示意图和代码来分析。 从下面的实现可以看到实现涉及到两个流程,把要 insert 的每个 KV pair 转化成指令组成的一系列小操作 instructions,然后再把 instructions 作用到 Tree 上。 ![image](https://hackmd.io/_uploads/rk5ax-oM0.png) ```rust= impl<S: ReadWriteHigherDb, P: Committer> TrieTrait for Trie<S, P> { fn insert(&mut self, kv: impl Iterator<Item = (crate::Key, crate::Value)>) { for (key_bytes, value_bytes) in kv { let ins = self.create_insert_instructions(key_bytes, value_bytes); self.process_instructions(ins); } } } ``` 但是需要注意的是,这里对 KV 的处理是串行,更新完一个 KV 后才能进行才一个 KV,这了有一个优化思路是可以提前对这些 KV 进行预处理,尽量让有公共 path 的 KV 同时处理,这样就可以减少公共 path 上的 stem node 和 branch node 的更新次数。 ![image](https://hackmd.io/_uploads/H1CqoUjMA.png) 指令有 3 种,`UpdateLeaf`,`ChainInsert`,`InternalNodeFallThrough` ```rust= pub enum Ins { UpdateLeaf { ... }, ChainInsert { ... }, InternalNodeFallThrough { ... }, } ``` #### UpdateLeaf ```rust= pub(crate) type BranchId = Vec<u8>; UpdateLeaf { key: [u8; 32], new_leaf_value: [u8; 32], depth: u8, branch_id: BranchId, branch_child_index: u8, }, ``` 更新 leaf node,会把从 root 到该 leaf node 的 path 上所有的 node 都更新一遍: 1. 更新该 leaf node 的 value 2. 更新它对应的 stem node 的 $C_1, C_2$ 中的某一个,以及对应的 hash 3. 更新 stem node 的父节点们(branch node)的 $C$ 和 $H_C$ 包含的信息: - `key`, `new_leaf_value`,用于第一条更新 - `depth`,该 leaf node 所处深度 - `branch_id`,该 leaf node 对应的 stem node 的 path - `branch_child_index`,该 leaf node 在 stem node 下的位置。 UpdateLeaf 的一个场景是如下图,假设在已有的 tree 的基础上新增 leaf node 6: ![image](https://hackmd.io/_uploads/SyBFoIhzA.png) 先看 `process_instructions` 里面是怎么处理 `UpdateLeaf` 指令的: ```rust= impl<Storage: ReadWriteHigherDb, PolyCommit: Committer> Trie<Storage, PolyCommit> { pub fn process_instructions(&mut self, instructions: Vec<Ins>) { for ins in instructions.into_iter().rev() { match ins { Ins::UpdateLeaf { key, new_leaf_value, depth, branch_id, branch_child_index, } => { let leaf_update = match self.update_leaf_table(key, new_leaf_value, depth) { Some(leaf_update) => leaf_update, None => { // No value was updated, early exit return; } }; let stem_update = self.update_stem_table(leaf_update, depth); self.update_branch_table(stem_update, branch_id, branch_child_index, depth); } ... } } } ... } ``` 如上图所示, 如果该 leaf node 之前有值,就依次调用了 `update_leaf_table`,`update_stem_table`,`update_branch_table`。如果返回是空,就只需要 `update_leaf_table`,感觉有点很奇怪。 看这个三个 update 的实现吧: **update_leaf_table**: ```rust= // Store the leaf, we return data on the old leaf, so that we can do the delta optimization // // If a leaf was not updated, this function will return None // else Some will be returned with the old value fn update_leaf_table( &mut self, key: [u8; 32], value: [u8; 32], depth: u8, ) -> Option<LeafUpdated> { let old_val = match self.storage.insert_leaf(key, value, depth) { Some(vec) => { // Check if they have just inserted the previous value // if so, we early exit and return None if vec == value { return None; } Some(vec) } None => None, }; Some(LeafUpdated { old_val, new_value: value.to_vec(), key: key.to_vec(), }) // Storing a leaf means we need to change the stem table too } impl WriteOnlyHigherDb for MemoryDb { fn insert_leaf(&mut self, key: [u8; 32], value: [u8; 32], _depth: u8) -> Option<Vec<u8>> { self.leaf_table .insert(key, value) .map(|old_val| old_val.to_vec()) } } ``` 调用了 storage 的 insert_leaf method。然后调用了 HashMap 的 insert。如果有值就返回值,没值就返回 none。 总结就是 update_leaf_table 如果 leaf node - 新的 KV 和之前的 KV 相同 - 之前该 K 不存在 V 就返回 None,UpdateLeaf 就只执行 update_leaf_table。否则返回更新结果:LeafUpdated,并继续执行 update_stem_table,update_branch_table。 **update_stem_table** - 为什么 branch node 里面只需要一个 commitment 就行,而 stem node 里面需要两个 commitment,他们不都是对 256 个子节点进行承诺吗?因为被承诺的对象不同。branch node 的子节点是 branch node 或者是 stem node,被承诺的对象是 branch node 或者 stem node 的 commitment 的 hash 值,取值范围为 `[0, order)`,直接承诺就行。而 stem node 的子节点是 KV,被承诺的对象是 V,取值范围 `[0,2^256)`,超出 order 的那一部分无法被承诺。因此需要两个 commitment。怎样划分呢?一个自然的想法是把子节点 256 个 V 的 [0,127] bit 放在第一个 $C_1$,[128,255] bit 放在 $C_2$,但是这样做,只要更新 V 里面的某一个值,就会导致需要更新 $C_1$, $C_2$。因此采用不同的划分办法: $C_1$ 用于 index 为 0-127 的 V 的承诺,$C_2$ 用于 index 为 128-255 的 V 的承诺。具体实现为: ![image](https://hackmd.io/_uploads/ByZLayWQA.png) 一个 stem node 下面有 $V_0$~$V_{255}$ 共 256 个 leaf node,每一个 $K_i, V_i$,假设为 [u8; 32],则: - `Vi_l = Fr::from_le_bytes_mod_order(V_i[0..=15]) + 2^128` - `Vi_h = Fr::from_le_bytes_mod_order(V_i[16..=31])` - `stem: [u8;31] = Ki[0..=30]` 如果 index 为 `i` 的 node 不存在,则 `Vi` 使用 0 代替。 假设 Commitment base 为 $(G_0, G_1,\dots, G_{255})$,实际上所有的 branch node 的 $C$,stem node 的 $C_1$, $C_2$ 都是用同一套 Commitment base。那么计算公式为: Assume that the Commitment base is $(G_0, G_1,\dots, G_{255})$. In fact, all $C$ of branch nodes and $C_1$ and $C_2$ of stem nodes use the same set of Commitment base. Then the calculation formula is: $$\begin{align*}C_1 &= \sum_{i=0}^{127} (V_i\_l\cdot G_{2i} + V_i\_h \cdot G_{2i+1}) \\ C_2 & = \sum_{i=0}^{127} (V_{i+128}\_l\cdot G_{2i} + V_{i+128}\_h \cdot G_{2i+1}) \\ H_{C} &= \text{group_to_field}(C) \\ C_{stem} &= 1 \cdot G_0 + \text{stem} \cdot G_1 + H_{C_1} \cdot G_2 + H_{C_2} \cdot G_3\end{align*}$$ 其中 $C_{stem}$ 的计算方式也是下图绿色节点的来源, The calculation method of $C_{stem}$ is also the source of the green node in the figure below, ![image](https://hackmd.io/_uploads/HyVsdyWXC.png) > from https://blog.ethereum.org/2021/12/02/verkle-tree-structure 具体的代码就不全部贴上来了,还挺多,可以自行去这里查看: https://github.com/crate-crypto/rust-verkle/blob/e07b17b0538fcd227c221d2388e5468417116f00/verkle-trie/src/trie.rs#L537-L675 需要注意的是,里面利用了椭圆曲线上点的加法同态来优化,假设更新了 $V_1$,那么: $C_1^\prime = C_1 + (V_1\_l^\prime - V_1\_l) \cdot G_2 + (V_1\_h^\prime - V_1\_h) \cdot G_3$ $C_2^\prime = C_2$ $C_{stem}^\prime = C_{stem} + (H_{C_1}^\prime - H_{C_1})*G_2$ 然后下面是我根据上面的公式一步步计算 $C_1, C_2, C_{stem}$ 和更新后的 trie 里面的相等: https://github.com/flyq/verkle_tree_example/blob/da877d33309ee036661c4408a519b27357dbd335/src/main.rs#L14-L47 最后调用 insert_stem 直接在 MemoryDb 的 stem_table 上 insert: ```rust= fn insert_stem(&mut self, key: [u8; 31], meta: StemMeta, _depth: u8) -> Option<StemMeta> { self.stem_table.insert(key, meta) } ``` **update_branch_table** $$\begin{align*} C_{branch} &= \sum_{i=0}^{255} H_{i} \cdot G_i \\ H_i &= \text{group_to_field}(C_i) \end{align*}$$ 其中 $C_i$ 表示该 branch node 下面的子节点的 commitment,可能是 branch 节点,可能是 stem 节点。 ```rust= fn update_branch_table( &mut self, stem_update: StemUpdated, branch_id: BranchId, branch_index: u8, depth: u8, ) -> Fr { let old_stem_hash = stem_update.old_val.unwrap_or_else(Fr::zero); let new_stem_hash = stem_update.new_val; let delta = new_stem_hash - old_stem_hash; let old_branch_comm = self.storage.get_branch_meta(&branch_id).unwrap().commitment; let delta_comm = self.committer.scalar_mul(delta, branch_index as usize); let updated_branch_comm = old_branch_comm + delta_comm; let hash_updated_branch_comm = group_to_field(&updated_branch_comm); // Update the branch metadata self.storage.insert_branch( branch_id.clone(), BranchMeta { commitment: updated_branch_comm, hash_commitment: hash_updated_branch_comm, }, depth, ); let mut branch_child_id = branch_id; branch_child_id.push(branch_index); self.storage .add_stem_as_branch_child(branch_child_id, stem_update.stem, depth); hash_updated_branch_comm } ``` 上面的代码实现同样利用了椭圆曲线点的加法同态性: $C_{branch}^\prime = C_{branch} + (H_i^\prime - H_i) G_i$ 然后下面是我根据原始的公式一步步计算 $C_{branch}$ 和 trie 里面的相等: https://github.com/flyq/verkle_tree_example/blob/d4fdd7f5565db9bcb5aaabe861c549847063598e/src/main.rs#L78-L112 #### InternalNodeFallThrough ```rust= InternalNodeFallThrough { branch_id: BranchId, branch_child_index: u8, child: BranchId, old_child_value: Option<Meta>, depth: u8, }, ``` ```rust= Ins::InternalNodeFallThrough { branch_id, branch_child_index, child, depth, old_child_value, } => { let new_branch_meta = self.storage.get_branch_meta(&child).unwrap(); let new_hash_comm = new_branch_meta.hash_commitment; let old_hash_comm = match old_child_value { Some(old_branch_meta) => old_branch_meta.into_branch().hash_commitment, None => Fr::zero(), }; let delta = new_hash_comm - old_hash_comm; let delta_comm = self .committer .scalar_mul(delta, branch_child_index as usize); let old_parent_branch_metadata = self.storage.get_branch_meta(&branch_id).unwrap(); let old_branch_comm = old_parent_branch_metadata.commitment; let updated_comm = old_branch_comm + delta_comm; let hash_updated_comm = group_to_field(&updated_comm); self.storage.insert_branch( branch_id, BranchMeta { commitment: updated_comm, hash_commitment: hash_updated_comm, }, depth, ); } ``` 这个用于更新一类特殊 branch node 的 C 和 Hc。这类 branch node 的子节点也是 branch node,**不是** stem node。并且是在 trie 已经更新好该 branch node 的子节点下,再更新本 branch node。 #### ChainInsert ```rust= ChainInsert { starting_depth: u8, chain_insert_path: Vec<u8>, parent_branch_node: BranchId, child_index: u8, old_leaf_index: u8, new_leaf_key: [u8; 32], new_leaf_value: [u8; 32], new_leaf_index: u8, }, ``` 这个指令用于更新某个路径下一系列的 branch node。源码就不贴了,在 https://github.com/crate-crypto/rust-verkle/blob/e07b17b0538fcd227c221d2388e5468417116f00/verkle-trie/src/trie.rs#L362-L485 ### create_verkle_proof ```rust= fn create_verkle_proof( &self, keys: impl Iterator<Item = [u8; 32]>, ) -> Result<crate::proof::VerkleProof, crate::errors::ProofCreationError> { use crate::proof::prover; prover::create_verkle_proof(&self.storage, keys.collect()) } pub fn create_verkle_proof<Storage: ReadOnlyHigherDb>( storage: &Storage, keys: Vec<[u8; 32]>, ) -> Result<VerkleProof, ProofCreationError> { if keys.is_empty() { return Err(ProofCreationError::EmptyKeySet); } let (queries, verification_hint) = create_prover_queries(storage, keys); // Commitments without duplicates and without the root, (implicitly) sorted by path, since the queries were // processed by path order let root_query = match queries.first() { Some(query) => query, None => return Err(ProofCreationError::ExpectedOneQueryAgainstRoot), }; let root_comm = root_query.commitment; let comms_sorted: Vec<_> = queries .iter() // Filter out the root commitment .filter(|query| query.commitment != root_comm) // Pull out the commitments from each query .map(|query| query.commitment) // Duplicate all commitments .dedup() .collect(); let mut transcript = Transcript::new(b"vt"); let proof = MultiPoint::open(CRS.clone(), &PRECOMPUTED_WEIGHTS, &mut transcript, queries); Ok(VerkleProof { comms_sorted, verification_hint, proof, }) } ``` verkle proof 依赖 IPA 的实现 #### ipa-multipoint ```rust= pub struct IPAProof { pub(crate) L_vec: Vec<Element>, pub(crate) R_vec: Vec<Element>, pub(crate) a: Fr, } ``` 推导一下 `ipa-multipoint/src/ipa.rs` 里面的 `create` function: 参数: - `transcript`,用于 Fiat-Shamir Transform,即将 IPA 中的交互式部分(产生随机挑战值),使用 random oracle 代替,transcript 扮演 random oracle,并且,验证者按照相同步骤也能获取到对应的挑战值,省去 transcript 的传输 - `crs`,Common Reference String, 这里用于保存 Pedersen commitments 对应的 base,即 256 + 1 个线性无关的点 Element - `a_vec`,向量 a - `a_comm`,向量 a 的承诺,等于 $\sum_{i=0}^{n-1} a_i G_i$ - `b_vec`,向量 b - `input_point`,多项式自变量的值,仅仅用于 transcript 逻辑: - $out\_point = \vec{a} \cdot \vec{b}$ - $Q = w \cdot Q$ - $C_{L_i} = a_R \cdot G_L + a_R \cdot b_L \cdot Q$ - $C_{R_i} = a_L \cdot G_R + a_L \cdot b_R \cdot Q$ - $L = (C_{L_0}, C_{L_1}, \dots, C_{L_{rounds}} )$ - $R = (C_{R_0}, C_{R_1}, \dots, C_{R_{rounds}} )$ - $a = a_L + x \cdot a_R$ - $b = b_L + x^{-1} b_R$ - $G = G_L + x^{-1} G_R$ 输出: proof: $L, R, a[0]$ **verify function** 参数: - `transcript`,和 create 一样 - `crs`,和 create 一样 - `b`,和 create 的 `b_vec` 一样 - `a_comm`, 和 create 一样 - `input_point`, input_point - `output_point`,结果,等于`f(input_point)`,也等于 $\vec{a} \cdot \vec{b}$ 参数和 create 相比,少了向量 a,多了 output_point 逻辑: - $a\_comm = a\_comm + w \cdot Q \cdot output\_point$ - $a\_comm = a\_comm + C_{L_i} \cdot x_i + C_{R_i} \cdot x^{-1}$ - $G_i = G_{L_{i-1}} + x_i^{-1} G_{R_{i-1}}$ - $b_i = b_{L_{i-1}} + x_i^{-1} b_{R_{i-1}}$ - $a\_comm ?= G_n[0] \cdot proof.a + proof.a \cdot b_n[0] \cdot w \cdot Q$ 以上总结就是: - $\mathcal{P}$ 知道 $\vec{a}$,$\vec{b}$, - $\mathcal{V}$ 知道 $\vec{b}$,不知道(不想知道)$\vec{a}$ - base 为 $\vec{G} = (G_0, G_1, \cdots, G_{n-1})$ 和 $U$ 过程: - $C_a = \vec{a} \cdot \vec{G}$ - $U = w \cdot U$ - $C_0 = C_a + \vec{a} \cdot \vec{b}U$ - $C_{L1}= \vec{a}_{R0} \cdot \vec{G}_{L0} + \vec{a}_{R0} \cdot \vec{b}_{L0}U$ - $C_{R1}= \vec{a}_{L0} \cdot \vec{G}_{R0} + \vec{a}_{L0} \cdot \vec{b}_{R0}U$ - $C_1 = C_0 + x_1 C_{L1} + x_1^{-1}C_{R1}$ 一直到 $k = \log n$: - $\vec{a}_k = \vec{a}_{Lk-1} + x_k \vec{a}_{Rk-1}$ - $\vec{b}_k = \vec{b}_{Lk-1} + x_k^{-1}\vec{b}_{Rk-1}$ - $C_{Lk}= \vec{a}_{Rk-1} \cdot \vec{G}_{Lk-1} + \vec{a}_{Rk-1} \cdot \vec{b}_{Lk-1}U$ - $C_{Rk}= \vec{a}_{Lk-1} \cdot \vec{G}_{Rk-1} + \vec{a}_{Lk-1} \cdot \vec{b}_{Rk-1}U$ - $C_k = C_{k-1} + x_kC_{Lk} + x_k^{-1}C_{Rk}$ 此外, - $\vec{a}_k, \vec{G}_k$ 长度均为 1 $C_a, \vec{a}_k, \vec{a} \cdot \vec{b}, \vec{C}_{L}, \vec{C}_{R}$ 发送给 $\mathcal{V}$ $\mathcal{V}$ 验证: - $C_k = C_a + \vec{a}\cdot \vec{b}U + \sum_{i=1}^k (x_iC_{Li} + x_i^{-1} C_{Ri})$ - $C_k \overset{\text{?}}{=} \vec{a}_k[0] \cdot \vec{G}_k[0] + \vec{a}_k[0]\vec{b}_k[0]U$ verify_multiexp: - $\sum_{i=1}^k (x_iC_{Li} + x_i^{-1} C_{Ri}) + C_a + (\vec{a}\cdot \vec{b} - \vec{a}_k[0]\vec{b}_k[0]) \cdot U - \vec{a}_k[0] \cdot \vec{G}_k[0] \overset{\text{?}}{=} 0$ ### MultiPointProof ```rust= pub struct MultiPointProof { open_proof: IPAProof, g_x_comm: Element, } ``` ```rust= pub struct PrecomputedWeights { // This stores A'(x_i) and 1/A'(x_i) barycentric_weights: Vec<Fr>, // This stores 1/k for k \in [-255, 255] inverted_domain: Vec<Fr>, } ``` - `PrecomputedWeights.barycentric_weights`: 在 verkle tree 场景下,有 256+256 个值,$A'(0), A'(1), \dots, A'(255), \frac{1}{A'(0)}, \frac{1}{A'(1)},\dots, \frac{1}{A'(255)}$, - `PrecomputedWeights.inverted_domain`: 在 verkle tree 的场景下,有 255+255 个值,$1,\frac{1}{2},\frac{1}{3},\dots,\frac{1}{255},-1, -\frac{1}{2}, -\frac{1}{3}, \dots, -\frac{1}{255}$ ```rust= pub struct LagrangeBasis { // We assume that the domain starts at zero, // so we only need to supply the upperbound domain: usize, values: Vec<Fr>, } ``` 对于由:$(0,f(0)), (1, f(1)), (2, f(2)), \dots, (255, f(255))$ 进行 Lagrange 插值出来的多项式 $f(x)$: - `LagrangeBasis.domain` 为 256 - `LagrangeBasis.values` 为 $(f(0), f(1), f(2), \dots, f(255))$ Methods: - `divide_by_linear_vanishing`, 计算新的多项式 $q(x) = \frac{f(x) - f(x_i)}{x-x_i}$ - `evaluate_in_domain`,当 $x_i \in \{0, 1, \dots, 255\}$ 时,直接拿 $f(x_i)$ - `evaluate_lagrange_coefficients`, 当 $x_i$ 不在 domain 时,返回新的 `LagrangeBasis`,$f(z) = A(z)\sum_{i=0}^{N-1}\frac{f(i)}{A^\prime(x_i)(z-x_i)}$, - $A(z) = \prod_{i=0}^{255} z-i$ - a vector, which contains: $res_i = \frac{A(z)}{A^\prime(i)(z-i)}$ ```rust= pub struct ProverQuery { pub commitment: Element, pub poly: LagrangeBasis, // TODO: Make this a reference so that upstream libraries do not need to clone // Given a function f, we use z_i to denote the input point and y_i to denote the output, ie f(z_i) = y_i pub point: usize, pub result: Fr, } ``` 前面的依赖数据结构看完了,看看 open 是怎么实现的 `open()` -> `MultiPointProof` 输入: - `crs`,Pedersen 承诺的 base - `precomp`,预计算一些 $A^\prime, +- \frac{1}{i}$ - `transcript`, 用于 Fait-Shamir变换 - `queries`,数组,长度为 $t$,代表 $t$ 个多项式 $f_i(x)$ 逻辑: - 获取随机 r,生成 $1, r, r^2, r^3, \dots, r^{t-1}$ - 分组: $(i, (r^i, f_i(x)))$,其中 $i \in [0..t]$ - aggregated_polynomial 有 256 项,假设为 $h_i(x)$ - $h_i(j) = f_i(j) \cdot r^i$ - $h_i(x) = A(x) \cdot r^i \cdot \sum_{j=0}^{255} f_i(j) \cdot \frac{1}{A^\prime \cdot (x-j)}$ 对同一点打开的多项式: `HashMap<point, Vec<(f_i, r^i)>>`进行聚合: - $ag\_poly = \sum_i r^i \cdot \vec{f_i}$ 然后我们有由 $(point, ag\_poly)$ 组成的 Vector 用 $d_i(x)$ 表示 $ag\_poly$ $g(x) = \sum_{i = 0}^h \frac{d_i(x) -d_i(point)}{x - point}$ h 等于有多少个不同的 point,也是上面 Vector 的长度。 $C_g = \sum_{i=0}^{255} g(i) \cdot G_i$ ## Reference - https://blog.ethereum.org/2021/12/02/verkle-tree-structure - https://vitalik.eth.limo/general/2021/06/18/verkle.html - https://ihagopian.com/posts/anatomy-of-a-verkle-proof - https://verkle.dev - https://verkle.info - https://eips.ethereum.org/EIPS/eip-6800 - https://dankradfeist.de/ethereum/2021/07/27/inner-product-arguments.html - https://dankradfeist.de/ethereum/2021/06/18/pcs-multiproofs.html