# Sin7Y Tech Review (23): Verkle Tree For ETH
Compared to Merkle Tree, Verkle Tree has been improved a lot in the Proof size as a critical part of the ETH2.0 upgrade. When it comes to data with the size of one billion, the Merkle Tree proof will take 1kB, while the Verkle Tree proof only needs no more than 150 bytes.
The Verkle Tree concept was proposed in 2018 (More details can be found [here](https://math.mit.edu/research/highschool/primes/materials/2018/Kuszmaul.pdf).). The 23rd tech review by Sin7Y will demonstrate the principle of Verkle Tree.
## Merkle Tree
Before digging into Verkle Tree, it is important to understand the Merkle Tree concept. Merkle Tree is a common Accumulator, which can be used to prove that one element exists in the Accumulator as shown in the following figure:
![](https://i.imgur.com/RMzKhzv.png)
Figure 1: Merkle Tree
To prove that (key: value) = (06: 32) exists in the Tree (green-marked), the Proof must contain all red-marked nodes in the figure. The verifier can calculate the Root according to the method shown in Figure 1, and then compare it with the expected Root (grey-marked).
It is predictable that with the depth and width of the Tree getting greater, the Proof size will also be larger (for branched-2, the complexity is $log_2(n)$, while for branched-k, it is $(k - 1)log_k(n)$). As well, the verifier needs to conduct a great number of Hash calculations from the basic level to the upper level. Thus, the increase in the depth and width of the Tree leads to the increase in the verifier's workload, which is unacceptable.
## Verkle Tree - Concept
Simply increasing the Tree's width can reduce its depth, but the proof size will not be reduced because the proof size changes from the original $log_2(n)$ to $(k -1)log_k(n)$. That is, for each layer, the prover needs to provide ($k-1$) additional node information. In the paper Verkle Tree, John Kuszmaul mentioned a scheme to reduce the proof complexity to $log_k(n)$. If we set $k = 1024$, the proof will be reduced by $log_2(k) = 10$ times.
The Verkle Tree design is shown as follows:
![](https://i.imgur.com/VILZyK2.png)
Figure 2. Verkle Tree
For each node, there are two pieces of information: (1) value; (2) existence proof $\pi$. For example, the green-marked $(H(k,v), \pi_{03})$ shows that $H(k,v)$ exists in commitment $C_0$ and $\pi_{03}$ is the proof of this argument. Similarly, $(C_0, \pi_0)$ means that $C_0$ exists in commitment $C_{Root}$ and $\pi_{0}$ is the proof of this argument.
In the paper Verkle Tree, the method of such existence commitment is called [Vector commitment](https://eprint.iacr.org/2011/495.pdf). If the Vector commitment scheme is used to execute existence commitment for the original data, the Proof with $O(1)$ complexity will be obtained while the complexity of Construct Proof and that of update Proof are $O(n^2), O(n)$, respectively.
Therefore, to strike a balance, the K-ary Verkle Tree scheme is used in the paper Verkle Tree (as shown in Figure 2) to make the complexity of construct Proof, update Proof and Proof be $O(kn), O(klog_k\ n), O(log_k\ n)$, respectively. The specific performance comparison is shown in Table 1:
| Scheme/Op | Construct | Update Proof |Proof Size|
| ------------| -------- | -------- |--------|
| Merkle Tree | $$O(n)$$ | $$O(log_2\ n)$$ |$$O(log_2\ n)$$|
|K-ary Merkle Tree|$$O(log_2\ n)$$|$$O(klog_k\ n)$$|$$O(klog_k\ n)$$|
|Vector Commitment|$$O(n^2)$$|$$O(log_2\ n)$$|$$O(1)$$|
|K-ary Verkle Tree|$$O(kn)$$|$$O(klog_k\ n)$$|$$O(log_k\ n)$$|
In this article, we are not intended to provide a detailed introduction to some specific vector commitment schemes, which John Kuszmaul has explained well in his paper. Fortunately, compared to the vector commitment, we have a more efficient tool called polynomial commitment. Given a group of the coordinate set $(c_0,c_1,....,c_n)$ and a value set $(y_1,y_2,....,y_n)$, you can construct a polynomial ([Lagrange interpolation](https://en.wikipedia.org/wiki/Lagrange_polynomial)), satisfying $P(c_i) = y_i$, and conduct a commitment to this polynomial. [KZG10](https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html) and [IPA](https://twitter.com/VitalikButerin/status/1371844878968176647) are common polynomial commitment schemes (At this point, the commitment is a point on the elliptic curve, typically between 32 and 48 bytes in size).
## Basis
### KZG for single point
Take KZG10 as an example. For the polynomial $P(x)$, we use $[P(s)]_1$ to represent the polynomial commitment. As we all know, for $P(x)$, if $P(z) = y$, then $(x-z)|(P(x) - y)$.That is to say, if we set $Q(x)= (P(x) - y) / (x - z)$, then $Q(x)$ is a polynomial.
Now, we generate a proof for $P(x)$ to satisfy $P(z) = y$. That is, calculate $[Q(s)]_1$ and send it to the verifier, who needs to verify:
$$e([Q(s)]_1, [s - z]_2) = e([P(s)]_1 - [y]_1, H)$$
Because s is a randomly-chosen point in the finite domain F, the probability of the prover's succesful evil behavior is degree(Q)/P ([Schwartz–Zippel lemma](https://en.wikipedia.org/wiki/Schwartz%E2%80%93Zippel_lemma)).
### KZG for multiple points
Now, we want to prove that the values of the polynomial $P(x)$ on $(z_0,z_1,....,z_{k-1})$ are $(y_1,y_2,....,y_{k-1})$, respectively. Therefore, we need to define two polynomials:
$$I(x) : I(z_i) = y_i \ for\ all\ i\in [0,k)$$
$$V(x) = (x-z_1)(x-z_2)...(x-z_{k-1})$$
According to the description mentioned above, we need to satisfy $V(x)|(P(x)-I(x))$. That is, there exists a polynomial $Q(x)$ , satisfying:
$$Q(x) * V(x) = P(x) - I(x)$$
Therefore, the Prover needs to provide the commitments $[P(s)]_1, [Q(s)]_1$ for $P(x)$ and $Q(x)$, and send the commitments to the verifier. The Verifier calculates $[I(s)]_1, [V(s)]_2$ locally, and verifies the equation:
$$e([Q(s)]_1, [V(s)]_2) = e([P(s)]_1 - [I(s)]_1, H)$$
It is clear that the proof size is constant no matter how many Points there are. If we choose the BLS12-381 curve, the Proof size is only 48 bytes, which is very efficient.
## Verkle Tree - ETH
Compared to Merkle Tree, in which to prove the existence of an element, the prover still needs to provide the proof with $O(log_2n)$ size, Verkle Tree has made a great improvement on the proof size.
Let's check out a simple example of Verkle Tree.
![](https://i.imgur.com/wAreLZA.jpg)
Figure 3. Verkle Tree for ETH
It can be seen that, similar to the Merkle Patricia Tree structure, nodes can be divided into three types - empty node, inner node, and leaf node. The width of each inner node tree is 16 (0000->1111 in hexadecimal). To prove that the state of the leaf node is (0101 0111 1010 1111 -> 1213), we need to conduct the commitment to Inner node A and Inner node B:
1. Prove that the value of Inner node B's commitment is hash (0101 0111 1010 1111, 1213) at index 1010.
2. Prove that the value of Inner node A's commitment is hash (cm_B) at index 0111.
3. Prove that the value of node Root's commitment is hash (cm_A) at index 0101;
Use $C_0(Inner node B), C_1(Inner node A), C_2(Root)$ to represent the commitments mentioned above and correspond them to the polynomial $f_i(x)$ respectively. Therefore, the Prover needs to prove:
1. $$f_0(w^{0b1010}) = H(0101 0111 1010 1111, 1213)$$
2. $$f_1(w^{0b0111}) = H(C_0)$$
3. $$f_2(w^{0b0101}) = H(C_1)$$
### Compress for multiple polys
To make it easy, we will use $z_i$ to represent the index. The prover needs to prove that for the polynomial set $f_0(x), f_1(x),....,f_{m-1}(x)$, it satisfies the following conditions at points $z_0,z_1,....,z_{m-1}$, respectively:
$$f_0(z_0) = y_0 \\
f_1(z_1) = y_1 \\
. \\
. \\
.\\
f_{m-1}(z_{m-1}) = y_{m-1}$$
According to the previous description (KZG for Single point), for each polynomial, there exists a quotient polynomial satisfying:
$$q_0(x) = (f_0(x) - y_0) / (x - z_0)\\
q_1(x) = (f_1(x) - y_1) / (x - z_1) \\
. \\
. \\
.\\
q_{m-1}(x) = (f_{m-1}(x) - y_{m-1}) / (x - z_{m-1})$$
Prover needs to conduct the commitment to the original polynomial and the quotient polynomial, and send it to the Verifier:
$$[f_0(s)]_1 ,[q_0(x)]_1 \\
[f_1(s)]_1 ,[q_1(x)]_1 \\
. \\
. \\
.\\
[f_{m-1}(s)]_1 ,[q_{m-1}(x)]_1$$
Verifier executes the verification:
$$e([q_0(s)]_1, [s-z_0]_2) = e([f_0(s)]_1 - [y_0]_1, H) \\
e([q_1(s)]_1, [s-z_1]_2) = e([f_1(s)]_1 - [y_1]_1, H) \\
. \\
. \\
.\\
e([q_{m-1}(s)]_1, [s-z_{m-1}]_2) = e([f_{m-1}(s)]_1 - [y_{m-1}]_1, H) $$
It is obvious that we don't want the verifier to execute so many pairing operations (it's expensive). Therefore, we need to execute a Compress as follows.
Generate some random numbers $r_0,r_1,....,r_{m-1}$, and gather the above quotient polynomials together:
$$g(x) = r_0q_0(x) + r_1q_1(x)+...+r_{m-1}q_{m-1}$$
Assume that if and only if each $q_i(x)$ is a polynomial, $g(x)$ will be a polynomial (The probability that the fractions between $q_i(x)$ exactly offset is very low because of random numbers).
The prover conducts commitment to the polynomial $g(x)$ and send $[g(s)]_1$ to the verifier.
Next, let the verifier believe that $[g(s)]_1$ is the commitment to the polynomial $g(x)$.
Observe the form of the polynomial $g(x)$, which can be written as:
$$g(x) = \sum_{i=0}^{m-1}r_i \frac {(f_i(x) - y_i)}{ (x -z_i)} = \sum_{i=0}^{m-1}r_i \frac {f_i(x)}{ (x -z_i)} - \sum_{i=0}^{m-1}r_i \frac {y_i}{ (x -z_i)}$$
Choose a value $t$ randomly and there is:
$$g(t) = \sum_{i=0}^{m-1}r_i \frac {(f_i(t) - y_i)}{ (t -z_i)} = \sum_{i=0}^{m-1}r_i \frac {f_i(t)}{ (t -z_i)} - \sum_{i=0}^{m-1}r_i \frac {y_i}{ (t -z_i)}$$
Define the polynomial:
$$h(x) = \sum_{i=0}^{m-1}r_i \frac {f_i(x)}{ (t -z_i)}$$
Its commitment can be calculated with the following method:
$$[h(s)]_1 = \sum_{i=0}^{m-1}r_i \frac {C_i}{ (t -z_i)}$$
Then the value of the polynomial $h(x) - g(x)$ at point $t$ is:
$$h(t) - g(t) = \sum_{i=0}^{m-1}r_i \frac {y_i}{ (t -z_i)}$$
Set $y = \sum_{i=0}^{m-1}r_i \frac {y_i}{ (t -z_i)}$.
Calculate the quotient polynomial $q(x) = (h(x) - g(x) - y) / (x -z)$.
Calculate the commitment $\pi = [q(s)]_1 = [(h(s) - g(s) - y) / (s - t)]_1$, and send it to the verifier.
Verifier performs the following verification:
1. Calculate
$$y = \sum_{i=0}^{m-1}r_i \frac {y_i}{ (t -z_i)}$$
$$[h(s)]_1 = \sum_{i=0}^{m-1}r_i \frac {C_i}{ (t -z_i)}$$
2. Verify
$$e( [h(s)]_1 - [g(s)]_1 -[y]_1, [1]_2) = e (\pi, [(s -t)]_2)$$
## Key properties
1. Any number of points can be proved using this scheme without changing the Proof size. (For each commitment, there is a proof $\pi$.)
2. The value of $y_i$ do not need to be provided explicitly as it is the hash of the next layer value.
3. The value of $x_i$ do not need to be provided explicitly as it can be judged from Key.
4. The public information used includes the key/value pair to be proved and the corresponding commitments from the basic level to the upper level.
## References
1. Dankrad Feist, "PCS multiproofs using random evaluation," https://dankradfeist.de/ethereum/2021/06/18/pcs-multiproofs.html, accessed: 2022-05-10.
2. Vitalik Buterin, "Verkle trees," https://vitalik.ca/general/2021/06/18/verkle.html, accessed: 2022-05-10.
3. John Kuszmaul, "Verkle Trees," https://math.mit.edu/research/highschool/primes/materials/2018/Kuszmaul.pdf, accessed: 2022-05-10.