Sin7Y Tech Review (23): Verkle Tree For ETH

Compared to Merkle Tree, Verkle Tree has been improved a lot in the Proof size as a critical part of the ETH2.0 upgrade. When it comes to data with the size of one billion, the Merkle Tree proof will take 1kB, while the Verkle Tree proof only needs no more than 150 bytes.

The Verkle Tree concept was proposed in 2018 (More details can be found here.). The 23rd tech review by Sin7Y will demonstrate the principle of Verkle Tree.

Merkle Tree

Before digging into Verkle Tree, it is important to understand the Merkle Tree concept. Merkle Tree is a common Accumulator, which can be used to prove that one element exists in the Accumulator as shown in the following figure:

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Figure 1: Merkle Tree

To prove that (key: value) = (06: 32) exists in the Tree (green-marked), the Proof must contain all red-marked nodes in the figure. The verifier can calculate the Root according to the method shown in Figure 1, and then compare it with the expected Root (grey-marked).

It is predictable that with the depth and width of the Tree getting greater, the Proof size will also be larger (for branched-2, the complexity is

l o g_{2} (n)

, while for branched-k, it is

(k - 1) l o g_{k} (n)

). As well, the verifier needs to conduct a great number of Hash calculations from the basic level to the upper level. Thus, the increase in the depth and width of the Tree leads to the increase in the verifier's workload, which is unacceptable.

Verkle Tree - Concept

Simply increasing the Tree's width can reduce its depth, but the proof size will not be reduced because the proof size changes from the original

l o g_{2} (n)

(k - 1) l o g_{k} (n)

. That is, for each layer, the prover needs to provide (

k - 1

) additional node information. In the paper Verkle Tree, John Kuszmaul mentioned a scheme to reduce the proof complexity to

l o g_{k} (n)

. If we set

k = 1024

, the proof will be reduced by

l o g_{2} (k) = 10

times.

The Verkle Tree design is shown as follows:

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Figure 2. Verkle Tree

For each node, there are two pieces of information: (1) value; (2) existence proof

π

. For example, the green-marked

(H (k, v), π_{03})

shows that

H (k, v)

exists in commitment

C_{0}

and

π_{03}

is the proof of this argument. Similarly,

(C_{0}, π_{0})

means that

C_{0}

exists in commitment

C_{R o o t}

and

π_{0}

is the proof of this argument.

In the paper Verkle Tree, the method of such existence commitment is called Vector commitment. If the Vector commitment scheme is used to execute existence commitment for the original data, the Proof with

O (1)

complexity will be obtained while the complexity of Construct Proof and that of update Proof are

O (n^{2}), O (n)

, respectively.

Therefore, to strike a balance, the K-ary Verkle Tree scheme is used in the paper Verkle Tree (as shown in Figure 2) to make the complexity of construct Proof, update Proof and Proof be

O (k n), O (k l o g_{k} n), O (l o g_{k} n)

, respectively. The specific performance comparison is shown in Table 1:

Scheme/Op	Construct	Update Proof	Proof Size
Merkle Tree	$O (n)$	$O (l o g_{2} n)$	$O (l o g_{2} n)$
K-ary Merkle Tree	$O (l o g_{2} n)$	$O (k l o g_{k} n)$	$O (k l o g_{k} n)$
Vector Commitment	$O (n^{2})$	$O (l o g_{2} n)$	$O (1)$
K-ary Verkle Tree	$O (k n)$	$O (k l o g_{k} n)$	$O (l o g_{k} n)$

In this article, we are not intended to provide a detailed introduction to some specific vector commitment schemes, which John Kuszmaul has explained well in his paper. Fortunately, compared to the vector commitment, we have a more efficient tool called polynomial commitment. Given a group of the coordinate set

(c_{0}, c_{1}, . . . ., c_{n})

and a value set

(y_{1}, y_{2}, . . . ., y_{n})

, you can construct a polynomial (Lagrange interpolation), satisfying

P (c_{i}) = y_{i}

, and conduct a commitment to this polynomial. KZG10 and IPA are common polynomial commitment schemes (At this point, the commitment is a point on the elliptic curve, typically between 32 and 48 bytes in size).

Basis

KZG for single point

Take KZG10 as an example. For the polynomial

P (x)

, we use

[P (s)]_{1}

to represent the polynomial commitment. As we all know, for

P (x)

, if

P (z) = y

, then

(x - z) | (P (x) - y)

.That is to say, if we set

Q (x) = (P (x) - y) / (x - z)

, then

Q (x)

is a polynomial.

Now, we generate a proof for

P (x)

to satisfy

P (z) = y

. That is, calculate

[Q (s)]_{1}

and send it to the verifier, who needs to verify:

e ([Q (s)]_{1}, [s - z]_{2}) = e ([P (s)]_{1} - [y]_{1}, H)

Because s is a randomly-chosen point in the finite domain F, the probability of the prover's succesful evil behavior is degree(Q)/P (Schwartz–Zippel lemma).

KZG for multiple points

Now, we want to prove that the values of the polynomial

P (x)

(z_{0}, z_{1}, . . . ., z_{k - 1})

are

(y_{1}, y_{2}, . . . ., y_{k - 1})

, respectively. Therefore, we need to define two polynomials:

I (x) : I (z_{i}) = y_{i} f o r a l l i \in [0, k)

V (x) = (x - z_{1}) (x - z_{2}) . . . (x - z_{k - 1})

According to the description mentioned above, we need to satisfy

V (x) | (P (x) - I (x))

. That is, there exists a polynomial

Q (x)

, satisfying:

Q (x) * V (x) = P (x) - I (x)

Therefore, the Prover needs to provide the commitments

[P (s)]_{1}, [Q (s)]_{1}

for

P (x)

and

Q (x)

, and send the commitments to the verifier. The Verifier calculates

[I (s)]_{1}, [V (s)]_{2}

locally, and verifies the equation:

e ([Q (s)]_{1}, [V (s)]_{2}) = e ([P (s)]_{1} - [I (s)]_{1}, H)

It is clear that the proof size is constant no matter how many Points there are. If we choose the BLS12-381 curve, the Proof size is only 48 bytes, which is very efficient.

Verkle Tree - ETH

Compared to Merkle Tree, in which to prove the existence of an element, the prover still needs to provide the proof with

O (l o g_{2} n)

size, Verkle Tree has made a great improvement on the proof size.

Let's check out a simple example of Verkle Tree.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Figure 3. Verkle Tree for ETH

It can be seen that, similar to the Merkle Patricia Tree structure, nodes can be divided into three types - empty node, inner node, and leaf node. The width of each inner node tree is 16 (0000->1111 in hexadecimal). To prove that the state of the leaf node is (0101 0111 1010 1111 -> 1213), we need to conduct the commitment to Inner node A and Inner node B:

Prove that the value of Inner node B's commitment is hash (0101 0111 1010 1111, 1213) at index 1010.
Prove that the value of Inner node A's commitment is hash (cm_B) at index 0111.
Prove that the value of node Root's commitment is hash (cm_A) at index 0101;

Use

C_{0} (I n n e r n o d e B), C_{1} (I n n e r n o d e A), C_{2} (R o o t)

to represent the commitments mentioned above and correspond them to the polynomial

f_{i} (x)

respectively. Therefore, the Prover needs to prove:

$f_{0} (w^{0 b 1010}) = H (0101011110101111, 1213)$
$f_{1} (w^{0 b 0111}) = H (C_{0})$
$f_{2} (w^{0 b 0101}) = H (C_{1})$

Compress for multiple polys

To make it easy, we will use

z_{i}

to represent the index. The prover needs to prove that for the polynomial set

f_{0} (x), f_{1} (x), . . . ., f_{m - 1} (x)

, it satisfies the following conditions at points

z_{0}, z_{1}, . . . ., z_{m - 1}

, respectively:

f_{0} (z_{0}) = y_{0} f_{1} (z_{1}) = y_{1} . . . f_{m - 1} (z_{m - 1}) = y_{m - 1}

According to the previous description (KZG for Single point), for each polynomial, there exists a quotient polynomial satisfying:

q_{0} (x) = (f_{0} (x) - y_{0}) / (x - z_{0}) q_{1} (x) = (f_{1} (x) - y_{1}) / (x - z_{1}) . . . q_{m - 1} (x) = (f_{m - 1} (x) - y_{m - 1}) / (x - z_{m - 1})

Prover needs to conduct the commitment to the original polynomial and the quotient polynomial, and send it to the Verifier:

[f_{0} (s)]_{1}, [q_{0} (x)]_{1} [f_{1} (s)]_{1}, [q_{1} (x)]_{1} . . . [f_{m - 1} (s)]_{1}, [q_{m - 1} (x)]_{1}

Verifier executes the verification:

e ([q_{0} (s)]_{1}, [s - z_{0}]_{2}) = e ([f_{0} (s)]_{1} - [y_{0}]_{1}, H) e ([q_{1} (s)]_{1}, [s - z_{1}]_{2}) = e ([f_{1} (s)]_{1} - [y_{1}]_{1}, H) . . . e ([q_{m - 1} (s)]_{1}, [s - z_{m - 1}]_{2}) = e ([f_{m - 1} (s)]_{1} - [y_{m - 1}]_{1}, H)

It is obvious that we don't want the verifier to execute so many pairing operations (it's expensive). Therefore, we need to execute a Compress as follows.

Generate some random numbers

r_{0}, r_{1}, . . . ., r_{m - 1}

, and gather the above quotient polynomials together:

g (x) = r_{0} q_{0} (x) + r_{1} q_{1} (x) + . . . + r_{m - 1} q_{m - 1}

Assume that if and only if each

q_{i} (x)

is a polynomial,

g (x)

will be a polynomial (The probability that the fractions between

q_{i} (x)

exactly offset is very low because of random numbers).

The prover conducts commitment to the polynomial

g (x)

and send

[g (s)]_{1}

to the verifier.

Next, let the verifier believe that

[g (s)]_{1}

is the commitment to the polynomial

g (x)

Observe the form of the polynomial

g (x)

, which can be written as:

g (x) = \sum_{i = 0}^{m - 1} r_{i} \frac{(f_{i} (x) - y_{i})}{(x - z_{i})} = \sum_{i = 0}^{m - 1} r_{i} \frac{f_{i} (x)}{(x - z_{i})} - \sum_{i = 0}^{m - 1} r_{i} \frac{y_{i}}{(x - z_{i})}

Choose a value

t

randomly and there is:

g (t) = \sum_{i = 0}^{m - 1} r_{i} \frac{(f_{i} (t) - y_{i})}{(t - z_{i})} = \sum_{i = 0}^{m - 1} r_{i} \frac{f_{i} (t)}{(t - z_{i})} - \sum_{i = 0}^{m - 1} r_{i} \frac{y_{i}}{(t - z_{i})}

Define the polynomial:

h (x) = \sum_{i = 0}^{m - 1} r_{i} \frac{f_{i} (x)}{(t - z_{i})}

Its commitment can be calculated with the following method:

[h (s)]_{1} = \sum_{i = 0}^{m - 1} r_{i} \frac{C_{i}}{(t - z_{i})}

Then the value of the polynomial

h (x) - g (x)

at point

t

is:

h (t) - g (t) = \sum_{i = 0}^{m - 1} r_{i} \frac{y_{i}}{(t - z_{i})}

Set

y = \sum_{i = 0}^{m - 1} r_{i} \frac{y_{i}}{(t - z_{i})}

Calculate the quotient polynomial

q (x) = (h (x) - g (x) - y) / (x - z)

Calculate the commitment

π = [q (s)]_{1} = [(h (s) - g (s) - y) / (s - t)]_{1}

, and send it to the verifier.

Verifier performs the following verification:

Calculate

$y = \sum_{i = 0}^{m - 1} r_{i} \frac{y_{i}}{(t - z_{i})}$

$[h (s)]_{1} = \sum_{i = 0}^{m - 1} r_{i} \frac{C_{i}}{(t - z_{i})}$
Verify

$e ([h (s)]_{1} - [g (s)]_{1} - [y]_{1}, [1]_{2}) = e (π, [(s - t)]_{2})$

Key properties

Any number of points can be proved using this scheme without changing the Proof size. (For each commitment, there is a proof
$π$ .)
The value of
$y_{i}$ do not need to be provided explicitly as it is the hash of the next layer value.
The value of
$x_{i}$ do not need to be provided explicitly as it can be judged from Key.
The public information used includes the key/value pair to be proved and the corresponding commitments from the basic level to the upper level.

References

Dankrad Feist, "PCS multiproofs using random evaluation," https://dankradfeist.de/ethereum/2021/06/18/pcs-multiproofs.html, accessed: 2022-05-10.
Vitalik Buterin, "Verkle trees," https://vitalik.ca/general/2021/06/18/verkle.html, accessed: 2022-05-10.
John Kuszmaul, "Verkle Trees," https://math.mit.edu/research/highschool/primes/materials/2018/Kuszmaul.pdf, accessed: 2022-05-10.

Sin7Y Tech Review (23): Verkle Tree For ETH

Merkle Tree

Verkle Tree - Concept

Basis

KZG for single point

KZG for multiple points

Verkle Tree - ETH

Compress for multiple polys

Key properties

References

Read more

Circle FFT and its implementation

Babylon: An Extremely Appealing Protocol Scaling Bitcoin to Secure PoS Chains

Onis Litepaper of Ola

A basic of Binius: Towers of Binary Fields