Try   HackMD

Hi there, it's Agnish again. After racking my head over ALOT of topics, and even more github issues, I am finally able to narrow down, what I REALLY want to contribute and work on for this EPF Cohort.

So, this week I have dived deep into Verkle trees and how the roadmap of Stateless Ethereum is being planned. I'm doing an overall update of whatever I've learnt in this domain for the past week.

General Background

I started off with Vitalik's first Verkle Tree post, he repeatedly referred to the first by Verkle Tree paper by Kuszmaul back in 2018. He also discussed about the merits of VKT over Merkle trees with the most important ones being :

  • Merkle trees generate proofs that have data of each of the siblings in the particular node while Verkle trees don't. In practice, a merkle tree has 15 siblings for each node. Whereas, the average depth of a Verkle Tree is roughly ~4, VKTs specify more on the index of the of the particular child and it's path leading towards the root.

So here's a basic comparison table stating the exhaustive details of PROOF SIZES for both MKT and VKTs:

Trie Specifics in each proof Levels Proof Size Cryptographic Reqs
Merkle Patricia Trie leaf data + 15 siblings, 32 bytes for each level ~7 ~3.5 MB for 1,000 leaves Collision-resistant hash functions
Verkle Trie leaf data + Polynomial Comm + value + index of child, which is 32 bytes, another 32 bytes + 1 byte + small constant size data ~4 ~150 KB for 1,000 leaves Earlier (for Eth1 State): KZG Polynomial Comms (based on Billinear Pairings) with BLS12_381 curve, Now (for Eth2): Pedersen Commitments with Bandersnatch/Banderwagon

Mathematical Background

In real world scenarios, we can use a polynomial commitment as a vector commitment to generate proof for the root of a Verkle Trie.

Polynomial Commitments let's us hash a polynomial, and make a proof for it's evaluation and any point, usually we call that an opening.

First we need to agree upon a standardized set of coordinates, let us call it

[c1,c2,c3,....,cn], given a list of final points
[y1,y2,y3,...,yn]
, we can commit to the polynomial P such that
P(ci)=yi
for all
i∈[1...n]
. We can find this particular polynomial using Lagrange's Interpolation method.

Lagrange Polynomial Interpolation

This method is used to interpolate a given dataset and find the lowest possible degree polynomial that passes through the points of the dataset.

Formal Def:

The Lagrange Interpolation for points/nodes

[y0,y1,...,yk], then the linear combination of these for points
[yi]
should look like -

βˆ‘j=0kyjβˆ—lj(x)

where,

lj(x) is the Lagrangian Basis for the polynomial.

lj(x)=∏mβ‰ j,m>=0xβˆ’xmxjβˆ’xm

To understand this better, you can tweak with this, Desmos graphing calculator.

Here, I have plotted 4 arbitrary points,

a,b,c,d and have plotted the Lagrangian Basis for point a.

This was the given polynomial, with

La:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More β†’

And hence, the curve would look like this:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More β†’

This Lagrange Interpolation method can be used for computing polynomial commitments for both KZG and Bulletproof-style commitments (IPA).

Verkle Trie implementations for Eth1

VKT for Eth1 was largely based on KZG Polynomial Commitment Schemes. Initially, in a Merkle Tree of depth d, we could compute a commitment to a vector for elements in the Merkle Tree, which is,

a0,a1,....,a2dβˆ’1.

By the definition of Polynomial Commitments, we define a polynomial

p(X) of degree
n
is a function:

βˆ‘i=0npiXi where
pi
are the coefficients of the corresponding polynomial.

Also, here the degree of the polynomial is the depth of the tree

n=2dβˆ’1, and by assigning
ai=pi
and computing the proof of the coefficients, the prover wants to show the verifier that
p(z)=y
for some
z
. The prover can do this by sending the verifier all such
pi
and the verifier can compute indeed,
p(z)=y
.

Some features using KZG

  • The commitment is nothing but a group element of an elliptic curve group (usually BLS12_381) that is based on pairings, generally 48 bytes long.

Need for the Trusted Setup

A short notation intro to Elliptic Curves and Pairings

Let

G1 and
G2
be 2 elliptic curves with a pairing
e:G1Γ—G2βˆ’>GT
. Let
p
be the order of
G1
and
G2
, and
G
and
H
be generators of
G1
and
G2
. We can represent that in the following ways:

[x]1=xG∈G1 and
[x]2=xH∈G2

for any setup

x∈Fp.

Now, let us assume that we have a trusted setup, so that for some secret

s, the elements
[si]1
and
[si]2
are available to both the prover and the verifier for all
i=0,...,nβˆ’1
.

Now, with ecc it's impossible to find out what actually is from the trusted setup group elements. It's a number in the finite field, but the prover cannot actually know the original number. They can only perform a certain set of group operations.

Hence, given that if

p(X)=βˆ‘i=0npiXi is a polynomial, then the prover can compute the following:

[p(s)]1=[βˆ‘i=0npisi]=βˆ‘i=0npi[si]1
Hence this can be represented as a group element.

In this way, everyone can evaluate a polynomial without knowing really what

s is, they only get an elliptic curve point such that
[p(s)]1=p(s)G
.

Verkle tries for Eth2 - Pedersen Commitments and Bandersnatch Modulus.

Definition:

Let

g and
h
be elements of
Gq
such that nobody knows what
logg(h)
is. These elements can either be chosen by a trusted center, when the system is initialized, or by (some of) the participants using a randomizing protocol.

These entire process DOES NOT need a trusted setup ceremony like KZG.

Note that here

p and
q
denotes large primes such that
q
divides
pβˆ’1
, and
Gq
is a unique subgroup of
Zpβˆ—
of the order
q
, and
g
is a generator of
Gq
. It can easily be tested if an element
a∈Zq
is in
Gq
since,
a∈Gq<=>aq=1.

As any element
b
!= 1 in
Gq
generates the same group, the discrete log of
a∈Gq
with repsect to the base
b
is defined and it is denoted by
logb(a)
.

The commiter commits herself to an

s∈Zq by choosing
t∈Zq
at random and she computes the following,

E(s,t)=gsht.

Such a commitment can be again revealed later by revealing

s and
t
. Then the following theorem is very easy to prove and shows that
E(s,t)
reveals no information about
s
at all, and that the committer cannot open a commitment to
s
as
sβ€²!=s
unless she can find what
logg(h)
.

What I am left to cover so far:

I am yet to read more about the merits of Bandersnatch Modulus over BLS12_381 curve, how it is a 42% improvement over the Jubjub curve and how it's implemented using the GLV multi-scalar multiplication method, and the deeper math behind how Multiproofs are computed using Inner Product Arguments.

Hoping to cover all of that by this week! I hope to dive deeper and work on the Hyperledger Besu implementation Verkle Tries in this github repo.
I hope draft improvements of this Github repo in my project proposal as well. Also, looking forward to understanding more on State Expiry, Address Size Extensioning for several periods of data on Ethereum, and finally Portal Networks.

Happy Reading!