Try   HackMD

Vector Commitment Scheme - High Level

Familiarity with binary merkle trees is assumed.

Commitment Scheme

Commitment schemes in general are at the heart of every scenario where you want to prove something to another person. Lets list two examples from our daily lives.

Lottery

Before you are able to see the winning results of a lottery, you must first commit to your choice of numbers. This commitment will allow you to prove that you did indeed choose these numbers before seeing the results. This commitment is often referred to as a lottery ticket.

We cannot trust people to be honest about their results, or more generously, we cannot trust people to attest to the truth; they could have bad memory.

If you trust everyone to tell the truth or if it is not advantageous for a rational actor to lie, then you might be able to omit the commitment scheme. This is not usually the case, especially in a scenario where it may be impossible to find out the truth.

Sometimes we cannot even assume that actors will behave rationally!

There are certain features that a lottery ticket must have like not being able to edit it after the fact. Many of these features draw a parallel with vector commitment schemes.

Registration and Login

A lot of social applications require you to prove your digital identity to use them. There are two stages;

  • Registration: This is where you put in your details such as your email address, name, password and phone number. You can think of this as a commitment to a particular identity.
  • Login: This is where you use the email address and password from registration to prove that you are the same person. Ideally, only you know these login details.

Without the registration phase, you would not be able to later prove your digital identity.

As you can see, commitment schemes are crucial where one needs to prove something after an event has happened. This analogy also carries over to the cryptographic settings we will consider.

Why do we need a commitment scheme?

  • For the lottery example, one could call it a ticket commitment scheme.
  • For the registration example, one could call it an identity commitment scheme.
  • For verkle trees and indeed merkle trees, we need a vector commitment scheme.

Analogously, this means that we need to commit to a vector and later attest to values in that vector.

As a spoiler, with verkle/merkle trees, when one is tasked with proving that a particular value is in the tree, we can reduce this to many instances of proving that particular values are in a vector.

Brief overview of a vector

Think of a vector as a list of items where the length of the vector and the position of each item is also a part of the definition.

Example 1

v1=<a,b,c>

v2=<b,a,c>

Here the vectors

v1 and
v2
are not equal because the first and second items in the vectors are not equal. This may seem obvious but it is not true for mathematical objects such as sets.

Example 2

v1=<1,2,3>

v2=<1,2,3,3>

Here the vectors are also not equal, because their lengths are not equal. Note also that as a set, they would be considered equal.

We will later see that vector commitment schemes, must encode both of these properties (position of each item and length of the vector) when committing to a vector.

Binary Merkle Tree as a vector commitment scheme

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Figure1: Image of a binary merkle tree

First bring your attention to

Ha,Hb in Figure 1. One can define some function
fc
which takes both of these values as inputs and transforms them into a single output value
Hab
.

Encoding the position

We specify that

fc(Ha,Hb) should not be equal to
fc(Hb,Ha)
. This means that the function
fc
implicitly encodes the positions of its input values. In this case
Hab
conveys the fact that
Ha
is first and
Hb
is second.

Encoding the length

Another property of

fc is that
fc(Ha,Hb,k)
should not equal
fc(Ha,Hb)
, meaning that
fc
should also encode the number of inputs, which is conversely the length of the vector. (Even if
k
has a value of
0
)

Elaborating, if there are two items as inputs, one should not get the same answer when there are three items. No matter what the third input is.

Committing to a vector

We now ask the reader to view

Ha and
Hb
as two elements in a vector; ie
<Ha,Hb>
. The function
fc
allows us to commit to such a vector, encoding the length of the vector and the position of each element in the vector. In the above merkle tree, one can repeatedly use
fc
until we arrive at the top of the tree. The final output at the top is denoted as the root.

By induction, we can argue that the root is summary of all of the items below it. Whether the summary is succinct, depends on

fc.

Popular choices for

fc include the following hash functions: sha256, blake2s and keccak. But one could just as easily define it to be the concatentation of the input.

Opening a value

Say we are given the root

Habcdefgh in Figure 1 and we want to show that
Hb
is indeed a part of the tree that this root represents.

To show that

Hb is in the tree with root
Habcdefgh
, we can do it by showing:

  • Habcd
    is the first element in the vector
    <Habcd,Hefgh>
    and applying
    fc
    to this vector yields
    Habcdefgh
  • Then we can show that
    Hab
    is the first element in the vector
    <Hab,Hcd>
    and applying
    fc
    to the vector yields
    Habcd
  • Finally, we can show that
    Hb
    is the second element in the vector
    <Ha,Hb>
    and applying
    fc
    to the vector yields
    Hab

We now define a new function

fo to show that an element is in a certain position in a vector and that when
fc
is applied to said vector, it yields an expected value

fo takes four arguments:

  • A commitment to a vector
    Cv
    . This is the output of
    fc
    on a vector.
  • An index,
    i
  • An element in some vector,
    ev
  • A proof
    π
    attesting to the fact that
    Cv
    is the commitment to
    v
    , and
    ev
    is the element at index
    i
    of
    v
    .

fo returns true if for some vector
v
:

  • Cv
    is the commitment of
    v
    . i.e.
    fc(v)=Cv
  • The i'th element in
    v
    is indeed
    ev
    . i.e.
    v[i]=ev

Example

Lets use

fo to demonstrate us checking:

Habcd is the first element in the vector
<Habcd,Hefgh>
and applying
fc
to this vector yields
Habcdefgh

Cv=Habcdefgh
i=0
(zero indicates the first element)
ev=Habcd

if

fo(Habcdefgh,0,Habcd,π) returns true, then we can be sure that
Habcdefgh
commits to some vector
v
using
fc
and at the first index of that vector, we have the value
Habcd
.

We must trust that

Habcdefgh was computed correctly, ie it corresponds to the tree in question. This is outside the scope of verkle/merkle trees in general and is usually handled by some higher level protocol.

What is

π ?

For a binary merkle tree,

π would be
Hefgh
. Now given
Habcd
and
π
, we can apply
fc
to check that
Cv=fc(Habcd,π)
. This also allows us to check that
Habcd
is the first element in the vector.

Proof cost For Binary Merkle Tree

For a binary merkle tree, our vectors have size

2 and so
π
only has to contain 1 extra element to show
Cv=fc(a,π)
. If we had a hexary merkle tree, where our vector had 16 elements,
π
would need to contain 15 elements. Hence the proof grows in proportion to the vector sizes that we are using for merkle trees.

Even more disparaging, is that fact that there is not just one

π. In our case there is actually 3
π
to show
Hb
is in the tree. The overall proof size thus also grows, with the amount of vectors/levels/depths.

In general, we can compute the overall proof size by first defining the number of items in the tree, this is also known as the tree width

tw, we then define the size of our vectors, this is sometimes referred to as the node width
nw
: We can compute the proof size with :
lognw(tw)(nw)=depthnw

Verkle Tree Improvements

The problem with

fc being a hash function like sha256 in the case of a merkle tree is that in order to attest to a single value that was hashed, we need to reveal everything in the hash. The main reason being that these functions by design do not preserve the structure of the input. For example,
sha256(a)
+
sha256(b)
!=
sha256(a + b)
.

Fortunately, we only require a property known as collision resistance and there are many other vector commitment schemes in the literature which are more efficient and do not require all values for the opening. Depending on the one you choose, there are indeed different trade offs to consider.

Some trade offs to consider are:

  • Proof creation time; How long it takes to make
    π
  • Proof verification time; How long it takes to verify
    π

Moreover, with some of the schemes in the wider literature, it is possible to aggregate many proofs together so one only needs to verify a single proof

π. With this in mind, it may be unsurprising that with verkle trees, the node width/vector size has increased substantially, since the proof size in the chosen scheme does not grow linearly with the node width.

Summary

  • Merkle trees use a vector commitment scheme which is really inefficient.
  • Verkle trees use a commitment scheme which has better efficiency for proof size and allows one to minimise the proof size using aggregation.
  • Verkle trees also increase the node width, which decreases the depth of the tree.