Vectors, w.r.t. commitment schemes, are simply arrays or ordered multi-sets, (sets with possible repetition of elements), unlike the regular definition of elements of linear spaces. For example, will be a vector in this context (left to right order). Between sets and vectors, we could have also considered commitments to multi-sets without a specified order, but this is fairly uncommon in the literature.
Vector commitments are similar to set commitments, but values of the elements are encoded into the digest (and opening proofs) along with their particular order in the vector. This means that the digest (and opening proofs) must be specific to not only to each element value but also to their particular positioning, and in other words, not be invariant to permutations of the input vector.
This requirement is a consequence of the idea of positional binding, which we take as a central principle in vector commitments, in addition to zero-knowledge partial openings as was the case in set commitments. In some of the literature, opening proof size is stated as a requirement for a commitment scheme to be considered a vector commitment, which we do not agree with. We will ignore this characterization.
The figure above shows different types of vector commitments discussed in the literature. We will first focus on commitments based on cryptographic hash functions (CRHFs), both a simple scheme and then Merkle trees, which build upon the simple scheme to improve performance.
After that we will discuss Verkle trees that replace CRHFs in a Merkle tree with something more efficient (usually based on pairings, such as polynomial commitments) to increase the arity of the tree and improve performance even further (pairing-based vector commitment are reductions of polynomial commitments and will be discussed in the next chapter in more depth).
Finally, we will discuss an example vector commitment based on groups of unknown order and conclude with the reduction of vector commitments to set commitments. We will be skipping a discussion of lattice-based vector commitments for the present, although they are interesting in their own right.
In an earlier chapter, we focused on single-element commitment schemes that are based on cryptographic hash functions, which can be used to commit to any type of message as long as we accept that opening any part of the message requires us to reveal the entire message to the public (no zero-knowledge partial openings).
In this section, we discuss how we can commit to a vector using a single-element commitment scheme based on CRHFs. This is not a true vector commitment scheme in our classification but serves as a building block for Merkle trees which are true vector commitments that fix the problem of partial openings (along with performance improvement), as we will discuss shortly.
Modifying our previous discussion slightly to adapt it to vector inputs as messages, we can redefine the four algorithms of the commitment scheme as follows:
A version with zero-knowledge partial openings (a true vector commitment):
The above commitment scheme is how vector commitments using CRHFs are usually presented in the literature, which we showed are not hiding in their partial openings. In order to make partial openings hiding and turn the above scheme into a true vector commitment in our classification, we can make a simple modification:
This is a true vector commitment in our classification.
Efficiency: The space/communication complexity of the digest is a single hash output in both versions. The opening proof , on the other hand, is of size or (units depend on either the size of the message elements in the first version or the hash output size in the second version). This opening proof size is considered inefficient and Merkle/Verkle Trees are designed to improve upon the second version with sublinear opening proofs.
The complexities of the commitment algorithm and the opening algorithm are with (usually expensive) hash functions dominating the computation.
The scheme does not require a trusted-setup algorithm, as the parameters of the hash function are completely public.
Merkle trees are very ubiquitous commitment schemes found across many real world applications (e.g. blockchains like Ethereum).
Merkle trees build upon the simple vector hash commitment above (second version), by introducing a divide & conquer approach through a tree structure, a common method in all computer science to improve linear space/time requirements to sublinear ones. More specifically, Merkle trees improve opening proof sizes to from (specifically ) that was the case for the simple vector hash commitment.
Merkle trees accomplish this by taking the simple vector hash commitment and using it in each of the nodes of a binary tree, such that at each node of the tree, a commitment to a vector of two elements occur. Let us give the formal definition as usual:
Efficiency: The space/communication complexity of the digest is a single hash output. The opening proof , on the other hand, is of size hashes.
The complexities of the commitment algorithm and the opening algorithm are with (usually expensive) hash functions dominating the computation.
The scheme does not require a trusted-setup algorithm, as the parameters of the hash function are completely public.
A k-ary Merkle tree commitment?: An interesting way to think about Merkle tree vector commitments is to consider them as building recursively on simple vector hash commitments at each node to create a tree that incorporates larger and larger vectors. As simple vector hash commitments have sized opening proofs for a vector of size , and in the case of the individual nodes of a Merkle tree, is simply , the opening proofs of individual nodes are of size .
Since the authentication path length for any vector element is , the opening proof for the entire Merkle tree is the product . However, this can be quite large when the vectors to commit to have billions of elements. A more ideal opening size would be with . This seemingly can be accomplished by increasing the arity of the tree (width) to reduce its depth.
On the other hand, simply making a -ary Merkle tree instead of a binary Merkle tree does not result in what we seek in terms of opening proof sizes. The reason for this is the fact that each node will now have to commit to a -element vector using a simple vector hash commitment, which would have a size opening proof and the overall opening proof for the -ary Merkle tree would be since the authentication paths are of size . This is not a improvement over since keeps growing as increases.
From this perspective, Merkle Tree based vector commitments can be thought as vector commitments that are building upon a less efficient (w.r.t. opening proof sizes) vector commitment scheme by integrating it into a tree structure to improve performance. However, they are not able to improve performance significantly further beyond , as long as they are stuck with simple vector hash commitments as their building block, as increasing their arity decreases their opening proof performance in this case.
Verkle tree commitments are the same as Merkle tree commitments with a crucial difference: they employ constant opening size vector commitments at each node instead of simple vector hash commitments, which allows them to achieve an overall opening proof size of , when they are configured as -ary trees with . Usually the node level commitment scheme that Verkle trees use are reductions of polynomial commitment schemes that use pairings (e.g. Kate commitment) that we will discuss in the next chapter.
An important question to ask is, if we can already build vector commitment schemes with constant size opening proofs (i.e. ), why would we ever use Verkle trees which have opening proofs of size ?
The answer is that there are multiple aspect to the performance of a commitment scheme as we have been discussing. Aside from the opening proof size, opening algorithm time complexity can be a big issue when dealing with vectors of huge sizes. The constant opening proof commitment schemes used at the node level usually have high asymtotic complexity opening algorithms and cannot handle vectors with very large sizes.
Therefore, if the arity of the tree is chosen to be much smaller than while being significantly larger than in a Verkle Tree, a sweet spot for the performance can be found where the overall opening proofs are of size while at the same time the opening algorithm complexity at each node is based on instead of .
The formal definition of the commitment algorithms of Verkle trees are the same as the algorithms of the Merkle tree commitments, expect that the hash function is replaced by the algorithms of the polynomial commitment scheme simulating the node level vector commitment through the reduction we will introduce in the next chapter.
Ethereum: Verkle trees are important in an upcoming update to the the Ethereum blockchain, which uses high-arity (i.e. ) Merkle Patricia tries to commit to state, storage, transaction and receipt data. Therefore, Merkle Patricia tries in Ethereum currently have high opening proof sizes and will benefit from using a Verkle tree commitment scheme (which will allow even a higher arity to find that sweet spot for the performance) after the update is finalized in the future.
Finally, we give an example of a constant size opening proof vector commitment that is based on groups of known order that uses the idea of accumulators introduced in the previous chapter on set commitments.
As before, we assume the maximum number of vector elements the scheme can handle is . We will consider a set of unique primes whose product is . Now consider unique integers such that , which represents positions/indeces in an encoded manner.
By looking at as numerical values, we can, in principle, computationally determine (using factorization) their index as they are missing a very particular prime in their constitution.
Armed with this encoded representation, we can define the algorithms of the commitment scheme:
Efficiency: The space/communication complexity of the digest and any opening proof are single group elements (whose size depends on the group size and representation).
The complexity of the commitment algorithm is group operations in , using fast exponentiation such as the square & multiply algorithm. The opening algorithm has the same complexity.
The complexity of the verification algorithm is group operations in , using fast exponentiation.
The scheme does not require a trusted-setup algorithm.
Given a vector commitment , it is straightforward to simulate a set commitment through the following reduction:
This is again extremely simplistic, as we are just wrapping the set message as a vector and using the vector commitment algorithms to simulate the set commitment. Any ordering of the set elements works, as long as we keep it consistent.
In the next chapters, we move on to commitments to functions of various kinds which can be thought of vectors with infinite elements.
Intro: A Hierarchical View of Cryptographic Commitment Schemes
Previous: Set Commitments
Next: Univariate Polynomial Commitments