Vector Commitments

<style> .markdown-body code, .markdown-body tt { color: #eee; background-color: rgba(20, 230, 230, 0.36); } /* a, .open-files-container li.selected a { color: #5EB7E0; } */ </style> Vectors, w.r.t. commitment schemes, are simply arrays or ordered multi-sets, (sets with possible repetition of elements), unlike the regular definition of elements of linear spaces. For example, $[2, 3, 1, 2, 5]$ will be a vector in this context (left to right order). Between sets and vectors, we could have also considered commitments to multi-sets without a specified order, but this is fairly uncommon in the literature. Vector commitments are similar to set commitments, but values of the elements are encoded into the digest (and opening proofs) along with their particular order in the vector. This means that the digest (and opening proofs) must be specific to not only to each element value but also to their particular positioning, and in other words, not be invariant to permutations of the input vector. This requirement is a consequence of the idea of *positional binding*, which we take as a central principle in vector commitments, in addition to zero-knowledge partial openings as was the case in set commitments. In some of the literature, $\mathrm{O}(1)$ opening proof size is stated as a requirement for a commitment scheme to be considered a vector commitment, which we do not agree with. We will ignore this characterization. ![](https://i.imgur.com/aESRNCy.jpg) The figure above shows different types of vector commitments discussed in the literature. We will first focus on commitments based on cryptographic hash functions (CRHFs), both a simple scheme and then *Merkle trees*, which build upon the simple scheme to improve performance. After that we will discuss *Verkle trees* that replace CRHFs in a Merkle tree with something more efficient (usually based on pairings, such as polynomial commitments) to increase the arity of the tree and improve performance even further (pairing-based vector commitment are reductions of polynomial commitments and will be discussed in the next chapter in more depth). Finally, we will discuss an example vector commitment based on groups of unknown order and conclude with the reduction of vector commitments to set commitments. We will be skipping a discussion of lattice-based vector commitments for the present, although they are interesting in their own right. --- Simple vector hash commitments --- In an [earlier chapter](/@petrichor/commitmentSingle), we focused on single-element commitment schemes that are based on cryptographic hash functions, which can be used to commit to any type of message as long as we accept that opening any part of the message requires us to reveal the entire message to the public (no zero-knowledge partial openings). In this section, we discuss how we can commit to a vector using a single-element commitment scheme based on CRHFs. This is not a *true* vector commitment scheme in our classification but serves as a building block for Merkle trees which are true vector commitments that fix the problem of partial openings (along with performance improvement), as we will discuss shortly. Modifying our previous discussion slightly to adapt it to vector inputs $m=[m_1, m_2, ..., m_n]$ as messages, we can redefine the four algorithms of the commitment scheme as follows: - The setup algorithm outputs the specific hash function to use based on some security parameters $s$, such as the number of bits the hash function will output etc., $\Phi(s)=\rho=H$. - The commitment algorithm first concatenates the elements of the message vector in the order they appear $m_1 || m_2 || ... || m_n$, and then computes the hash of the concatenated message as its digest to be published, $C( \rho, m)= d = H(m_1 || m_2 || ... || m_n)$. Note that if we reorder the elements of the input vector, the digest would change also, i.e. digest is sensitive to the order and not permutation-invariant. - The opening algorithm is where this scheme disqualifies in our classification as it makes the entire message public. The opening proof for the element $i(m) = m_i$ is $O(\rho, m, i) = \pi_i = (m_\text{pre}, m_\text{post})$ where $m_\text{pre} = m_1 || m_2 || ... || m_{i-1}$ is the prefix and $m_\text{post} = m_{i+1} || m_{i+2} || ... || m_{n}$ is the postfix of the specific part of the vector $m_i$. Clearly, a public $(m_\text{pre}, m_\text{post})$ is not zero-knowledge/hiding for the rest of the vector. - The verification algorithm simply checks that a candidate message $m_i^*$ has the same hash as the original message given in the public digest $d$, when combined with the opening proof prefix and postfix, $V(\rho, d, i, \pi_i, m_i^*) = (H(m_\text{pre}||m_i^*||m_\text{post}) \stackrel{?}{=} d)$. --- **A version with zero-knowledge partial openings (a true vector commitment)**: The above commitment scheme is how vector commitments using CRHFs are usually presented in the literature, which we showed are not hiding in their partial openings. In order to make partial openings hiding and turn the above scheme into a true vector commitment in our classification, we can make a simple modification: - Instead of producing a digest for concatenated elements directly with the hash function, $d=H(m_1 || m_2 || ... ||m_n)$, we can first produce all the hashes of all individual elements $H(m_i)$, and then produce the digest of *their* concatenation instead, $d=H(H(m_1) || H(m_2) || ... || H(m_n))$. This initial hash computation for each element hides them in the opening proof, and as we will see shortly, is exactly how Merkle trees deal with this problem as well. - The opening algorithm for the $i$th element now produces the prefix and the postfix for the *hashes of the elements* instead of the elements themselves, $O(\rho, m, i) = \pi_i = (h_\text{pre}, h_\text{post})$ where $h_\text{pre} = H(m_1) || H(m_2) || ... || H(m_{i-1})$ and $h_\text{post} = H(m_{i+1}) || H(m_{i+2}) || ... || H(m_{n})$. As we can see, knowing $(h_\text{pre}, h_\text{post})$ does not reveal the rest of the message. - The verification algorithm checks $V(\rho, d, i, \pi_i, m_i^*) = (H(h_\text{pre}||H(m_i^*)||h_\text{post}) \stackrel{?}{=} d)$. Note that only the value of the $i$th element is revealed to the verifier if the candidate is validated. This is a true vector commitment in our classification. --- **Efficiency**: The space/communication complexity of the digest $d$ is a single hash output in both versions. The opening proof $\pi_i$, on the other hand, is of size $n-1$ or $\mathrm{O}(n)$ (units depend on either the size of the message elements in the first version or the hash output size in the second version). This opening proof size is considered inefficient and Merkle/Verkle Trees are designed to improve upon the second version with sublinear opening proofs. The complexities of the commitment algorithm and the opening algorithm are $\mathrm{O}(n)$ with (usually expensive) hash functions dominating the computation. The scheme does not require a trusted-setup algorithm, as the parameters of the hash function are completely public. --- Merkle trees --- Merkle trees are very ubiquitous commitment schemes found across many real world applications (e.g. blockchains like Ethereum). Merkle trees build upon the simple vector hash commitment above (second version), by introducing a divide & conquer approach through a tree structure, a common method in all computer science to improve linear space/time requirements to sublinear ones. More specifically, Merkle trees improve opening proof sizes to $\mathrm{O}(\log_2 n)$ from $\mathrm{O}(n)$ (specifically $n-1$) that was the case for the simple vector hash commitment. Merkle trees accomplish this by taking the simple vector hash commitment and using it in each of the nodes of a *binary tree*, such that at each node of the tree, a commitment to a vector of *two* elements occur. Let us give the formal definition as usual: - The setup algorithm outputs the specific hash function to use based on some security parameters $s$, such as the number of bits the hash function will output etc. as well as the maximum vector size allowed $n_\text{max}$, $\Phi(s)=\rho=(H,n_\text{max})$. ![](https://i.imgur.com/uFEDKxi.png) - Once again, the commitment algorithm first produces all the hashes of all individual elements of the vector $H(m_i)$ to protect the opening proofs from revealing information and sets these hash values as the leaves of the tree (bottom layer). Then at each level above, nodes commit to their children hash values by applying the hash function to their concatenation $H(h_\text{left}||h_\text{right})$ which can be considered a commitment to a 2-element vector. The root hash is the digest of the Merkel tree commitment $d = C(\rho, m)$. Note that if we reorder the elements of the input vector, the digest would change also, i.e. digest is sensitive to the order and not permutation-invariant. ![](https://i.imgur.com/zfQ09HK.png) - The opening algorithm uses the index function $i$ to determine the location of a given item $m_i$ in the tree. This is then used to determine the *authentication path* of $m_i$, which consists of the nodes on the *path* from $H(m_i)$ to the root. For the example in the figure above, the authentication path for $m_3$ is indicated in blue and its nodes need to be calculated in the validation algorithm to get to the digest at the top. In order to do so, we will need the *siblings* of the nodes in the authentication path in the opening proof for $m_3$, $O(\rho, m, 3) = \pi_3 = (h_4, h_9, h_{14})$, which are indicated in yellow. Note that these values can be calculated as long as we know the entire message, which is the case in the opening algorithm as usual. We write this as $O(\rho, m, i) = \pi_i = (v_1, v_2, ..., v_{\log_2 n_\text{max}})$ where $v_j$ corresponds to the sibling node value at level $j$ from the bottom. - The verification algorithm checks whether a candidate message $m_i^*$ can be used to arrive at the top-hash/digest if used in conjunction with the sibling node values in the opening proof $\pi_i$. This is accomplished by first computing its hash $H(m_i^*)$ and then using it to compute all the node values in the authentication path using the sibling values in sequence. This is a bit hard to write formally but if we assume we right/left concatenate appropriately depending on the position $i$, we can write it as $V(\rho, d, i, \pi_i, m_i^*) = (H(H(H(m_i^*)||v_1)|| ...)||v_{\log_2 n_\text{max}})\stackrel{?}{=} d)$. --- **Efficiency**: The space/communication complexity of the digest $d$ is a single hash output. The opening proof $\pi_i$, on the other hand, is of size $\mathrm{O}(\log_2 n_\text{max})$ hashes. The complexities of the commitment algorithm and the opening algorithm are $\mathrm{O}(n)$ with (usually expensive) hash functions dominating the computation. The scheme does not require a trusted-setup algorithm, as the parameters of the hash function are completely public. --- **A k-ary Merkle tree commitment?**: An interesting way to think about Merkle tree vector commitments is to consider them as building recursively on simple vector hash commitments at each node to create a tree that incorporates larger and larger vectors. As simple vector hash commitments have $k-1$ sized opening proofs for a vector of size $k$, and in the case of the individual nodes of a Merkle tree, $k$ is simply $2$, the opening proofs of individual nodes are of size $k-1=1$. Since the authentication path length for any vector element is $\log_2 n_\text{max}$, the opening proof for the entire Merkle tree is the product $1 \cdot \log_2 n_\text{max}$. However, this can be quite large when the vectors to commit to have billions of elements. A more ideal opening size would be $\log_k n_\text{max}$ with $k >> 2$. This seemingly can be accomplished by increasing the arity of the tree (width) to reduce its depth. ![](https://i.imgur.com/Luuak3z.png) On the other hand, simply making a $k$-ary Merkle tree instead of a binary Merkle tree does not result in what we seek in terms of opening proof sizes. The reason for this is the fact that each node will now have to commit to a $k>>2$-element vector using a simple vector hash commitment, which would have a $k-1$ size opening proof and the overall opening proof for the $k$-ary Merkle tree would be $(k-1) \cdot \log_k n_\text{max}$ since the authentication paths are of size $\log_k n_\text{max}$. This is not a improvement over $\log_2 n_\text{max}$ since $\frac{(k-1)\cdot\log_k n_\text{max}}{\log_2 n_\text{max}} = (\log 2)\frac{(k-1)}{\log k}$ keeps growing as $k$ increases. From this perspective, Merkle Tree based vector commitments can be thought as vector commitments that are building upon a less efficient (w.r.t. opening proof sizes) vector commitment scheme by integrating it into a tree structure to improve performance. However, they are not able to improve performance significantly further beyond $\mathrm{O}(\log_2 n_\text{max})$, as long as they are stuck with simple vector hash commitments as their building block, as increasing their arity decreases their opening proof performance in this case. --- Verkle trees --- Verkle tree commitments are the same as Merkle tree commitments with a crucial difference: they employ *constant* opening size vector commitments at each node instead of simple vector hash commitments, which allows them to achieve an overall opening proof size of $\text{log}_k n_\text{max}$, when they are configured as $k$-ary trees with $k>>2$. Usually the node level commitment scheme that Verkle trees use are reductions of polynomial commitment schemes that use pairings (e.g. Kate commitment) that we will discuss in the next chapter. An important question to ask is, if we can already build vector commitment schemes with constant size opening proofs (i.e. $\mathrm{O}(1)$), why would we ever use Verkle trees which have opening proofs of size $\mathrm{O}(\log_k n_\text{max})$? The answer is that there are multiple aspect to the performance of a commitment scheme as we have been discussing. Aside from the opening proof size, opening algorithm time complexity can be a big issue when dealing with vectors of huge sizes. The constant opening proof commitment schemes used at the node level usually have high asymtotic complexity opening algorithms and cannot handle vectors with very large sizes. Therefore, if the arity of the tree $k$ is chosen to be much smaller than $n_\text{max}$ while being significantly larger than $2$ in a Verkle Tree, a sweet spot for the performance can be found where the overall opening proofs are of size $\mathrm{O}(\log_{k>>2} n_\text{max})$ while at the same time the opening algorithm complexity at each node is based on $k<<n_\text{max}$ instead of $n_\text{max}$. The formal definition of the commitment algorithms of Verkle trees are the same as the algorithms of the Merkle tree commitments, expect that the hash function is replaced by the algorithms of the polynomial commitment scheme simulating the node level vector commitment through the reduction we will introduce in the next chapter. **Ethereum**: Verkle trees are important in an upcoming update to the the Ethereum blockchain, which uses high-arity (i.e. $k=16$) Merkle Patricia tries to commit to state, storage, transaction and receipt data. Therefore, Merkle Patricia tries in Ethereum currently have high opening proof sizes and will benefit from using a Verkle tree commitment scheme (which will allow even a higher arity to find that sweet spot for the performance) after the update is finalized in the future. --- A vector commitment using groups of known order -- DL --- Finally, we give an example of a constant size opening proof vector commitment that is based on groups of known order that uses the idea of accumulators introduced in the previous chapter on set commitments. As before, we assume the maximum number of vector elements the scheme can handle is $n_\text{max}$. We will consider a set of $n_\text{max}$ unique primes $\{p_i\}_{i=1:n_\text{max}}$ whose product is $p=\prod_{j=1}^{n_\text{max}} p_j$. Now consider $n_\text{max}$ unique integers $[h_i]_{i=1:n_\text{max}}$ such that $h_i = p/p_i$, which represents positions/indeces $[1, ..., n_\text{max}]$ in an encoded manner. By looking at $h_i$ as numerical values, we can, in principle, computationally determine (using factorization) their index $i$ as they are missing a very particular prime in their constitution. Armed with this encoded representation, we can define the algorithms of the commitment scheme: - As usual with groups of known order, the setup algorithm outputs the group generator, and the group order, based on some security parameters $s$, such as the size of the group to be used, along with $[h_i]_{i=1:n_\text{max}}$, $\Phi(s)=\rho=(p,g, [h_i]_{i=1:n_\text{max}})$. - The commitment algorithm creates an inner product between the input vector $m=[m_1, ..., m_n]$ and $[h_i]_{i=1:n}$ (the first $n$ elements for $h_i$), which can be written as $\sum_{i=1}^n m_i h_i$. Then the digest is the group encoded version of this inner product, $d = C(\rho, m) = g^{\sum_{i=1}^n m_i h_i \mod p}$. Note that this is in contrast to the set commitment accumulator, where the exponents used in the commitment were products of primes that were mapped only from the values of the elements in the set and did not contain index/position information (which was permutation invariant). - The opening algorithm outputs the proof for the element $i(m) = m_i$ as the digest without the contribution of this $i$th element, $g^{m_i h_i}$. This can be written as $O(\rho, m, i) = \pi_i = d/g^{m_i h_i} = g^{(\sum_{j=1}^n m_j h_j) - m_i h_i \mod p}$. - The verification algorithm simply checks that the opening proof $\pi_i$ group multiplied by the candidate message $m^*_i$ combined with the position encoding, $g^{m^*_i h_i}$, can reproduce the digest, $V(\rho, d, i, \pi_i, m_i^*)=(\pi_i g^{m^*_i h_i} \stackrel{?}{=} d)$. --- **Efficiency**: The space/communication complexity of the digest $d$ and any opening proof $\pi_i$ are single group elements (whose size depends on the group size and representation). The complexity of the commitment algorithm is $\mathrm{O}(\log (\sum_{j=1}^n m_j h_j \mod p))$ group operations in $G$, using fast exponentiation such as the square & multiply algorithm. The opening algorithm has the same complexity. The complexity of the verification algorithm is $\mathrm{O}(\log (m_i^* h_j \mod p))$ group operations in $G$, using fast exponentiation. The scheme does not require a trusted-setup algorithm. --- Reduction to set commitments --- Given a vector commitment $(\Phi^+, C^+, O^+, V^+)$, it is straightforward to simulate a set commitment $(\Phi, C, O, V)$ through the following reduction: - The setup algorithm is the same, i.e. $\Phi(s)= \Phi^+(s) = \rho$. - The commitment algorithm creates a digest from a set message $m=\{m_1, m_2, ..., m_n\}$ by creating a vector with an arbitrary order $[m_1, m_2, ..., m_n]$ first and then committing to it through $C^+$, $C(\rho, m) = C^+(\rho, [m_1, m_2, ..., m_n]) = d$. - The opening algorithm $O$ for the $i$ element of the set $i(m) = m_i$ produces the opening proof for the $i$th element of the set/vector with $O^+$ (the order matches the one used in the commitment algorithm), $O(\rho, m, i) = O^+(\rho, [m_1, m_2, ..., m_n], i)= \pi_i$. - The verification algorithm for a candidate message $m_i^*$ is simulated as $V(\rho, d, i, \pi_i, m_i^*) = V^+(\rho, d, i, \pi_i, m_i^*)$ without any changes. This is again extremely simplistic, as we are just wrapping the set message $m$ as a vector and using the vector commitment algorithms to simulate the set commitment. Any ordering of the set elements works, as long as we keep it consistent. --- Remarks --- In the next chapters, we move on to commitments to functions of various kinds which can be thought of vectors with infinite elements. [Intro: A Hierarchical View of Cryptographic Commitment Schemes](/@kullervo/commitmentHierarchy) [Previous: Set Commitments](/@kullervo/commitmentSet) [Next: Univariate Polynomial Commitments](/@kullervo/commitmentUnivariatePoly)