# Verification of batch IPA proofs in halo2-lib
*Thanks to Yi Sun and Jonathan Wang for the Axiom open source program and for substantial help with halo2 and halo2-lib, and to M.S. Krishnamoorthy for many helpful discussions about polynomial commitment schemes in general and IPA in particular.*
This document was written as part of the Axiom open source program. It contains a detailed explanation/description of a particular polynomial commitment scheme, the "Inner Product Argument", as well as notes on the implementation of a "circuit" (in the sense of Plonkish arithmetization) to verify batch IPA proofs.
## Motivation
As part of the Axiom open source program, I planned to write a circuit to verify membership proofs in a Verkle trie. Verkle tries are a data structure to commit to a dictionary (i.e., a store of key-value pairs), and they have been proposed as a replacement to Merkle Patricia tries for storing state in Ethereum ([BF21](https://blog.ethereum.org/2021/12/02/verkle-tree-structure), [But21a](https://vitalik.ca/general/2021/06/18/verkle.html), [Fei21a](https://dankradfeist.de/ethereum/2021/06/18/verkle-trie-for-eth1.html)).
The basic model, a [Verkle tree](https://math.mit.edu/research/highschool/primes/materials/2018/Kuszmaul.pdf), was invented by J. Kuszmaul in 2018. Verkle trees require an underlying *vector commitment scheme*. For reasons we indicate below, inner product arguments (IPA) have been proposed as the vector commitment scheme for the Verkle trie to store state. One of these reasons is that IPA, like other linear vector commitment schemes, permits *batch proofs*. IPA has found several other uses: they are being used currently in zcash and in the backend of the Halo2 proving system.
With this in mind, my first goal was to write a circuit that verified a batch IPA proof.
### Tries
Recall that a trie, a.k.a. a prefix tree, is a data structure that allows one to efficiently store *partially defined functions* (a.k.a. *associative arrays* a.k.a. *dictionaries*). For instance, one could imagine a partially function: $$\text{State}\colon \{0,1\}^{256}\rightarrow \{0,1\}^{256}.$$
The trie roughly allows for the lazy allocation of nodes. See [here](https://en.wikipedia.org/wiki/Trie) for an introduction.
### Vector commitment schemes
**Throughout this article, all vectors will be row vectors.**
Fix a cryptographic prime $r$. A vector commitment scheme is a protocol to commit to vectors $\mathbb F_r^n\ni \vec{v}\mapsto \text{COM}(\vec{v})$. The commitment scheme should have the following properties:
1. $\text{COM}(\vec{v})$ is much smaller than $\vec{v}$.
2. The prover can efficiently compute a short (probabilistic) proof $\Pi$ that witnesses the claim "$v_i$ is the $i^{\text{th}}$ element of $\vec{v}$".
The word "short" in (2) implies that the proof can't reveal all the other elements of $\vec{v}$.
In other words, a vector commitment scheme allows a prover to commit to a long vector, such that if later in time someone asks "what was the $100^{\text{th}}$ element", the prover can produce both the element and a short probabilistic proof that the element is correct. (This means that if the verifier algorithm returns "TRUE" on input $\Pi$, then the verifier may rest assured that with overwhelming probability the prover is telling the truth.)
There are occasionally other desiderata of vector commitment schemes: for instance, that the verification algorithm be sub-linear, or that the protocol batches, etc.
### Comparison of Verkle tries with Merkle Patricia tries
In comparison to Merkle Patricia tries, Verkle tries have the following advantages.
1. They allow for much wider trees. Indeed, in a Merkle tree, one has to provide the hash values of all of the siblings of all of the nodes in a proof (i.e., in a path up to the root in the tree.) Using vector commitment schemes, this is unnecessary in a Verkle tree: one simply provides commitment proofs following the path from the leaf to the root. This means that membership proofs will be substantially smaller than MPT membership proofs.
To give a sense of scale in the case of Ethereum: with the current hexary tree, the worst case size of membership proof is ~3kB. With a binary Merkle tree, this improves to ~500 bytes. Under the current model of a Verkle trie to store state (32 byte keys, 32 byte values, branching of 256), membership proofs would be ~100 bytes in the worst case (31 internal nodes in the path). In practice, the membership proofs in a Verkle trie are expected to be even shorter. According to [Feist](https://www.youtube.com/watch?v=RGJOQHzg3UQ?t=24m53s), we can expect ~16,000 membership proofs that aren't completely random (i.e., corresponding to elements that aren't completely unrelated to each other) to only take a few hundred kilobytes.
2. As long as the underlying vector commitment scheme are linear, membership proofs may be *batched*. Concretely, this means the following. Suppose the prover wants to prove $1,000$ membership proofs. Then she can produce a "batched proof", such that the cost to the verifier is not much more than the verification of a single proof.
3. In the context of Ethereum, Verkle tries are supposed to provide a path towards *weak statelessness*. This means that validating nodes will not need to store any state; they would just need to check a series of proofs. One model for this: block proposers would include a block of *proofs* that the state trie has been updated in a valid manner.
## Inner product arguments
There are a variety of excellent sources on IPA: [But21b](https://vitalik.ca/general/2021/11/05/halo.html), [Fei21b](https://dankradfeist.de/ethereum/2021/07/27/inner-product-arguments.html), [HALO19](https://eprint.iacr.org/2019/1021.pdf). As far as I know, the original inner product argument occurs in [BCCGP16](https://eprint.iacr.org/2016/263.pdf). In our implementation, we closely follow the numbering and details of the Halo paper.
Before we explain the protocol in detail, we summarize the goals and basic properties and features.
### Goal
As per the name, the inner product argument accomplishes the following: the prover can commit to a vector $\vec{a}\in \mathbb F_r^N$ and then prove they know a $\vec{b}\in \mathbb F_r^N$ such that $\vec{a}\cdot \vec{b}=y$. To iterate: the advantage of IPA is that the proof is significantly shorter than revealing all of the elements of $\vec{b}$. In fact, IPA works by *jointly proving* that the verifier knows $\vec{a}$ with the given commitment and $\vec{b}$ with $\vec{a}\cdot \vec{b} = y$.
In the bare IPA scheme (which is what the following is based on), we demand no privacy with respect to $\vec{b}$. Privacy can be easily be assured by adding some blinding factors.
We are most interested in IPA as a polynomial commitment scheme. This will be explained below.
### Basic features
1. We assume that the length $N=2^k$ of the vector is a power of $2$.
2. IPA requires a group of prime order $r$ in which the discrete logarithm is (conjectured to be) computationally hard. As is now well-understood, for certain elliptic curves $E/\mathbb F_p$ (over an *auxiliary* prime field $\mathbb F_p$), the groups $\mathbb G:=E(\mathbb F_p)$ have prime order and seem to be resistent to attacks on the discrete log problem. We briefly reminder the reader what this means. While the group $\mathbb G$ is isomorphic to $\mathbb Z/r\mathbb Z$ (and in fact a generator $P$ of the group is public knowledge, i.e., there is an explicit isomorphism $\mathbb Z/r\mathbb Z\rightarrow \mathbb G$), given a random point $(x,y)=g\in \mathbb G$, it is conjectured to be hard to compute $\log_P(g)$ from the representation as $(x,y)$. In other words, it seems as though inverting the aforementioned isomorphism is computationally difficult.
3. IPA does not require a trusted setup a.k.a. structured reference string. In particular, IPA **requires no toxic waste**. (In comparison, KZG does require an SRS which comes along with toxic waste.) Instead, IPA requires an "unstructured reference string" $\vec{g}\in \mathbb G^N$, with the property that if $\vec{g}=(g_1,\ldots, g_N)$, then none of the pairwise discrete logarithms $\log_{g_i}(g_j)$ are known. This may be generated deterministically by feeding into a hash function.
4. IPA does not require pairings, unlike KZG.
5. Using a "Lagrange basis" (to be explained below), the commitment of a vector is (essentially) a Pederson commitment. In particular, the commitment of $\vec{v}\in \mathbb F_r^N$ is an element of $\mathbb G$.
6. IPA is an interactive proof system, where at every stage there is randomness furnished by the verifier. At each stage, the prover supplies two elements $L_i,R_i\in \mathbb G$. Roughly speaking, IPA acts like a folding scheme, reducing the claim of a long statement (that two vectors of length $m$ have a given inner product) to a shorter statement (that two vectors of length $m/2$ have a given inner product). (As usual, this interaction can be removed by using a deterministic, pseudo-random procedure for generating the "verifier randomness", a.k.a. the Fiat-Shamir heurestic.) In particular, the length of an IPA proof is $\mathcal O(\log(N))$
7. IPA verification is $\mathcal O(N)$, so it might seem that their main advantage is that the proofs have size $\mathcal O(\log N)$. However, IPA proofs, being a *linear commitment scheme*, batch exceptionally well, and moreover. **Punchline:** there is an efficient protocol for a prover to prove a $m$ IPA proofs (with the same $\vec{g}$), with proof length $\mathcal O(m\log N)$, such that the verifier time is $\mathcal O(m\log(N) + N)$. More precisely, the verifier time is $$\mathcal O(m\log N) + \text{verfier time for a single IPA proof.}$$
8. Moreover, this batching may done incrementally, i.e., on-the-fly; the prover can continually generate single IPA proofs, and then when the verifier wants to check them all, the prover can generate a single "merged" proof, which is again an IPA proof. (In fact, any other prover could do this computation equally well, just looking at the list of single IPA proofs.) The verifier, inspecting the original proofs, can check that the merged proof was correctly computed in logarithmic time.
In particular, note that IPA does not require a trusted setup, has no toxic waste, and does not require pairings (and hence a pairing friendly elliptic curve). Together with the aggregation properties, these are probably the most important reasons IPA has been favored to be the vector commitment scheme underlying the Verkle trie to manage state in Ethereum.
### Lagrange basis
We quickly review Lagrange bases. let $\omega\in \mathbb{F}_r$ be a primitive $N^{\text{th}}$ root of unity. We write $H\subset \mathbb{F}_r^{\times}$ to be the subgroup generated by $\omega$. (Note that while $\omega$ is not uniquely determined, the subgroup $H$ is -- this comes from the beautiful fact that the (multiplicative) group of non-zero elements of $\mathbb{F}_r$ is cyclic.)
Let $\mathcal L_i\in \mathbb{F}_r[x]$ be the unique degree $N-1$ polynomial whose values on $H$ are given by:
$$\mathcal L_i(\omega^j)=\delta_{i,j}.$$
The $\{\mathcal L_i\}$ are called the *Lagrange basis* with respect to $H$. (They form a basis of the vector space $\text{Functions}(H,\mathbb F_r)$.) One utility of the Lagrange basis is an efficient scheme to build a vector commitment scheme from a polynomial commitment scheme, which we explain below.
Suppose we want to commit a vector $\vec{v}\in \mathbb F_r^N$. There is a unique polynomial $f_{\vec{v}}\in \mathbb F_r[x]$ of degree no greater than $N-1$ such that :
$$f(\omega^i) = v_i.$$
Explicitly, in Lagrange basis $f=\sum v_i\mathcal L_i.$
Suppose we have a polynomial commitment scheme. Then we commit $\vec{v}$ by commiting $f_{\vec{v}}$. The opening proof for $v_i$ will then be an evaluation proof for $f_{\vec{v}}$ at $\omega^i$.
A final remark on computation time. As we have chosen $H$ to be as subgroup of 2-primary roots of unity, the computation of the Lagrange basis may be done in $\mathcal O(N\log(N))$ time via FFT. However, for all applications in this note, the Lagrange basis is precomputed and hence does not contribute to any of the time analyses.
In particular, we could have chosen *any other subset $S\subset \mathbb F_r$* and built a Lagrange basis for this subset, and we would equally get a vector commitment scheme from a polynomial commitment scheme of the "same performance." We stick to the choice of $H$, both out of convention, and because in other applications it is important that $H\subset \mathbb F_r^{\times}$ is a subgroup.
### IPA as a polynomial commitment scheme.
By the above, we have reduced a vector commitment scheme to a polynomial commitment scheme, i.e., if we have a polynomial commitment scheme, we can derive a vector commitment scheme.
We will see that for single proofs, IPA may be used directly as a vector commitment scheme; however, the main advantage of *batching* comes from treating it as a PCS.
To derive a PCS from an inner product argument above, if $f=\sum a_ix^i$, set $\vec{a}=(a_0,\ldots, a_{N-1})$. To provide an evaluation proof at $z$, set $\vec{b} = (1,z,\ldots, z^{N-1})$.
As indicated above, as part of the setup of IPA, we require the following.
### Setup
* Our vectors are of length $N=2^k$.
* $\mathbb G$, a cryptographic group of size $r$ in which the discrete logarithm problem is hard.
* A URS $\vec{g}\in \mathbb G^N$, where the discrete logs between any two components of the vectors are unknown.
* A random element $u\in \mathbb G$, again with the assumption that $\log_u(g_i)$ is unknown.
* To make the protocol non-interactive, a cryptographic hash function $H\colon \mathbb G^{*}\rightarrow \mathbb F_r$ that eats strings of group elements and spits out a field element.
The Prover's initial commitment is $\textbf{COM}:=[\vec{a}]\cdot \vec{g}$. We pause to interpret what this means: recall that $r=|\mathbb G|$. Therefore, for any $\lambda \in \mathbb F_r$ and any $g\in G$, the product
$$[\lambda]\cdot g:=\underbrace{g+\ldots +g}_{\lambda \text{ times}}$$is well-defined. (Note that some authors prefer *multiplicative* notation for $\mathbb{G}$. In this notation, the above, "iteratively apply the group operation to $\lambda$ copies of a group element $g\in \mathbb G$" would be written as $g^{\lambda}$. We will only use additive notation for $\mathbb G$ in this note.)
Then $[\vec{a}]\cdot \vec{g}$ is the "multi-scalar multiplication", i.e., is $\sum [a_i]\cdot g_i \in \mathbb G$. (This is sometimes known as a Pederson commitment of $\vec{a}$.)
The prover will claim to know $\vec{a}$ with $\textbf{COM}=[\vec{a}]\cdot \vec{g}$ and $\vec{b}$ with $\vec{a}\cdot \vec{b}=y$. The associated *IPA claim* is:
$$[\vec{a}]\cdot \vec{g}+[y]u\in \mathbb G.$$
We consider this element to be a joint commitment to the claims $[\vec{a}]\cdot \vec{g}=\textbf{COM}$ and $\vec{a}\cdot \vec{b}=y$.
#### Joint commitments
A word on the nomenclature. When we say the word "joint commitment", we mean that the element $[\vec{a}]\cdot \vec{g}+[y]u\in \mathbb G$ computationally binds the the two claims. This concretely means the following: if the prover knew another $\alpha, \delta$ such that $[\vec{\alpha}]\cdot \vec{g} + [\delta] u = [\vec{a}]\cdot \vec{g} + [y]u\in \mathbb G$, then the prover would have found a linear relation between the $g_i$ and $u$, which we assume is computationally difficult.
Before we describe the rest of the protocol, we need two further notions.
### Folding and Folded Commitments.
#### Scalar folding
Let $\lambda\in \mathbb F_r$, let $m$ be an even number, and let $\vec{w}\in \mathbb F_r^m$ Then $\text{ScalFold}(\lambda, \vec{w}):=\lambda^{-1}\vec{w}_{\text{lo}}+\lambda \vec{w}_{\text{hi}}$ is in $\mathbb F_r^{m/2}$.
Abusing notation, we make a similar definition wher $\vec{w}$ is replaced by a vector of group elements. Suppose $\vec{h}\in \mathbb G^{m}$. Then $\text{ScalFold}(\lambda, \vec{h}):=[\lambda^{-1}]\vec{w}_{\text{lo}}+[\lambda] \vec{w}_{\text{hi}}$ is in $\mathbb G^{m/2}$.
We choose the name "scalar folding" because in both cases, we are folding a vector of length $m$ into a vector of length $m/2$ with help from a single scalar $\lambda\in \mathbb F_r$.
#### Folded commitment.
Let $m$ be an even number and $\vec{\alpha}, \vec{\beta}$ be vectors in $\mathbb F_r^m$ and $\vec{\gamma}$ be a vector in $\mathbb G^m$. Then $\text{FoldCom}(\alpha,\beta,\gamma):=(L,R)$, where
$L:=[\vec{\alpha}_{\text{lo}}]\cdot \vec{\gamma}_{ \text{hi}}+[(\vec{\alpha}_{\text{lo}}\cdot \vec{\beta}_{ \text{hi}})]u\in \mathbb G$ and $R:=[\vec{\alpha}_{\text{hi}}]\cdot \vec{\gamma}_{\text{lo}}+[(\vec{\alpha}_{\text{hi}}\cdot \vec{\beta}_{\text{lo}})]u\in \mathbb G$.
Note that both $L$ and $R$ are IPA-like statement commitments:
* $L$ commits to: $\vec{\alpha}_{\text{lo}}\in \mathbb F_r^{m/2}$ through a Pederson commitment with respect to $\vec{\gamma}_{\text{hi}}$, as well as to $\vec{\alpha}_{\text{lo}}\cdot \vec{\beta}_{\text{hi}}\in \mathbb F_r$.
* $R$ commits to $\vec{\alpha}_{\text{hi}}\in \mathbb F_r^{m/2}$ through a Pederson commitment with respect to $\vec{\gamma}_{\text{lo}}$ as well as to $\vec{\alpha}_{\text{hi}}\cdot \vec{\beta}_{\text{lo}}\in \mathbb F_r$.
### The rough idea of folding
The prover claims that $\vec{a}\cdot \vec{b} = y$ and $[\vec{a}]\cdot \vec{g}=\textbf{COM}$. Using $u$, this claim may be packaged into one joint *claimed* value of:
$$[\vec{a}]\cdot \vec{g} + [\vec{a}\cdot\vec{b}]u$$
Using folding and folding commitments, the prover will "reduce" this claim to a claimed value of:
$$[\vec{a}']\cdot \vec{g}' + [\vec{a}'\cdot\vec{b}']u,$$
where $\vec{a}', \vec{b}', \vec{g}'$ all have half the length. (Equivalently, the prover reduces the original pair of claims to a pair of claimed values for $[\vec{a}']\cdot \vec{g}'$ and $\vec{a}\cdot \vec{b}$.) Note that for this reduction to be computationally sound, it must be hard to find non-trivial linear relations between the $g_i$ and $u$.
The verifier will respond with some randomness, which the prover will use to seed a new folding, reducing the claim to an MSM claim and an inner product claim where the constituent vectors have length $N/4$, etc. At the end of the protocol, the prover is making a claim about the "inner product" of vectors of length 1, i.e., the product. At that point, the prover may simply reveal $a_0$. (Depending on the details/the structure of $\vec{b}$, the verifier can compute the rest herself, sometimes quite quickly.)
### The protocol for the prover (following the Halo paper)
Set $\vec{a}_k = \vec{a}$, $\vec{b}_k=\vec{b}$, and $\vec{g}_k = \vec{g}$. At stage $k$, the prover computes $(L_k, R_k):=\text{FoldCom}(\vec{a}_k,\vec{b}_k,\vec{g}_k)$.
```sequence
Prover->Verifier: Claimed commitment of a, claimed evaluation y.
Prover->Verifier: Stage k: L_k, R_k in G
Note right of Verifier: Pick a random w_k
Verifier->Prover: w_k
Note left of Prover: Prover computes a_{k-1}, b_{k-1}, g_{k-1}
Note left of Prover: Prover computes L_{k-1}, R_{k-1}
Prover->Verifier: Stage k-1: L_{k-1}, R_{k-1} in G
Note right of Verifier: Pick a random w_{k-1}
Verifier->Prover: w_{k-1}
Prover->Verifier: ...
Verifier->Prover: ...
Note left of Prover: Prover computes L_1, R_1
Prover->Verifier: Stage 1: L_1, R_1 in G
Note right of Verifier: Pick a random w_1
Verifier->Prover: w_1
Note left of Prover: Compute a_0
Prover->Verifier: Stage 0: a_0 in F_r
Note right of Verifier: Check the proof!
```
For every $\ell \in \{k-1,\ldots, 1\}$, at the beginning of stage $\ell$, the prover computes the following elements (without sending to the verifier):
$$\vec{a}_{\ell}:=\text{ScalFold}(w_{\ell+1}^{-1}, \vec{a}_{\ell+1}),\ \vec{b}_{\ell}=\text{ScalFold}(w_{\ell+1}, \vec{b}_{\ell+1}),\ \vec{g}_{\ell}=\text{ScalFold}(w_{\ell+1}, \vec{g}_{\ell+1})$$
We note that $\vec{a}_{\ell},\vec{b}_{\ell}\in \mathbb F_r^{(2^{\ell})}$ are vectors of length $2^{\ell}$.
Then she computes $$(L_{\ell}, R_{\ell}):=\text{FoldCom}(\vec{a}_{\ell}, \vec{b}_{\ell}, \vec{g}_{\ell})\in \mathbb G^2$$ and sends these two elements of $\mathbb G$ to the verifier.
Finally, at stage 0, the prover simply sends $\vec{a}_0$, given by $\vec{a}_{0}:=\text{ScalFold}(w_{1}^{-1}, \vec{a}_{1})$, which, as it is a vector of length 1, we treat as a scalar.
### The implicit claim
The prover has two implicit claims at Stage $\ell$:
1. $\vec{a}_{\ell}\cdot \vec{b}_{\ell}=y_{\ell},$ where $y_{\ell}$ is defined inductively by: $$y_{\ell}=y_{\ell+1}+w_{\ell+1}^2(a_{\ell+1,\text{lo}}\cdot b_{\ell+1,\text{hi}})+w_{\ell+1}^{-2}(a_{\ell+1,\text{hi}}\cdot b_{\ell+1,\text{lo}}).$$
2. $[\vec{a}_{\ell}]\cdot \vec{g}_{\ell}=\textbf{COM}_{\ell},$ where the latter is defined inductively as:$$\textbf{COM}_{\ell} = \textbf{COM}_{\ell+1} + w_{\ell+1}^2(a_{\ell+1,\text{lo}}\cdot g_{\ell+1,\text{hi}})+w_{\ell+1}^{-2}(a_{\ell+1,\text{hi}}\cdot g_{\ell+1,\text{lo}})$$
In other words, this has the form of an original IPA claim: a commitment to a vector with respect to a known URS and a claimed inner product. Note further that the two equations above really look quite similar.
In the actual proof, we are able to package both of these claims into a single claim using $u$; as we assumed from the get-go that it is computationally difficult to find a linear relation between the elements of the vector $\vec{g}$ and $u$, both claims can be sampled together via $u$ into a joint claimed value of:
$$\textbf{COM}_{\ell}+[y_{\ell}]u \in \mathbb G.$$
We have an outstanding question: *why* are these the implicit claims, and what do the $L_j, R_j$ have to do them them?
#### Relation of $L_{\ell+1},R_{\ell+1}$ to the IPA claim.
If the prover is conducting the protocol honestly, then for any $\ell\in \{1,\ldots, k-1\}$, we have:
$$[\vec{a}_{\ell}]\cdot \vec{g}_{\ell} + [\vec{a}_{\ell}\cdot \vec{b}_{\ell}]u = \bigg([\vec{a}_{\ell+1}]\cdot \vec{g}_{\ell+1} + [\vec{a}_{\ell+1}\cdot \vec{b}_{\ell+1}]u\bigg) + \bigg(w_{\ell+1}^2L_{\ell+1} + w_{\ell+1}^{-2}R_{\ell+1}\bigg)$$
In other words, after sending $L_{\ell+1},R_{\ell+1}$ and receiving $w_{\ell+1}$, suppose the prover reveals $\vec{a}_{\ell}, \vec{b}_{\ell}$. Then the verifier has a simple compatibility check she can do, namely, the above equality.
Conversely, we have the following claim.
**Claim:** Suppose that at stage $\ell$, the prover reveals $\vec{a}_{\ell},b_{\ell}$ and the above check is satisfied. Then with overwhelming probability, the prover knew an original $\vec{\alpha}, \vec{\beta}$ with:
$$[\vec{\alpha}]\cdot \vec{g} = \textbf{COM},\ \vec{\alpha}\cdot \vec{\beta}=y.$$
This claim follows from the fact that $w_k,\ldots, w_{\ell+1}$ are all randomly (or even adversarially) chosen. The key is that with overwhelming probability, the implicit IPA claim the prover had at step $\ell+1$ is given by the element $[\vec{a}_{\ell}]\cdot \vec{g}_{\ell} + [\vec{a}_{\ell}\cdot \vec{b}_{\ell}]u - \bigg(w_{\ell+1}^2L_{\ell+1} + w_{\ell+1}^{-2}R_{\ell+1}\bigg)$.
In other words, the validity at stage $\ell+1$ has been reduced to the validity at stage $\ell$. Using induction, one concludes.
The actual verifier check practically jumps off of the page at this point. We explain this in the next subsection.
For further intuition of how this implicit claim works, we strongly recommend the reader look at the diagrams in [But21b](https://vitalik.ca/general/2021/11/05/halo.html).
### Verifier's protocol
Now, the verifier has the following information:
* $z\in \mathbb F_r$
* The claimed value $y=f(z)=\vec{a}\cdot \vec{b}$
* The commitment $\textbf{COM}\in \mathbb G$ to $\vec{a}$ (equivalently, to $f$)
* $L_k,R_k,\ldots, L_1,R_1$, i.e., $2k$ elements of $\mathbb G$.
* The final value $a_0$.
As we have described the prover, we must now describe the verifier algorithm. This goes as follows.
1. Given the randomness $w_k,\ldots, w_1$, compute the following "binary counting vector"
$$\vec{s}:=\begin{pmatrix} w_1^{-1}w_2^{-1}w_3^{-1}\cdots w_k^{-1} \cr w_1w_{2}^{-1}w_{3}^{-1}\cdots w_k^{-1}\cr w_1^{-1}w_{2}w_{3}^{-1}\cdots w_k^{-1}
\cr
w_1w_{2}w_{3}^{-1}\cdots w_k^{-1}
\cr \vdots
\cr
w_1 \cdots w_k
\end{pmatrix}^T\in \mathbb F_r^N $$where the pattern of of ones and negative ones is governed by binary counting.
2. Compute $b_0=\vec{s}\cdot \vec{b}$. In particular, the claim is that $\text{ScalFold}(w_1,\vec{b}_1) = \vec{s}\cdot \vec{b}$. This claim holds for any starting vector $\vec{b}\in \mathbb F_r^N$; for general vectors $\vec{b}$, this computation requires a linear number of field operations.
However, in our case $\vec{b}=(1,z,\ldots, z^{N-1})$, and $\vec{s}\cdot \vec{b}$ can be computed very quickly from $w_k,\ldots, w_1$. (In other words, $\vec{b}$ is highly structured in a way that dramatically simplifies this computations.) In particular, we do not need to explicitly invoke $\vec{s}$ for this step, just the elements $w_k,\ldots, w_1$.
3. Compute $g_0=\vec{s}\cdot \vec{g}$. This is the only step in the process that necessarily takes linear time. (Note that scalar multiplication is also much more expensive than field multiplication.)
4. Compute $Q_1:=\sum_{j=1}^kw_j^2\cdot L_j + \sum_{j=1}^{k}w_j^{-2}\cdot R_j + \overbrace{(\textbf{COM} + y\cdot u)}^{\text{original IPA claim}}$ and $Q_2 =\underbrace{a_0\cdot (g_0 + b_0\cdot u)}_{\text{final IPA claim}}$. Check that $Q_1 = Q_2$.
We explain in greater detail point (2). Consider the following auxiliary polynomial of degree $N-1$:
$$T(x):=\prod_{j=1}^k (w_j^{-1}+ w_jx^{2^{j-1}})\in \mathbb F_r[x].$$
Then (2) follows from the fact that $T(z)=b_0$. (This follows from simply writing out the binary counting pattern.) In particular, although the polynomial $T$ has $N$ non-zero coefficients, one can compute its evaluations in $\mathcal O(\log N)$ time because of this *compact product representation*.
Note that if $\vec{b}$ is simply a unit vector of the form $\vec{b}=(0,..0,1,0,..0)$, we can still compute $b_0$ quickly. (Exercise given the above!) This implies that IPA immediately furnishes a vector commitment scheme whose only linear cost is a single MSM. (Here, "immediately" means: without transforming $\vec{a}$ to Lagrange bases and using $(1,\omega^i,\omega^{2i},\ldots,\omega^{(N-1)i})$ as $\vec{b}$, as explained in the subsection on the relation between vector commitment schemes and polynomial commitment schemes.) However, as far as I was able to work out, in order to obtain efficient batching, one must work with the polynomial commitment scheme form.
### Non-interactive version
As usual, this interactive proof system can be made non-interactive by applying the Fiat-Shamir heurestic. This means that the randomness $w_{\ell}$ will be some determinstic evaluation of a pseudorandom hash function of the transcript so far. As usual, to ensure security one needs that
* the original statement as well is commited in the randomness; and
* every element the prover sends is commited in the randomness.
Here, the phrase "commited in the randomness" simply means "is fed into the hash function."
### Batching
The crucial observation for batching is that $g_0$ is in fact the Pederson commitment of $T(z)$. Indeed, this follows from the explicit expansion of $T(z)$. (As mentioned above, the coefficients, read from constant to top degree, will $\vec{s}$.) Moreover, the only step that is expensive in the above verification process is the MSM required to compute $g_0$ from $\vec{g}$.
We now explain how the prover can "batch" $m$ proofs together such that the verifier ends up doing $$\mathcal O(m\log N)+\text{single MSM of length }N$$
#### Prover protocol
Here, we assume that we have a *fixed* $\vec{g}$ and $u$, and we want to commit $m$ vectors $\{\ _i\vec{v}\}$, where $\ _i \vec{v}\in \mathbb F_r^N$. (We use left subscripts to avoid confusing the index inside a given vector with the index of the vector in my list of vectors.)
The prover generates individual IPA proofs for each of the $\ _i\vec{v}$, with the following addition: the prover includes a claimed value of the $\ _ig_0\in \mathbb G$, the "$g_0$" value in $i^{\text{th}}$ IPA proof. Alternatively, this element may also be described as:
$$\text{ScalFold}(_iw_1,(\text{ScalFold}(_iw_2, \ldots\text{ScalFold}(_iw_k, \vec{g})))),$$
where $_iw_j$ are the pseudo-random elements in the $i^{\text{th}}$ IPA proof.
By the above discussion, $_ig_0$ is Pederson commit of $\ _iT$, where $\ _iT$ is the polynomial $T$ constructed above from the $i^{\text{th}}$ IPA claim. (Again, this polynomial only depends on the pseudorandom challenges.)
Suppose that the prover now received a pseudo-random challenges $\xi,\zeta\in \mathbb F_r$. (In our setting, the prover will generate this as some hash of all of the proofs.)
Then the prover will provide one additional IPA proof. The polynomial will be $\sum \xi^i\ _iT(x)\in \mathbb F_r[x]$, which will be evaluated at $\zeta$.
#### Verifier protocol
1. First of all, the verifier has to check the final IPA proof. This takes $\mathcal O(N)$ time, again dominated by the single MSM.
Then the verifier must check that the final "merged" proof actually corresponds to a correct merging of the original list of proofs.
2. The verifier computes the pseudorandomness $_iw_j$ at stage $j$ of the $i^{\text{th}}$ proof. (In other words, we assume that the pseudo-random challenges were generated determinastically by some series of functions of the transcript. The verifier can simply compute this herself.)
3. The verifier checks that the computation of $\xi, \zeta$ was done correctly. (Similarly, we assume that $\xi$ and $\zeta$ are computed determinstically given the transcript of all of the single proofs; the verifier can therefore look at all of these single proofs and compute $\xi$ and $\zeta$ herself.)
4. The verifier then uses the $_iw_j$ to check whether the claimed evaluation $\sum \xi^i\ _iT(\zeta)$ in the final proof is compatible with the earlier proofs: the verifier may evaluate each $_i T$ in $\mathcal O(\log(N))$ time due to the compact product representation explained above.
## Implementation
I implemented "IPA as a batched polynomial commitment scheme". This means: (1) a native Rust implementation of prover and verifier, as well as an circuit-level implementation to verify a batched IPA proof.
### Rust
The Rust implementation is straightforward and thoroughly commented. A few remarks on the code are in order.
The only real choice in implementation is that of the hash functions used to generate the pseudo-randomness. Due to initially having some trouble getting the PoseidonChip in halo2-lib to be compatible with the other native implementations, I decided against putting it in this iteration. Instead, randomness is generated as follows.
#### Pseudorandomness in single proof generation
$w_k$ is set to simply be the claimed evaluation $y$. Then, for all $\ell\in \{1,\ldots,k-1\}$, we set:
$$w_{\ell} = y w_{\ell+1} + \text{hash_group_to_field}(L_{\ell})+\text{hash_group_to_field}(R_{\ell}),$$
where the function $\text{hash_group_to_field}\colon \mathbb E(\mathbb F_p)\rightarrow \mathbb F_r$ takes a point $(x,y)\in \mathbb F_p^2$, reduces the coordinates mod $r$, and takes the sum.
#### Pseudorandomness to generate $\xi, \zeta$
$\xi = \prod\ _iw_1$ and $\zeta = \xi^2$. In words: $\xi$ is the product of the "last randomness" for every proof and $\zeta$ is the square of $\xi$. This at least means that the pseudo-random numbers $\xi, \zeta$ reflect iterated commitments to every part of every proof. In an production-level implementation, this would presumably have to be improved. However, we emphasize: if one believes that the $_iw_j$ are pseudo-random, essentially any function one picks of (all of) the $_iw_1$ will be secure, in the sense that the prover cannot give false proofs by carefully massaging the $L_{\ell}, R_{\ell}$.
#### Data structures
I built several data structures to store proofs, complete proofs, batched proofs, etc., so it was easy to pass to a native Rust verifier as well as to a circuit verifier. I list each of these abstractions here, with their desired purpose.
##### ProofOfEvaluation
This is a struct containing a proof of Evaluation.
* ``revealed_evaluation`` is the element claimed to be evaluated.
* ``stage_proof`` is the vector of $[L_i,R_i]$ as described in the IPA protocol. The only additional point of note is the order: stage $k$, stage $k-1$, ..., stage $1$.
* ``final_a`` is the final a value that the prover reveals.
* ``BatchingHelperInfo`` contains the claimed stage_randomness as well as the claimed $g_0$. This is necessary when we try to batch.
The model here is that the other inputs needed for the proof, namely $\vec{g}=g_{\text{init}}$, $u$, and $z$, are assumed to be already known.
```
pub struct ProofOfEvaluation {
pub revealed_evaluation: Fr, // element claimed to be revealed
pub stage_proof: Vec<[G1Affine; 2]>,
pub final_a: Fr,
pub batching_helper_info: Option<BatchingHelperInfo>,
}
```
##### CompleteSingleIPAProof
This struct contains all of the information the verifier needs to verify a single IPA proof.
* ``commitment`` is the commitment to the polynomial, or equivalently to $\vec{a}$.
* $z$ is the point where we want to evaluate the polynomial.
* ``proof`` is of type ``ProofOfEvaluation``
* ``g_init`` is $\vec{g}$ in our explanation above, i.e., it is the URS.
* ``u`` is the random element $u$ that is used to combine two claims into a single joint claim.
```
pub struct CompleteSingleIPAProof{
pub commitment: G1Affine,
pub z: Fr,
pub proof: ProofOfEvaluation,
pub g_init: Vec<G1Affine>,
pub u: G1Affine,
}
```
##### BatchingHelperInfo
This struct contains auxiliary information in a single IPA proof to help with batching. The necessary extra information is ``g_0``, which is the claimed final folded $g$ value. I chose to add a tiny bit of extra information: the pseudo-randomness at each stage.
```
pub struct BatchingHelperInfo{
pub stage_randomness: Vec<Fr>,
pub g_0: G1Affine,
}
```
##### BatchProofOfEvaluation
This is analogous to ProofOfEvaluation.
* ``list_of_proofs`` is a vector of the claimed single proofs.
* ``commitment_to_weighted_poly`` is the claimed commitment to $\sum \xi^i\ _iT(x)$ (in the language above.) More vaguely: it is the claimed commitment to the "merged polynomial."
* ``proof_of_weighted_poly`` is an opening proof for the value of weighted_poly at $\zeta$ (the pseudo-random point generated from the proofs.)
```
pub struct BatchProofOfEvaluation{
pub list_of_proofs: Vec<ProofOfEvaluation>,
pub commitment_to_weighted_poly: G1Affine,
pub proof_of_weighted_poly: ProofOfEvaluation
}
```
##### CompleteBatchIPAProof
This struct contains everything the verifier needs to evaluate a batched IPA opening proof.
* ``commitments`` is a list of the commitments of the polynomials
* ``vec_z`` is a vector of the points where we want to evaluate the proofs.
* ``batch_proof`` is of thype ``BatchProofOfEvaluation``
* ``g_init`` is the URS. (This is $\vec{g}$ in our explanation above.)
* ``u`` is the random element used to take two commitments and make them a joint commitment.
```
pub struct CompleteBatchIPAProof {
pub commitments: Vec<G1Affine>,
pub vec_z: Vec<Fr>,
pub batch_proof: BatchProofOfEvaluation,
pub g_init: Vec<G1Affine>, // g_init is the same for all batched proofs.
pub u: G1Affine
}
```
### Circuit
I wrote a circuit to take in a batched IPA proof and verify the proof. Here, I emphasize: I am treating IPA as a PCS; that means that each $\vec{b}=(1,z,z^2,\ldots, z^{N-1})$.
##### Loading the input
To test my circuit, I generated proofs via the rust implementation and passed it into my circuit. To do this, one needs to "place all of the inputs into a table of numbers". This is accomplished with the following function
```
pub fn load_complete_single_ipa_proof(
builder: &mut GateThreadBuilder<Fr>,
gate: &GateChip<Fr>,
range: &RangeChip<Fr>,
params: &MSMCircuitParams,
single_proof: &CompleteSingleIPAProof, // an IPA proof, generated in rust.
make_public: &mut Vec<AssignedValue<Fr>>,
) -> CircuitCompleteSingleProof
```
What is ``CircuitCompleteSingleProof``? This will be the input to my single IPA proof verifier circuit. Note that all of the parts of the struct are *circuit level*, i.e., already in my table of numbers.
One small note: ``left`` corresponds to $L$ and ``right`` corresponds to $R$. In other words, these are the group elements that the prover sends to the verifier at each stage.
```
pub struct CircuitCompleteSingleProof {
pub commitment: EcPoint<Fr, ProperCrtUint<Fr>>,
pub z: AssignedValue<Fr>,
pub g_init: Vec<EcPoint<Fr, ProperCrtUint<Fr>>>,
pub u: EcPoint<Fr, ProperCrtUint<Fr>>,
pub revealed_evaluation: AssignedValue<Fr>,
pub left: Vec<EcPoint<Fr, ProperCrtUint<Fr>>>,
pub right: Vec<EcPoint<Fr, ProperCrtUint<Fr>>>,
pub final_a: AssignedValue<Fr>,
pub k: usize, // number, such that 2^k is the degree of the polynomial we are commiting.
// equivalently, the number of steps in the proof.
}
```
We have siimlar functions for batched proofs.
```
pub fn load_complete_batch_ipa_proof(
builder: &mut GateThreadBuilder<Fr>,
gate: &GateChip<Fr>,
range: &RangeChip<Fr>,
params: &MSMCircuitParams,
proof: &CompleteBatchIPAProof,
make_public: &mut Vec<AssignedValue<Fr>>
)->CircuitCompleteBatchProof
```
and
```
pub struct CircuitCompleteBatchProof {
// the proofs of the individual claims
pub vec_commitment: Vec<EcPoint<Fr, ProperCrtUint<Fr>>>,
pub vec_z: Vec<AssignedValue<Fr>>,
pub g_init: Vec<EcPoint<Fr, ProperCrtUint<Fr>>>,
pub u: EcPoint<Fr, ProperCrtUint<Fr>>,
pub vec_revealed_evaluation: Vec<AssignedValue<Fr>>,
pub vec_left: Vec<Vec<EcPoint<Fr, ProperCrtUint<Fr>>>>,
pub vec_right: Vec<Vec<EcPoint<Fr, ProperCrtUint<Fr>>>>,
pub vec_a_0: Vec<AssignedValue<Fr>>,
pub vec_g_0: Vec<EcPoint<Fr, ProperCrtUint<Fr>>>, // this is new for batching!
// the final "blended" proof.
pub final_commitment: EcPoint<Fr, ProperCrtUint<Fr>>,
pub final_left: Vec<EcPoint<Fr, ProperCrtUint<Fr>>>,
pub final_right: Vec<EcPoint<Fr, ProperCrtUint<Fr>>>,
pub final_a_0: AssignedValue<Fr>,
pub k: usize,
pub m: usize,
}
```
##### Hashing and randomness
Given some part of the proof, the verifier needs to compute the ``stage_randomness``. (This is where the verifier simulates the Fiat-Shamir.)
In particular, the output will be of the form: ``(<w_k, ..., w_1>, <w_k^{-1}, ..., w_1^{-1}>)``, with constraints forcing ``w_i * w_i^{-1} = 1``. We emphasize that the order is the same as that of the Halo paper: $w_{\ell}$ denotes the randomness at the (end of the) $\ell^{\text{th}}$ stage of the proof.
```
pub fn compute_stage_randomness_single_proof(
builder: &mut GateThreadBuilder<Fr>,
gate: &GateChip<Fr>,
z: AssignedValue<Fr>,
revealed_evaluation: AssignedValue<Fr>,
left: Vec<EcPoint<Fr, ProperCrtUint<Fr>>>,
right: Vec<EcPoint<Fr, ProperCrtUint<Fr>>>,
k: usize,
)->(Vec<AssignedValue<Fr>>, Vec<AssignedValue<Fr>>)
```
As a necessary piece of this, we need code to hash group elements to field elements. The signature is below:
```
pub fn hash_group_to_field_circuit(
ctx: &mut Context<Fr>,
gate: &GateChip<Fr>,
p: EcPoint<Fr, ProperCrtUint<Fr>>
)->AssignedValue<Fr>
```
Similarly, given many proofs, the verifier needs to independently compute the "batching randomness", which will correspond to $(\xi, \zeta)$ in the above.
```
pub fn compute_final_batching_randomness(
builder: &mut GateThreadBuilder<Fr>,
gate: &GateChip<Fr>,
vec_stage_randomness: Vec<Vec<AssignedValue<Fr>>>,
m: usize,
k: usize,
)->[AssignedValue<Fr>; 2]
```
##### Evaluate the compact polynomial
To e.g. compute $b_0$ (and also to use the properties of the $g_0$), we need the following function.
```
pub fn evaluate_folded_product_poly(
ctx: &mut Context<Fr>,
gate: &GateChip<Fr>,
z: AssignedValue<Fr>,
stage_randomness: Vec<AssignedValue<Fr>>,
stage_randomness_inv: Vec<AssignedValue<Fr>>,
)->AssignedValue<Fr>
```
Given ``stage_randomness`` = $(w_k, w_{k-1},...,w_1)$ and $z\in \mathbb{F}_r$, compute the following:
$$\prod_{i=1}^k (z^{2^{i-1}} w_i + w_i^{-1})\in \mathbb{F}_r.$$
##### Binary counting
The verifier must generate the binary counting pattern given the ``stage_randomness``. The following are the relevant code snippets.
```
pub fn binary_counting<F: ScalarField>(
ctx: &mut Context<F>,
gate: &GateChip<F>,
input: binary_counting_input<F>,
// make_public: &mut Vec<AssignedValue<F>>
)-> Vec<AssignedValue<F>>
```
The input is $(w_1,...,w_k)$ (where each element is of type ``AssignedValue<F>``) and the output is $(w_1^{-1}...w_k^{-1}, w_1w_2^{-1}...w_k^{-1}, w_1^{-1}w_2...w_k^{-1}, ... w_1w_2...w_k)$ (which of of type ``Vec<AssignedValue<F>>``).
In fact, we use ``binary_counting_reverse`` in our circuit.
##### Evaluation of the proofs
This is straightforward given the constituent parts above and the description of the verification algorithm. The relevant functions are:
```
fn verify_single_ipa_proof(
builder: &mut GateThreadBuilder<Fr>,
single_proof: CompleteSingleIPAProof,
make_public: &mut Vec<AssignedValue<Fr>>,
)
```
and
```
pub fn verify_batch_ipa_proof(
builder: &mut GateThreadBuilder<Fr>,
complete_batch_ipa_proof: CompleteBatchIPAProof,
make_public: &mut Vec<AssignedValue<Fr>>,
)
```
### Specs
Currently, the circuit is running for BN-254, which is notably non-native. (This means that the proving system is designed for $\mathbb F_r$ circuits, but the elliptic curve is defined over $\mathbb F_p$, hence doing elliptic curve operations requires *simulating a different field.* This is expensive.)
Ppening proofs for length 256 vectors, batched to 20 vectors.
* 29,074,563 advice cells;
* 3,529,821 lookup advice cells
* 262 fixed cells
* size of keys: pk: ~ 12 gb, vk: ~3.7 mb
* a proof verifies in ~10 seconds on 3.3 GHz x 4 processor with 32 gB of RAM.
Substituting a "native" elliptic curve should *dramatically* improve performance. (This means an elliptic curve defined over $\mathbb{F}_r$, the field in which Halo2 assumes arithmetic circuits are written.) One example of such a curve is [Grumpkin](https://hackmd.io/@aztec-network/ByzgNxBfd#2-Grumpkin---A-curve-on-top-of-BN-254-for-SNARK-efficient-group-operations), developed by Aztec to be in a "curve cycle" with BN-254).
## Next steps
Moving forward, my goal is to write a circuit to verify membership in a Verkle trie, following the specs provided in the above links by the Ethereum foundation. This will involve variable length byte-array concatenation checks. In a normal circuit, this would be difficult to attain with any degree of efficiency: when trying to naively implementing branching, circuits have to compute all branches!
Fortunately, Halo2's challenge API is almost designed for this; it allows you to generate in-circuit randomness and feed that into your circuit. This randomness may be used in conjunction with the Schwarz Zippel lemma to test polynomial equality. I will most likely use Axiom's RlcChip. (Rlc = "random linear combination".)