# Current system.
$\DeclareMathOperator{Im}{Im} \DeclareMathOperator{Ker}{Ker} \DeclareMathOperator{eq}{eq} \DeclareMathOperator{Fr}{Fr}$
1. We can do sumcheck of the form $(P \land Q)(x) \eq(x; q)$.
2. Formally, $(P \land Q)(x)$ is defined as $\sum b_i P_i(x) Q_i(x)$, where $P_i, Q_i$ are coordinate polynomials of $P, Q$. $P_i$ and $Q_i$ can be expressed through Frobenius twists of $P$ and $Q$, but it is only needed for verifier on the last stage, so prover does not keep twists of $P$ and $Q$ ever.
2.1. Prover does it in $2$ phases. In 1st phase, taking $c$ rounds, it works roughly normally, but operates over the quadratic polynomial $P \land Q$, computed on the subset $(0, 1, \infty)^{c+1} \times (0, 1)^{n-c-1}$.
On i-th round, $i \leq c$ it keeps the restriction $(P \land Q) (r_0,.., r_{i-1}, x_i, ..., x_{n-1})$.
When it needs to answer the query - i.e. send univariate polynomial $$U(t) = \underset{x_{>i}}\sum (P \land Q)(r_{<i}, t, x_{>i}) \eq(r_{<i}, t, x_{>i}; q)$$ it exploits the fact that $\eq(r_{<i}, t, x_{>i}; q)$ is split as $$\eq(r_{<i}, t, x_{>i}; q) = \eq(r_{<i}; q_{<i}) \eq(t; q_i) \eq(x_{>i}; q_i)$$
This allows us to not manipulate with polynomials of degree $3$ directly until the last stage - $\eq(r_{<i}; q_{<i}) \eq(t; q_i)$ terms do not depend on $x_{>i}$, and so can be evicted from the sum. I.e. we compute the sum $$W(t) = \underset{x_{>i}}\sum (P \land Q)(r_{<i}, t, x_{>i}) \eq(x_{>i}; q_i)$$
and then adjust it as $U(t) = \eq(r_{<i}; q_{<i}) \eq(t; q_i)W(t)$. This last stage is done in coefficient form, and the summation itself is done in evaluation form - i.e. it is just 3 sums
$$W(t) = \underset{x_{>i}}\sum (P \land Q)(r_{<i}, t, x_{>i}) \eq(x_{>i}; q_i)$$
for $t = 0, 1, \infty$ - which we can compute because we have the evaluations of $(P \land Q) (r_0,.., r_{i-1}, t, x_{i+1}, ..., x_{n-1})$ for $t = 0, 1, \infty$.
2.2 After $c$ phases, prover no longer has required data (as we have computed extension only on $3^{c+1} \times 2^{n-c-1}$ subset), and can not proceed further. This is when we change strategy, and compute $P_i(r_0, ..., r_c, x_{>c})$ and $Q_i(r_0, ..., r_c, x_{>c})$ (using RESTRICT functionality).
2.3 Optimality considerations. First phase requires work roughly proportional to $(3/2)^{c+1} N$. Second phase requires work roughly proportional to $\frac{128}{2^c} N$. By trial and error, I have found that the optimal inflection point is $c = 5$, however, $c = 4, c = 6$ are somewhat close.
3. Normal sumcheck would end with prover's claims about values of $P_i(r), Q_i(r)$. We employ an additional argument to convince verifier in this values. Namely, for any number a, its i-th coordinate $a_i$ satisfies $$a_i = \pi (\gamma_i a)$$ where $$\pi(x) = \underset{0\leq j < 128}\sum \Fr^j(x)$$, and $\gamma_i$ is a dual basis to our standard basis $b_i$ w.r.t. to the bilinear form $$\langle x, y \rangle = \pi(xy) = (xy)_0$$
Therefore, our prover actually communicates values of $(\Fr^j P) (r)$, and verifier locally recovers intended $P_i(r)$-s.
Moreover, as we assume that actually verifier only has access to $P$, we recall that $$(\Fr^j P) (r) = \Fr^j (P(\Fr^{-j} r))$$ and thus it is equivalent to querying polynomial $P$ in an inverse Frobenius orbit of $r$.
## Multiopening reduction
#### Naive approach
In order for our system to be useful, we need to provide an additional argument, which will, for a polynomial $P$, reduce the statements $P(\Fr^i r)$ to a single opening in some different point $r'$. There is a very classic, standard multiopening argument, working as follows:
1. For some points $q_0, ..., q_{k-1}$ (not necessarily Frobenius orbit), prover claims: $$P(q_i) = s_i$$
2. Verifier sends challenge $u$.
3. Now, prover, trivially, has the following folded claim: $$\underset{i}\sum u^i (P(q_i) - s_i) = 0$$
4. Replace with sumchecks and rearrage: $$\underset{i}\sum u^i ((\underset{x} \sum P(x) \eq(x; q_i)) - s_i) = 0$$ $$\underset{x}\sum P(x) (\underset{i}\sum u^i \eq(x; q_i)) = \underset{i} \sum u^i s_i$$
5. Run the sumcheck.
#### Improvement 1
Crucially, for arbitrary set of opening points $q_i$, this argument requires $kN$ from the prover in order to compute $\underset{i}\sum u^i \eq(x; q_i)$. In our case, however, this is unacceptable, as we have $128$ opening points. But they are not random, and $\eq(\Fr^i r; x)$ is actually $= \Fr^i \eq(r, x)$ (Frobenius is a homomorphism, and coefficients of eq are just some algebraic expressions of $r$). Therefore, we can write
$$\underset{i}\sum u^i \eq(x; \Fr^i r) = \underset{i} \sum u^i \Fr^i \eq(x; r)$$
This means, that we can just compute $\eq(x; r)$, and then apply to its values a single $128 \times 128$ boolean matrix $\underset{i}\sum u^i \Fr^i$. This is definitely cheaper than computing $\eq(x; \Fr^i r)$ separately! However, matrix application is still a heavy operation (much heavier than multiplication).
#### Improvement 2 (unimpl, likely unnecessary)
There is an alternative path, which suggests to instead operate with polynomials $$S_i(x) = \eq(x, r) P_i(x)$$
The main feature here is that $$ \eq(x; \Fr^j r) P_i(x) = \Fr^j S_i$$ because $P_i$ has values in $\mathbb{F}_2$, and hence is invariant under Frobenius mapping.
Now, for the reason which will be clear very soon, we will run the sumcheck in a *reverse* order of variables.
**Claim.** On $j$-th round of sumcheck, with previous challenges denoted as $r'_{n-1}, r'_{n-2}, ..., r'_{n-j-1}$ it is enough to compute sums $$\underset{x_{<n-j-1}}\sum S_i(x_{<n-j-1}, t, r'_{>{n-j-1}}) = \eq(x_{<n-j-1}, t, r'_{>{n-j-1}}; r) P_i(x_{<n-j-1}, t, r'_{>{n-j-1}})$$
The sum over $x$-s can then be compressed into $$\eq(t, r_{n-j-1}) P_i(r_{<n-j-1}, t, r'_{>{n-j-1}})$$
This is very efficient, because we *already know* polynomials $P_i(r_{<n-j-1}, x_{\geq n-j-1})$ from the second phase of our previous algorithm (we currently do not save them, but we can). This allows us to skip first few rounds almost completely - the remaining sum is proportional in size to $2^j$. Which is exactly the reason why we want to run this sumcheck in reverse order of variables!
After skipping first few rounds (and not later than when $n-j-1$ reaches $c$) we can just compute $$\sum u^i \eq(x_{<n-j-1}, r'_{\geq n-j-1}; \Fr^i r)$$ by replacing $\eq(x_{<n-j-1}, r'_{\geq n-j-1}; \Fr^i r) = \eq(x_{<n-j-1}; \Fr^i r_{<n-j-1}) \eq(r'_{\geq n-j-1}; \Fr^i r_{\geq n-j-1})$, and computing $$ \sum (u^i \eq(r'_{\geq n-j-1}; \Fr^i r_{\geq n-j-1})) \eq(x_{<n-j-1}; \Fr^i r_{<n-j-1})$$
Optimization from Improvement 1 can be used here, though it is not strictly necessary, as this is already very efficient.
### How to do XORs.
The system described above is enough to build ANDplonk - a version of PlonK, which supports, together with normal additions and multiplications, a bitwise AND operation. This, however, is not efficient for hashes, the main reason being that additions (XOR operations) will not be free in such arithmetization.
Sumchecks, however, do allow us to do very efficient linear operations.
Assume we are given some linear matrix $M(y; x)$ (with some efficiency assumptions, stated later). Then, application of this matrix to the vector of values is, by definition, a following sum: $$(MP)(y) = \underset{x}\sum M(y; x) P(x)$$
This allows us to reduce claims about $MP$ to claims about $P$, using the sumcheck:
$$(MP)(r) = \underset{x}\sum M(r; x) P(x)$$
Efficiency requirements:
1. Prover must be able to apply matrix: i.e. compute $MP$ from $P$ efficiently.
2. Prover must be able to apply transposed matrix: i.e. compute $M(r; x) = M^t \eq(r; x)$ efficiently.
In particular, let us assume that our linear transforms are slightly vectorized, i.e. matrix actually works on chunks of size $2^k$ and not whole $2^n$. Then, we can even compute the following reduced sumcheck:
$$\underset{x_{\geq k}}\sum M(r_{<k}; x_{< k})P(x, r_{\geq k})$$
which reduces the claim about evaluation of $MP(r)$ to the evaluation of $P(r'_{<k}, r_{\geq k})$.
This is somewhat better for verifier (less rounds), and much better for the prover *if it knows $P(x, r_{\geq k})$ in advance* - which, funnily enough, exactly what happens in our scenario, due to
1. $M$ represeneting linear part of the round of the keccak hash - and hence being invertible.
2. Us knowing $MP(x; r_{\geq k})$ from the previous sumcheck, and hence being able to compute $P(x, r_{\geq k})$ by applying $M^{-1}$.
----
# Hashcaster
We construct a succinct proof system with efficient vectorized boolean operations.
While it can, potentially, be used for a lot of different application, the most obvious one is an efficient non-arithmetic hashes. Considering that the resulting proofs are still quite large, we believe that in many cases it will be embedded into some sort of external SNARK.
This motivates our name choice - we call our system Hashcaster, for its ability to turn non-arithmetic hashes into arithmetic ones (required for its verification in the embedding SNARK).
## Idea
Recently, starting from Binius, there was a lot of attention related to extension fields of $\mathbb{F}_2$. Creating proof systems over such fields is challenging, however, they do, by default, provide us *at least* with a very efficient vectorized XOR operator (linear addition over $\mathbb{F}_2$).
Our main insight is the idea that we can also express AND operator (coordinate-wise multiplication) in appropriate basis through algebraic operations: additions, multiplications and, crucially, Frobenius morphism.
We adapt the sumcheck protocol for Frobenius twists of polynomials.
## Preliminaries
### Representing bitwise AND algebraically
In what follows, we will mainly consider the field $K = \mathbb{F}_2^{128}$. This is done mainly for simplicity, and we believe that our result extends to other extensions. This can be done if the need arises.
Let's consider a quadratic extension, $\mathbb{F}_{2}^{k+1} / \mathbb{F}_{2}^{k}$.
**Definition.** Artin-Schreier mapping $a_k$ is defined by formula $$a_k (x) = x^{2^k} - x$$.
**Lemma.** $$\Ker(a_k) = \mathbb{F}_{2}^{k}$$ $$\Im(a_k) = \mathbb{F}_{2}^k$$
**Proof.** Clearly, for every $x \in \mathbb{F}_{2}^{k}$ it holds that $x^{2^k} - x = 0$ (by generalized Fermat little theorem). Conversely, it no other element satisfies such property, because equation $x^{2^k} - x = 0$ can not have more than $2^k$ roots. This proves that the kernel of $a_k$ is indeed $\mathbb{F}_{2}^{k}$.
Now, the image of the mapping has dimension $2^{k+1} - 2^k = 2^k$, so to prove that it coincides with $\mathbb{F}_2^k$ it is enough to see that every element $s \in \mathbb{F}_2^k$ has a preimage: $x^{2^k} - x = s$. But this is a quadratic polynomial, so it has a root in a quadratic extension of $\mathbb{F}_2^k$, and there is a single quadratic extension for each finite field - in this case, $\mathbb{F}_{2}^{k+1}$.
$\blacksquare$
**Definition.** Now, we compose $a_k$-s through the tower of extensions: $$\pi(x) = a_{1} \circ \cdots \circ a_7 \circ a_{8} (x)$$
**Lemma.** $$\Im(\pi) = \mathbb{F}_2$$
Proof obvious (applying facts that we already know about $a_k$).
We will frequently abuse the notation and treat $\pi$ as $\mathbb{F}_2$-valued linear function on space $\mathbb{F}_2^{128}$.
Now, pick any $\mathbb{F}_2$-basis $r_0, ..., r_{127}$ in $\mathbb{F}_2^{128}$. We define linear functions $\pi_i(x) = \pi(r_i x)$.
Note that these are also linearly independent.
**Definition.** Chosen basis $b_0, ..., b_{127}$ is a dual basis to $\pi_i$-s, i.e. satisfying $\pi_i(b_j) = \delta_{ij}$.
**Lemma.** Actually, any basis can be chosen for appropriate choices of $r$-s, because the mapping $r \mapsto \pi(rx)$ is an isomorphism with dual space.
**Proof.** Assume converse. Then, there is $r \neq 0$ such that linear function $\pi(rx)$ is $0$. However, we know that $\pi(x)$ is nonzero, and substituting $x = y/r$ we recover original $\pi$. Therefore, it is an isomorphism, and so for any basis $b_i$ the corresponding collection $r_i$ can be found.
We will normally choose basis $b$ to correspond with our main computation basis, and find $r_i$-s using this procedure.
This allows us to express bitwise AND (in basis $b_i$) through an algebraic formula:
$$x \land y = \underset{i}\sum b_i \pi_i(x) \pi_i(y)$$
Moreover, this formula is actually quadratic in $x, y$ and their Frobenius twists.
### Linearized formula for $\pi$
We observe, that actually $\pi(x)$ admits a rather simple expression:
$$\pi(x) = \underset{i}\sum \Fr^i(x)$$
i.e. it is a sum over whole Frobenius orbit. One can show that composition of these mappings for a sequence of extensions is once again mapping of this form.
### Frobenius twists of polynomials
Consider a polynomial $P$ (potentially, multivariate), given in coefficient form. Let us denote $P_{>> k}$ the application of $k$-th power of Frobenius mapping to it. Then, it is well known that $$P_{>>k}(x) = \Fr^k(P(\Fr^{-k}(x)))$$ (or, explicitly, $(P(x^{1/p^k}))^{p^k}$).
Note, that this operation also applies Frobenius mapping to the *values* of a polynomial on points defined over $\mathbb{F}_2$. This is extremely important, as sumcheck will operate on a boolean hypercube $\mathbb{B}^n = \{0, 1\}^n$.
In what follows, we will frequently need to compute values of a polynomial in Frobenius orbit. We give an efficient prover for this subroutine:
### Efficently evaluating multilinear polynomial in Frobenius orbit
Suppose we are given a multilinear polynomial $P(x)$ in evaluation form (and we will denote corresponding table of values $P[i]$). We want to compute it in points $$r, r^2, r^{2^2}, ..., \Fr^i(r), ...$$
Using formulas from previous section, we know that $P(\Fr^{-i}(r)) = \Fr^{-i}((P_{>>i})(r))$.
Naively, this would require us to compute all Frobenius shifts of $P$, and evaluate them.
**Naive cost** (counting only asymptotic terms):
1. $N$ multiplications to compute the Lagrange kernel $\eq(r, x)$.
2. $128 N$ Frobeniuses to compute shifts of $P$.
3. $128 N$ multiplications and $128 N$ additions to compute inner products of shifts with Lagrange kernels.
This cost is very significant, and would render protocol completely impractical. We suggest the following improved method:
1. We split each $P$ into standard basis, obtaining 128 tables with values in $\mathbb{F}_2$: $$P(x) = \underset{i < 128}\sum b_i P_i(x)$$
2. We evaluate each $P_i(r)$, and compute their Frobenius orbits, as $P_i(\Fr^j(r)) = \Fr^j(P_i(r))$, due to its values living in $\mathbb{F}_2$.
3. We linearly combine the results back: $P(\Fr^i(r)) = \underset{i < 128}\sum b_i P_i(\Fr^i(r))$.
Then, we can estimate the **improved cost**:
1. $N$ multiplications to obtain $\eq(r, x)$.
2. $128 N$ additions and $128 N$ conditional choices (multiplication by bit) to compute all $P_i(r)$.
3. Constant work to combine the results back.
This makes such evaluation comparable in costs with single evaluations.
### Efficiently convincing the verifier about these evaluations (DEPRECATED, CHECK REAL METHOD ABOVE)
Assume that we are given a polynomial $P$, and are trying to convince verifier that $P(\Fr^i(r)) = c_i$. Polynomial is committed, so we can open it in any a single point. Then, there is a standard sumcheck argument, reducing multiple openings (in points $r_i$) to a single one:
Verifier sends challenge value $\gamma$, and we perform the sumcheck for the following equation:
$$P(x)(\underset{i}\sum \gamma^i \eq(x; r_i)) = \underset{i}\sum \gamma^i c_i$$
In the end of sumcheck, $x$ will be specialized to some single random value $r'$, and verifier will only need to check single opening $P(r')$, and compute some constant amount of values $\eq(r'; r_i)$.
So, this sumcheck is very efficient for the verifier, the question is how to make it efficient for the prover. Naively evaluating $\eq(x; r_i)$ is out of question.
We suggest the following improved method.
1. Split $r$ into $(r_{-} | r_{+})$ of half size.
2. Perform sumcheck $\underset{i}\sum \gamma^i((P(x_{-} | \Fr^i(r_{+})) \eq(x_{-}; \Fr^i (r_{-})))$
Note that we can evaluate the tables $P(x_{-} | \Fr^i(r_{+}))$ efficiently, using the argument from the previous section (for each $x_{-}$).
Tables $\eq(x_{-}; \Fr^i (r_{-}))$ will be evaluated naively, which requires $128 \sqrt{N}$ multiplications.
3. This sumcheck spits out specialization $x_{-} \rightarrow r'_{-}$, and now we are required to compute evaluations $P(r'_{-}, \Fr^i(r_{+}))$.
4. Which is normal naive protocol from before.
This method is efficient provided that $\sqrt{N} >> 128$, which happens in all realistic scenarios.
### Evaluating $P \land Q$ in a point
Important prover subroutine is evaluating a quadratic polynomial $P \land Q = \underset{i}\sum b_i \pi_i(P) \pi_i(Q)$ in a point. Simplest approach is just computing its coefficients (there are $3^n = N^{\log_2(3)}$ terms), and evaluating it directly. There are somewhat better approaches if $N$ is large enough: one can compute $P_i = \pi_i(P)$, $Q_i = \pi_i(Q)$, evaluate them in $r$ (which requires 256 N additions, because these polynomials are over $\mathbb{F}_2$, and compute $\underset{i}\sum b_i P_i(r) Q_i(r)$).
For very small $N$-s this method is worse, because it hides the constant $256$ multiplications to combine the evaluation $P_i(r)$ and $Q_i(r)$ into a single answer.
## Combining it together: ANDCHECK
We would like compute the following sumcheck on a boolean hypercube:
$$\underset{x \in \mathbb{B}^n}\sum (P(x) \land Q(x)) \times \eq(x; q)$$
which is treated as quadratic formula depending on Frobenius twists of $P$ and $Q$
Let's recall that the prover response is a univariate polynomial of degree $3$ defined as:
$$U_i(t) = \underset{x_{>i} \in \mathbb{B}^{n-i-1}}\sum (P \land Q)(r_{<i}, t, x_{>i}) \times \eq(r_{<i},t,x_{>i}; q)$$
Which can be reaaranged as
$$U_i(t) = \eq(r_{<i}, q_{<i}) \eq(t, q_i) \underset{x_{>i} \in \mathbb{B}^{n-i-1}}\sum (P \land Q)(r_{<i}, t, x_{>i}) \times \eq(x_{>i}, q_{>i})$$
which drops the degree in $t$ under summation to 2.
Our main problem now is computing the restriction of $(P \land Q)(r, _)$, for which we have discussed some approaches in the previous section.
We split our protocol in $2$ phases:
**Phase 1:**
This phase consists of $c$ rounds, and we start by computing the table of evaluations of $P$, $Q$ in the extended set of size $3^{c+1} * 2^{n-c-1}$, namely $(0, 1, \infty)^{c+1} \times (0, 1)^{n-c-1}$.
This requires us to compute $3^{c+1}2^{n-c-1} - 2^n$ additions (XORs).
Then, we also evaluate $P \land Q$ on this whole subset, which requires $3^{c+1}2^{n-c-1}$ evaluations of $\land$ (and $2^n$, presumably, were already done).
*Note:* this work will be insignificant compared to what is done next, however, this tradeoff might change if instead of $\land$ we are using a more involved arithmetic circuit with multiple inputs and multiple arithmetic operations.
Next, we are ready to respond to verifier challenges. Prover enters the first round and does the following:
1. Compute evaluations of (P \land Q)(t, x_{>0}), for $t \in \{0, 1, \infty\}$, and add them up multiplied by $\eq(x_{>0}, q_{>0})$. This requires $3 * 2^{n-1}$ multiplications.
2. Obtain challenge $r_0$ from verifier.
3. Restrict our table on $r_0$. This represents the bulk of work, taking $3^{c+1} 2^{n-c-1}$ multiplications.
**Phase 2**
In Phase 2, we explicitly maintain polynomials $P_i = \pi_i(P)$, $Q_i = \pi_i(Q)$. This slowdowns us a lot, so it makes sense to search for some balance. We believe $c = 5$ should provide a reasonable tradeoff.