Endoscaling - HackMD

# Endoscaling Often in proof systems, it is necessary to multiply a group element by a scalar that depends on a challenge. Since the challenge is random, what matters is only that the scalar retains that randomness; that is, it is acceptable to apply a 1-1 mapping to the scalar if that allows the multiplication to be done more efficiently. The Pasta curves we use for Halo 2 are equipped with an endomorphism that allows such efficient multiplication. By allowing a 1-1 mapping as described above, we can avoid having to "decompose" the input challenge using an algorithm such as [[Pornin2020]](https://eprint.iacr.org/2020/454) that requires lattice basis reduction. ## Definitions - The Lagrange basis polynomial $\ell_i(X)$ is such that $\ell_i(\omega^i) = 1$ and $\ell_i(\omega^j) = 0$ for $i \neq j$. - We consider curves over a base field $\mathbb{F}_p$ with a "cubic endomorphism" $\phi$ defined on $\mathbb{F}_p$-rational points by $\phi((x, y)) = (\zeta_p \cdot x, y)$ for $\zeta_p \in \mathbb{F}_p$. This is equivalent to $\phi(P) = [\zeta_q]P$ for some $\zeta_q \in \mathbb{F}_q$ of multiplicative order $3$. ## Endoscaling for public inputs In the Halo 2 proof system, this technique can optionally be used to commit to an instance column using bits that represent the public input. Each basis polynomial corresponds with a cell in the column. ## Computing an endoscaling commitment Let $N$ be the limit on the number of bits that can be input to endoscaling at once while avoiding collisions. For CM curves that have a cubic endomorphism $\phi$, such as the Pasta curves, this limit can be computed using the script [checksumsets.py in zcash/pasta](https://github.com/zcash/pasta/blob/master/checksumsets.py). Assume that $N$ is even. (For Pasta, $N = 248$.) Let $\text{Endoscale}$ be Algorithm 1 in the [Halo paper](https://eprint.iacr.org/2019/1021.pdf): $$ (\mathbf{r}, G) \mapsto [n(\mathbf{r})] G $$ Given $G_i = \text{Comm}(\ell_i(X))$, we compute an endoscaling instance column commitment by calculating the sum $P = \sum_{i = 0}^{m - 1} \text{Endoscale}(\mathbf{r}_i, G_i)$. ![Screenshot 2024-01-04 at 4.46.49 PM](https://hackmd.io/_uploads/Sk2vglSu6.png) Example: Let $\textbf{r} = 11001101 \in \{0, 1\}^8$, so $\lambda = 8$ then we need $\lambda / 2 = 4$ loops: $\begin{array}{rl} i = 3 &\rightarrow r_7 = 1, S_3 = \phi([2 r_6 - 1] P) = \phi(P)\\ i = 2 &\rightarrow r_5 = 0, S_2 = [2 r_4 - 1] P = - P \\ i = 1 &\rightarrow r_3 = 1, S_1 = \phi([2 r_2 - 1] P) = \phi(P) \\ i = 0 &\rightarrow r_1 = 0, S_0 = [2 r_0 - 1] P = P \end{array}$ :warning: Apart from saving constraints for each bit, we also process the double-and-add using 4 loops instead of 8. ![Screenshot 2024-01-04 at 4.48.02 PM](https://hackmd.io/_uploads/Bkzcxlrdp.png) ### Algorithm 1 (optimized) The input bits to endoscaling are $\mathbf{r}$. Split $\mathbf{r}$ into $m$ chunks $\mathbf{r}_0, \mathbf{r}_1, ..., \mathbf{r}_{m - 1} \in \{0, 1\}^N$. For now assume that all the $\mathbf{r}_i$ are the same length. let $S(i, j) = \begin{cases} [2\mathbf{r}_{i,2j} - 1] G_i,\text{ if } \mathbf{r}_{i,2j+1} = 0, \\ \phi([2\mathbf{r}_{i,2j} - 1] G_i),\text{ otherwise}. \end{cases}$ $P := [2] \sum_{i=0}^{m-1} (G_i + \phi(G_i))$ for $j$ in $0..N/2$: $$ \begin{array}{l} \mathrm{Inner} := S(0, j) \\ \text{for $i$ in $1..m$:} \\ \hspace{2em} \mathrm{Inner} := \mathrm{Inner} \;⸭\; S(i, j) \\ P := (P \;⸭\; \mathrm{Inner}) \;⸭\; P \\ \end{array} $$ Example: suppose we have two chunks $\textbf{r}_0, \textbf{r}_1$, each with bit length of $8$: $$r_{71} r_{61} \dots r_{21} r_{11} r_{01} | r_{70} r_{60} \dots r_{20} r_{10} r_{00}$$ - We first loop over $i = 3, 2, 1, 0$ to compute $S_i$ for each chunk $\textbf{r}_0, \textbf{r}_1$ - In each loop, we compute the accumulation of $S_i$ for all the chunks The above algorithm is equivalent to (using complete addition) $$ \begin{array}{l} P := \mathcal{O} \\ \text{for $i$ in $0..m$:} \\ \hspace{2em} P := [2] (G_i + \phi(G_i)) \\ \hspace{2em} \text{for $j$ in $0..N/2$:} \\ \hspace{4em} P := (P + S(i, j)) + P \\ \end{array} $$ #### Circuit cost We decompose each $\mathbf{r}_i$ chunk into two-bit chunks: $$ \mathbf{r} = c_0 + 4 \cdot c_1 + ... + 4^{N/2 - 1} \cdot c_{N/2 -1} $$ with a running sum $z_j, j \in [0..(N/2)).$ $z_0$ is initialized as $z_0 = \mathbf{r}$. Each subsequent $z_j$ is calculated as: $z_j = (z_{j-1} - c_{j-1}) \cdot 2^{-2}$. The final $z_{N/2} = 0$. Each $c_j$ is further broken down as $c_j = b_{j,0} + 2 \cdot b_{j,1}$. The tuple $(b_0, b_1)$ maps to the endoscaled points: typos: :warning: $\begin{array}{rl} (0, 0) &\rightarrow (G_x, -G_y) \\ (0, 1) &\rightarrow (\zeta \cdot G_x, -G_y) \\ (1, 0) &\rightarrow (G_x, G_y) \\ (1, 1) &\rightarrow (\zeta \cdot G_x, G_y) \end{array}$ corrections: :100: $\begin{array}{rl} (0, 0) &\rightarrow (x_G, -y_G) \\ (0, 1) &\rightarrow (\zeta \cdot x_G, -y_G) \\ (1, 0) &\rightarrow (x_G, y_G) \\ (1, 1) &\rightarrow (\zeta \cdot x_G, y_G) \end{array}$ Based on Algorithm 1, give a string $\textbf{r} = r_1 r_0 \in \{0, 1\}^2$, the above formulas are equivalent to $\begin{array}{rl} 00 &\rightarrow -G = (x_G, -y_G) \\ 10 &\rightarrow \phi(-G) = (\zeta \cdot x_G, -y_G) \\ 01 &\rightarrow G = (x_G, y_G) \\ 11 &\rightarrow \phi(G) = (\zeta \cdot x_G, y_G) \end{array}$ Let $r$ be the number of incomplete additions we're doing per row. For $r = 1$: $$ \begin{array}{|c|c|c|c|c|c|} \hline z & b_0 & b_1 & x_G & y_G & x_a & x_p & \lambda_1 & \lambda_2 & q_{endoscale\_base} & q_{init} & q_{double}& q_{add\_incomplete} & q_{dbl\_and\_add} & q_{final} \\\hline & & & & & x_G & y_G & x_{\phi(G)} = \zeta \cdot x_G & y_G & 0 & 0 & 0 & 1 & 0 & 0 \\\hline & & & & & x_{G + \phi(G)} & y_{G + \phi(G)} & x_{InitAcc} & y_{InitAcc} & 0 & 1 & 1 & 0 & 0 & 0 \\\hline z_{N/2} & b_{0,N/2 - 1} & b_{1,N/2 - 1} & x_G & y_G & x_{InitAcc} & x_{P, N/2 - 1} & \lambda_{1, N/2 - 1} & \lambda_{2, N/2 - 1} & 1 & 0 & 0 & 0 & 1 & 0 \\\hline z_{N/2 - 1} & b_{0,N/2 - 2} & b_{1,N/2 - 2} & x_G & y_G & x_{A, N/2 - 2} & x_{P, N/2 - 2} & \lambda_{1, N/2 - 2} & \lambda_{2, N/2 - 2} & 0 & 0 & 0 & 0 & 1 & 0 \\\hline ... & ... & ... & ... & ... & ... & ... & ... & ... & ... & ... & ... & ... & ... & ... \\\hline z_1 & b_{0, 0} & b_{1, 0} & x_G & y_G & x_{A,0} & x_{P,0} & \lambda_{1,0} & \lambda_{2,0} & 0 & 0 & 0 & 0 & 0 & 1 \\\hline z_0 & & & & & x_{A,final} & & y_{A,final} & & & & & & & \\\hline \end{array} $$ Recall that double-and-add uses a width-four circuit layout, where each row computes $$A_{i+1} := (A_i \;⸭\; P_i) \;⸭\; A_i \equiv R_i \;⸭\; A_i$$. $$ \begin{array}{|c|c|c|c|} \hline x_P & x_A & \lambda_1 & \lambda_2 \\\hline x_{P,0} & x_{A,init} & \lambda_{1,0} & \lambda_{2,0} \\\hline ... & ... & ... & ... \\\hline x_{P,i} & x_{A,i} & \lambda_{1,i} & \lambda_{2,i} \\\hline x_{P,i+1} & x_{A,i+1} & \lambda_{1,i+1}& \lambda_{2,i+1} \\\hline ... & ... & ... & ... \\\hline x_{P,n-1} & x_{A,n-1} & \lambda_{1,n-1} & \lambda_{2,n-1} \\\hline \end{array} $$ For each row $j$ from $N/2$ down to $0$, we check the decomposition and the endoscaling map. $$ \begin{array}{|c|l|} \hline \text{Degree} & \text{Constraint} \\\hline 3 & q_{endoscale\_base} \cdot \BoolCheck(b_0) = 0 \\\hline 3 & q_{endoscale\_base} \cdot \BoolCheck(b_1) = 0 \\\hline 2 & q_{endoscale\_base} \cdot [(z_{j - 1} - z_{j} \cdot 2^{-2}) - (b_0 + 2\cdot b_1)] = 0 \\\hline 3 & q_{endoscale\_base} \cdot b_0 \cdot (y_p - y_G) + (1 - b_0) \cdot (y_p + y_G) = 0 \\\hline 3 & q_{endoscale\_base} \cdot b_1 \cdot (x_p - \zeta \cdot x_G) + (1 - b_1) \cdot (x_p - x_G) = 0 \\\hline \end{array} $$ where $$ y_p = y_a - \lambda_1 \cdot (x_a - x_p) $$ The $q_{dbl\_and\_add}$ selector is also passed to the [double-and-add](../double-and-add.md) helper as $q_{gradient}$, which means that the double-and-add gradient check is activated on each row where $q_{dbl\_and\_add} = 1$: $$ \begin{array}{|c|l|} \hline \text{Degree} & \text{Constraint} \\\hline 3 & q_{dbl\_and\_add} \cdot \left(\lambda_{2,i} \cdot (x_{A,i} - x_{A,i-1}) - y_{A,i} - y_{A,i-1}\right) = 0 \\\hline \end{array} $$ This composite selector $q_{dbl\_and\_add} + q_{final}$ is passed to the [double-and-add](../double-and-add.md) helper as $q_{secant}$ which means that the double-and-add secant check is activated on each row where $q_{dbl\_and\_add} + q_{final} = 1$: $$ \begin{array}{|c|l|} \hline \text{Degree} & \text{Constraint} \\\hline 3 & (q_{dbl\_and\_add} + q_{final}) \cdot \left(\lambda_{2,i}^2 - x_{A,i-1} - x_{R,i} - x_{A,i}\right) = 0 \\\hline \end{array} $$ where $$ \begin{aligned} x_{R,i} &= \lambda_{1,i}^2 - x_{A,i} - x_T, \\ y_{A,i} &= \frac{(\lambda_{1,i} + \lambda_{2,i}) \cdot (x_{A,i} - (\lambda_{1,i}^2 - x_{A,i} - x_T))}{2},\\ y_{A,i-1}^\text{witnessed} &\text{ is witnessed.} \end{aligned} $$ ### Initialization To initialize the double-and-add, we set the accumulator to $InitAcc = [2](\phi(P) + P)$. In the case where $P$ is a fixed base, we copy it in from a fixed column; if $P$ is a variable base, we require it to be provided as a `NonIdentityEccPoint` from the ECC gadget, and copy in the $x$ and $y$ coordinates. (The initial section of the layout has been reproduced here for ease of reference.) $$ \begin{array}{|c|c|c|c|c|c|} \hline x_a & x_p & \lambda_1 & \lambda_2 & q\_init & q\_add\_incomplete & q\_double \\\hline x_P & y_P & x_{\phi(P)} & y_{\phi(P)} = y_P & 0 & 1 & 0 \\\hline x_{P + \phi(P)} & y_{P + \phi(P)} & x_{InitAcc} & y_{InitAcc} & 1 & 0 & 1 \\\hline x_{N/2 - 1} & x_{N/2 - 1,p} & \lambda_{N/2 - 1, 1} & \lambda_{N/2 - 1, 2} & 0 & 0 & 0 \\\hline \end{array} $$ We use the [incomplete addition](./ecc/addition.md#incomplete-addition) helper to perform the first incomplete addition of $\phi(P) + P$. The result is then passed into the doubling helper to get the initial accumulator $[2](\phi(P) + P)$. The $q\_{init}$ selector checks that: $$ \begin{array}{|c|l|} \hline \text{Degree} & \text{Constraint} \\\hline 2 & q\_init \cdot \left(\zeta \cdot x_P - x_{\phi(P)}\right) = 0 \\\hline 3 & q\_init \cdot (y_{InitAcc} - y_{A,N/2 - 1}) = 0 \\\hline \end{array} $$ where $$ y_{A,N/2 - 1} = \frac{(\lambda_{1,N/2 - 1} + \lambda_{2,N/2 - 1}) \cdot (x_{A,N/2 - 1} - (\lambda_{1,N/2 - 1}^2 - x_{InitAcc} - x_T))}{2} $$ #### Finalization In the final section of the double-and-add algorithm, we witness the $y$ coordinate of the accumulator and check that it is consistent with the previous values. (The final section of the layout has been reproduced here for ease of reference.) $$ \begin{array}{|c|c|c|c|c|c|} \hline z & b_0 & b_1 & x_G & y_G & x_a & x_p & \lambda_1 & \lambda_2 & q_{final} \\\hline z_1 & b_{0, 0} & b_{1, 0} & x_G & y_G & x_{A,0} & x_{P,0} & \lambda_{1,0} & \lambda_{2,0} & 1 \\\hline z_0 & & & & & x_{A,final} & & y_{A,final} & & \\\hline \end{array} $$ The $q\_final$ selector checks that: $$ \begin{array}{|c|l|} \hline \text{Degree} & \text{Constraint} \\\hline 2 & q\_final \cdot \left(\lambda_{2,0} \cdot (x_{A,0} - x_{A, final}) - y_{A,0} - y_{A, final}\right) = 0 \\\hline \end{array} $$ ### Algorithm 2 Split $\mathbf{r}$ into $K$-bit chunks $r_{0..=u-1}$. $\mathsf{Acc} := 2(\zeta + 1)$ for $i$ from $N/K - 1$ down to $0$: $\hspace{2em}$ look up $s = \mathsf{endoscale\_scalar}(r_i)$ $\hspace{2em}$ $\mathsf{Acc} := 2^{K/2} \cdot \mathsf{Acc} + s$ **Remark:** Suppose the bit length $K = 8$ as before, we have \begin{equation} \begin{split} s & = \mathsf{endoscale\_scalar}(r_i) \\ & = 2^3 \cdot S_3 + 2^2 \cdot S_2 + 2^1 \cdot S_1 + 2^0 \cdot S_0 \end{split} \end{equation} Hence the final value of $\mathsf{Acc}$ is exactly the result $n(\textbf{r})$ following Algorithm 1. #### Handling partial chunks Suppose that $\mathbf{r}$ is not a multiple of $K$ bits. In that case we will have a partial chunk $r_u$ of length $K' < K$ bits. The unoptimized algorithm for computing the table is: $(a, b) := (0, 0)$ for $i$ from $K/2 − 1$ down to $0$: $\hspace{2em}$ let $(\mathbf{c}_i, \mathbf{d}_i) = \begin{cases} (0, 2\mathbf{r}_{2i} − 1),&\text{if } \mathbf{r}_{2i+1} = 0 \\ (2\mathbf{r}_{2i} − 1, 0),&\text{otherwise} \end{cases}$ $(a, b) := (2a + \mathbf{c}_i, 2b + \mathbf{d}_i)$ Output $[a \cdot \zeta_q + b]\, P$. We want to derive the table output for $K'$ when $\mathbf{r} = r_u$ from the table output for $K$. Pad $r_u$ to $K$ bits on the right (high-order bits) with zeros. So the effect of running the above algorithm for the padding bits will be: $(a, b) := (0, 0)$ for $i$ from $0$ up to $(K-K')/2 − 1$: $\hspace{2em} b := 2b - 1$ (which is equivalent to $(a, b) := (0, 1 - 2^{(K-K')/2})$) for $i$ from $(K-K')/2$ up to $K/2 − 1$: $\hspace{2em}$ let $(\mathbf{c}_i, \mathbf{d}_i) = \begin{cases} (0, 2\mathbf{r}_{2i} − 1),&\text{if } \mathbf{r}_{2i+1} = 0 \\ (2\mathbf{r}_{2i} − 1, 0),&\text{otherwise} \end{cases}$ $\hspace{2em} (a, b) := (2a + \mathbf{c}_i, 2b + \mathbf{d}_i)$ Output $[a \cdot \zeta_q + b]\, P$. So now we need to adjust the result of the table **lookup** to take account that we initialized $(a, b)$ to $(0, 1 - 2^{(K-K')/2})$ instead of $(0, 0)$. The offset for $b$ will get multiplied by $2^{K'/2}$, which means that we need to subtract $(1 - 2^{(K-K')/2}) \cdot 2^{K'/2} = (2^{K'/2} - 2^{K/2})$. #### Circuit costs ##### Initial chunk In the case where the bitstring length is a multiple of $K$, we witness the first full chunk like so: $$ \begin{array}{|c|c|c|c|c|} \hline \texttt{z} & \texttt{acc} & \texttt{endoscalars\_copy} & \texttt{q\_init} & \texttt{q\_lookup} \\\hline z[u] & acc_1 & \texttt{endo}(r_u) & 1 & 1 \\\hline z[u-1] & & & 0 & 0 \\\hline \end{array} $$ with the following constraints: $$ \begin{array}{|c|l|} \hline \text{Degree} & \text{Constraint} \\\hline 2 & q_\text{init} \cdot [(\texttt{init\_acc} \cdot 2^{K / 2} + \texttt{endo}(r_u)) - acc_1] = 0 \\\hline \end{array} $$ where $\texttt{init\_acc} = 2 \cdot (\zeta + 1)$. :warning: As before, $q_{lookup}$ looks up the tuple $(z[u-1] - z[u] * 2^K, \texttt{endo}(r_u)).$ If the first chunk is a $K'$-bit partial chunk, it has been right-padded with $K - K'$ zeros. We constrain it in its own region: $$ \begin{array}{|c|c|c|c|c|} \hline \texttt{z} & \texttt{acc} & \texttt{endoscalars\_copy} & \texttt{q\_partial} & \texttt{q\_lookup} & \texttt{q\_short\_range\_check} \\\hline z[u] & r_u & \texttt{endo}(r_u) & 1 & 1 & 1 \\\hline z[u-1] & acc_1 & 2^{K'/2} & 0 & 0 & 0 \\\hline \end{array} $$ with the following constraints: $$ \begin{array}{|c|l|} \hline \text{Degree} & \text{Constraint} \\\hline 2 & q_\text{partial} \cdot [(z[u-1] - z[u] \cdot 2^K) - r_u] = 0 \\\hline 2 & q_\text{partial} \cdot [(\texttt{init\_acc} \cdot 2^{K' / 2} + \texttt{shifted\_endo}) - acc_1] = 0 \\\hline \end{array} $$ where $\texttt{init\_acc} = 2 \cdot (\zeta + 1),$ and $\texttt{shifted\_endo} = \texttt{endo}(r_u) - (2^{K'/2} - 2^{K/2})$. As before, $q_{lookup}$ looks up the tuple $(z[u-1] - z[u] * 2^K, \texttt{endo}(r_u)).$ Additionally, we do a $\texttt{q\_short\_range\_check}(r_u, K')$ to check that $r_u$ is indeed a $K'$-bit value. (see [Lookup short range check](./decomposition.md#short-range-check).) ##### Steady state After initializing the first chunk, we proceed with the remaining chunks in the steady state: $$ \begin{array}{|c|c|c|c|c|} \hline \texttt{z} & \texttt{acc} & \texttt{endoscalars\_copy} & \texttt{q\_endoscale} & \texttt{q\_lookup} \\ \hline z[i] & acc_{u-i+1} & \texttt{endo}(r_i) & 1 & 1 \\ \hline z[i-1] & acc_{u-i} & \texttt{endo}(r_{i-1}) & 1 & 1 \\ \hline z[i-2] & & & 0 & 0 \\ \hline \end{array} $$ with the following constraints: $$ \begin{array}{|c|l|} \hline \text{Degree} & \text{Constraint} \\\hline 2 & q_\text{endoscale} \cdot [(acc_{u-i+1} \cdot 2^{K / 2} + \texttt{endo}(r_i)) - acc_{u-i}] = 0 \\\hline \end{array} $$ As before, $q_{lookup}$ looks up the tuple $(z[i-1] - z[i] * 2^K, \texttt{endo}(r_i)).$