Modexp - HackMD

# Modexp Previously, we did multiplication and modular reduction in first step. See [here](https://hackmd.io/@ivokub/SyJRV7ye2) for description. On high level, lets say we non-native have inputs \begin{align} a &= a_0 + 2^{B} a_1 + 2^{2B} a_2 + 2^{3B} a_3, \\ b &= b_0 + 2^{B} b_1 + 2^{2B} b_2 + 2^{3B} b_3. \end{align} We can also consider them as polynomials $a(X)$ and $b(X)$: \begin{align} a(X) &= a_0 + X a_1 + X^2 a_2 + X^3 a_3, \\ b(X) &= b_0 + X b_1 + X^2 b_2 + X^3 b_3. \end{align} For integer multiplication, we want to compute $c$ (alternatively $c(X)$) such that $$ c(X) = c_0 + X c_1 + X^2 c_2 + X^3 c_3 + X^4 c_4 + X^5 c_5 + X^6 c_6, $$ where $c_{k} = \sum_{i=0,\\j=0\\i+j=k} a_i b_j$. Instead of computing $c_i$ in-circuit, we provide the answer from the hint and then check for $r = 1,...,7$ that $a(r) * b(r) = c(r)$. This is very cheap in R1CS because multiplication by constant and addition are free. But we still have 7 equality checks. :::warning This is integer comparison! ::: Secondly, for modular reduction we have to find $q, k$ such that $c = r + k*p$. We compute $t = k*p$ similarly using multiplication algorithm. Then, we are left to show that $$ c-r = t. $$ To compute $d = c-r$, we can perform the subtraction limb-by-limb. However, if we do this, then we may reach underflow per limb when $r_i > c_i$ for some $i$. Because we work in the native field, then it errors the integer value of the limbs. To avoid this we add some padding to $c$. The padding is constructed in a way that the high bits of every limbs are set and the padding is a multiple of $p$. So the check actually is (we reuse the notation $kp$ to account for padding) $$ c+s-r = t. $$ Denote $$ d = c+s-r. $$ Now, we have to check \begin{align} d &= t \\ \sum_{i=0}^{6} 2^{iB} d_i &= \sum_{i=0}^6 2^{iB} t_i \end{align} Now, the overflows of every limb $d_i$ and $t_i$ may be (and in general are) different. This is because of the padding, potential addition chains etc. Assuming that bit length of $t_i$ is in general $2B$ (actually a bit more, but right now the exact value is not important). In general the bit-length of $d_i$ is generally more than $2B$. We can carry the overflows over: \begin{align} d'_0 &= MASK(d_0, 2B), \\ e_0 & = RSH(d_0 - d'_0, 2B), \\ & \ldots \\ d'_i & = MASK(d_i + e_{i-1}, 2B), \\ e_i & = RSH(d_i + e_{i-1} - d'_i, 2B). \end{align} Here, $e_i$ are the extra overflows what we carry over to the next limb and $d'_i$ are the limbs which have exactly width $2B$. Now, we can do limb-by-limb checks $$ d'_i = t_i, \forall i=0,...,6 $$ and additionally that $e_6$ = 0. :::info In gnark we do RSH using a hint, we requires range checking $e_i$ ::: ## Proposal Using the commit API, we can speed up the multiplication check by evaluating $a(r) * b(r) = c(r)$ at a random $r$. For R1CS there is no difference as we have replaced multiplication by a constant with a multiplication by a variable (which costs). But for modular reduction we still have to do right shift (which costs one constraint) and limb by limb check. This is approx 13 constraints. But, if we look at polynomials \begin{align} e & = \sum_{i = 0}^5 e_i 2^{i2B} \\ e' & = \sum_{i = 1}^6 e_i 2^{i2B}, \end{align} then we have $d = d' + 2^{2B} e - e'$. In polynomial form \begin{align} d(X) &= d'(X) + 2^{2B} e(X) - e'(X) \\ & = d'(X) + (2^{2B}-X) e(X). \end{align} :::danger Rewriting, reordering etc. We can now also omit the padding as we return all carries $e_i$ from a hint and do not have to worry about the underflow anymore. There is some error in notation carrying over from Sage ::: When we want to combine multiplication and modular reduction, we can check only a single check: \begin{align} a(X) * b(X) = r(X) + k(X) * p(X) + (2^{2B}-X) e(X) \end{align} At first it doesn't seem a lot less, but if we defer the checks, then: * we can compute $p(r)$ only once for all checks * we can compute $2^{2B} - r$ only once for all checks * usually, $r(r)$ is an input to next multiplication, so we can cache them when computing $a(r)$ or $b(r)$. So, we do not have to compute $a(r)$ neither $b(r)$. So, we are left with: * $r(r)$ which is 3 muls * $k(r)$ which is 3 muls * $e(r)$ which is 3 muls * $a(r) * b(r)$ is 1 mul * $k(r) * p(r)$ is 1 mul * $(2^{2B}-r) e(r)$ is 1 mul ## Conclusion This is 12 muls for mul+reduce. Previously 13+7, so 40% saving. This doesn't account for the range checks ($e_i$ and $r_i$, but is the same). But we do not have to use the padding $s$ anymore! So this method saves constraints for fixed-modulus operations and enables variable-modulus operations. ## And more I have an intuition that maybe we wouldn't have to enforce the polynomial check for every mul+reduce, but if we have commitments to $a(X), b(X), r(X), e(X)$, then is sufficient if we check only a single row. But I haven't figured out yet. If it would be possible, then non-native arithmetic would truly be free.

Read more

Notes about optimizing emulated pairing (part 1)