Modexp

Previously, we did multiplication and modular reduction in first step. See here for description.

On high level, lets say we non-native have inputs

\begin{aligned} a & = a_{0} + 2^{B} a_{1} + 2^{2 B} a_{2} + 2^{3 B} a_{3}, \\ b & = b_{0} + 2^{B} b_{1} + 2^{2 B} b_{2} + 2^{3 B} b_{3} . \end{aligned}

We can also consider them as polynomials

a (X)

and

b (X)

\begin{aligned} a (X) & = a_{0} + X a_{1} + X^{2} a_{2} + X^{3} a_{3}, \\ b (X) & = b_{0} + X b_{1} + X^{2} b_{2} + X^{3} b_{3} . \end{aligned}

For integer multiplication, we want to compute

c

(alternatively

c (X)

) such that

c (X) = c_{0} + X c_{1} + X^{2} c_{2} + X^{3} c_{3} + X^{4} c_{4} + X^{5} c_{5} + X^{6} c_{6},

where

c_{k} = \sum_{i = 0, j = 0 i + j = k} a_{i} b_{j}

. Instead of computing

c_{i}

in-circuit, we provide the answer from the hint and then check for

r = 1, . . ., 7

that

a (r) * b (r) = c (r)

. This is very cheap in R1CS because multiplication by constant and addition are free. But we still have 7 equality checks.

This is integer comparison!

Secondly, for modular reduction we have to find

q, k

such that

c = r + k * p

. We compute

t = k * p

similarly using multiplication algorithm. Then, we are left to show that

c - r = t .

To compute

d = c - r

, we can perform the subtraction limb-by-limb. However, if we do this, then we may reach underflow per limb when

r_{i} > c_{i}

for some

i

. Because we work in the native field, then it errors the integer value of the limbs. To avoid this we add some padding to

c

. The padding is constructed in a way that the high bits of every limbs are set and the padding is a multiple of

p

. So the check actually is (we reuse the notation

k p

to account for padding)

c + s - r = t .

Denote

d = c + s - r .

Now, we have to check

\begin{aligned} d & = t \\ \sum_{i = 0}^{6} 2^{i B} d_{i} & = \sum_{i = 0}^{6} 2^{i B} t_{i} \end{aligned}

Now, the overflows of every limb

d_{i}

and

t_{i}

may be (and in general are) different. This is because of the padding, potential addition chains etc. Assuming that bit length of

t_{i}

is in general

2 B

(actually a bit more, but right now the exact value is not important). In general the bit-length of

d_{i}

is generally more than

2 B

. We can carry the overflows over:

\begin{aligned} d_{0}^{'} & = M A S K (d_{0}, 2 B), \\ e_{0} & = R S H (d_{0} - d_{0}^{'}, 2 B), \\ \dots \\ d_{i}^{'} & = M A S K (d_{i} + e_{i - 1}, 2 B), \\ e_{i} & = R S H (d_{i} + e_{i - 1} - d_{i}^{'}, 2 B) . \end{aligned}

Here,

e_{i}

are the extra overflows what we carry over to the next limb and

d_{i}^{'}

are the limbs which have exactly width

2 B

. Now, we can do limb-by-limb checks

d_{i}^{'} = t_{i}, \forall i = 0, . . ., 6

and additionally that

e_{6}

= 0.

In gnark we do RSH using a hint, we requires range checking

e_{i}

Proposal

Using the commit API, we can speed up the multiplication check by evaluating

a (r) * b (r) = c (r)

at a random

r

. For R1CS there is no difference as we have replaced multiplication by a constant with a multiplication by a variable (which costs). But for modular reduction we still have to do right shift (which costs one constraint) and limb by limb check. This is approx 13 constraints.

But, if we look at polynomials

\begin{aligned} e & = \sum_{i = 0}^{5} e_{i} 2^{i 2 B} \\ e^{'} & = \sum_{i = 1}^{6} e_{i} 2^{i 2 B}, \end{aligned}

then we have

d = d^{'} + 2^{2 B} e - e^{'}

. In polynomial form

\begin{aligned} d (X) & = d^{'} (X) + 2^{2 B} e (X) - e^{'} (X) \\ = d^{'} (X) + (2^{2 B} - X) e (X) . \end{aligned}

Rewriting, reordering etc. We can now also omit the padding as we return all carries

e_{i}

from a hint and do not have to worry about the underflow anymore.

There is some error in notation carrying over from Sage

When we want to combine multiplication and modular reduction, we can check only a single check:

\begin{array}{r} a (X) * b (X) = r (X) + k (X) * p (X) + (2^{2 B} - X) e (X) \end{array}

At first it doesn't seem a lot less, but if we defer the checks, then:

we can compute
$p (r)$ only once for all checks
we can compute
$2^{2 B} - r$ only once for all checks
usually,
$r (r)$ is an input to next multiplication, so we can cache them when computing
$a (r)$ or
$b (r)$ . So, we do not have to compute
$a (r)$ neither
$b (r)$ .

So, we are left with:

$r (r)$ which is 3 muls
$k (r)$ which is 3 muls
$e (r)$ which is 3 muls
$a (r) * b (r)$ is 1 mul
$k (r) * p (r)$ is 1 mul
$(2^{2 B} - r) e (r)$ is 1 mul

Conclusion

This is 12 muls for mul+reduce. Previously 13+7, so 40% saving. This doesn't account for the range checks (

e_{i}

and

r_{i}

, but is the same).

But we do not have to use the padding

s

anymore! So this method saves constraints for fixed-modulus operations and enables variable-modulus operations.

And more

I have an intuition that maybe we wouldn't have to enforce the polynomial check for every mul+reduce, but if we have commitments to

a (X), b (X), r (X), e (X)

, then is sufficient if we check only a single row. But I haven't figured out yet. If it would be possible, then non-native arithmetic would truly be free.

Modexp

Proposal

Conclusion

And more

Read more

Changing gas cost of hash functions to match them with proving cost

Notes about optimizing emulated pairing (part 1)