Modexp

Previously, we did multiplication and modular reduction in first step. See here for description.

On high level, lets say we non-native have inputs

a=a0+2Ba1+22Ba2+23Ba3,b=b0+2Bb1+22Bb2+23Bb3.
We can also consider them as polynomials
a(X)
and
b(X)
:
a(X)=a0+Xa1+X2a2+X3a3,b(X)=b0+Xb1+X2b2+X3b3.

For integer multiplication, we want to compute
c
(alternatively
c(X)
) such that
c(X)=c0+Xc1+X2c2+X3c3+X4c4+X5c5+X6c6,

where
ck=i=0,j=0i+j=kaibj
. Instead of computing
ci
in-circuit, we provide the answer from the hint and then check for
r=1,...,7
that
a(r)b(r)=c(r)
. This is very cheap in R1CS because multiplication by constant and addition are free. But we still have 7 equality checks.

This is integer comparison!

Secondly, for modular reduction we have to find

q,k such that
c=r+kp
. We compute
t=kp
similarly using multiplication algorithm. Then, we are left to show that
cr=t.

To compute

d=cr, we can perform the subtraction limb-by-limb. However, if we do this, then we may reach underflow per limb when
ri>ci
for some
i
. Because we work in the native field, then it errors the integer value of the limbs. To avoid this we add some padding to
c
. The padding is constructed in a way that the high bits of every limbs are set and the padding is a multiple of
p
. So the check actually is (we reuse the notation
kp
to account for padding)
c+sr=t.

Denote
d=c+sr.

Now, we have to check

d=ti=062iBdi=i=062iBti
Now, the overflows of every limb
di
and
ti
may be (and in general are) different. This is because of the padding, potential addition chains etc. Assuming that bit length of
ti
is in general
2B
(actually a bit more, but right now the exact value is not important). In general the bit-length of
di
is generally more than
2B
. We can carry the overflows over:
d0=MASK(d0,2B),e0=RSH(d0d0,2B),di=MASK(di+ei1,2B),ei=RSH(di+ei1di,2B).

Here,
ei
are the extra overflows what we carry over to the next limb and
di
are the limbs which have exactly width
2B
. Now, we can do limb-by-limb checks
di=ti,i=0,...,6

and additionally that
e6
= 0.

In gnark we do RSH using a hint, we requires range checking

ei

Proposal

Using the commit API, we can speed up the multiplication check by evaluating

a(r)b(r)=c(r) at a random
r
. For R1CS there is no difference as we have replaced multiplication by a constant with a multiplication by a variable (which costs). But for modular reduction we still have to do right shift (which costs one constraint) and limb by limb check. This is approx 13 constraints.

But, if we look at polynomials

e=i=05ei2i2Be=i=16ei2i2B,
then we have
d=d+22Bee
. In polynomial form
d(X)=d(X)+22Be(X)e(X)=d(X)+(22BX)e(X).

Rewriting, reordering etc. We can now also omit the padding as we return all carries

ei from a hint and do not have to worry about the underflow anymore.

There is some error in notation carrying over from Sage

When we want to combine multiplication and modular reduction, we can check only a single check:

a(X)b(X)=r(X)+k(X)p(X)+(22BX)e(X)

At first it doesn't seem a lot less, but if we defer the checks, then:

  • we can compute
    p(r)
    only once for all checks
  • we can compute
    22Br
    only once for all checks
  • usually,
    r(r)
    is an input to next multiplication, so we can cache them when computing
    a(r)
    or
    b(r)
    . So, we do not have to compute
    a(r)
    neither
    b(r)
    .

So, we are left with:

  • r(r)
    which is 3 muls
  • k(r)
    which is 3 muls
  • e(r)
    which is 3 muls
  • a(r)b(r)
    is 1 mul
  • k(r)p(r)
    is 1 mul
  • (22Br)e(r)
    is 1 mul

Conclusion

This is 12 muls for mul+reduce. Previously 13+7, so 40% saving. This doesn't account for the range checks (

ei and
ri
, but is the same).

But we do not have to use the padding

s anymore! So this method saves constraints for fixed-modulus operations and enables variable-modulus operations.

And more

I have an intuition that maybe we wouldn't have to enforce the polynomial check for every mul+reduce, but if we have commitments to

a(X),b(X),r(X),e(X), then is sufficient if we check only a single row. But I haven't figured out yet. If it would be possible, then non-native arithmetic would truly be free.