Previously, we did multiplication and modular reduction in first step. See here for description.
On high level, lets say we non-native have inputs
We can also consider them as polynomials and :
For integer multiplication, we want to compute (alternatively ) such that
where . Instead of computing in-circuit, we provide the answer from the hint and then check for that . This is very cheap in R1CS because multiplication by constant and addition are free. But we still have 7 equality checks.
This is integer comparison!
Secondly, for modular reduction we have to find such that . We compute similarly using multiplication algorithm. Then, we are left to show that
To compute , we can perform the subtraction limb-by-limb. However, if we do this, then we may reach underflow per limb when for some . Because we work in the native field, then it errors the integer value of the limbs. To avoid this we add some padding to . The padding is constructed in a way that the high bits of every limbs are set and the padding is a multiple of . So the check actually is (we reuse the notation to account for padding)
Denote
Now, we have to check
Now, the overflows of every limb and may be (and in general are) different. This is because of the padding, potential addition chains etc. Assuming that bit length of is in general (actually a bit more, but right now the exact value is not important). In general the bit-length of is generally more than . We can carry the overflows over:
Here, are the extra overflows what we carry over to the next limb and are the limbs which have exactly width . Now, we can do limb-by-limb checks
and additionally that = 0.
In gnark we do RSH using a hint, we requires range checking
Using the commit API, we can speed up the multiplication check by evaluating at a random . For R1CS there is no difference as we have replaced multiplication by a constant with a multiplication by a variable (which costs). But for modular reduction we still have to do right shift (which costs one constraint) and limb by limb check. This is approx 13 constraints.
But, if we look at polynomials
then we have . In polynomial form
Rewriting, reordering etc. We can now also omit the padding as we return all carries from a hint and do not have to worry about the underflow anymore.
There is some error in notation carrying over from Sage
When we want to combine multiplication and modular reduction, we can check only a single check:
At first it doesn't seem a lot less, but if we defer the checks, then:
So, we are left with:
This is 12 muls for mul+reduce. Previously 13+7, so 40% saving. This doesn't account for the range checks ( and , but is the same).
But we do not have to use the padding anymore! So this method saves constraints for fixed-modulus operations and enables variable-modulus operations.
I have an intuition that maybe we wouldn't have to enforce the polynomial check for every mul+reduce, but if we have commitments to , then is sufficient if we check only a single row. But I haven't figured out yet. If it would be possible, then non-native arithmetic would truly be free.