owned this note changed 3 years ago
Linked with GitHub

ECC operation optimizations

Suppose the scalar field of the curve

\[y^2 = x^3 + b\]

is \(F_q\) where \(q\) has \(B\) bits and the baby Jubjub prime is \(p\) which is slightly less 254-bit. Choose \(n\) and \(k\) so that

  • \(nk >= B\)
  • \(3n + log_2(2k^2) + \epsilon < 254\)

where \(\epsilon\) is a small constant.

We represent BigInts with \(k\) registers of \(n\) bits each. For register size \(X = 2^n\), define addition and multiplication without carries by:

  • \(\sum_i a_i X^i +' \sum_i b_i X^i = \sum_i c_i X^i\) with \(c_i = a_i + b_i\)
  • \(\sum_i a_i X^i *' \sum_i b_i X^i = \sum_i d_i X^i\) with \(d_i = \sum_{j = 0}^i a_j * b_{i - j}\)

Tracking overflows

Define \(n_{z_i} = \lceil log_2(z_i) \rceil\) for a signal \(x\). This function represents the number of bits needed to represent \(x\).

For an array \(z = \sum_i z_i X^i\), define \(n_z = \max_i (n_{z_i})\). Tracking \(n_z\) allows us to ensure that we are never overflowing the maximum theoretical register size of 253 bits, even when representing numbers in overflow representation.

Additionally, define \(k_z\) to be the number of such \(n_z\)-bit registers used to represent \(k\).

Suppose that \(c = a +' b\). Then \(n_c \leq \max(n_a, n_b) + 1\), and \(k_c = \max(k_a, k_b)\).

Now suppose that \(d = a *' b\). Then \(n_d \leq n_a + n_b + \lceil log_2(\min(k_a, k_b)) \rceil\), and \(k_d = k_a + k_b - 1\).

AddUnequal

For input points \((x_1, y_1)\) and \((x_2, y_2)\) with sum \((x_3, y_3)\), we verify

  1. \(x_3 = \lambda^2 - x_2 - x_1\) where \(\lambda = \frac{y_2-y_1}{x_2-x_1}\)
    • \((x_1 + x_2 + x_3)(x_2-x_1)^2 - (y_2-y_1)^2 = 0\)
  2. \(\frac{y_3 + y_1}{x_3 - x_1} = \frac{y_3 + y_2}{x_3 - x_2}\), which is equivalent to:
    • \(y_1 * x_3 + (y_3 + y_2) * x_1 = y_2 * x_3 + (y_3 + y_1) * x_2\)
  3. \(x_3, y_3\) are 4x64 bit numbers

To verify these statement, we do the following:

  • Evaluate both sides using \(+'\) and \(*'\) so that no carrying takes place.
  • We now wish to verify the identity \(\sum_i a_i X^i = \sum_i b_i X^i\) mod \(q\) where \(0 \leq a_i, b_i < C_1X^3\) for a small constant \(C_1\)
  • Find a multiple of \(q\) of the form \(t = \sum_i t_i X^i\) where \(C_1X^3 < t_i < p - C_1 X^3\), where \(p\) is the babyjubjub prime. no longer needed because of negative carries
  • Verify that there is a bigint \(r = \sum_i r_i X^i\) for which

\[Carry(\sum_i (a_i + t_i - b_i) X^i) = Carry(\sum_i r_i X^i *' \sum_i q_i X^i)\]

Possibly the last two carries can be performed in a mixed fashion.

  • is it possible to do away with \(t\) by just adapting our carry algorithm to work with negative registers? Then we'd only need to do one carry
  • Is checking Carry(x) = Carry(y) same as checking Carry(x-y) so we only need to check Carry(x) == 0? Check carry to zero implementation
  • constrain \(x_3\) and \(y_3\) to be 64 bits per register and also in \([0,p-1]\) by rangechecking that all registers are in \([0,2^{64}-1]\) and if upper registers are all equal to \(2^{64}-1\) then checking lowest register is in \([0, 2^{64}-2^{32} - 2^9 -2^8 - 2^7 - 2^6 -2^4 -1 -1]\). Check range implementation

Double

For input points \((x_1, y_1)\) with double \((x_3, y_3)\), we verify

  1. \(y_3^2 = x_3^3 + b\)
  2. \(\frac{- y_3 - y_1}{x_3 - x_1} = \frac{3 x_1^2}{2y_1}\), which is equivalent to:
    • \(-2y_1\cdot (y_3+y_1) = 3x_1^2\cdot(x_3-x_1)\)
  3. \((x_3,y_3) \neq (x_1,-y_1)\)
    • only need to do \(x_3 \neq x_1\)
  4. \(x_3, y_3\) are 4x64 bit numbers and in the range [0, p-1]

[ignore] Picking \(n\), \(k\), \(t\)

How do we find \(t\) and pick our register sizes? We'll have to do some tracking. Assume that \(x_1, y_1, x_2, y_2, x_3, y_3, b\) all have \(k\) registers of \(n\) bits.

In the first equation:

  • \(k_{LHS} \leq 2k-1\)
  • \(n_{LHS} \leq 2n + \lceil log(k) \rceil\).
  • \(k_{RHS} \leq 3k-2\)
  • \(n_{RHS} \leq 3n + \lceil log(k) \rceil + \lceil log(2k-1) \rceil + 1 \leq 3n + 2 \lceil log(k) \rceil + 2\)

In the second equation:

  • \(k_{LHS} \leq 2k-1\)
  • \(n_{RHS} \leq 2n + \lceil log(k) \rceil + 2\)
  • \(k_{LHS} \leq 2k-1\)
  • \(n_{RHS} \leq 2n + \lceil log(k) \rceil + 2\)

All of the \(n\) values should be less than 254. Simultaneously, \(nk\) must be at least 256. Therefore \(k\) should be at least 4. So we arrive at \(n=64\), \(k=4\).

Now it is the case that we must pick a \(t\) such that all \(t_i\) are large enough to ensure that \(a_i + t_i - b_i \geq 0\) for all \(i\). Since \(a_i - b_i\) can be as low as \(-2^{3n + 2\lceil log(k) \rceil + 2} \geq -2^{192+4+2} = -2^{198}\), we should pick \(t\) such that all values of \(t_i \geq 2^{198}\).

We also have an upper bound on \(t\). Note that \(a_i + t_i - b_i < p\), so we should pick \(t < p - 2^{198}\) as well.

So now we must find an array \(t_0, t_1, \dots, t_9\) (length 10, since our max number of registers is \(3k-2 = 10\)) such that:

  • \(2^{198} \leq t_i < p - 2^{198}\)
  • \(q | \sum t_i X^i\) for \(X = 2^{64}\)

Prime trick

Consider \(X = 2^{64}\) and

\[S = \sum_{i=0}^9 a_i X^i\]

Where each of the \(a_i\) is up to \(192+\epsilon\) bits (probably fine to say \(\epsilon \leq 6\)).

We wish to find some \(S'\) that is congruent to \(S\) modulo \(p\) (secp256k1 prime) with few registers.

The most \(p = 2^{256} - k\) trick is:

\[2^{256} * Z \equiv k * Z \mod p\]

Since the secp256k1 prime has \(k = 2^{256} - 2^{32} - \delta\) for some small \(\delta \approx 2^{10}\), we can write the following congruence:

\[S\]
\[\equiv a_9X^9 + a_8X^8 + a_7X^7 + a_6X^6 + a_5X^5 + a_4X^4 + a_3X^3 + a_2X^2 + a_1X + a_0\]
\[\equiv (2^{64}+2^{33}\delta + \delta^2)a_9X + (2^{64}+2^{33}\delta + \delta^2)a_8 + (2^{32}+\delta)a_7X^3 + (2^{32}+\delta)a_6X^2 + (2^{32}+\delta)a_5X + (2^{32}+\delta)a_4 + a_3X^3 + a_2X^2 + a_1X + a_0\]

Since \((2^{32}+\delta)^2 = 2^{64} + 2^{33}\delta + \delta^2\).

Let \(C_i\) be the coefficient of \(X^i\) after terms are accumulated in this last equation. We have:

\[C_3 = (2^{32}+\delta)a_7 + a_3\]
\[C_2 = (2^{32}+\delta)a_6 + a_2\]
\[C_1 = (2^{64}+2^{33}\delta+\delta^2)a_9 + (2^{32}+\delta)a_5 + a_1\]
\[C_0 = (2^{64}+2^{33}\delta+\delta^2)a_8 + (2^{32}+\delta)a_4 + a_0\]

Let's replace \(2^{64}\) with \(X\) in the expressions for \(C_1\) and \(C_0\) and re-accumulate:

\[C_3' = (2^{32}+\delta)a_7 + a_3\]
\[C_2' = (2^{32}+\delta)a_6 + a_2 + a_9\]
\[C_1' = (2^{33}\delta+\delta^2)a_9 + (2^{32}+\delta)a_5 + a_1 + a_8\]
\[C_0' = (2^{33}\delta+\delta^2)a_8 + (2^{32}+\delta)a_4 + a_0\]

We add an additional 44 bits of overflow at most to the registers (the worst coefficient of an \(a_i\) involved in a sum is \(2^{33}\delta\)). Since our \(a_i\) are at most 200 bits approximately, the \(C_i\) do not overflow the babyjubjub prime (253 bits). So now we have a representation \(S' = C_3'X^3 + C_2'X^2 + C_1'X + C_0'\) of a number that is congruent to \(S\), such that all of the \(C_i'\) are at most 244 bits. After subtracting off some multiple of \(p\) we can check that after carries this is equivalent to \(0\) in \(244 * 4 \approx 1000\) constraints.

Select a repo