ECC operation optimizations

Suppose the scalar field of the curve

\[y^2 = x^3 + b\]

is \(F_q\) where \(q\) has \(B\) bits and the baby Jubjub prime is \(p\) which is slightly less 254-bit. Choose \(n\) and \(k\) so that

\(nk >= B\)
\(3n + log_2(2k^2) + \epsilon < 254\)

where \(\epsilon\) is a small constant.

We represent BigInts with \(k\) registers of \(n\) bits each. For register size \(X = 2^n\), define addition and multiplication without carries by:

\(\sum_i a_i X^i +' \sum_i b_i X^i = \sum_i c_i X^i\) with \(c_i = a_i + b_i\)
\(\sum_i a_i X^i *' \sum_i b_i X^i = \sum_i d_i X^i\) with \(d_i = \sum_{j = 0}^i a_j * b_{i - j}\)

Tracking overflows

Define \(n_{z_i} = \lceil log_2(z_i) \rceil\) for a signal \(x\). This function represents the number of bits needed to represent \(x\).

For an array \(z = \sum_i z_i X^i\), define \(n_z = \max_i (n_{z_i})\). Tracking \(n_z\) allows us to ensure that we are never overflowing the maximum theoretical register size of 253 bits, even when representing numbers in overflow representation.

Additionally, define \(k_z\) to be the number of such \(n_z\)-bit registers used to represent \(k\).

Suppose that \(c = a +' b\). Then \(n_c \leq \max(n_a, n_b) + 1\), and \(k_c = \max(k_a, k_b)\).

Now suppose that \(d = a *' b\). Then \(n_d \leq n_a + n_b + \lceil log_2(\min(k_a, k_b)) \rceil\), and \(k_d = k_a + k_b - 1\).

AddUnequal

For input points \((x_1, y_1)\) and \((x_2, y_2)\) with sum \((x_3, y_3)\), we verify

\(x_3 = \lambda^2 - x_2 - x_1\) where \(\lambda = \frac{y_2-y_1}{x_2-x_1}\)
- \((x_1 + x_2 + x_3)(x_2-x_1)^2 - (y_2-y_1)^2 = 0\)
\(\frac{y_3 + y_1}{x_3 - x_1} = \frac{y_3 + y_2}{x_3 - x_2}\), which is equivalent to:
- \(y_1 * x_3 + (y_3 + y_2) * x_1 = y_2 * x_3 + (y_3 + y_1) * x_2\)
\(x_3, y_3\) are 4x64 bit numbers

To verify these statement, we do the following:

Evaluate both sides using \(+'\) and \(*'\) so that no carrying takes place.
We now wish to verify the identity \(\sum_i a_i X^i = \sum_i b_i X^i\) mod \(q\) where \(0 \leq a_i, b_i < C_1X^3\) for a small constant \(C_1\)
~~Find a multiple of \(q\) of the form \(t = \sum_i t_i X^i\) where \(C_1X^3 < t_i < p - C_1 X^3\), where \(p\) is the babyjubjub prime.~~ no longer needed because of negative carries
Verify that there is a bigint \(r = \sum_i r_i X^i\) for which

\[Carry(\sum_i (a_i + t_i - b_i) X^i) = Carry(\sum_i r_i X^i *' \sum_i q_i X^i)\]

Possibly the last two carries can be performed in a mixed fashion.

is it possible to do away with \(t\) by just adapting our carry algorithm to work with negative registers? Then we'd only need to do one carry
Is checking Carry(x) = Carry(y) same as checking Carry(x-y) so we only need to check Carry(x) == 0? Check carry to zero implementation
constrain \(x_3\) and \(y_3\) to be 64 bits per register and also in \([0,p-1]\) by rangechecking that all registers are in \([0,2^{64}-1]\) and if upper registers are all equal to \(2^{64}-1\) then checking lowest register is in \([0, 2^{64}-2^{32} - 2^9 -2^8 - 2^7 - 2^6 -2^4 -1 -1]\). Check range implementation

Double

For input points \((x_1, y_1)\) with double \((x_3, y_3)\), we verify

\(y_3^2 = x_3^3 + b\)
\(\frac{- y_3 - y_1}{x_3 - x_1} = \frac{3 x_1^2}{2y_1}\), which is equivalent to:
- \(-2y_1\cdot (y_3+y_1) = 3x_1^2\cdot(x_3-x_1)\)
\((x_3,y_3) \neq (x_1,-y_1)\)
- only need to do \(x_3 \neq x_1\)
\(x_3, y_3\) are 4x64 bit numbers and in the range [0, p-1]

[ignore] Picking \(n\), \(k\), \(t\)

How do we find \(t\) and pick our register sizes? We'll have to do some tracking. Assume that \(x_1, y_1, x_2, y_2, x_3, y_3, b\) all have \(k\) registers of \(n\) bits.

In the first equation:

\(k_{LHS} \leq 2k-1\)
\(n_{LHS} \leq 2n + \lceil log(k) \rceil\).
\(k_{RHS} \leq 3k-2\)
\(n_{RHS} \leq 3n + \lceil log(k) \rceil + \lceil log(2k-1) \rceil + 1 \leq 3n + 2 \lceil log(k) \rceil + 2\)

In the second equation:

\(k_{LHS} \leq 2k-1\)
\(n_{RHS} \leq 2n + \lceil log(k) \rceil + 2\)
\(k_{LHS} \leq 2k-1\)
\(n_{RHS} \leq 2n + \lceil log(k) \rceil + 2\)

All of the \(n\) values should be less than 254. Simultaneously, \(nk\) must be at least 256. Therefore \(k\) should be at least 4. So we arrive at \(n=64\), \(k=4\).

Now it is the case that we must pick a \(t\) such that all \(t_i\) are large enough to ensure that \(a_i + t_i - b_i \geq 0\) for all \(i\). Since \(a_i - b_i\) can be as low as \(-2^{3n + 2\lceil log(k) \rceil + 2} \geq -2^{192+4+2} = -2^{198}\), we should pick \(t\) such that all values of \(t_i \geq 2^{198}\).

We also have an upper bound on \(t\). Note that \(a_i + t_i - b_i < p\), so we should pick \(t < p - 2^{198}\) as well.

So now we must find an array \(t_0, t_1, \dots, t_9\) (length 10, since our max number of registers is \(3k-2 = 10\)) such that:

\(2^{198} \leq t_i < p - 2^{198}\)
\(q | \sum t_i X^i\) for \(X = 2^{64}\)

Prime trick

Consider \(X = 2^{64}\) and

\[S = \sum_{i=0}^9 a_i X^i\]

Where each of the \(a_i\) is up to \(192+\epsilon\) bits (probably fine to say \(\epsilon \leq 6\)).

We wish to find some \(S'\) that is congruent to \(S\) modulo \(p\) (secp256k1 prime) with few registers.

The most \(p = 2^{256} - k\) trick is:

\[2^{256} * Z \equiv k * Z \mod p\]

Since the secp256k1 prime has \(k = 2^{256} - 2^{32} - \delta\) for some small \(\delta \approx 2^{10}\), we can write the following congruence:

\[S\]
\[\equiv a_9X^9 + a_8X^8 + a_7X^7 + a_6X^6 + a_5X^5 + a_4X^4 + a_3X^3 + a_2X^2 + a_1X + a_0\]
\[\equiv (2^{64}+2^{33}\delta + \delta^2)a_9X + (2^{64}+2^{33}\delta + \delta^2)a_8 + (2^{32}+\delta)a_7X^3 + (2^{32}+\delta)a_6X^2 + (2^{32}+\delta)a_5X + (2^{32}+\delta)a_4 + a_3X^3 + a_2X^2 + a_1X + a_0\]

Since \((2^{32}+\delta)^2 = 2^{64} + 2^{33}\delta + \delta^2\).

Let \(C_i\) be the coefficient of \(X^i\) after terms are accumulated in this last equation. We have:

\[C_3 = (2^{32}+\delta)a_7 + a_3\]
\[C_2 = (2^{32}+\delta)a_6 + a_2\]
\[C_1 = (2^{64}+2^{33}\delta+\delta^2)a_9 + (2^{32}+\delta)a_5 + a_1\]
\[C_0 = (2^{64}+2^{33}\delta+\delta^2)a_8 + (2^{32}+\delta)a_4 + a_0\]

Let's replace \(2^{64}\) with \(X\) in the expressions for \(C_1\) and \(C_0\) and re-accumulate:

\[C_3' = (2^{32}+\delta)a_7 + a_3\]
\[C_2' = (2^{32}+\delta)a_6 + a_2 + a_9\]
\[C_1' = (2^{33}\delta+\delta^2)a_9 + (2^{32}+\delta)a_5 + a_1 + a_8\]
\[C_0' = (2^{33}\delta+\delta^2)a_8 + (2^{32}+\delta)a_4 + a_0\]

We add an additional 44 bits of overflow at most to the registers (the worst coefficient of an \(a_i\) involved in a sum is \(2^{33}\delta\)). Since our \(a_i\) are at most 200 bits approximately, the \(C_i\) do not overflow the babyjubjub prime (253 bits). So now we have a representation \(S' = C_3'X^3 + C_2'X^2 + C_1'X + C_0'\) of a number that is congruent to \(S\), such that all of the \(C_i'\) are at most 244 bits. After subtracting off some multiple of \(p\) we can check that after carries this is equivalent to \(0\) in \(244 * 4 \approx 1000\) constraints.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.