Babyjubjub Noir Specification

This document describes a specification for the desired operations over the babyjubjub curve for Noir to support, and how to implement them efficiently

Definitions
- Point Arithmetic In Baby Jubjub
- Proof System Assumptions
Variable-Base Multi-Scalar Multiplication
Estimated Constraint Costs

Definitions

Babyjubjub is a twisted edwards curve defined over a field \(\mathbb{F}_p\) where \(p=21888242871839275222246405745257275088548364400416034343698204186575808495617\).

Let \(E_M\) be the Baby-Jubjub Montgomery elliptic curve over \(\mathbb{F}_p\) using the equation:

\[ E: v^2 = u^3 + 168698u^2 + u \]

The order of \(E_m\) (i.e. number of points) is \(8r\) where \(r=2736030358979909402780800718157159386076813972158567259200215660948447373041\) is prime. i.e. the cofactor of the curve is 8.

See ERC-2494 for a full description fo the baby jubjub parameters.

Twisted Edwards Form

\(E_M\) is equivalent to a twisted Edwards curve:

\[ E: x^2 + y^2 = 1 + dx^2y^2 \]

where \(d = 9706598848417545097372247223557719406784115219466060233080913168975159366771\)

The map between the two representations is:

\[ (x, y) \rightarrow (u, v) = (\frac{1+y}{1-y},\frac{1+y}{(1-y)x}) \]
and
\[ (u,v) \rightarrow (x,y) = (\frac{u}{v}, \frac{u-1}{u+1}) \]

Point Arithmetic In Baby Jubjub

In Edwards representation, adding points \((x_1, y_1), (x_2, y_2) := (x_3, y_3)\), the formula is:

\[ \begin{array}{l} \lambda = dx_1x_2y_1y_2 \\ x_3 = \frac{(x_1y_2 + y_1x_2)}{1+\lambda} \\ y_3 = \frac{(y_1y_2 - x_1x_2)}{1-\lambda} \end{array} \]

The point at infinity is represented as \(O = (0, 1)\). The inverse of a point \((x, y)\) is \((-x, y)\).

In Montgomery representation, adding points is done via the formula:

\[ \lambda = \frac{y_2 - y_1}{x_2 - x_1} \\ x_3 = \lambda^2 - A - x_1 - x_2 \\ y_3 = \lambda(x_1 - x_3) - y_1 \]

Evaluating group operations as constraints

The following applies to a width-4 PLONKish representation.

In the Twisted Edwards representation, group addition is the following:

let t1 := x1*x2;
let t2 := y1*y2;
let lambda := d*t1*t2;
let t3 := x1*y2;
let t4 := y1*x2 + t3;
assert(x_3 * lambda + x_3 - t4 == 0);
assert(-y_3 * lambda + y_3 - t1 - t2 == 0);

i.e. group addition costs 7 constraints.

In Montgomery representation, group addition is the following:

let t1 := x2 - x1;
assert(t1 * lambda - y2 + y1 == 0)
assert(lambda * lambda - x1 - x2 - x3 - A == 0)
let t2 := x1 - x3;
assert(lambda * t2 - y1 - y3 == 0)

i.e. group addition costs 5 constraints.

While Montgomery representation is nominally cheaper, its point addition formula is not complete (i.e. does not handle points at infinity and the case where the input points are identical).

The additional constraints required to handle the edge cases make the Edwards representation the preferred solution.

Desired Baby Jubjub operations

Any BabyJubJub class should have the following methods/operators:

+ operator
- operator
* operator
multi_scalar_mul function

Proof system assumptions

This document assumes that circuits are being written for a cryptographic backend that supports witness-defined lookup tables (also referred to as ROM tables).

A witness-defined lookup table represents a fixed-size array of values. Unlike precomputed lookup tables, each table value is defined via circuit constraints.

We define a black-box subprotocol \(\mathsf{table\_read_k}(A, b, c)\) where \(k \in \mathbb{Z}\), \(A \in \mathbb{F}^k, b \in \mathbb{F}, c \in \mathbb{F}\). \(\mathsf{table\_read_k}(A, b, c)\) where \(k \in \mathbb{Z}\) validates that \(c = A_b\) and that the integer representation of \(b\) is less than \(k\).

In noir, const arrays will compile into ROM tables.

We also assume existence of a black-box range checking subprotocol assert_max_bit_size(x) that efficiently validates a field element is less than \(2^x\).

It is assumed the cryptographic backend has an efficient method of determining range checks. The algorithms in this document focus on optimising for minimising group operations and not range checks

Variable-base multi-scalar multiplication

We define a SNARK function
\(\mathsf{scalar\_mul_n}(\vec{[P]}, \vec{x}, [R]\})\) where \(\vec{[P]} \in \mathbb{G}^n, \vec{x} \in \mathbb{F}^n, [R] \in \mathbb{G}\). This function validates that \(\sum_{i=1}^n x_i[P_i] = [R]\).

The core of the algorithm is a standard Straus multiscalar multiplication algorithm, where lookup tables are used to store small multiples of the input base points.

These lookup tables are then used within a double-and-add iterative algorithm to add points into an accumulator point.

The key difference between the algorithm in this document is the fact that the accumulator is initialized with an offset generator point \([g]\).

To obtain the final result point \([R]\), the inverse of the contributing factor from the offset generator, \([g']\) is added to the accumulator. \([g'] = 2^{252}[-g]\).

To describe the multiscalar multiplication algorithm we define the following subalgorithms:

\(\mathsf{compute\_wnaf}(x, \vec{w}, s)\)
\(\mathsf{ecc\_add}([A], [B]) \rightarrow [C]\)

Notation

The following algorithms describe SNARK circuits, of which there is not canonical pseudocode representation. The following symbols are used to describe witness generation and constraint definition.

The symbol \(\rightarrow\) is used to define the return parameter of a function
The symbol \(\leftarrow\) is used to assign a value to a variable without generating constraints
The sumbol \(:=\) is used to both assign a value to a variable and generate constraints that validate the correctness of the calculation

\(\mathsf{compute\_wnaf}(x) \rightarrow (\vec{w}, s)\)

For a scalar \(x \in \mathbb{F}\), \(\mathsf{compute\_wnaf}\) its "windowed non adjacent form" representation is described by \(\vec{w} \in \mathbb{F}^{32}, s \in \mathbb{F}\).

Each window slice \(w_i\) is in the range \([-15, -13, \ldots, 15]\). The "skew" factor \(s\) is in the range \([0, 1]\).

The following pseudocode converts a 256-bit input scalar x into an array of 64 4-bit windowed-non-adjacent-form slices w and a skew factor s.

The rust-pseudocode describes functions that compute witnesses, but do not define constraints.

compute_wnaf_slices(x)
{
	let in = x;
	w[0] = in & 0xf;
	in = in >> 4;
	let s = w[0] & 1 == 1;
	w[0] -= s;
	for i in 1..63 {
		let slice = in & 0xf;
		if (slice & 1 == 1) {
			w[i-1] -= 32;
			slice += 1;
		}
		w[i] = slice;
	}
	{w, s}
}

The group order of the baby jubjub curve can be covered by a 250-bit integer. However we want to support hashing arbitrary elements of \(\mathbb{F}_p\) and so this section describes how to convert any field element into WNAF form.

(TODO: describe how to convert into an actual element of the babyjubjub curve oder)

One challenge with decomposing an input scalar \(x\) into a windowed-non-adjacent-form \((w, s)\) is that we need to validate that \(-s + \sum_{i=1}^{64}16^{i-1}w_i = x\) when evaluated over the integers. However our SNARK circuit evaluates expressions modulo \(p\).

To resolve this, we evaluate a basic subtraction algorithm. We define a borrow factor \(b \in [0, 1]\) and define witnesses \(lo, hi\), where \(lo = b*2^{129} p_{lo} - s + \sum_{i=1}^{32}16^{i-1}w_i\), \(hi = -b - p_{hi} + \sum_{i=32}^{64}16^{i-32}w_i\).

Here \(p_{lo}, p_{hi}\) are the low and high 128-bits of the circuit modulus \(p\).

If \((w, s)\) is well-formed, both \(lo\) and \(hi\) will be less than \(<2^{129}\). Else one/both of \(lo, hi\) will wrap around the modulus boundary and fail a 129-bit range check.

compute_borrow_factor(w, s) {
	let b := 0;
	let sum: int256 = 0;
	for i in 0..63 {
		sum *= 16;
		sum += w[63 - i];
	}
	sum -= s;
	sum -= p_lo;
	if (sum < 0) {
		b = 1; 
	}
	b
}

\[ \begin{array}{l} \mathsf{compute\_wnaf}(x) \rightarrow (w, s):\\ \ \ \text{let } w, s \leftarrow \mathsf{compute\_wnaf\_slices(x)} \\ \ \ \text{let } b \leftarrow \mathsf{compute\_borrow\_factor(w, s)} \\ \ \ \text{let } t_0 := \sum_{i=0}^{15}16^{i-1}w_i-s \\ \ \ \text{let } t_1 := \sum_{i=0}^{15}16^{i-1}w_{i+15}\\ \ \ \text[assert] (t_0 + t_12^{128} == x)\\ \ \ \text{let } l_0 := 2^{129}b + t_0 - p_{lo} \\ \ \ \text{let } l_1 := t_1 - b \\ \ \ \text{assert } \mathsf{bitrange}(l_0,{129}) \\ \ \ \text{assert } \mathsf{bitrange}(l_1,{129}) \\ \ \ \text{assert } \mathsf{bitrange}(b, 2) \\ \ \ \text{for } i \text{ in } [0, \ldots, 63]: \\ \ \ \ \ \text{assert } \mathsf{bitrange}(\frac{w_i + 15}{2}, 4) \\ \ \ \text{end for} \end{array} \]

\(\mathsf{precompute\_table}([P]) \rightarrow T\)

This algorithm defines how to compute and constrain a lookup table containing small multiples of a base point \([P]\). \(T = \{-15[P], -13[P], \ldots, 15[P] \}\).

\[ \begin{array}{l} \text{let } [D] := \mathsf{ecc\_dbl}([P]) \\ \text{let } [Q] := [P] \\ \text{for } i \text{ in } [0, \ldots, 7]: \\ \ \ \text{if } i == 0:\\ \ \ \ \ \text{let } [Q] := [Q] + [D] \\ \ \ \text{end if} \\ \ \ \text{let } T_{x,7-i} := [P].x \\ \ \ \text{let } T_{y, 7-i} := -[P].y \\ \ \ \text{let } T_{x,8+i} := [P].x \\ \ \ \text{let } T_{y, 8+i} := [P].y \\ \text{end for} \\ \text{return T} \end{array} \]

\(\mathsf{ecc\_add}((x_1, y_1), (x_2, y_2)) \rightarrow (x_3, y_3):\)

\[ \begin{array}{l} \mathsf{ecc\_add}((x_1, y_1), (x_2, y_2)) \rightarrow (x_3, y_3):\\ \ \ \text{let } t1 := x1*x2 \\ \ \ \text{let } t2 := y1*y2 \\ \ \ \text{let } lambda := d*t1*t2 \\ \ \ \text{let } t3 := x1*y2 \\ \ \ \text{let } t4 := y1*x2 + t3 \\ \ \ \text{assert }(x_3 * lambda + x_3 - t4 == 0) \\ \ \ \text{assert } (-y_3 * lambda + y_3 - t1 - t2 == 0) \\ \ \ \text{return } \{x_3, y_3 \} \end{array} \]

\(\mathsf{scalar\_mul_n}(\vec{[P]}, \vec{x}\}) \rightarrow [R]\)

\[ \begin{array}{l} \mathsf{scalar\_mul_n}(\vec{[P]}, \vec{x}, [R]\}) \\ \ \ \text{for } i \text{ in } [1, \ldots, n]: \\ \ \ \ \ \text{let } (\vec{w_i}, s_i) := \mathsf{compute\_wnaf}(x_i)\\ \ \ \ \ \text{let } T_i := \mathsf{precompute\_table}([P_i]) \\ \ \ \text{end for}\\ \ \ \text{let } [acc] := (0, 1) // \text{point at infinity} \\ \ \ \text{for } i \text{ in } [1, \ldots, 64]:\\ \ \ \ \ \text{if } i \ne 0:\\ \ \ \ \ \ \ \text{let } [acc] := \mathsf{ecc\_add}([acc], [acc]) \\ \ \ \ \ \ \ \text{let } [acc] := \mathsf{ecc\_add}([acc], [acc]) \\ \ \ \ \ \ \ \text{let } [acc] := \mathsf{ecc\_add}([acc], [acc]) \\ \ \ \ \ \ \ \text{let } [acc] := \mathsf{ecc\_add}([acc], [acc]) \\ \ \ \ \ \text{end if}\\ \ \ \ \ \text{for } j \text{ in } [1, \ldots, n]:\\ \ \ \ \ \ \ \text{let } [Q] := \mathsf{table\_read}(T, s_i) \\ \ \ \ \ \ \ \text{let } [acc] := \mathsf{eccadd}([acc], [Q]) \\ \ \ \text{end for} \\ \ \ \text{end for} \\ \ \ \text{let } [R] := [acc] \\ \ \ \text{return } [R] \ \ \end{array} \]

Estimated Constraint Costs

To estimate each of the above functions, the following table of black-box costs is used

operation	number of gates
x-bit range check (\(x<2^{14}\))	0.5
x-bit range check (\(x >2^{14}\))	\(\lceil\frac{\lceil \frac{x}{14}\rceil}{3}\rceil + \lceil\frac{\lceil \frac{x}{14}\rceil}{2}\rceil\)
init ROM table	2
read ROM table	2

\(\mathsf{ecc\_add}\)

7 constraints

If the input x/y coordinates are the same, this should cost 6 constraints.

\(\mathsf{precompute\_table}\)

Computing \(-15[P], -13[P], \ldots, 15[P]\) requires 8 point additions and 8 point negations i.e \(7 * 8 + 8 = 64\) constraints.

In addition, a ROM table must be defined and initialized. This costs 2 constraints per table entry. There are two ROM tables per point (one for each x/y coordinate) each of size 16, costing 64 constraints.

Total cost: 128 constraints.

\(\mathsf{compute\_wnaf}\)

Asserting that \(\frac{w_i + 15}{2} < 2^{16}\) costs 1.5 gates. One arithmetic gate + one range check. Across 64 slices this costs 92 gates.

Each of the 64 wnaf slices must satisfy a 4-bit range check, and the skew factor must satisfy a 1-bit range check. i.e. 32.5 gates.

Computing \(t_0\) requires a linear sum of 17 witnesses, which costs 9 constraints.

Similarly \(t_1\) is a linear sum of 16 witnesses which is also 8 constraints.

Computing \(l_0, l_1\) costs 2 constraints. Their 129-bit range checks cost 9 constraints each for 18 constraints total.

operation	cost
\(w_i\) range checks + algebra	92
\(s\) range check	0.5
\(t_0, t_1\)	17
\(l_0, l_1\)	20
assert \(t_0 + 2^{128}t_1 = x\)	1

Total number of constraints = 130.5

Multiscalar Mul

For \(n\) points, the cost can be modelled as the following:

Each point requires 1 call to \(\mathsf{compute\_wnaf}\) and \(\mathsf{precompute\_table}\) for 258.5 constraints (rounded up to 259)

In addition the MSM algorithm requires 64 point additions, as well as 128 ROM reads (reading x/y coordinates at each iteration). These operations cost \((64 * 7 + 128 * 2) = 704\) constraints.

Finally, irrespective of \(n\), there are 256 point doublings which each cost 6 constraints for 1,536 constraints.

Total cost: \(963n + 1,536\).

TODO:

When computing witnesses in unconstrained functions, use batch inversion technique to reduce number of needed field inversions (very expensive)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

Babyjubjub Noir Specification

Table of Contents

Definitions

Twisted Edwards Form

Point Arithmetic In Baby Jubjub

Evaluating group operations as constraints

Desired Baby Jubjub operations

Proof system assumptions

Variable-base multi-scalar multiplication

Notation

\(\mathsf{compute\_wnaf}(x) \rightarrow (\vec{w}, s)\)

\(\mathsf{precompute\_table}([P]) \rightarrow T\)

\(\mathsf{ecc\_add}((x_1, y_1), (x_2, y_2)) \rightarrow (x_3, y_3):\)

\(\mathsf{scalar\_mul_n}(\vec{[P]}, \vec{x}\}) \rightarrow [R]\)

Estimated Constraint Costs

\(\mathsf{ecc\_add}\)

\(\mathsf{precompute\_table}\)

\(\mathsf{compute\_wnaf}\)

Multiscalar Mul

TODO: