owned this note
owned this note
Published
Linked with GitHub
# Babyjubjub Noir Specification
This document describes a specification for the desired operations over the babyjubjub curve for Noir to support, and how to implement them efficiently
# Table of Contents
- [Definitions](#Definitions)
- [Point Arithmetic In Baby Jubjub](#Point-Arithmetic-In-Baby-Jubjub)
- [Proof System Assumptions](#Proof-system-assumptions)
- [Variable-Base Multi-Scalar Multiplication](#Variable-base-multi-scalar-multiplication)
- [Estimated Constraint Costs](#Estimated-constraint-costs)
# Definitions
Babyjubjub is a **twisted edwards curve** defined over a field $\mathbb{F}_p$ where $p=21888242871839275222246405745257275088548364400416034343698204186575808495617$.
Let $E_M$ be the Baby-Jubjub Montgomery elliptic curve over $\mathbb{F}_p$ using the equation:
$$
E: v^2 = u^3 + 168698u^2 + u
$$
The order of $E_m$ (i.e. number of points) is $8r$ where $r=2736030358979909402780800718157159386076813972158567259200215660948447373041$ is prime. i.e. the cofactor of the curve is 8.
See [ERC-2494](https://eips.ethereum.org/EIPS/eip-2494) for a full description fo the baby jubjub parameters.
### Twisted Edwards Form
$E_M$ is equivalent to a twisted Edwards curve:
$$
E: x^2 + y^2 = 1 + dx^2y^2
$$
where $d = 9706598848417545097372247223557719406784115219466060233080913168975159366771$
The map between the two representations is:
$$
(x, y) \rightarrow (u, v) = (\frac{1+y}{1-y},\frac{1+y}{(1-y)x})
$$
and
$$
(u,v) \rightarrow (x,y) = (\frac{u}{v}, \frac{u-1}{u+1})
$$
# Point Arithmetic In Baby Jubjub
In Edwards representation, adding points $(x_1, y_1), (x_2, y_2) := (x_3, y_3)$, the formula is:
$$
\begin{array}{l}
\lambda = dx_1x_2y_1y_2 \\
x_3 = \frac{(x_1y_2 + y_1x_2)}{1+\lambda} \\
y_3 = \frac{(y_1y_2 - x_1x_2)}{1-\lambda}
\end{array}
$$
The point at infinity is represented as $O = (0, 1)$. The inverse of a point $(x, y)$ is $(-x, y)$.
In Montgomery representation, adding points is done via the formula:
$$
\lambda = \frac{y_2 - y_1}{x_2 - x_1} \\
x_3 = \lambda^2 - A - x_1 - x_2 \\
y_3 = \lambda(x_1 - x_3) - y_1
$$
## Evaluating group operations as constraints
The following applies to a width-4 PLONKish representation.
In the Twisted Edwards representation, group addition is the following:
```
let t1 := x1*x2;
let t2 := y1*y2;
let lambda := d*t1*t2;
let t3 := x1*y2;
let t4 := y1*x2 + t3;
assert(x_3 * lambda + x_3 - t4 == 0);
assert(-y_3 * lambda + y_3 - t1 - t2 == 0);
```
i.e. group addition costs 7 constraints.
In Montgomery representation, group addition is the following:
```
let t1 := x2 - x1;
assert(t1 * lambda - y2 + y1 == 0)
assert(lambda * lambda - x1 - x2 - x3 - A == 0)
let t2 := x1 - x3;
assert(lambda * t2 - y1 - y3 == 0)
```
i.e. group addition costs 5 constraints.
While Montgomery representation is nominally cheaper, its point addition formula is not *complete* (i.e. does not handle points at infinity and the case where the input points are identical).
The additional constraints required to handle the edge cases make the Edwards representation the preferred solution.
<!--The techniques in the [short weierstrass curve document]([https://hackmd.io/zzXVVyGHRjeDIDMmzgj8dw]) are not useful despite Montgomery representation being cheaper. This is because -->
# Desired Baby Jubjub operations
Any `BabyJubJub` class should have the following methods/operators:
* `+` operator
* `-` operator
* `*` operator
* `multi_scalar_mul` function
# Proof system assumptions
This document assumes that circuits are being written for a cryptographic backend that supports *witness-defined lookup tables* (also referred to as ROM tables).
A witness-defined lookup table represents a fixed-size array of values. Unlike precomputed lookup tables, each table value is defined via circuit constraints.
We define a black-box subprotocol $\mathsf{table\_read_k}(A, b, c)$ where $k \in \mathbb{Z}$, $A \in \mathbb{F}^k, b \in \mathbb{F}, c \in \mathbb{F}$. $\mathsf{table\_read_k}(A, b, c)$ where $k \in \mathbb{Z}$ validates that $c = A_b$ and that the integer representation of $b$ is less than $k$.
In noir, const arrays will compile into ROM tables.
We also assume existence of a black-box range checking subprotocol `assert_max_bit_size(x)` that efficiently validates a field element is less than $2^x$.
> It is assumed the cryptographic backend has an efficient method of determining range checks. The algorithms in this document focus on optimising for minimising group operations and not range checks
# Variable-base multi-scalar multiplication
We define a SNARK function
$\mathsf{scalar\_mul_n}(\vec{[P]}, \vec{x}, [R]\})$ where $\vec{[P]} \in \mathbb{G}^n, \vec{x} \in \mathbb{F}^n, [R] \in \mathbb{G}$. This function validates that $\sum_{i=1}^n x_i[P_i] = [R]$.
The core of the algorithm is a standard Straus multiscalar multiplication algorithm, where lookup tables are used to store small multiples of the input base points.
These lookup tables are then used within a double-and-add iterative algorithm to add points into an accumulator point.
The key difference between the algorithm in this document is the fact that the accumulator is initialized with an offset generator point $[g]$.
To obtain the final result point $[R]$, the inverse of the contributing factor from the offset generator, $[g']$ is added to the accumulator. $[g'] = 2^{252}[-g]$.
To describe the multiscalar multiplication algorithm we define the following subalgorithms:
* $\mathsf{compute\_wnaf}(x, \vec{w}, s)$
* $\mathsf{ecc\_add}([A], [B]) \rightarrow [C]$
### Notation
The following algorithms describe SNARK circuits, of which there is not canonical pseudocode representation. The following symbols are used to describe witness generation and constraint definition.
* The symbol $\rightarrow$ is used to define the return parameter of a function
* The symbol $\leftarrow$ is used to assign a value to a variable *without* generating constraints
* The sumbol $:=$ is used to both assign a value to a variable and generate constraints that validate the correctness of the calculation
## $\mathsf{compute\_wnaf}(x) \rightarrow (\vec{w}, s)$
For a scalar $x \in \mathbb{F}$, $\mathsf{compute\_wnaf}$ its "windowed non adjacent form" representation is described by $\vec{w} \in \mathbb{F}^{32}, s \in \mathbb{F}$.
Each window slice $w_i$ is in the range $[-15, -13, \ldots, 15]$. The "skew" factor $s$ is in the range $[0, 1]$.
The following pseudocode converts a 256-bit input scalar `x` into an array of 64 4-bit windowed-non-adjacent-form slices `w` and a skew factor `s`.
The rust-pseudocode describes functions that compute witnesses, but do not define constraints.
```rust!
compute_wnaf_slices(x)
{
let in = x;
w[0] = in & 0xf;
in = in >> 4;
let s = w[0] & 1 == 1;
w[0] -= s;
for i in 1..63 {
let slice = in & 0xf;
if (slice & 1 == 1) {
w[i-1] -= 32;
slice += 1;
}
w[i] = slice;
}
{w, s}
}
```
The group order of the baby jubjub curve can be covered by a 250-bit integer. However we want to support hashing arbitrary elements of $\mathbb{F}_p$ and so this section describes how to convert any field element into WNAF form.
(TODO: describe how to convert into an actual element of the babyjubjub curve oder)
One challenge with decomposing an input scalar $x$ into a windowed-non-adjacent-form $(w, s)$ is that we need to validate that $-s + \sum_{i=1}^{64}16^{i-1}w_i = x$ when evaluated *over the integers*. However our SNARK circuit evaluates expressions modulo $p$.
To resolve this, we evaluate a basic subtraction algorithm. We define a borrow factor $b \in [0, 1]$ and define witnesses $lo, hi$, where $lo = b*2^{129} p_{lo} - s + \sum_{i=1}^{32}16^{i-1}w_i$, $hi = -b - p_{hi} + \sum_{i=32}^{64}16^{i-32}w_i$.
Here $p_{lo}, p_{hi}$ are the low and high 128-bits of the circuit modulus $p$.
If $(w, s)$ is well-formed, both $lo$ and $hi$ will be less than $<2^{129}$. Else one/both of $lo, hi$ will wrap around the modulus boundary and fail a 129-bit range check.
```rust!
compute_borrow_factor(w, s) {
let b := 0;
let sum: int256 = 0;
for i in 0..63 {
sum *= 16;
sum += w[63 - i];
}
sum -= s;
sum -= p_lo;
if (sum < 0) {
b = 1;
}
b
}
```
$$
\begin{array}{l}
\mathsf{compute\_wnaf}(x) \rightarrow (w, s):\\
\ \ \text{let } w, s \leftarrow \mathsf{compute\_wnaf\_slices(x)} \\
\ \ \text{let } b \leftarrow \mathsf{compute\_borrow\_factor(w, s)} \\
\ \ \text{let } t_0 := \sum_{i=0}^{15}16^{i-1}w_i-s \\
\ \ \text{let } t_1 := \sum_{i=0}^{15}16^{i-1}w_{i+15}\\
\ \ \text[assert] (t_0 + t_12^{128} == x)\\
\ \ \text{let } l_0 := 2^{129}b + t_0 - p_{lo} \\
\ \ \text{let } l_1 := t_1 - b \\
\ \ \text{assert } \mathsf{bitrange}(l_0,{129}) \\
\ \ \text{assert } \mathsf{bitrange}(l_1,{129}) \\
\ \ \text{assert } \mathsf{bitrange}(b, 2) \\
\ \ \text{for } i \text{ in } [0, \ldots, 63]: \\
\ \ \ \ \text{assert } \mathsf{bitrange}(\frac{w_i + 15}{2}, 4) \\
\ \ \text{end for}
\end{array}
$$
### $\mathsf{precompute\_table}([P]) \rightarrow T$
This algorithm defines how to compute and constrain a lookup table containing small multiples of a base point $[P]$. $T = \{-15[P], -13[P], \ldots, 15[P] \}$.
$$
\begin{array}{l}
\text{let } [D] := \mathsf{ecc\_dbl}([P]) \\
\text{let } [Q] := [P] \\
\text{for } i \text{ in } [0, \ldots, 7]: \\
\ \ \text{if } i == 0:\\
\ \ \ \ \text{let } [Q] := [Q] + [D] \\
\ \ \text{end if} \\
\ \ \text{let } T_{x,7-i} := [P].x \\
\ \ \text{let } T_{y, 7-i} := -[P].y \\
\ \ \text{let } T_{x,8+i} := [P].x \\
\ \ \text{let } T_{y, 8+i} := [P].y \\
\text{end for} \\
\text{return T}
\end{array}
$$
### $\mathsf{ecc\_add}((x_1, y_1), (x_2, y_2)) \rightarrow (x_3, y_3):$
$$
\begin{array}{l}
\mathsf{ecc\_add}((x_1, y_1), (x_2, y_2)) \rightarrow (x_3, y_3):\\
\ \ \text{let } t1 := x1*x2 \\
\ \ \text{let } t2 := y1*y2 \\
\ \ \text{let } lambda := d*t1*t2 \\
\ \ \text{let } t3 := x1*y2 \\
\ \ \text{let } t4 := y1*x2 + t3 \\
\ \ \text{assert }(x_3 * lambda + x_3 - t4 == 0) \\
\ \ \text{assert } (-y_3 * lambda + y_3 - t1 - t2 == 0) \\
\ \ \text{return } \{x_3, y_3 \}
\end{array}
$$
## $\mathsf{scalar\_mul_n}(\vec{[P]}, \vec{x}\}) \rightarrow [R]$
$$
\begin{array}{l}
\mathsf{scalar\_mul_n}(\vec{[P]}, \vec{x}, [R]\}) \\
\ \ \text{for } i \text{ in } [1, \ldots, n]: \\
\ \ \ \ \text{let } (\vec{w_i}, s_i) := \mathsf{compute\_wnaf}(x_i)\\
\ \ \ \ \text{let } T_i := \mathsf{precompute\_table}([P_i]) \\
\ \ \text{end for}\\
\ \ \text{let } [acc] := (0, 1) // \text{point at infinity} \\
\ \ \text{for } i \text{ in } [1, \ldots, 64]:\\
\ \ \ \ \text{if } i \ne 0:\\
\ \ \ \ \ \ \text{let } [acc] := \mathsf{ecc\_add}([acc], [acc]) \\
\ \ \ \ \ \ \text{let } [acc] := \mathsf{ecc\_add}([acc], [acc]) \\
\ \ \ \ \ \ \text{let } [acc] := \mathsf{ecc\_add}([acc], [acc]) \\
\ \ \ \ \ \ \text{let } [acc] := \mathsf{ecc\_add}([acc], [acc]) \\
\ \ \ \ \text{end if}\\
\ \ \ \ \text{for } j \text{ in } [1, \ldots, n]:\\
\ \ \ \ \ \ \text{let } [Q] := \mathsf{table\_read}(T, s_i) \\
\ \ \ \ \ \ \text{let } [acc] := \mathsf{eccadd}([acc], [Q]) \\
\ \ \text{end for} \\
\ \ \text{end for} \\
\ \ \text{let } [R] := [acc] \\
\ \ \text{return } [R]
\ \ \end{array}
$$
# Estimated Constraint Costs
To estimate each of the above functions, the following table of black-box costs is used
| operation | number of gates |
| --- | --- |
| x-bit range check ($x<2^{14}$) | 0.5 |
| x-bit range check ($x >2^{14}$) | $\lceil\frac{\lceil \frac{x}{14}\rceil}{3}\rceil + \lceil\frac{\lceil \frac{x}{14}\rceil}{2}\rceil$|
| init ROM table | 2 |
| read ROM table | 2 |
### $\mathsf{ecc\_add}$
7 constraints
If the input x/y coordinates are the same, this should cost 6 constraints.
### $\mathsf{precompute\_table}$
Computing $-15[P], -13[P], \ldots, 15[P]$ requires 8 point additions and 8 point negations i.e $7 * 8 + 8 = 64$ constraints.
In addition, a ROM table must be defined and initialized. This costs 2 constraints per table entry. There are two ROM tables per point (one for each x/y coordinate) each of size 16, costing 64 constraints.
Total cost: 128 constraints.
### $\mathsf{compute\_wnaf}$
Asserting that $\frac{w_i + 15}{2} < 2^{16}$ costs 1.5 gates. One arithmetic gate + one range check. Across 64 slices this costs 92 gates.
Each of the 64 wnaf slices must satisfy a 4-bit range check, and the skew factor must satisfy a 1-bit range check. i.e. 32.5 gates.
Computing $t_0$ requires a linear sum of 17 witnesses, which costs 9 constraints.
Similarly $t_1$ is a linear sum of 16 witnesses which is also 8 constraints.
Computing $l_0, l_1$ costs 2 constraints. Their 129-bit range checks cost 9 constraints each for 18 constraints total.
| operation | cost |
| --- | --- |
| $w_i$ range checks + algebra | 92 |
| $s$ range check | 0.5 |
| $t_0, t_1$ | 17 |
| $l_0, l_1$ | 20 |
| assert $t_0 + 2^{128}t_1 = x$ | 1 |
Total number of constraints = 130.5
### Multiscalar Mul
For $n$ points, the cost can be modelled as the following:
Each point requires 1 call to $\mathsf{compute\_wnaf}$ and $\mathsf{precompute\_table}$ for 258.5 constraints (rounded up to 259)
In addition the MSM algorithm requires 64 point additions, as well as 128 ROM reads (reading x/y coordinates at each iteration). These operations cost $(64 * 7 + 128 * 2) = 704$ constraints.
Finally, irrespective of $n$, there are 256 point doublings which each cost 6 constraints for 1,536 constraints.
Total cost: $963n + 1,536$.
### TODO:
When computing witnesses in unconstrained functions, use batch inversion technique to reduce number of needed field inversions (very expensive)