Fixing the Pedersen Hash Standard

## Table of Contents [Introduction](#Introduction) [Interface Specification](#External-Interface-Methods) # Introduction ## Goals of this specification document We want to define a clean **interface** for methods related to in-circuit native group operations. The inputs/outputs must be agnostic of the underlying proving system. In addition, we want the inputs/outputs to be clearly defined and easy to understand (e.g. none of this 2-bit windowed non-adjacent-form custom gate stuff where scalars are split unevenly over two generators) An additional requirement is that it is possible to write efficient implementations of these interface methods using lookup tables, given their likely heavy use in Aztec3. Of critical importance is that we have a Pedersen hash function that can hash two field elements very efficiently. i.e. similar ballpark numbers as our existing width-2 pedersen hash/compress function. Finally, we want to remove edge-cases where the pedersen methods don't work for certain combinations of inputs (e.g. not handling points at infinity properly). This specification describes an external interface (for barretenberg) for operations related Pedersen hashing and native group operations, and proposed Standard/Ultra Honk/Plonk implementations. In addition to methods relating to Pedersen hashes, native group operations have been included in this specification (e.g. fixed base scalar multiplication). We will need these operations in our external interface sooner-or-later. We don't want to implement them in a manner that is inconsistent with how we do Pedersen hashing, given their similarities. ## Preliminaries For a SNARK circuit defined over a prime field, let $\mathbb{F}_p$ be the native field all circuit arithmetic is evaluated over. Let $\mathbb{G}$ be the elliptic curve defined over $\mathbb{F}^2_p$. This curve has a cofactor of $1$ and the order of the elliptic curve sugroup is $q$. Let $[g_1], \ldots, [g_n]$ be $n$ linearly independent generator points of $\mathbb{G}$. The method for how to derive the points is outside the scope of this document, but is equivalent to how we currently derive generator points in barretenberg. $n$ is the maximum number of generators supported for a given circuit and, ideally, should not have an upper bound. ## Data Structures --- ### `cycle_group` A native element is a tuple of 3 witness values `(x, y, point_at_infinity)`. `point_at_infinity` is a boolean. --- ### `cycle_scalar` A group scalar is represented as a tuple of 2 witness values `(lo, hi)`. These are 128-bit scalars that represent a scalar multiplier: $scalar := lo + 2^{128} \cdot hi$. If an interface consumes `Group::scalar` objects , the implementation must validate that `lo, hi` are 128-bits. i.e. the entity calling the interface does _not_ have to perform range checks on `lo, hi` (we get them for free as part of most scalar multiplication implementatio). By representing scalars as 2 witnesses we can represent both members of $\mathbb{F}_p$ and $\mathbb{F}_q$ via this data structure (depending on context) # Generator Context The "generator context" defines the list of independent generator points used when computing Pedersen commitments and Pedesen hashes. The context contains a `domain_separator` string, which is used in a hash-to-curve algorithm to derive a list of generators, accessible via the context. If not specified, the generator context defaults to one where `domain_separator = "default domain separator"` # Hash to curve algorithm **Input parameters**: byte buffer represent a `seed` **Output**: a group element **High level lescription**: We derive 512-bits of entropy from the input seed, interpret this as an unsigned integer in big-endian form, and reduce modulo $p$ to produce an element of $\mathbb{F}_p$ - interpreted as an x-coordinate. The y-coordinate is then derived via the curve formula. 1. Initialize unsigned integer `attempt_count = 0` 2. Copy seed into a buffer whose size is 2 bytes greater than `seed` (initialized to 0) 3. Interpret `attempt_count` as a byte and write into buffer at [buffer.size() - 2] 4. Compute Blake3s hash of buffer 5. Set the end byte of the buffer to `1` 6. Compute Blake3s hash of buffer 7. Interpret the two hash outputs as the high / low 256 bits of a 512-bit integer (big-endian) 8. Derive x-coordinate of point by reducing the 512-bit integer modulo the curve's field modulus (Fq) 9. Compute y^2 from the curve formula y^2 = x^3 + ax + b (a, b are curve params. for BN254, a = 0, b = 3) 10. IF y^2 IS NOT A QUADRATIC RESIDUE 10a. increment `attempt_count` by 1 and go to step 2 11. IF y^2 IS A QUADRATIC RESIDUE 11a. derive y coordinate via y = sqrt(y) 11b. Interpret most significant bit of 512-bit integer as a 'parity' bit 11c. If parity bit is set AND y's most significant bit is not set, invert y 11d. If parity bit is not set AND y's most significant bit is set, invert y 11e. return (x, y) Note: Steps 11c, 11d exist to ensure the above algorithm is deterministic. The `sqrt` algorithm will return one of two potential values. The parity bit ensures that the y-coordinate is determinstic regardless of the `sqrt` algorithm used. # Deriving generators via hash-to-curve **Input parameters** 1. `n` - the number of generators to derive 2. `domain_separator` - a byte buffer that uniquely determines the generators returned **Output parameters** 1. `points` - a size-n list of generators The `seed` parameter of `hash_to_curve` is a 64-byte buffer. The first 32 bytes are the blake3s hash of `domain_separator`. The latter 32 bytes are a `count` variable, interpreted as a 256-bit big-endian integer. `count` is initialized to 0. `hash_to_curve` is iteratively called, each time incrementing `count` by 1 after the call is made. The points produced by `hash_to_curve` are returned as the `points` list. # External Interface Methods These methods are designed by be consumed by both internal `barretenberg` components and software that uses `barretenberg` as a 3rd party --- ### `Pedersen::Commit` **Inputs:** 1. Vector of _native field elements_ $\{ a_1, \ldots, a_n \}$, vector of generator indices that map to $\{ [g_1], \ldots, [g_n] \}$ 2. A `generator_context` pointer that contains points $\{ [g_1], \ldots, [g_n] \}$ **Output:** `Group::element` `Commit` returns the group element equal to $\sum_{i=1}^n a_i \cdot [g_i]$ Note: `Commit` enables commitments to data represented by native field elements (i.e. the simplest data structure available). If a commitment to a more complex data structure is required, the calling entity first converts to field elements. Note: `Commit` must validate that for each $a_i \in \{ a_1, \ldots, a_n \}$, $a_i < p$. --- ### `Pedersen::Hash` **Inputs:** 1. Vector of _native field elements_ $\{ a_1, \ldots, a_n \}$ 2. A `generator_context` pointer that contains points $\{ [g_1], \ldots, [g_n] \}$ **Output:** native field element `Hash` returns the x-coordinate of the group element equal to `Commit` returns the group element equal to $\sum_{i=1}^n a_i \cdot [g_i] + n\cdot [g_{length}]$ The group element $[g_{length}]$ is a generator uniquely associated with the length parameter (from `derive_generators` where `domain_separator="pedersen_hash_length"`) --- ### `stdlib::cycle_group::batch_mul` **Inputs** 1. Vector of `cycle_scalar` elements $\{ a_1, \ldots, a_n}\}$ 2. Vector of `cycle_group` elements $\{ [P_1], \ldots, [P_n] \}$ **Output** 1. A `cycle_group` element `batch_mul` returns $\sum_{i=1}^n a_i \cdot [P_i]$. The internal implementation in `barretenberg` uses 4 code paths to ensure constraints are generated **Path 1: `a_i, [P_i]` are both constants** The result `a_i * [P_i]` is computed and added to a constant accumulator, **Path 2: `[P_i]` is constant, `a_i` is a witness, `[P_i]` exists as an UltraPlonk lookup table** An optimized scalar multiplication algorithm, using precomputed lookup tables, is executed. The result is added to a fixed-base accumulator. **Path 3: `[P_i]` is constant, `a_i` is a witness, `[P_i]` does not exist as an UltraPlonk lookup table** A fixed-base scalar multiplication algorithm is executed, the result is added to a fixed-base accumulator **Path 4: both `a_i, [P_i]` are witnesses** A variable-base scalar multiplication algorithm is executed, the result is added to a variable-base accumulator The final returned point is the sum of the constant accumulator, the fixed-base accumulator and the variable-base accumulator. ### `cycle_group` and offset_generators Offset generators allow us to evaluate ecc arithmetic over short-Weierstrass curves, while securely using *incomplete* addition formula. Recap: when adding [P1], [P2]. If [P1] = [P2], doubling formulae must be used instead of addition formula. If [P1] = -[P2], the result is the point at infinity, which is not the result produced by the addition formula. Constraining complete addition over short Weierstrass curves in a SNARK is expensive due to the need to perform a conditional additions and a conditional point doubling operation, followed by a conditional assignment in case [P1] = [P2] or [P1] = -[P2] An `offset_generator` is a unique generator point that is added to an accumulator when evaluating an elliptic curve double-and-add scalar multiplication algorithm within a circuit. Offset generators are used to ensure that an honest Prover will never produce a witness that equals the point at infinity, during the evaluation of a scalar multiplicaiton (or multiscalar multiplication) algorithm. In the case where the input points are known at circuit-compile time, the circuit can skip edge-case checks for short Weierstrass curves (i.e. x-coordinate collisions), as finding a collision is equivalent to solving the discrete logarithm problem. In the case where the input points are witnesses, rememer that the probability of an *honest* Prover triggering a short-Weierstrass edge-case is equivalent to solving the discrete logarithm problem. The circuit does not have to constraint fully complete addition. The circuit can constrain incomplete addition, as well adding constraints requiring [P1].x == [P2].x (much cheaper) # Custom UltraPlonk elliptic curve gates ## Double gate Given an input point (x_1, y_1) and output point (x_3, y_3), the ecc double gate will evaluate that (x_3, y_3) == (x_1, y_1).dbl() across two rows. Double gates can be chained with double gates or addition gates in such a way that successive gates only add 1 row to the execution trace (input wires of the dbl gate = output wires of an adjacent dbl gate or add gate) ## Addition gate Given input points (x_1, y_1), (x_2, y_2), (x_3, y_3), the addition gate will constrain that either (x_1, y_1) + (x_2, y_2) = (x_3, y_3) *or* (x_1, y_1) - (x_2, y_2) = (x_3, y_3), depending on the value of a selector column. The addition gate is evaluated across two rows of the execution trace. Addition gates can be chained such that each successive gate only adds 1 row to the execution trace.