Pedersen Hashing in TurboPlonk

# Pedersen Hashing in TurboPlonk **Disclaimer:** This documentation was written on March, 2022. It is intended to give readers a high-level understanding. The codebase is the canonical source of truth, and over time this document might fall behind the implementation details of the code. ###### tags: `audit` ##### Authors: Suyash and Arijit The aim of this document is to chalk out all the details of how Pedersen hashing in TurboPlonk works. This should serve as a standalone document to understand Pedersen hash implementation in TurboPlonk. Credits to Cody, Ariel and Zac for previous documentations in different places. ### wNAF Representation The $w$NAF form is used to break a large scalar into smaller quads for enabling efficient scalar multiplication in Elliptic Curve Cryptography. In the context of Aztec, we define the following functions to compute the $w$NAF of a field scalar $b \in \mathbb{F}_r \equiv \{0,1,2, \dots, r-1\}$: $$ \begin{aligned} \textsf{wnaf_decompose}(b) &:= (\{q_{127}, q_{126}, \dots, q_0\}, s) \\ b = \textsf{wnaf_reconstruct}\Big(\{q_i\}_{i=0}^{127}, \ s\Big) &:= \left(\sum_{i=0}^{127}q_i4^i\right) - s. \end{aligned} $$ where we refer to $\{q_i\}_{i=0}^{127}$ as quads and $s$ as the skew. To understand how $w$NAF representations work in detail, we refer the reader to the $w$NAF [documentation](https://hackmd.io/TEjx7DC6T22g2W5ZJFKfog?view). A brief summary is as follows. In case of an even $b$, the function $\textsf{wnaf_decompose}$ computes decomposition of $b+1$ (the next odd number) and sets $s=1$. In case of an odd $b$, the function $\textsf{wnaf_decompose}$ computes decomposition of $b$ and sets $s=0$. In case of an odd $b$, the function $\textsf{wnaf_reconstruct}$ gets back original $b$ using the reconstruction formula. In case of an odd $b$, $\textsf{wnaf_reconstruct}$ gets back $b+1$, and obtains $b$ by subtracting $s=1$ from it. ### Scalar Multiplication Using Quads Before describing our pedersen hash in the next section, we describe a simpler and more straightforward primitive of scalar multiplication. Given a scalar $b \in \mathbb{F}_r$, we wish to compute $B = bP$. To do this efficiently, we use the quad representation of $b$. We do the following: 1. Input: $b \in \mathbb{F}_r$. 3. Compute the $w$NAF of the modified input: $$ \textsf{wnaf_decompose}(b) = \Big(\{q_i\}_{i=0}^{127}, s\Big) $$ where $\{q_i\}_{i=0}^{127}$ are the quads (for $b$ if $b$ is odd, for $b+1$ if $b$ is even) and $s$ is the skew ($s=1$ if the input $b$ is even). We have $$ b = \textsf{wnaf_reconstruct}\Big(\{q_i\}_{i=0}^{127}, s\Big) = \left(\sum_{i=0}^{127}4^iq_i\right) - s $$ Therefore, the scalar multiplication can be written as: $$ B = bP = \sum_{i = 0}^{127} q_i \cdot 4^{i}P -sP $$ It it worth noting an important observation from the $w$NAF [documentation](https://hackmd.io/TEjx7DC6T22g2W5ZJFKfog?view): The most significant quad $q_{127}=1$ for any scalar $b\in \mathbb{F}_r$. Thus, we can rewrite $B$ as: $$ B = bP = 4^{127} + \sum_{i = 0}^{126} q_i \cdot 4^{i}P - sP. $$ Now, if we have pre-computed the following group elements: $-3P_i, -P_i, P_i, 3P_i$ where $P_i = 4^iP$, then we can compute the scalar multiplication in our circuit by cumulative point additions. Notice that we only need to compute scalar multiplications of quad values $1$ and $3$, i.e. $P_i$ and $3P_i$ . The scalar multiplications for $-1$ and $-3$ can be obtained by negating the y-coordinates of that of $P_i$ and $3P_i$ respectively while keeping the x coordinates the same. We store these pre-computed group elements in the `hash_ladder` struct in the code. ### From scalar multiplication to Pedersen In a particular Pedersen hash (say for computing a commitment to a `value_note`), you need to hash together multiple field scalars each with a unique and uniformly generated set of group generators. This is simplified in the code using something called as `hash_index`. A `hash_index` is a tuple of integers which chooses the generator $P \in \mathbb{G}_1$ and auxiliary generator $P_{\text{aux}} \in \mathbb{G}_1$ for computing Pedersen hashes. We also have a `skew_generator` for a particular `hash_index`. But why do we need three generators for computing a Pedersen hash of a single scalar? The reason is we want a collision resistant hash function. Since we take only the x-coordinate of the Pedersen hash as output, we need to ensure that the input $b < p/2$ where $p=|\mathbb{G}_1|$ (For a proof see the answers of [this](https://crypto.stackexchange.com/questions/96275/two-elliptic-curve-points-having-the-same-x-coordinate?answertab=active#tab-top), collision resistance is formally proved below. See the exact value of $p$ (Grumpkin group order) in the [Aztec Yellow paper](https://hackmd.io/@aztec-network/ByzgNxBfd#◈-First-pairing-source-group)). In the current setting, the input value can exceed $p/2$. For instance, let $b_{max}$ denote the maximum value of the scalar you can represent using our $w$NAF form (taking $n=127$ below). $$ b_{max} = 4^{n} + 3(4^{n-1} + 4^{n-2} + \dots + 4 + 1)-s = 2\cdot 4^{n}-1-0 = 2^{255}-1. $$ Clearly, $b_{max}>p/2$. Let's drop the least significant quad $q_{0}$ and consider only the remaining 127 quads. In that case, we can have powers of $4$ till $4^{n-1}$, so $$ b_{max} = 4^{n-1} + 3(4^{n-2} + 4^{n-1} + \dots + 4 + 1)-s = 2\cdot 4^{n-1}-1-0 = 2^{253}-1. $$ and hence $b_{max}>p/2$. Similarly, let's drop the two least significant quads $q_{0},q_{1}$ and the skew $s$ and see what is the maximum value we can represent. $$ b_{max} = 4^{n-2} + 3(4^{n-3} + 4^{n-4} + \dots + 4 + 1) = 2\cdot 4^{n-2}-1 = 2^{251}-1. $$. In this case, we do have $b_{max}<p/2$. Therefore, we need to separate the original input $b$ in two parts: $\gamma$ which contains the quads $( q_{0}, q_{1})$ and $\lambda$ which contains $(q_2, q_3\dots, q_{126}, q_{127}=1)$. If the skew $s$ is 1, we use a `skew_generator` $P_{\text{skew}}$. Thus, we define the Pedersen hash of a scalar $b\in \mathbb{F}_r$ as: $$ \begin{align*} H_P(b) &= \underbrace{\left(4^{125} + q_{126}4^{124} + \dots + q_{3}4 + q_{2}\right)P}_{\lambda P} + \underbrace{\left( q_{1}4 + q_{0}\right)P_{\text{aux}}}_{\gamma P_{\text{aux}}} - sP_{\text{skew}} \\ &= \underbrace{(4^{125}P - sP_{\text{skew}})}_{\text{Gate }0} + \underbrace{(q_{126}4^{124} + \dots + q_{3}4 + q_{2})P}_{\text{Gates }1, 2, \dots, 125} + \underbrace{(4q_1 + q_0)P_{\text{aux}}}_{\text{Gate }126, 127} \end{align*} $$ From the above equation, it is evident that $1 \le \lambda < 2^{251} < \frac{p}{2}$, $-15 \le \gamma \le 15$, $\gamma \neq 0$). In the code, the powers of the generators are pre-computed and stored in a struct called `hash_ladder`. For generators $P,P_{\text{aux}}\in \mathbb{G}_1$ we define a `hash_ladder` as follows: | index | quad | $\texttt{one}$ | $\texttt{three}$ | | -------- | -------------- | ----------------- | ------------------------- | | 0 | $q_{127}(= 1)$ | $4^{125}P$ | $3 \cdot 4^{125}P$ | | 1 | $q_{126}$ | $4^{124}P$ | $3 \cdot 4^{124}P$ | | 2 | $q_{125}$ | $4^{123}P$ | $3 \cdot 4^{123}P$ | | $\vdots$ | $\vdots$ | $\vdots$ | $\vdots$ | | 124 | $q_{3}$ | $4P$ | $3 \cdot 4P$ | | 125 | $q_{2}$ | $1P$ | $3P$ | | 126 | $q_{1}$ | $4P_{\text{aux}}$ | $3 \cdot 4P_{\text{aux}}$ | | 127 | $q_{0}$ | $1P_{\text{aux}}$ | $3P_{\text{aux}}$ | To confirm the structure of the above table, we have a added a test called `hash_ladder_structure` in `pedersen.test.cpp`. ### Proving Collision Resistance We prove this by contradiction. Let $a,b \in \mathbb{F}_r$ be two distinct field scalars for which we get a collision (Pedersen hashes yield curve points whose x-coordinates are the same). Thus, we have: $$ \begin{aligned} H_P(a) &= \gamma_1 P_{\text{aux}}+\lambda_1P -s_1P_{\text{skew}} = A \implies (x,y_1) \\ H_P(b) &= \gamma_2 P_{\text{aux}}+\lambda_2P -s_2P_{\text{skew}} = B \implies (x,y_2). \end{aligned} $$ For collision to occur, we need $y^2_1 = y^2_2 \implies y_1 = \pm y_2$ (as both $A$ and $B$ satisfy the curve equation $y^2 = x^3 - 17$). If $y_1 = -y_2$, $A$ and $B$ are inverses of each other i.e. $$ \begin{aligned} A + B &= \mathcal{O} \\ \implies (\lambda_1+\lambda_2)P + (\gamma_1+\gamma_2) P_{\text{aux}} -(s_1+s_2)P_{\text{skew}} &= \mathcal{O} \end{aligned} \tag{1} $$ Since $1 \le \lambda_1, \lambda_2 < 2^{251} < \frac{p}{2}$, we know that $0 < (\lambda_1 + \lambda_2) < p$. Hence to satisfy equation (1) with uniform generators $P$, $P_{\text{aux}}$, and $P_{\text{skew}}$, an adversary needs to come up with non-zero scalar solution $(a_1,a_2,a_3)$ such that $$ a_1P+a_2P_{\text{aux}}+a_3P_{\text{skew}} = \mathcal{O} $$ holds. Here $a_1$ is constrained to be non-zero ensuring that the adversary needs to produce non-zero scalar solution. Coming up with such a solution is equivalent to solving DL problem as shown in this [document](https://hackmd.io/urZOnB1gQimMqsMdf7ZBvw?view). If $y_1=y_2$, we must have $$ \begin{aligned} &A = B \\ &\implies \lambda_1P + \gamma_1P_{\text{aux}} - s_1P_{\text{skew}} = \lambda_2P + \gamma_2P_{\text{aux}} - s_1P_{\text{skew}} \\ &\implies (\gamma_1-\gamma_2)P_{\text{aux}} +(\lambda_2-\lambda_1)P +(s_2-s_1)P_{\text{skew}} = \mathcal{O}. \end{aligned} \tag{2} $$ Let ${c = (\gamma_1-\gamma_2), d = (\lambda_2-\lambda_1)}$, and $e = (s_2-s_1)$. All $c$, $d$, and $e$ cannot be zero simultaneously since then $a = b$, a contradiction to our assumption of having two distinct scalars producing collision. Again, $e$ should ideally lie in the set $\{-1,0,1\}$. But we do not need to enforce that since that will make the adversary more powerful. Coming up with non-zero scalar solution such that equation (2) holds is equivalent to solving DL problem as shown in this [document](https://hackmd.io/urZOnB1gQimMqsMdf7ZBvw?view). Hence our Pedersen hash function is collision resistant. $\blacksquare$ ### Gate Structure: In this section, we describe the gate structure for computing a fixed-base scalar multiplication in circuits. We call this as `fixed_group_add_gate` in the code. For input $b = t + \sum_{i = 0}^{126} b_i \cdot 4^{i}$ and generator $P\in \mathbb{G}_1$, we have $n=127$ quads (excluding $q_{127}=1$) and we need a total of $n+1 = 128$ following gates. | $w_1$ | $w_2$ | $w_3$ | $w_4$ | | -------- | -------- | -------- | --- | | $x_0$ | $y_0$ | $4^{-n}$ | $a_0 = (1 - s4^{-n}) \in \{1, 1 - 4^{-n}\}$ | | $x_1$ | $y_1$ | $x_{\alpha, 0}$ | $a_1 = 4a_0 + q_{126}$ | | $x_2$ | $y_2$ | $x_{\alpha, 1}$ | $a_2 = 4^2a_0 + 4q_{126} + q_{125}$ | | $\vdots$ | $\vdots$ | $\vdots$ | $\vdots$ | | $x_i$ | $y_i$ | $x_{\alpha, i-1}$ | $a_{i} = 4^{i}a_0 + 4^{i-1}q_{126} + 4^{i-2}q_{125} + \dots + 4q_{127-i+1} + q_{127-i}$ | | $x_{i+1}$ | $y_{i+1}$ | $x_{\alpha, i}$ | $a_{i+1} = 4^{i+1}a_0 + 4^{i}q_{126} + 4^{i-1}q_{125} + \dots + 4q_{127-i} + q_{127-i-1}$ | | $\vdots$ | $\vdots$ | $\vdots$ | $\vdots$ | | $x_{n}$ | $y_{n}$ | $x_{\alpha, n-1}$ | $a_{n} = 4^{n}a_0 + 4^{n-1}q_{126} + 4^{n-2}q_{125} + \dots + 4^2q_2 + 4q_{1} + q_{0} =: b$ | For the $i$th gate, we have the wires values $w_1 = x_i, w_2 = y_i$ such that $(x_i,y_i) \in \mathbb{G}_1$ denotes the cumulative addition until that gate. In the third wire of the *next* gate, we store $x_{\alpha,i}$ - the x-coordinate of the next point to be added, i.e. $\{-3P_{124-i}, -P_{124-i}, P_{124-i}, 3P_{124-i}\}$. Thus, we have: $$ (x_{i+1},y_{i+1}) = (x_{i},\ y_{i}) \ +_{ecc} \ (x_{\alpha, i},\ y_{\alpha, i}) $$ The forth wire values are the cumulative sums of the quad values. We use cumulative sums for efficiency reasons. Therefore, the wire value $w_4$ of the last gate should be equal to the original input $b$. Note that in the last gate, we have $4^{n}a_0 = 4^n(1 - s4^{-n}) = 4^n - s$ gives $4^{n}$ when $b$ is odd and $4^{n}-1$ when $b$ is even, so the skew term is taken care of. We describe the constraints on the wire values (i.e. witnesses) in the next section. ### Constraints To prove that the fixed-base scalar multiplication is correctly computed, we need to enforce the following constraints on the $i$th gate for all $i \in \{0,1,\dots, n-1\}$: ![](https://hackmd.io/_uploads/Hy3G2KHOY.png) 1. Check if the quad value is indeed $\{-3,-1,1,3\}$. We have $\delta_i = q_{126-i} = a_{i+1} - 4 a_i$ $$ \boxed{(\delta_i+3)(\delta_i+1)(\delta_i-1)(\delta_i-3) = 0} \tag{a} $$ 2. Check if the x-coordinate of the point to be added next is correct. Now, we know the point to be added has to be $(x_{\alpha,i}, y_{\alpha,i}) \in \left\{(x_{\beta}, y_{\beta}), (x_{\beta}, -y_{\beta}), (x_{\gamma}, y_{\gamma}), (x_{\gamma}, -y_{\gamma})\right\}$ where $(x_{\beta}, y_{\beta}) \equiv P_{124-i}$ and $(x_{\gamma}, y_{\gamma}) \equiv 3P_{124-i}$ are pre-computed constants (note that $\delta_i^2$ can either be 9 or 1 below). $$ \begin{aligned} x_{\alpha,i} :=& \left(\frac{\delta_i^2 - 9}{-8}\right)x_\beta + \left(\frac{\delta_i^2 - 1}{8}\right)x_{\gamma} \\ =& \bigg(\underbrace{\frac{x_\gamma-x_\beta}{8}}_{q_{1,i}}\bigg) \delta_i^2 + \bigg(\underbrace{\frac{9x_\beta - x_\gamma}{8}}_{q_{2,i}}\bigg) \end{aligned} $$ $$ \therefore \quad \boxed{q_{1,i}\delta_i^2 + q_{2,i} - x_{\alpha,i} = 0} \tag{b} $$ 3. Check if x-coordinate of the resulting point addition is correct. For this, we use the elliptic curve point addition formula: $$ \begin{gather} x_{i+1} = \left(\frac{y_{\alpha,i} - y_i}{x_{\alpha,i} - x_i}\right)^2 - x_{\alpha,i} - x_i \\ \implies (x_{i+1} + x_{\alpha,i} + x_i)(x_{\alpha,i} - x_i)^2 - (y_{\alpha,i} - y_i)^2 = 0 \\ \implies (x_{i+1} + x_{\alpha,i} + x_i)(x_{\alpha,i} - x_i)^2 - y_{\alpha,i}^2 + 2y_{\alpha,i}y_i - y_i^2 = 0 \tag{2} \end{gather} $$ We can derive $y_{\alpha,i}$ from $x_{\alpha,i}$ as $$ \begin{aligned} y_{\alpha,i} :=& \ \frac{(x_{\alpha,i}-x_\gamma)\delta_i}{(x_\beta - x_\gamma)}y_\beta+ \frac{(x_{\alpha,i} - x_\beta)\delta_i}{3(x_\gamma - x_\beta)}y_\gamma \\ =& \ \bigg( \underbrace{\frac{3y_\beta - y_\gamma}{3(x_\beta - x_\gamma)}}_{q_{ecc,i}} \bigg) x_{\alpha,i}\delta_i + \bigg( \underbrace{\frac{x_\beta y_\gamma - 3x_\gamma y_\beta}{3(x_\beta - x_\gamma)}}_{q_{3,i}} \bigg)\delta_i \end{aligned} $$ Substituting $y_{\alpha,i} = q_{ecc,i}x_{\alpha,i}\delta_i + q_{3,i}$ and $y_{\alpha,i}^2 = x_{\alpha,i}^3 + b_{\text{grumpkin}}$ in $(2)$, we get: $$ \boxed{(x_{i+1} + x_{\alpha,i} + x_i)(x_{\alpha,i} - x_i)^2 - (x_{\alpha,i}^3 + b_{\text{grumpkin}}) + 2y_i(q_{ecc,i}x_{\alpha,i}\delta_i + q_{3,i}) - y_i^2 = 0} \tag{c} $$ 4. Check if the y-coordinate of the resulting point addition is correct. $$ \begin{gather} y_{i+1} = \bigg(\frac{y_{\alpha,i} - y_i}{x_{\alpha,i} - x_i}\bigg)\big(x_i - x_{i+1}\big) - y_i \\ \therefore \quad \boxed{(y_{i+1} + y_i)(x_{\alpha,i} - x_i) - (q_{ecc,i}x_{\alpha,i}\delta_i + q_{3,i} - y_i)(x_i - x_{i+1}) = 0} \tag{d} \end{gather} $$ 5. Initialisation for $i=0$: we need to check $a_0 \in \{1, 1-4^{-n}\}$. Note that the selector $q_{c,i} = 0$ for $i\neq 0$. $$ \begin{gather} \boxed{q_{c,i}(1 - a_i)(1 - 4^{-n} - a_i) = 0} \tag{e} \end{gather} $$ 5. Initialisation for $i=0$: $x_0$ must be the x-coordinate of either of $Q_0=4^{125}P$ or $Q_1=4^{125}P - P_{\text{skew}}$. Define selector values for $i=0$: $$ \begin{gather} q_{4,0}=Q_{0, x}, \quad q_{5,0}=(Q_{0,x}-Q_{1,x}) \\ q_{m,0}=Q_{0, y}, \quad q_{c,0}=(Q_{0,y}-Q_{1,y}) \\ \\ \boxed{q_{c,i} (w_{3,i}(q_{4,i} - x_i) + q_{5,i}(1 - a_i)) = 0} \tag{f} \end{gather} $$ 7. Initialisation for $i=0$: $y_0$ must be the y-coordinate of either of $Q_0=4^{125}P$ or $Q_1=4^{125}P- P_{\text{skew}}$. $$ \boxed{q_{c,i} (w_{3,i}(q_{m,i} - y_i) + q_{c,i}(1 - a_i)) = 0} \tag{g} $$ We combine the constraints in equations $(a,b,c,d,e,f, g)$ using powers of a challenge. Linear and non-linear terms are separately computed in `transition_widgets/turbo_fixed_base_widget.hpp`. ### Range Constraining the Input to Pedersen Hashes Suppose our input to the hash function is $b \in \mathbb{Z}_2^{254}$, we need to prove that $b \in \mathbb{F}_r$. For that, we can split the bit-wise representation of $b$ in two parts: $$ b = \big(\underbrace{b_{hi}}_{128} \ \| \ \underbrace{b_{lo}}_{126} \big) \in \mathbb{F}_r $$ Now, suppose we split the field modulus in a same manner as $r = (r_{hi} \ \| \ r_{lo})$. If we prove either of the following condition holds: 1. $r_{hi} - b_{hi} >0$ 2. $r_{hi} - b_{hi} =0$ and $r_{lo} - b_{lo} >0$ then we effectively prove that $b < r$. Only question remains to be answered is how do we compute $b_{hi}$ and $b_{lo}$ from the accumulating $w$NAF sums. #### Computing Upper Half $b_{hi}$ We can compute the upper half $b_{hi}$ from $a_{64}$ in the following way. For $i=64$, we have: $$ \begin{aligned} a_{64} &= 4^{64}a_0 + 4^{63}q_{126} + 4^{62}q_{125} + \dots + 4q_{64} + q_{63} \\ &= 4^{64}\left(1 - s \cdot 4^{-127}\right) + 4^{63}q_{126} + 4^{62}q_{125} + \dots + 4q_{64} + q_{63} \\ &= -s \cdot 4^{-63} + \underbrace{4^{64}\cdot 1 + 4^{63}q_{126} + 4^{62}q_{125} + \dots + 4q_{64} + q_{63}}_{b_{hi}} \\ &= -s \cdot 4^{-63} + b_{hi} \end{aligned} $$ Note that the most significant quad in $b_{hi}$ too is 1 for the same reasons why it was 1 for $b$. $$ \boxed{\therefore \quad b_{hi} = a_{64} + s \cdot 2^{-126}} $$ #### Computing Lower Half $b_{lo}$ We compute the lower half as follows $$ \begin{aligned} b_{lo} &= a_{127} - 2^{126} \cdot b_{hi} \\ &= a_{127} - 2^{126}\left(a_{64} + s \cdot 2^{-126}\right) \end{aligned} $$ $$ \boxed{\therefore \quad b_{lo} = a_{127} - 2^{126}a_{64} - s} $$ ### References 1. Barrtenberg implementation: a. [`stdlib/hash/pedersen/pedersen.cpp`](https://github.com/AztecProtocol/aztec2-internal/blob/defi-bridge-project/barretenberg/src/aztec/stdlib/hash/pedersen/pedersen.cpp) b. [`plonk/proof_system/widgets/transition_widgets/turbo_fixed_base_widget.hpp`](https://github.com/AztecProtocol/aztec2-internal/blob/defi-bridge-project/barretenberg/src/aztec/plonk/proof_system/widgets/transition_widgets/turbo_fixed_base_widget.hpp) 2. The Turbo-PLONK program syntax for specifying SNARK programs, Ariel G and Zac W. Link: https://docs.zkproof.org/pages/standards/accepted-workshop3/proposal-turbo_plonk.pdf 3. Fixed-base Multiplication gate, Cody G. Link: https://hackmd.io/MCmV2bipRYelT1WUNLj02g?view