# Scalar Multiplication Verification Reduction in DV-SNARK Garbled Cirsuits ## Introduction Multi-Scalar Multiplication (MSM) is a core operation in elliptic curve cryptography. It computes the sum: $$\sum_{i=1}^{n} [a_i] P_i$$ , where each $ a_i $ is a large integer (scalar), and each $ P_i$ is a point on an elliptic curve. In advanced zero-knowledge proof systems like SNARKs, verifying a statement such as $Q=[k]P$ (a large scalar multiplication on an elliptic curve) or $\sum_{i=1}^{n} [a_i] P_i$ is computationally expensive because the circuit must simulate the entire process, leading to a gate count that grows linearly with the bit-length of scalars (e.g., 256 bits). For small number of scalars $n$, decomposing large integer within a circuit (e.g., in a DV-SNARK with a garbled circuit) transforms this costly verification into a more efficient one. Consider $Q=[k]P$, the decomposition leverages a hint—auxiliary values $x$ and $z$ provided by the prover—to transform the verification task. The process follows these steps: - Hint Generation: The prover finds two smaller integers $x$ and $z$ (e.g., each ~128 bits for a 256-bit $k$) such that: $k⋅z≡x \mod r$, where $r$ is the order of the elliptic curve’s base point. This equation is derived from number-theoretic algorithms that solve for a linear relation. - Verification of the Hint: The circuit checks two conditions: -- Scalar Relation: $k⋅z ≡ x \mod r$. This is efficient because modular arithmetic with fixed $r$ can be implemented with low gate count. -- Point Relation: $[x]P−[z]Q=O$ (the identity point). Since $x$ and $z$ have roughly half the bit-length of $k$, the double scalar multiplication requires fewer point operations and constraints with precomputation of $P + Q$. Similar decomposition exists for other small number of scalars, see paper [Fast elliptic curve scalar multiplications in SN(T)ARK circuits]( https://eprint.iacr.org/2025/933.pdf) for a more detailed description. ## Application to GCs of DV-SNARK verifier in GOAT BitVM2-GC In DV-SNARK verifier of GOAT BitVM2-GC, for a proof $\pi = (P, Q, a) \quad \text{for statement } R$, the designated verifier verifies: $P = \left( a + \left( a^2 + i(\alpha) \right) \delta \right) G + \left( \tau \epsilon - \alpha \epsilon \right) Q$ , with $~~\alpha = h(R, P).$ Similarly, we can decompose MSM(2, N) into MSM(3, 2/3N). Which means: $$ \begin{align*} [k_1]P_1 + [k_2]P_2 &= Q \Leftrightarrow \quad [x_1]P_1 + [x_2]P_2 - [z]Q = 0 \end{align*} $$ The scalars $x_1, x_2$ and $z$ are expected to be bounded by $1.22r^{2/3}$. ## Computing $\sum_{i=1}^nk_iG_i$ Suppose $k_i$ are N bits values, to compute $\sum_{i=1}^nk_iG_i$, we precompute $\sum_i b_iG_i$ for $b_i \in \{0,1\}$. Then Initialize $R= 0 ,$Loop through the bits $b_i$ of $k_i$ from the most signzificant bit (MSB) to the least significant bit (LSB): $$ R = 2 R; R = R +\sum_i b_iG_i $$ **Complexity**: $MSM(n, N) = (2^n - n - 1 + N - 1) \cdot \text{Add} + (N - 1) \cdot \text{Dbl}$ Suppose we have $n$ scalars $k_1, k_2, \dots, k_n$, each represented as an $N$-bit integer. To efficiently compute the multi-scalar multiplication (MSM) $$ \sum_{i=1}^n k_i G_i $$ on an elliptic curve, we can use the following window-based method with precomputation. **Precomputation Phase** Precompute all possible subgroup sums of the form $$ \sum_{i=1}^n b_i G_i \quad \text{for all} \quad b_i \in \{0,1\}. $$ This results in $2^n$ precomputed points. **Evaluation Algorithm** - State Initialize the accumulator point $R = \mathcal{O}$ (the point at infinity) - Foreach bit position $j$ from MSB to LSB of scalars $k_i$ -- $R \leftarrow 2R$ (Point doubling) -- Let $b_{i,j}$ denote the $j$-th bit of scalar $k_i$, $R \leftarrow R + \sum_{i=1}^n b_{i,j} G_i$ (Add precomputed sum) **Complexity Analysis** The total cost of the algorithm is: $$ \text{MSM}(n, N) = (2^n - n - 1 + N - 1) \cdot \text{Add} + (N - 1) \cdot \text{Dbl} $$ , where $\textbf{Add}$ denotes an elliptic curve point addition and $\textbf{Dbl}$ a point doubling. This approach reduces the number of curve operations by leveraging precomputation and processing all scalars in a bit-sliced manner. ## Usage in DV-SNARK Verifier **1. Description** Designated verifier needs to verify: $$P = \left( a + \left( a^2 + i(\alpha) \right) \delta \right) G + \left( \tau \epsilon - \alpha \epsilon \right) Q \Leftrightarrow P = [k_1]G + [k_2]Q \quad (1)$$ In the implementation, we denote $u_0 = k_1, v_0 = k_2$. We can use hinted double scalar multiplication to decomposes the $MSM(2, N)$ into $MSM(3, 2N/3)$. It is equivalent to theses checks: - $[x_1]G + [x_2]Q - [z]P = 0$ - $k_1 = x_1/z \mod r$ - $k_2 = x_2/z \mod r$ for scalars $x_1,x_2,z$ bounded by $1.22r^{2/3}$. **2. Prover** Prover first sends 2 commitments: $Q, P$ to verifier, then receives the trapdoor values to compute $k_1, k_2$. Then, prover uses GLV reduction to find $x_1, x_2, z$ such that: $|x_1|, |x_2|, |z| < 2^{2N/3 + 1}$ , where $N= 232$ is the number of bits in the field in our case. Because the decomposition can find $x1, x2, z$ which may $\le 0$, so we need to caucluate the negation of them if necessary. **3. Verifier** Instead of verifying $[k_1]G + [k_2]Q$, the Verifier verifies - $[x_1]G + [x_2]Q - [z]P = 0$ , where each of $|x_1|, |x_2|, |z|$ has $156$ bits. - $k_1 = x_1/z \mod r \quad(3)$ - check: $k_2 = x_2/z \mod r \quad(4)$ Note that the `hinted_double_scalar_mul` can compute: $[a]P_1 + [b]P_2 + [c]P_3$ where $0 < a, b, c < 2^{156}$, so if $x_1, x_2 <0$ or $z > 0$, we need to first negates them. ## Test Benchmark **Tau-adic window scalar multiplication** We can also efficiently compute the scalar multiplication using the Frobenius endomorphism map to efficiently compute $\tau(P)$, and combining with the $\tau$-adic representation. It enables use to process multiple bits in batch (using a window size of $\omega = 5$) with a single point addition, completely eliminating the need for point doubling operations. (see [implementation for efficient arithmetic on Koblitz curves](https://github.com/VanhGer/dv-pari-circuit/blob/3ff4d6b86660e879f04ab743752d2d19af4a7b06/src/curve_scalar_mul_ckt.rs#L1) for more details.) Although this approach doubles the bit-length (from 232 to 470), it significantly reduces the number of point additions required (470/5 = 94). Together with the precomputation of batched bits and Frobenius transform, we derive the number of gates: - and variants: 8,796,030 - xor variants: 2,365,188,084 - total: 2,373,984,114 **Hinted double scalar multiplication** Using the decomposition functionality provided by the codebase at [decomposition for large intergers](https://github.com/yelhousni/scalarmul-in-snark/blob/main/sage/decompose.py), we derive the gate number of using decomposition methed as: - and variants: 6,707,171 - xor variants: 3,370,130,781 - total: 3,376,837,952 The gates involved in computing the MSM $[x_1]G + [x_2]Q - [z]P$ are: - and variants: 5,635,105 - xor variants: 3,364,141,160 - total: 3,369,776,265 **Comparasion** Hinted double scalar multiplication reduce the total non-free gates from 8.8M to 6.7M (about $24\%$ reduction).