# Scalar Multiplication Verification Reduction in DV-SNARK Garbled Cirsuits
## Introduction
Multi-Scalar Multiplication (MSM) is a core operation in elliptic curve cryptography. It computes the sum:
$$\sum_{i=1}^{n} [a_i] P_i$$
, where each $ a_i $ is a large integer (scalar), and each $ P_i$ is a point on an elliptic curve.
In advanced zero-knowledge proof systems like SNARKs, verifying a statement such as $Q=[k]P$ (a large scalar multiplication on an elliptic curve) or $\sum_{i=1}^{n} [a_i] P_i$ is computationally expensive because the circuit must simulate the entire process, leading to a gate count that grows linearly with the bit-length of scalars (e.g., 256 bits). For small number of scalars $n$, decomposing large integer within a circuit (e.g., in a DV-SNARK with a garbled circuit) transforms this costly verification into a more efficient one.
Consider $Q=[k]P$, the decomposition leverages a hint—auxiliary values $x$ and $z$ provided by the prover—to transform the verification task. The process follows these steps:
- Hint Generation: The prover finds two smaller integers $x$ and $z$ (e.g., each ~128 bits for a 256-bit $k$) such that: $k⋅z≡x \mod r$, where $r$ is the order of the elliptic curve’s base point. This equation is derived from number-theoretic algorithms that solve for a linear relation.
- Verification of the Hint: The circuit checks two conditions:
-- Scalar Relation: $k⋅z ≡ x \mod r$. This is efficient because modular arithmetic with fixed $r$ can be implemented with low gate count.
-- Point Relation: $[x]P−[z]Q=O$ (the identity point). Since $x$ and $z$ have roughly half the bit-length of $k$, the double scalar multiplication requires fewer point operations and constraints with precomputation of $P + Q$. Similar decomposition exists for other small number of scalars, see paper
[Fast elliptic curve scalar multiplications in SN(T)ARK circuits]( https://eprint.iacr.org/2025/933.pdf) for a more detailed description.
## Application to GCs of DV-SNARK verifier in GOAT BitVM2-GC
In DV-SNARK verifier of GOAT BitVM2-GC, for a proof $\pi = (P, Q, a) \quad \text{for statement } R$, the designated verifier verifies:
$P = \left( a + \left( a^2 + i(\alpha) \right) \delta \right) G + \left( \tau \epsilon - \alpha \epsilon \right) Q$ , with $~~\alpha = h(R, P).$
Similarly, we can decompose MSM(2, N) into MSM(3, 2/3N). Which means:
$$
\begin{align*}
[k_1]P_1 + [k_2]P_2 &= Q \Leftrightarrow \quad [x_1]P_1 + [x_2]P_2 - [z]Q = 0
\end{align*}
$$
The scalars $x_1, x_2$ and $z$ are expected to be bounded by $1.22r^{2/3}$.
## Computing $\sum_{i=1}^nk_iG_i$
Suppose $k_i$ are N bits values, to compute $\sum_{i=1}^nk_iG_i$, we precompute $\sum_i b_iG_i$ for $b_i \in \{0,1\}$. Then
Initialize $R= 0 ,$Loop through the bits $b_i$ of $k_i$ from the most signzificant bit (MSB) to the least significant bit (LSB):
$$
R = 2 R; R = R +\sum_i b_iG_i
$$
**Complexity**:
$MSM(n, N) = (2^n - n - 1 + N - 1) \cdot \text{Add} + (N - 1) \cdot \text{Dbl}$
Suppose we have $n$ scalars $k_1, k_2, \dots, k_n$, each represented as an $N$-bit integer. To efficiently compute the multi-scalar multiplication (MSM)
$$
\sum_{i=1}^n k_i G_i
$$
on an elliptic curve, we can use the following window-based method with precomputation.
**Precomputation Phase**
Precompute all possible subgroup sums of the form
$$
\sum_{i=1}^n b_i G_i \quad \text{for all} \quad b_i \in \{0,1\}.
$$
This results in $2^n$ precomputed points.
**Evaluation Algorithm**
- State Initialize the accumulator point $R = \mathcal{O}$ (the point at infinity)
- Foreach bit position $j$ from MSB to LSB of scalars $k_i$
-- $R \leftarrow 2R$ (Point doubling)
-- Let $b_{i,j}$ denote the $j$-th bit of scalar $k_i$, $R \leftarrow R + \sum_{i=1}^n b_{i,j} G_i$ (Add precomputed sum)
**Complexity Analysis**
The total cost of the algorithm is:
$$
\text{MSM}(n, N) = (2^n - n - 1 + N - 1) \cdot \text{Add} + (N - 1) \cdot \text{Dbl}
$$ , where $\textbf{Add}$ denotes an elliptic curve point addition and $\textbf{Dbl}$ a point doubling.
This approach reduces the number of curve operations by leveraging precomputation and processing all scalars in a bit-sliced manner.
## Usage in DV-SNARK Verifier
**1. Description**
Designated verifier needs to verify:
$$P = \left( a + \left( a^2 + i(\alpha) \right) \delta \right) G + \left( \tau \epsilon - \alpha \epsilon \right) Q \Leftrightarrow P = [k_1]G + [k_2]Q \quad (1)$$
In the implementation, we denote $u_0 = k_1, v_0 = k_2$.
We can use hinted double scalar multiplication to decomposes the $MSM(2, N)$ into $MSM(3, 2N/3)$.
It is equivalent to theses checks:
- $[x_1]G + [x_2]Q - [z]P = 0$
- $k_1 = x_1/z \mod r$
- $k_2 = x_2/z \mod r$
for scalars $x_1,x_2,z$ bounded by $1.22r^{2/3}$.
**2. Prover**
Prover first sends 2 commitments: $Q, P$ to verifier, then receives the trapdoor values to compute $k_1, k_2$. Then, prover uses GLV reduction to find $x_1, x_2, z$ such that:
$|x_1|, |x_2|, |z| < 2^{2N/3 + 1}$
, where $N= 232$ is the number of bits in the field in our case.
Because the decomposition can find $x1, x2, z$ which may $\le 0$, so we need to caucluate the negation of them if necessary.
**3. Verifier**
Instead of verifying $[k_1]G + [k_2]Q$, the Verifier verifies
- $[x_1]G + [x_2]Q - [z]P = 0$
, where each of $|x_1|, |x_2|, |z|$ has $156$ bits.
- $k_1 = x_1/z \mod r \quad(3)$
- check: $k_2 = x_2/z \mod r \quad(4)$
Note that the `hinted_double_scalar_mul` can compute: $[a]P_1 + [b]P_2 + [c]P_3$ where $0 < a, b, c < 2^{156}$, so if $x_1, x_2 <0$ or $z > 0$, we need to first negates them.
## Test Benchmark
**Tau-adic window scalar multiplication**
We can also efficiently compute the scalar multiplication using the Frobenius endomorphism map to efficiently compute $\tau(P)$, and combining with the $\tau$-adic representation. It enables use to process multiple bits in batch (using a window size of $\omega = 5$) with a single point addition, completely eliminating the need for point doubling operations. (see [implementation for efficient arithmetic on Koblitz curves](https://github.com/VanhGer/dv-pari-circuit/blob/3ff4d6b86660e879f04ab743752d2d19af4a7b06/src/curve_scalar_mul_ckt.rs#L1) for more details.)
Although this approach doubles the bit-length (from 232 to 470), it significantly reduces the number of point additions required (470/5 = 94). Together with the precomputation of batched bits and Frobenius transform, we derive the number of gates:
- and variants: 8,796,030
- xor variants: 2,365,188,084
- total: 2,373,984,114
**Hinted double scalar multiplication**
Using the decomposition functionality provided by the codebase at [decomposition for large intergers](https://github.com/yelhousni/scalarmul-in-snark/blob/main/sage/decompose.py), we derive the gate number of using decomposition methed as:
- and variants: 6,707,171
- xor variants: 3,370,130,781
- total: 3,376,837,952
The gates involved in computing the MSM $[x_1]G + [x_2]Q - [z]P$ are:
- and variants: 5,635,105
- xor variants: 3,364,141,160
- total: 3,369,776,265
**Comparasion**
Hinted double scalar multiplication reduce the total non-free gates from 8.8M to 6.7M (about $24\%$ reduction).