ZeroGC: A Garbled Circuit Scheme for BitVM3 with Zero Ciphertext per Gate

# ZeroGC: A Garbled Circuit Scheme for BitVM3 with Zero Ciphertext per Gate The primary goal of [BitVM3](https://bitvm.org/) is to compile a garbled circuit scheme that minimizes communication complexity between the operator and watcher in the Bitcoin network. In the BitVM3 framework, the garbler (operator) may share wire values with evaluators (challengers), provided that evaluators cannot compute both the 0-label and 1-label for any wire. Consequently, the privacy property required in classical garbled circuit constructions can be relaxed, opening significant opportunities for efficient design. This relaxation is known in the literature as privacy-free garbled circuits. In this post, we introduce **ZeroGC**, a new garbled circuit scheme for arbitrary Boolean circuits that achieves **zero ciphertext per gate**. While this property was previously realized in the elegant construction of [Kondi et al., 2017](https://eprint.iacr.org/2017/561), their approach was restricted to formulas (i.e., circuits with fan-out one). ZeroGC overcomes this limitation by employing a lattice-based method to extend zero-ciphertext garbling to general circuits. Building on ideas from [BitGC](https://eprint.iacr.org/2024/1988), ZeroGC eliminates the need for information transfer from the garbler to the evaluator by allowing the evaluator to derive the least significant bit of the 0-label during the evaluation algorithm. This approach also dispenses with the need for expensive pseudorandom generators under ciphertexts, making ZeroGC substantially faster than BitGC. Our performance analysis shows that ZeroGC reduces garbler-evaluator communication to **several megabytes** for SNARK circuits of arbitrary size. With engineering optimizations, the evaluation of circuits with up to **10 billion gates** can be completed **within a day** on a personal computer. ZeroGC leverages the Somewhat Homomorphic Encryption (SWHE) with distributed decryption as proposed in BitGC. We begin by recalling the SWHE with distributed decryption, then describe the ZeroGC scheme in detail. Next, we analyze the correctness of ZeroGC and present a standard reduction to argue its security. Finally, we provide a performance analysis, demonstrating the practicality of ZeroGC for BitVM3. ## SWHE with Distributed Decryption > More details of this section can be found in the BitGC paper. An SWHE with distributed decryption scheme must satisfy the following properties of message homomorphism and distributed decryption, which are informally defined below. **Message homomorphism.** Given the ciphertexts $\tau_1, \ldots, \tau_\ell$ on messages $m_1, \ldots, m_\ell$ and a low-depth circuit $f$, the following property holds: $$ \mathrm{Dec}\left(sk, \widetilde{f}(\tau_1, \ldots, \tau_\ell)\right) = f(m_1, \ldots, m_\ell) \cdot sk, $$ where $\widetilde{f}$ represents the homomorphic computation of $f$ over the ciphertexts and can be performed in polynomial time. **Distributed decryption.** For a ciphertext $\tau$ on a message $m$, the following properties hold: 1. Linear distributed decryption. Let $sk_0 \in \mathcal{R}_p$ be a uniform element such that $\mathrm{LSB}(sk_0) = 0$, and set $sk_1 = sk_0 + sk \bmod p$, where $\mathrm{LSB}(sk_1) = 1$ for an even $p$. Then, with overwhelming probability, we have $$ \begin{align} \mathrm{Dec}(sk_i, \tau) &= -\mathrm{Dec}(sk_i, -\tau)\ \text{for } i \in \{0, 1\}, \\ \mathrm{Dec}(sk_1, \tau) &= \mathrm{Dec}(sk_0, \tau) + \mathrm{Dec}(sk, \tau). \end{align} $$ The above two equations also imply $\mathrm{Dec}(sk, \tau) = -\mathrm{Dec}(sk, -\tau)$. 2. Correlated-key distributed decryption. Let $sk_0, sk_0' \in \mathcal{R}_p$ be two uniform elements such that $\mathrm{LSB}(sk_0) = \mathrm{LSB}(sk_0') = 0$. Let $sk_1 = sk_0 + sk \bmod p$ and $sk_1' = sk_0' + sk \bmod p$, where $\mathrm{LSB}(sk_1) = \mathrm{LSB}(sk_1') = 1$ given that $p$ is even. There exists a polynomial-time algorithm $\widehat{\mathrm{Dec}}$ such that with overwhelming probability, $$ \begin{align} \widehat{\mathrm{Dec}}(sk_0, sk_0', \tau) &= -\widehat{\mathrm{Dec}}(sk_0, sk_0', -\tau) \\ \widehat{\mathrm{Dec}}(sk_i, sk_j', \tau) - \widehat{\mathrm{Dec}}(sk_0, sk_0', \tau) &= i \cdot j \cdot \mathrm{Dec}(sk, \tau),\quad \text{for any } i, j \in \{0, 1\}. \end{align} $$ ## The Scheme Given security parameter $\lambda$ and the computation circuit $f$, the garbler executes the garble algorithm $\text{Garble}(1^\lambda, f)$ as follows: 1. Run $(\text{params}, \Delta) \leftarrow \text{Gen}(1^\lambda, 1^L)$ with $\text{LSB}(\Delta) = 1$, where $\Delta$ is the secret key. Randomly sample $K \leftarrow \{0,1\}^\lambda$ as a PRF key. 2. Run $E_0 = \textsf{Enc}(\Delta, 0)$ and $E_1 = \textsf{Enc}(\Delta, 1)$, which encrypt 0 and 1 separately. 3. For each $i \in \text{Inputs}(f)$, randomly sample a 0-label $W^0_i \leftarrow \mathcal{R}_p$, where $\mathcal{R}_p =\mathbb{Z}_p[X]/(X^n+1)$, set $\pi_i := \text{LSB}(W^0_i)$. 4. For each gate $g \in \text{GateIndex}(f)$ in topological order, set $(\alpha, \beta) \leftarrow \text{Gateinputs}(f,g)$ as the input index. The 1-label $W^1_\alpha := W^0_\alpha + (-1)^{\pi_\alpha} \cdot \Delta \in \mathcal{R}_p$ and $W^1_\beta := W^0_\beta + (-1)^{\pi_\beta} \cdot \Delta \in \mathcal{R}_p$ are under the relationship $W^{\pi_\alpha}_\alpha = W^0_\alpha - \pi_\alpha \cdot \Delta \in \mathcal{R}_p$ and $W^{\pi_\beta}_\beta = W^0_\beta - \pi_\beta \cdot \Delta \in \mathcal{R}_p$, then do the following: (a) For each $i, j \in \{0,1\}$, compute $z_{g, i, j} = G(g, \pi_\alpha \oplus i, \pi_\beta \oplus j)$ , where $G$ is the evaluation function of gate $g$, and homomorphically compute $\tilde{\tau}_{g,i,j} = G(g, \pi_\alpha \oplus E_i, \pi_\beta \oplus E_j)$, resulting in the ciphertext $\tau_{g,i,j}$ that encrypts $z_{g,i,j}$. (b) Homomorphically compute HE ciphertexts $\tilde{\tau}_{g,1} := \tilde{\tau}_{g,1,0} - \tilde{\tau}_{g,0,0}, \tilde{\tau}_{g,2} := \tilde{\tau}_{g,0,1} - \tilde{\tau}_{g,0,0}, \tilde{\tau}_{g,3} := \tilde{\tau}_{g,0,0} + \tilde{\tau}_{g,1,1} - \tilde{\tau}_{g,1,0} - \tilde{\tau}_{g,0,1}$. $c$ Run $\tilde{\mathcal{L}}_{g,0,0} \leftarrow \text{Dec}(W^{\pi_\alpha}_\alpha, \tilde{\tau}_{g,1}) + \text{Dec}(W^{\pi_\beta}_\beta, \tilde{\tau}_{g,2}) + \widehat{\text{Dec}}(W^{\pi_\alpha}_\alpha, W^{\pi_\beta}_\beta, \tilde{\tau}_{g,3}) + \text{PRF}(K, g)$. Then, compute $\pi_g := \text{LSB}(\tilde{\mathcal{L}}_{g,0,0}) \oplus z_{g,0,0}$. (d) Compute $\tau_{g,1} := (-1)^{\pi_g} \cdot \tilde{\tau}_{g,1}, \tau_{g,2} := (-1)^{\pi_g} \cdot \tilde{\tau}_{g,2}$ and $\tau_{g,3} := (-1)^{\pi_g} \cdot \tilde{\tau}_{g,3}$, and $\mathcal{L}_{g,0,0} \leftarrow \text{Dec}(W^0_\alpha, \tau_{g,1}) + \text{Dec}(W^0_\beta, \tau_{g,2}) + \widehat{\text{Dec}}(W^0_\alpha, W^0_\beta, \tau_{g,3}) + \text{PRF}(K, g)$. (e) Compute the 0-label on the output wire $W^0_g = \mathcal{L}_{g,0,0} - (-1)^{\pi_g} \cdot z_{g,0,0} \cdot \Delta$. 5. Output a garbled circuit $F = (\text{params}, K, \{\pi_i\}_{i \in \text{Inputs}(f)}, E_0, E_1)$ and encoding information $e = (\{W^0_i\}_{i \in \text{Inputs}(f)}, \Delta)$. > In the BitVM3 setting, the $\text{Garble}(1^\lambda, f)$ algorithm is executed by the operator during setup and outputs the commitment of the final labels of the output wire. The correctness of this algorithm is guaranteed by the cut-and-choose method. $\text{Encode}(e,x)$: The garbler computes $\pi_i := \text{LSB}(W^0_i)$ and $X_i := W^0_i + (-1)^{\pi_i} \cdot x_i \cdot \Delta$ for each $i \in \text{Inputs}(f)$ and outputs $X = \{X_i\}_{i \in \text{Inputs}(f)}$. > In the BitVM3 setting, the $\text{Encode}(e,x)$ algorithm is used by the operator during the assert stage, i.e., when a challenger submits an on-chain challenge transaction. $\text{Eval}(X)$: The evaluator receives $\text{params}, E_0, E_1, K, \{\pi_i\}_{i \in \text{Inputs}(f)},$ and input wire labels $X = \{X_i\}_{i \in \text{Inputs}(f)}$. Additionally, the evaluator could learn each wire value $\{m_i\}_{i \in \text{WireIndex}}$ through running the circuit locally. The evaluator performs the following steps: 1. For each wire index $i \in \text{Inputs}(f)$, set $W_i := X_i$. 2. For each gate $g \in \text{GateIndex}(f)$ in topological order, set $(\alpha, \beta) \leftarrow \text{Gateinputs}(f, g)$ as the input index, and then do the following: (a) For each $i, j \in \{0, 1\}$, homomorphically compute $\tilde{\tau}_{g,i,j} = G(g, \pi_\alpha \oplus E_i, \pi_\beta \oplus E_j)$. (b) Homomorphically compute HE ciphertexts $\tilde{\tau}_{g,1} := \tilde{\tau}_{g,1,0} - \tilde{\tau}_{g,0,0}, \tilde{\tau}_{g,2} := \tilde{\tau}_{g,0,1} - \tilde{\tau}_{g,0,0}, \tilde{\tau}_{g,3} := \tilde{\tau}_{g,0,0} + \tilde{\tau}_{g,1,1} - \tilde{\tau}_{g,1,0} - \tilde{\tau}_{g,0,1}$. $c$ Run $\tilde{W_g} \leftarrow \text{Dec}(W_\alpha, \tilde{\tau}_{g,1}) + \text{Dec}(W_\beta, \tilde{\tau}_{g,2}) + \widehat{\text{Dec}}(W_\alpha, W_\beta, \tilde{\tau}_{g,3}) + \text{PRF}(K, g)$. $\pi_g = \text{LSB}(\tilde{W_g}) \oplus m_g$, where $m_g$ is the value of the output wire. (d) Compute $\tau_{g,1} := (-1)^{\pi_g} \cdot \tilde{\tau}_{g,1}, \tau_{g,2} := (-1)^{\pi_g} \cdot \tilde{\tau}_{g,2}$ and $\tau_{g,3} := (-1)^{\pi_g} \cdot \tilde{\tau}_{g,3}$. Run $W_g \leftarrow \text{Dec}(W_\alpha, \tau_{g,1}) + \text{Dec}(W_\beta, \tau_{g,2}) + \widehat{\text{Dec}}(W_\alpha, W_\beta, \tau_{g,3}) + \text{PRF}(K, g)$. 3. Output $Y = \{W_i\}_{i \in \text{Outputs}(f)}$. > In BitVM3, $\text{Eval}(X)$ is executed by the challenger during the challenge stage. The output label commitment is set in the slash script during setup and is spendable only if the challenger computes the 0-label. ## Correctness Analysis In this section, we analyze the correctness of our scheme. Specifically, we show that for any input value $i, j \in \{0, 1\}$ and gate $g$, only two labels, 0-label $W^0$ and 1-label $W^1$, could be produced, and the following equation always holds: $$\tag1 W^1 = W^0 + (-1)^\pi \cdot \Delta $$ , where $\pi = \text{LSB}(W^0)$. The garbler algorithm trivially satisfies this property by construction. For the evaluator, the equation (1) is satisfied for input wires by definition. We prove by induction that if equation (1) holds for the input wires of every gate, then only two labels are produced on the output wire, and equation (1) continues to hold for the output wire as well. To analyze this, we rewrite the Line 2(d) of the $\text{Eval(X)}$ algorithm for any input $i, j \in \{0, 1\}$: $$ \begin{align} W_g^{z_{g, i\oplus\pi_\alpha, j\oplus\pi_\alpha}} &= \text{Dec}(W_\alpha^i, \tau_{g,1}) + \text{Dec}(W_\beta^j, \tau_{g,2}) + \widehat{\text{Dec}}(W_\alpha^i, W_\beta^j, \tau_{g,3}) + \text{PRF}(K, g) \\ \tag2 &= \text{Dec}(W_\alpha^{\pi_\alpha} + (i\oplus\pi_\alpha)\cdot\Delta, \tau_{g,1}) + \text{Dec}(W_\beta^{\pi_\beta} + (j\oplus\pi_\beta)\cdot\Delta, \tau_{g,2}) + \\ &\quad\quad \widehat{\text{Dec}}(W_\alpha^{\pi_\alpha} + (i\oplus\pi_\alpha)\cdot\Delta, W_\beta^{\pi_\beta} + (j\oplus\pi_\beta)\cdot\Delta, \tau_{g,3}) + \text{PRF}(K, g) \\ \end{align} \\ \\ $$ Equation (2) follows from the definition and the inductive assumption that equation (1) holds for the input wires. For example, if $i=1, \pi_\alpha=0$, then $W_\alpha^1 = W_\alpha^0 + \Delta$. The same logic applies for other cases. Now, consider two cases: - If $i\oplus\pi_\alpha =0$ and $j\oplus\pi_\beta = 0$, then $$ W_g^{z_{g, 0, 0}} = \text{Dec}(W_\alpha^{\pi_\alpha}, \tau_{g,1}) + \text{Dec}(W_\beta^{\pi_\beta}, \tau_{g,2}) + \widehat{\text{Dec}}(W_\alpha^{\pi_\alpha}, W_\beta^{\pi_\beta}, \tau_{g,3}) + \text{PRF}(K, g) $$. - If $i\oplus\pi_\alpha =1$ and $j\oplus\pi_\beta = 0$, then $$ \begin{align} W_g^{z_{g, 1, 0}} &= \text{Dec}(W_\alpha^{\pi_\alpha}+\Delta, \tau_{g,1}) + \text{Dec}(W_\beta^{\pi_\beta}, \tau_{g,2}) + \widehat{\text{Dec}}(W_\alpha^{\pi_\alpha}, W_\beta^{\pi_\beta}, \tau_{g,3}) + \text{PRF}(K, g) \\ \tag3 &= \text{Dec}(\Delta, \tau_{g,1}) + W_g^{z_{g, 0, 0}} \\ \tag4 &= \text{Dec}(\Delta, (-1)^{\pi_g}(\tilde{\tau}_{g,1,0} - \tilde{\tau}_{g,0,0})) + W_g^{z_{g, 0, 0}} \\ \tag5 &= (-1)^{\pi_g}(z_{g,1,0}-{z_{g,0,0}}) \cdot \Delta + W_g^{z_{g, 0, 0}} \\ \end{align} $$ . Equation (3) holds by the distributed decryption property of the SWHE scheme. Equations (4) and (5) follow by substituting the definition of $\tau_{g,1}$. For any $z_{g,1,0}, z_{g,0,0} \in \{0, 1\}$, it is straightforward to check that the difference between $W_g^{z_{g,1,0}}$ and $W_g^{z_{g,0,0}}$ is always either $\Delta$ or $0$ and the equation (1) holds for $z_{g,1,0}, z_{g,0,0}$. To conserve space, proofs for the remaining two cases are left to the reader. ## Security Argument To ensure the security of BitVM3, the garbled circuit (GC) must satisfy two key properties. First, the operator must be unable to modify the input labels after the setup stage. This is trivially guaranteed by committing to the labels within the assert script. Second, the evaluator must not learn both the 0-label ($W_0$) and the 1-label ($W_1$) for any wire $W$; otherwise, the evaluator could slash a correct input. We demonstrate that our scheme satisfies the second property via a standard reduction argument. Suppose, for contradiction, that an efficient adversary $\text{Adv}$ exists that can recover both $W_0$ and $W_1$ for any wire $W$. We show that this would imply an efficient algorithm for breaking the GLWE (Generalized Learning With Errors) assumption. Specifically, we construct a reduction algorithm $R$ that, given $(E_0, E_1)$, outputs $\Delta$. From the perspective of $\text{Adv}$, the distributions of auxiliary variables and those in the real protocol are identical or computationally indistinguishable. Algorithm $R$ proceeds as follows: 1. Randomly select $X_{i \in \text{Inputs}(f)} \leftarrow \mathcal{R}_q$, where the $R_q$ is ring polynomial space in the RLWE setting, and let $X = \{X_i\}_{i \in \text{Inputs}(f)}$. 2. Randomly select $m_{i \in \text{Inputs}(f)} \leftarrow \{0, 1\}$ as the input of the garbled circuit. 3. For $i \in \text{Inputs}(f)$, set $\pi_i = \text{LSB}(X_i) \oplus m_i$. Randomly select $K \leftarrow \{0, 1\}^\lambda$. 4. Run $(W_0, W_1) \leftarrow \text{Adv}(E_0, E_1, K, \{\pi_i\}_{i \in \text{Inputs}(f)}, X)$. 5. Output $\Delta = (1-2*\text{LSB}(W_0))(W_1 - W_0)$. If $\text{Adv}$ can efficiently recover both $W_0$ and $W_1$, then $R$ can extract the secret $\Delta$, thus breaking the GLWE assumption. Therefore, under the hardness of GLWE, no efficient evaluator can obtain both labels for any wire, thereby satisfying the required security property. ## Performance Analysis In this section, we analyze the computational and communication costs of the $\text{Garble}(1^\lambda, f)$ and $\text{Eval}(X)$ algorithms to demonstrate the practicality of our scheme. We consider a realistic scenario in which the SNARK verifier circuit comprises 10 billion gates, with a proof size of 128 bytes and a 20-byte public input. For parameter selection, we use the following definition with 80-bit security showing by [lattice-estimator](https://github.com/malb/lattice-estimator), as adopted by the Fast Fully Homomorphic Encryption Library [TFHE](https://github.com/tfhe/tfhe). According to the construction in the BitGC paper, the matrix dimension is set to $N = 2\ell = 2(\log(q) + 1) = 514$. ``` LWEParameters( n=8192, q=2**256, Xs=ND.DiscreteGaussian(1024.0), Xe=ND.DiscreteGaussian(1024.0), tag="ZeroGC", ) ``` **Communication Cost** The primary communication overhead comes from transmitting $E_0$ and $E_1$, each costing $N^2 \times n = 270$ MB, for a total of approximately $540$ MB. Additionally, each input wire of the circuit requires one bit to indicate the least significant bit (LSB) of the 0-label, and a random key $K$ is included. These items are negligible, so the overall communication cost remains about $540$ MB. Note that this communication cost is independent of the circuit size. If we do not rely on homomorphic multiplication, only one row of the ciphertext is needed, which provides a significant opportunity to reduce communication. However, the SWHE with distributed decryption scheme derived from BitGC requires transmitting GSW–eGSW ciphertexts, incurring roughly 10 GB of public data. We can avoid this by directly sending the switched form of the ciphertext $\tilde{\tau}_{g,i}$, where $i \in \{1,2,3\}$. Each element in $\mathbb{R}_q$ occupies $n \times \log q = 260$ KB. For both $E_0$ and $E_1$, we need 4 elements for correlated-key decryption and 2 elements for standard decryption. There are an additional 4 elements for the key-switching ciphertexts used by correlated-key decryption. In summary, the total communication is $(2 \times (4 + 2) + 4) \times n \times \log q = 4.1$ MB. **Computation Cost** We now analyze the computational complexity. The ciphertext $\tilde{\tau}_{g,i,j}$, $\tilde{\tau}_{g,k}$, where $i, j \in \{0,1\}, k\in \{1,2,3\}$, depends only on the ciphertext $E_0$, $E_1$ and gate type, making these ciphertext amenable to pre-computation rather than per-gate computation. Thus, the dominant computational cost is attributed to the $\text{Dec}$ and $\widehat{\text{Dec}}$ algorithms. Both the garbler and evaluator execute these algorithms at the same frequency: each party runs $\text{Dec}$ four times and $\widehat{\text{Dec}}$ two times per gate. The $\text{Dec}$ algorithm for the SWHE scheme consists of one polynomial multiplication. The $\widehat{\text{Dec}}$ algorithm comprises a maximum of four polynomial multiplications and two key-switching operations (each involving one polynomial multiplication), for a total of six polynomial multiplications in $\mathbb{Z}_q[X]/(X^n+1)$ per invocation. Let $T_q$ denote the time required for a polynomial multiplication in $\mathbb{Z}_q[X]/(X^n+1)$ on a single CPU core. The total per-gate computation time is therefore $4 \times T_q + 2 \times 6 \times T_q$. Using empirical benchmarks (Table 2) from a [high-performance library](https://link.springer.com/chapter/10.1007/978-3-319-29485-8_20) on an Intel Core i7-4650U (2013 MacBook Air), we set $T_q = 31.7\,\mu\text{s}$. The resulting computation time per gate is approximately $500 \mu\text{s}$ using a single CPU core. For a target circuit with 10 billion gates, the total computation time is roughly $56$ days. ### Practical Acceleration Significantly, leveraging GPU acceleration can dramatically speed up computation. Empirical results indicate that a 100x increase in performance is readily achievable, enabling the entire process to finish within a single day. For example, our [NTT-form polynomial multiplication benchmark](https://github.com/wz14/CGBN/blob/zerogc_benchmark/perf_tests/native_256mul.cu)—which multiplies elements in $\mathbb{Z}_q$—was run on a single RTX4090 GPU. The benchmark demonstrates an average cost of just $0.012$ nanoseconds per multiplication, leading to a total time $T_q = n \times 0.012 \text{ ns} = 98 \text{ ns}$, and an overall computation time of approximately 4.3 hours. The primary bottleneck in this computation is the read/write bandwidth of GPU memory. Therefore, further acceleration may be achieved by utilizing GPUs with higher bandwidth, such as the A100 or H100 series.