# Implementing Lookups in Groth: The second secret ingredient to reduce Bionetta’s constraints
[Illia Dovhopolyi](https://www.linkedin.com/in/illia-d-6118481b4/), [Artem Sdobnov](https://www.linkedin.com/in/artem-sdobnov-824768345/), [Dmytro Zakharov](https://x.com/ZamDmytro)
## Acknowledgement
The UltraGroth proving system explained here is an adapted for real use cases version of the protocol initially described in [Merlin404's article on HackMD](https://hackmd.io/@Merlin404/Hy_O2Gi-h). Huge credit to their work, which inspired much of what follows.
## Prerequisites
We assume you already understand the basics of modern zero-knowledge (ZK) constructions. This article aims to build intuition around the UltraGroth proving system, and deep, formal knowledge of Groth16 isn't strictly necessary. However, familiarity with our previous blogs on the Bionetta framework will be beneficial, as examples here utilize Bionetta circuits.
## **Activation Functions are too expensive**
During the development of Bionetta, one of our most significant challenges was optimizing the constraint cost of activation functions like *ReLU*. Each *ReLU* operation costs at least $253$ constraints due to the bit decomposition of input signals. Considering neural networks rely heavily on non-linear activations, this quickly becomes prohibitively expensive. As an example, consider our `LeakyReLU` implementation:
```rust
template LeakyReLUwithCutPrecision(...) {
signal input inp;
signal output out;
var absBits[253] = abs_in_bits(inp);
signal absInBits[253];
var abs = 0;
var pow = 1;
for (var i = 0; i < 253; i++) {
absInBits[i] <-- absBits[i];
// We need to constraint each bit is either 0 or 1 separately :(
absInBits[i]*(1 - absInBits[i]) === 0;
abs += pow * absInBits[i];
pow += pow;
}
(abs - inp)*(abs + inp) === 0;
// 3 more constraints for max(0, inp) and precision cut
}
```
Our "zkLiveness" model for detecting whether the input photo corresponds to the real person (and not to, say, a photo from someone’s device), requires $3864$ non-linear activations, totaling at least $n \times b \approx 3864 \times 253 = 977592$ constraints.
Initially, reducing constraints here seemed impossible — we couldn't lower the activation count or simplify bit-checking. The further research showed that one of the prominent solutions was introduced in the PlonKish proving systems, where repetitive computation of the same function is verified using lookup tables.
However, we’re sticking with Groth16 because of its other Bionetta-specific benefits like [“zero” constraint linear layers](https://hackmd.io/@rarimo/ByZ6ij2blx). To get the best of both worlds, we developed UltraGroth — a Groth-based proving system that brings lookups to our circuits😎
## **Efficient Lookup Argument**
The idea behind lookup tables is straightforward. Rather than checking each bit individually, we can split activation function inputs into larger chunks and verify them against a predefined lookup table containing all values from $0$ to $2^\text{chunk_size} - 1$.
To do this, let’s first rewrite the `LeakyReLU` function from the previous example so that it splits the input signal into chunks and outputs them so that we can check their range later:
```rust
template LookupReLU(CHUNK_NEW, CHUNK_OLD, CHUNK_SIZE) {
assert (CHUNK_NEW <= CHUNK_OLD);
signal input inp;
signal output out;
// Now the ReLU outputs chunks so we can handle range check later
signal output absInChunks[253 \ CHUNK_SIZE];
var absChunks[253 \ CHUNK_SIZE];
for (var i = 0; i < 253 \ CHUNK_SIZE; i++) {
absChunks[i] = (value >> (i * CHUNK_SIZE)) & ((1 << CHUNK_SIZE) - 1);
}
var abs = 0;
var pow = 1;
for (var i = 0; i < 253 \ CHUNK_SIZE; i++) {
// We don't need to constraint each bit anymore
absInChunks[i] <-- absChunks[i];
abs += pow * absInChunks[i];
pow *= 1<<CHUNK_SIZE;
}
(abs - inp)*(abs + inp) === 0;
// 3 more constraints for max(0, inp) and precision cut
}
```
In the top-level circuit, we combine all chunks from layers with non-linear activations into a single signal vector. Although each activation produces multiple chunks (`CHUNK_FOR_NUM`), their order doesn't matter here:
```rust
var CHUNK_SIZE = 15;
var CHUNK_FOR_NUM = 253 / CHUNK_SIZE;
var CHUNK_NUM = /* sum calculated based on network layers */;
var CHUNK_BUSY = 0;
signal chunks[CHUNK_NUM];
// Some layer with non-linearities
component layer1Comp = LookupEDLightConv2D(...);
layer1Comp.inp <== image;
// Aggregating chunks from activation functions
for (var i = 0; i < layer_size * CHUNK_FOR_NUM; i++) {
chunks[i + CHUNK_BUSY] <== layer1Comp.chunks[i];
}
CHUNK_BUSY += layer_size * CHUNK_FOR_NUM;
```
To complete this optimization, we must ensure each aggregated chunk falls within the expected range. This verification is done through a lookup argument.
Consider a lookup table $L = (\ell_0, \ell_1, \ldots,\ell_n) = (0, 1, \dots, 2^{\text{chunk_size}})$ representing the range $[0; 2^{\text{chunk_size}} - 1]$. Our goal is to prove that each chunk $c_i$ in the vector $C = (c_0, c_1, \ldots, c_m)$ exists in $L$. Thus, all chunk values must fall within the specified range.
To constraint this, we first count how many times each value appears in the chunks, creating a multiplicity vector $M = (m_0, m_1, \ldots, m_n)$, where $m_i$ is the occurrence count of value $i$ in $C$.
Finally, using a random challenge point $\mathsf{rand}$, we perform the following check:
$$
s_1 =\sum_{i=0}^{m} \frac{1}{\mathsf{rand} - c_i}, \quad s_2 = \sum_{i=0}^{n} \frac{m_i}{\mathsf{rand} - \ell_i}, \quad s_1 \stackrel{?}{=} s_2.
$$
If this equality holds, all chunk values are confirmed to be within the lookup table. Detailed reasoning and proof are available in [this paper](https://eprint.iacr.org/2022/1530.pdf).
This is how this check looks inside the *Bionetta* circuit:
```rust
signal input rand;
signal chunks[CHUNK_NUM];
// Aggregating chunks
// Counting m_i coefficients
var m_temp[1<<CHUNK_SIZE];
for (var i = 0; i < CHUNK_NUM; i++) {
m_temp[chunks[i]]++;
}
signal m[1<<CHUNK_SIZE];
m <-- m_temp;
var s1 = 0;
var s2 = 0;
signal inv1[CHUNK_NUM];
signal inv2[1<<CHUNK_SIZE];
signal prod[1<<CHUNK_SIZE];
// Counting s1
for (var i = 0; i < CHUNK_NUM; i++) {
inv1[i] <-- (chunks[i] + rand) != 0 ? 1 / (chunks[i] + rand) : 0;
inv1[i] * (chunks[i] + rand) === 1;
s1 += inv1[i];
}
// Counting s2
for (var i = 0; i < 1<<CHUNK_SIZE; i++) {
inv2[i] <-- (i + rand) != 0 ? 1 / (i + rand) : 0;
inv2[i] * (i + rand) === 1;
prod[i] <== inv2[i] * m[i];
s2 += prod[i];
}
s1 === s2;
```
## Why UltraGroth?
The lookup argument introduced above hinges on a single field element $\text{rand}$. If the prover were free to choose $\text{rand}$, it could maliciously pick a value that makes $\sum_{i=0}^{m} \frac{1}{\text{rand} - c_i} \stackrel{?}{=} \sum_{i=0}^{n} \frac{m_i}{\text{rand} - l_i}$ hold even when some chunk $c_i$ is outside the table $L$.
UltraGroth prevents this by deriving the challenge from a previously committed part of the witness. We split the private witness into two rounds:
| **Round** | **Content** | **Purpose** |
| --- | --- | --- |
| round 1 indexes ($C_1$) | all chunk values $c_i$ (and other data independent of $\text{rand}$) | commitment used to bind randomness |
| round 2 indexes ($C_2$) | the remaining witness ($\text{rand}$, `inv1`, `inv2`) | lookup argument round |
After the prover computes the first commitment $C_1$, the verifier derives $\text{rand} = \text{Hash}(C_1)$.
In Groth16, the verifier checks the proof using a single pairing equation:
$$
e(A, B) = e([\alpha]_1, [\beta]_2) \cdot e(C, [\delta]_2) \cdot e(IC, [\gamma]_2)
$$
In UltraGroth, this equation changes slightly. Instead of a single commitment $C$, the prover sends two commitments: $C_1$ and $C_2$. The verifier updates the pairing check to:
$$
e(A, B) = e([\alpha]_1, [\beta]_2) \cdot e(C_1, [\delta_1]_2) \cdot e(C_2, [\delta_2]_2) \cdot e(IC, [\gamma]_2)
$$
Additionally, the verifier does not accept a random value from the prover. Instead, it derives the randomness on its own: $\text{rand}=\text{Hash}(C_1)$. This derived value is then inserted into the public inputs commitment (i.e., into $IC$) so that it's part of the final verification check.
To make the scheme practical, we modified the fastest open-source Groth16 stack rather than starting from scratch.
- https://github.com/rarimo/ultragroth — UltraGroth prover (fork of https://github.com/iden3/rapidsnark).
- https://github.com/rarimo/ultragroth-snarkjs — UltraGroth zkey and Solidity verifier generator (fork of https://github.com/iden3/snarkjs).
Note that *UltraGroth* is compatible with *Groth16*’s powers-of-tau, so you do not need a new trusted setup.
## **How Many Constraints Does This Save?**
Let's estimate the savings. Assume the model has $n = 3864$ activations, each with a $b = 253$-bit input. Instead of checking each bit separately, we split inputs into chunks of size $\ell = 15$ bits.
In the original method, every activation required $b$ constraints to perform a full bit decomposition. Thus, the total number of constraints was simply $n \times b = 3864 \times 353 = \color{blue}{977592}$ constraints.
In the lookup-based method, the constraint cost comes from two parts. First, we need $2 \times 2^{\ell}$constraints to handle the lookup table inverses sum. Second, we split each activation into $\lfloor b / \ell \rfloor$ chunks, resulting in an additional $n \times \lfloor b / \ell \rfloor$ constraints for the chunk inverses sum. Altogether, the new total constraint number is:
$$
2 \times 2^{\ell} + n \times \lfloor b / \ell \rfloor = 2×2^{15}+3864× \lfloor 253/15 \rfloor =65536+61824=\color{green}{127360}
$$
Thus, we go from $\color{blue}{977592}$ down to $\color{green}{127360}$ constraints. This gives us more than a $7 \times$ improvement and even more for heavier circuits.
Figure 1 shows the number of constraints before and after applying lookup optimizations: the red graph represents the original Groth16 cost rising linearly with the number of activations, while the blue and green graphs correspond to chunk sizes of $14$ and $15$, respectively. Notice how both optimized graphs stay near the bottom of the chart, growing only slightly even as activations increase from $15000$ to $30000$.

**Figure 1.** Number of constraints before and after applying lookup optimization.
## **Benchmarking UltraGroth**
