# Goblin Plonk: lazy recursive proof composition Author: Zac (Aztec) ###### tags: recursion [Similar document with diagram is here](https://hackmd.io/@aztec-network/BkGNaHUJn/%2FdUsu57SOTBiQ4tS9KJMkMQ) This book is a specification for a method of performing efficient recursive proof composition in Plonk/Honk/PlonkISH proof systems. The core technique involves using cycles of curves to avoid having to perform expensive non-native group arithmetic ala Halo2. We extend this concept by introducing an Instruction Machine to delegate expensive non-native group operations. We use aggregation techniques to combine the non-native group operation instructions from multiple proofs into a single set of instructions. We utilize a curve transposition circuit to efficiently transform non-native group operation instructions into native group operation instructions. We define an Elliptic Curve VM circuit to prove the correctness of the instruction transcript. Finally, we define a Non-native field VM circuit to perform non-native field operations required by the curve transposition circuit. The main goal is to remove the need to bounce between curve cycles at each layer of recursion, and significantly reduce Prover costs. # What is Goblin Plonk useful for? 1. A resource-constrained Prover needs to construct a ZK SNARK proof containing multiple layers of recursion 2. The Verifier is less sensitive to final proof size (e.g. the output proof will be swallowed by another layer of recursion) 3. The Verifier is less sensitive to final (non-recursive) verification costs The above conditions are ideal for ZK rollup architectures. For example, for a privacy-preserving ZK rollup each "transaction" is a user-generated proof-of-correctness of a private transaction. Goblin Plonk can be used to enable multiple layers of user-side recursion. The resulting proof is passed off to a rollup Prover, who has sufficient computational resources to recursively verify the (relatively large) Goblin Plonk proof. # Estimated Costs For the Plonk/Honk (multilinear Plonk ala Hyperplonk. Paper coming soon™) verification algorithm, we can convert every non-native scalar multiplication into 5 non-native field multipliations and 1 native (variable-base) scalar multiplication. Conservative estimate is that our ECC VM can perform a native scalar mul in 64 width-4-UltraPlonk-equivalent constraints. Conservative estimate for the NNF VM is one non-native mul per 10 width-4-UltraPlonk-equivalent constraints. For a Honk proof that requires (conservatively) 64 scalar muls to verify, this equates to 7,296 constraints. Hopefully we can do much better than this. The equivalent cost to brute-force 64 scalar muls with UltraPlonk constraints is close to 300,000 - 400,000. # Notation This document assumes the curve cycle in question is the half-pairing BN254/grumpkin curve cycle. It's a little easier to explain the protocol using explicit curves/parameters. It is trivial to adapt the techniques in this doc to any curve cycle e.g. pasta/vella curves. We use additive notation for elliptic curve group operations e.g. $[A] + x \cdot [B] = [C]$. # Core Techniques --- ## Curve cycles An elliptic curve $A$ is defined over a field $\mathbb{F}_q$. The resulting elliptic curve subgroup has order $p$. If the cofactor of $A$ is $1$, there will always exist a corresponding curve $B$ defined over $\mathbb{F}_p$, whose subgroup has order $q$. A SNARK circuit defined over curve $A$ can efficiently evaluate group operations over curve $B$ and vice-versa. For our planned implementation of Honk, the BN254 and Grumpkin curves form this curve cycle. [See this halo2 article for more information](https://zcash.github.io/halo2/background/recursion.html). ## Drawbacks of curve cycles Verification of a BN254 proof inside a BN254 circuit requires non-native group operations. One solution is to define a Grumpkin circuit that performs BN254 scalar multiplications. The BN254 circuit can now verify the correctness of this Grumpkin proof instead of directly performing non-native group operations. However this approach is not perfect. One must now verify 2 proofs instead of 1 (assuming one is evaluating core protocol logic over 1 curve and uses the cycle curve purely for recursive verification). Each proof requires a nontrivial amount of hashing, particularly if the proof system's IPA uses a sumcheck protocol . In addition, the *field* operations required to verify the Grumpkin proof are non-native operations in the BN254 circuit. It is a non-trivial task to delegate these field operations to a future Grumpkin circuit in the recursion stack. At some point we must link curve A circuit witnesses to public inputs of a SNARK over the cycle curve. This will always require (many) non-native field operations. ## Computation delegation Consider the original problem; we need to verify a BN254 proof in a BN254 circuit. Instead of performing non-native group operations (or verifying the correctness of a Grumpkin proof), there is a third solution. We use a modified lookup table protocol (e.g. plookup) to read from a lookup table. The lookup table contains the inputs/outputs of the *precise* non-native group operations we require. We define the "precomputed" lookup table commitments to be $[Y_0]_{IM}, [Y_1]_{IM}, [Y_2]_{IM}, [Y_3]_{IM}$. These represent commitments that are independent of our existing Plookup protocol commitments. The assumption is that the number of lookup table commitments tracks the "width" of the underlying Proof system (our UltraPlonk/Honk impl has a width of 4) The commitments $[Y_0]_{IM}, [Y_1]_{IM}, [Y_2]_{IM}, [Y_3]_{IM}$ represent commitments to an *Instruction Machine Transcript*. The Prover must eventually produce a proof of correctness of the transcript. We will use a curve transposition protocol to convert these commitments into commitments over the *Grumpkin* curve. The resulting circuit that proves the transcript results (and evaluates the required group arithmetic) will subsequently be defined over the Grumpkin curve and only require native (variable-base) group arithmetic. ## Transcript aggregation To maximise Prover efficiency, we wish to have to execute the curve transposition protocol once for a set of recursive proofs. Additionally we only want to compute a proof of correctness of the transcript once. i.e. recursion should not require recursively verification of transcript proofs To acheive this, our recursion step must efficiently aggregate two transcript commitments. To better frame the problem, consider the recursive verification scenario. We have a proof $\pi_i$ representing step $i$ of a recursion stack. We assume $\pi_i$ contains commitments to the previous and current versions of the transcript: $[\vec{Y}]_{IM, i-1}, [\vec{Y}]_{IM, i}$. We also have the degree of the transcript at step $i-1$ ($c_{i-1}$) and the number of transcript entries at step $i$ ($c_{current}$) (e.g. it's part of the verificaiton key). Assuming we know inductively that the transcript at step $i-1$ is correct, we can derive a transcript commitment for proof $\pi_i$. The exact manner of how to do this depends on the representation of the transcript polynomial. To give an example we assume a univariate commitment over a coefficient basis (i.e. what happens in Honk and HyperPlonok). Assuming the first $c_{i-1}$ coefficients of both transcript polynomial sets are identical, the description of the two transcript polynomial sets is the following: $$ \text{For } j \in [0, \ldots, 3]: Y_{i - 1, j}(X) = \sum_{j=0}^{c_{i-1} - 1} y_j X^j \\ \text{For } j \in [0, \ldots, 3]: Y_{i, j}(X) = \sum_{j=0}^{c_{i-1} + c_{current} - 1} y_j X^j \\ $$ i.e. $$ \text{For } j \in [0, \ldots, 3]: Y_{current, j}(X) = \frac{Y_{i, j} - Y_{i-1, j}}{X^{c_{i-1}}} \\ $$ Additionally it must be proven that the degree of $Y_{current, j}(X)$ is $c_{current} - 1$ (or alternatively via in the SNARK IOP, one proves the row values past $c_{current}$ are zero). An example aggregation scheme when using a multilinear Proving system is described in the [aggregation chapter](/03EZcIU1Riq7t-H1uqwNVw). ## Curve transposition Once a set of recursive proofs have been computed, we will be left with aggregated transcript commitments $[Y_1]_{IM}, [Y_2]_{IM}, [Y_3]_{IM}, [Y_4]_{IM}$ over the BN254 curve. These commitments will represent instructions to perform BN254 elliptic curve operations. We wish to provide transcript commitments over the *Grumpkin* curve and prove the equivalence of the commitments over the two curves. We can achieve this using a relatively small amount of non-native field arithmetic. First we take the set of Grumpkin polynomial commitments $[z_0], ..., [z_4]$ (the Grumpkin transcript uses 5 columns, see the [Elliptic Curve VM](/Zs730vdURaOw0n4PsQCsNg) spec for more details). We evaluate the Grumpkin polynomials at a random challenge $\zeta$ and obtain polynomial evaluations $z_0(\zeta), z_1(\zeta), z_2(\zeta), z_3(\zeta), z_4(\zeta)$. There will be a trivial map between the witness encoded in the original transcript commitments $[Y_1]_{IM}, [Y_2]_{IM}, [Y_3]_{IM}, [Y_4]_{IM})$ and the witnesses encoded in the Grumpkin commitments. This requires all witnesses are less than the subgroup orders for both curves. This will require a range check. We can use a BN254 SNARK circuit to derive Grumpkin polynomial coefficients from the BN254 transcript commitments. We then *directly evaluate* the claimed Grumpkin polynomials in the Grumpkin field, within a BN254 SNARK circuit. This requires non-native field arithmetic. N.B. the range check to require the witnesses to be smaller than the group order can be acquired "for free" as the non-native field arithmetic component will apply appropriate range constraints on the inputs. ## Instruction Machines We require two bespoke circuits that perform non-native field operations and elliptic curve group operations respectively. The former will be constructed/proved over the BN254 curve. The latter over the Grumpkin curve. For efficiency reasons we want to architect these circuits as *virtual machines*. Instead of defining simple conditions that must be true (e.g. "$a * b == c \text{ mod p }$" or "$[P] == s * [Q]$"), it is more Prover-efficient to have a circuit with updateable inernal state. For example "compute $a * b \text{ mod p}$ and add result into the accumulator", or "compute $s * [Q]$ and add result into the accumulator". This approach reduces the amount of information required in the instruction transcript. This is valuable because each transcript polynomial degree requires a non-native field multiplication to evaluate. For Goblin recursion, the only instructions required by this "Instruction Machine" are BN254 group operations (including scalar mul ops). However we can extend the concept and add additional useful instructions into the IM that will improve Prover efficiency (e.g. SHA256, non-native field arithmetic not related to BN254 group operations). Each instruction type can then be evaluated by a single purpose-built bespoke circuit at the end of the recursion stack.