Introduction to Semacaulk

# Introduction to Semacaulk by Andrija Novakovic, Koh Wei Jie, and Kobi Gurkan Geometry Research (`{andrija, wj, kobi}@`[geometry.xyz](https://geometry.xyz)) ## What is Semacaulk? Semacaulk is a zero-knowledge set membership protocol that works on Ethereum. In Semacaulk, on-chain insertions are cheap (68k gas) and the gas cost of verification (356k gas) is comparable to that of [Semaphore](https://github.com/semaphore-protocol/semaphore/). The source code for Semacaulk can be found [here](https://github.com/geometryresearch/semacaulk). Semacaulk differs from Semaphore in two key ways. First, instead of accumulating identity commitments in a Merkle tree, it updates a KZG commitment. This operations involves BN254 point multiplication and addition, rather than expensive zk-friendly hash operations. Secondly, the underlying proof system is a polynomial interactive oracle proof that combines [Caulk+ lookups](https://eprint.iacr.org/2022/957) and [multipoint opening argument](https://zcash.github.io/halo2/design/proving-system/multipoint-opening.html). While we are currently working on a detailed cryptographic and technical specification of Semacaulk, preliminary notes can be found in these documents: - [The Semacaulk Specification](https://hackmd.io/5A1HcO6IS5qyg9tsROoz_A?view) - [The Multiopen Argument in Semacaulk](https://hackmd.io/D-bL6-oNSbSej7Ao_-9PLA) ## Motivation Semacaulk demnostrates how to use the Caulk techniques for membership proofs in a gas-efficient manner to enable privacy-related applications. We believe these techniques, as are other new proving systems and lookup arguments, can be highly useful, and we hope that Semacaulk shows how to practically use them. Note that the techniques used in Semacaulk do not have to involve an end-to-end custom prover such as used here. We have done this only because we wanted to experiment with improving prover efficiency further. With only small changes, the same gas-efficient membership proof construction can be used with more generic and expressive SNARK-based programs. ## Security The code base has not been audited. We do not recommend its use in production until it has been audited. ## Deployment Before deployment, the following steps must be taken: 1. A phase 1 trusted setup. 2. Computation of commitments to the Lagrange basis polynomials. ### Trusted Setup For the BN254 curve, the output of the [Perpetual Powers of Tau](https://github.com/privacy-scaling-explorations/perpetualpowersoftau) ceremony can be used. Note that the [Aztec Ignition ceremony output](https://github.com/AztecProtocol/ignition-verification/blob/master/Transcript_spec.md) is not sufficient for Semacaulk as only provides 1 `tauG2` point, while Semacaulk requires as many `tauG2` points as the maximum desired capacity of the accumulator. The prover needs to access the `tauG1` and `tauG2` points from the setup. If the capacity of the accumulator is `2 ^ n`, then at least `2 ^ n + 1` G1 points and at least `2 ^ n` G2 points are needed. As such, the [`powersOfTau28_hez_final_12.ptau`](https://github.com/iden3/snarkjs#7-prepare-phase-2) file from the Hermez Network Phase 1 trusted setup provides sufficient G1 and G2 points (4097 and 4096) respectively if the desired maximum capacity of the system is `2 ^ 12`. ### Computation of the commitments to the Lagrange basis polynomials This only has to be done once and takes a small amount of time. For a capacity of `2 ^ 20`, it takes about 5 minutes to generate these commitments. ## Insertions Like Semaphore (v1), an identity commitment is the MiMC7 hash of an identity nullifier and an identity trapdoor. The nullifier hash, which is used to prevent double-signalling on-chain, is the MiMC7 hash of the identity nullifier and the external nullifier. To perform an insertion at index $i$, the user does the following steps: 1. Obtain the KZG commitment to the $i$th Lagrange basis polynomial. 2. Calculate the Merkle path from the hash of this commitment to a Merkle root of all commitments to the Lagrange basis polynomials up to the maximum supported capacity. 3. Submit their identity commitment and Merkle path from (1) and (2) to the smart contract. The smart contract uses the given data (1), (2) and identity commitment to update the KZG commitment of all identity commitments in the system. It is very important to note that the KZG commitment to the identity commitment is not a Merkle tree. The system uses a Merkle tree and verifies the Merkle path from step (2) only to ensure that the value from (1) is valid. Only with valid commitments to the Lagrange basis polynomials is it able to correctly calculate the new KZG commitment to the identity commitments. Since the Merkle tree of commitments to the Lagrange basis polynomials is deterministic and constant, the user may safely use a centralised data provider to obtain the Merkle path if they do not wish to compute it themselves. ### Proving The prover needs the full SRS. For a maximum capacity of `2 ^ 20`, the SRS is 200+ MB compressed and 387MB uncompressed. ## Performance ### Proving time These benchmarks were performed on an Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz machine with 24 GB RAM. They represent the time taken for a native Rust binary (built for `release`) to perform the precomputation and proof generation steps. | Maximum capacity | Precomputation (ms) | Proof generation (ms) | Precomputation + proving | SRS size (uncompressed hex) (MB) | |-|-|-|-|-| | `2 ** 11 = 2048` | `103` | `63` | `166` | `0.78` | | `2 ** 12 = 4096` | `183` | `51` | `234` | `1.6` | | `2 ** 14 = 16384` | `668` | `53` | `721` | `6.1` | | `2 ** 16 = 65536` | `2126` | `50` | `2176` | `25` | | `2 ** 20 = 1048576` | `24333` | `42` | `24375` | `387` | ### Precomputation and updates Caulk employs a precomputation step in order to make the use of it sublinear in the group size. This allows neatly separating the membership proof part of Semacaulk from the nullifier computation, while allowing to blind the precomputed membership proof differently at each use. When a group is updated, precomputation is needed again, so there are a few options as to how to avoid many precomputations in frequently updated groups. These include storing a history of valid group commitments or batching insertions and updating the precomputation after some predefined time slot. This also opens the possibility to use a centralised, but verifiable, service to compute these for them, since a unique feature of the precomputation in Caulk is that a batch can be computed even more cheaply. There are, of course, privacy implications when using a service, but these can also be mitigated using several techniques and trade-offs. Additionally, the precomputation done in Caulk is amenable to efficient updates when the group changes, justifying a higher cost for the initial precomputation. ### On-chain costs An insertion costs around 68k gas. `broadcastSignal()`, which includes proof verification, costs 355k gas. By contrast, a Tornado Cash deposit (which involves inserting a leaf to a Merkle tree) costs [907787 gas](https://etherscan.io/tx/0x6f60a4aa7058dab153a859adfb139362d4bc395145528371ed90b127e528c7e7) and a withdrawal (which involves a Groth16 verification step) costs [327188 gas](https://etherscan.io/tx/0xf2eb3005bf1d1866b4778d6b3686aaed64f8c6b015d2e998855598226223b613). ## Future work We intend to release the Semacaulk source code as an open source (MIT-licensed) proof of concept, and encourage the community to adopt and develop it towards new directions. There is a great deal of room for improvement in various areas, including but not limited to: - Prover performance optimisations - Browser compatibility, such as via WASM - Gas optimisations - Use-case development - Tooling development