Try   HackMD

Semaphore: A Technical Overview

Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Abstract

Semaphore is a zero-knowledge (zk) protocol that allows users to prove their membership in a group and send messages such as votes or feedback without revealing their identity.

It also provides a simple mechanism to prevent double-signaling, which means that the same proof cannot be verified twice.

Motivation

Privacy remains a significant challenge in the digital world, with existing solutions often being limited, hard to extend, and overly complex. These limitations make it difficult to create privacy-preserving applications and prevent users from securely interacting without exposing their identities. Semaphore addresses these issues by enabling the sharing of anonymous messages, solving the critical need for privacy while maintaining transparency.

System Components

Implementations MUST provide:

1. Identity

An identity serves as a unique identifier for users.

2. Group

A group is a collection of identities represented in a structured format.

3. Anonymous Proof of Membership

An anonymous proof of membership enables users to prove their inclusion in a group without revealing their identity.

4. Nullifier

A value designed to be a unique identifier for the zk proof. It is generated using the identity.

5. Anonymous Message

After joining a group (an identity is added to a group) and proving that they are part of it, users can also send an anonymous message.

Interaction of System Components

Semaphore Flow:

  1. The user generates an identity.
  2. The identity is added to the group.
  3. An anonymous membership proof can be generated using the identity and group.
  4. A nullifier is generated to uniquely represent the user's zk proof while ensuring anonymity.
  5. The user can attach an anonymous message (such as a vote, text message).
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Implementation

There are some Semaphore implementations such as Semaphore v3 and Semaphore v4 both developed by PSE. While they follow the same system components, they have differences. To learn more about the differences, you can read Appendix A: Semaphore v3 and v4 implementations.

The implementation section will refer to Semaphore v4 implemented by PSE. The same logic can be applied to implement Semaphore v4 with other tech stacks such as Noir or Rust.

Semaphore's zk functionality is implemented using Circom + Snarkjs + Groth16.

  • Circom: Used to write the circuit and generate its R1CS constraint system.
  • Snarkjs: Used to generate zk-artifacts as well as to create and verify zk-SNARK proofs.
  • Groth16: Used as a proving system.

Semaphore Identity

The identity of a user in the Semaphore protocol. A Semaphore identity consists of an EdDSA public/private key pair and a commitment. Semaphore uses an EdDSA implementation based on Baby Jubjub and Poseidon.

Identity Commitment

The Identity Commitment is the public Semaphore identity value used in Semaphore groups. Semaphore uses the Poseidon hash function to create the identity commitment from the Semaphore identity public key.

Semaphore Group

A Semaphore group is a Merkle tree in which each leaf is an identity commitment for a user.

Semaphore uses the LeanIMT implementation, which is an optimized binary incremental Merkle tree. The tree nodes are calculated using Poseidon. To learn more about it you can read the LeanIMT paper.

Circuit

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Private Inputs

  • merkleProofLength: Length of the Merkle Proof (Siblings Length) used to calculate the Merkle Root.
  • merkleProofSiblings: Merkle Proof Siblings used to calculate the Merkle Root.
  • merkleProofIndices: Merkle Proof Indices used to calculate the Merkle Root.
  • secret: The secret is the scalar generated from the EdDSA private key. Using the secret scalar instead of the private key allows this circuit to skip steps 1, 2, 3 in the generation of the public key defined in RFC 8032, making the circuit more efficient and simple. See the Semaphore identity package to know more about how the identity is generated.

Public Inputs

  • message: The value the user shares when voting, confirming, sending a text message, etc.
  • scope: A value used like a topic on which users can generate a valid proof only once. The scope is supposed to be used to generate the nullifier.

Circuit Operations

  1. EdDSA public key generation: The EdDSA public key is derived from the secret using Baby Jubjub. The public key is a point with two coordinates.

  2. Identity Commitment generation: Calculate the hash of the public key. This hash is the Identity Commitment.

  3. Proof of membership verification: The Merkle root passed as output must be equal to that calculated within the circuit through the inputs of the Merkle proof.

  4. Nullifier generation: The nullifier is generated by calculating the hash of the scope and the secret.

  5. Dummy Square: As the message is not really used within the circuit, the square applied to it is a way to force Circom's compiler to add a constraint and prevent its value from being changed by an attacker. More information in this article by the Geometry team. This dummy square is tied to Circom and may not be necessary when using other technologies.

Outputs

  • merkleRoot: Merkle Root of the LeanIMT.
  • nullifier: A value designed to be a unique identifier for the zk proof. It is used to prevent the same zk proof from being used twice. In Semaphore, the nullifier is the hash of the scope and secret value of the user's Semaphore identity.

When using the same scope for an identity, the resulting nullifier remains the same because the same hash is generated. To obtain different nullifiers for the same identity (allowing users to share multiple zk proofs) users must use a different scope each time.

Proof Generation

Snarkjs is used to generate the proof.

const { proof, publicSignals } = await groth16.fullProve(
  {
    secret: identity.secretScalar,
    merkleProofLength,
    merkleProofIndices,
    merkleProofSiblings,
    scope: hash(scope),
    message: hash(message)
  },
  wasm,
  zkey
)

return {
  merkleTreeDepth,
  merkleTreeRoot: merkleProof.root.toString(),
  nullifier: publicSignals[1],
  message: message.toString() as NumericString,
  scope: scope.toString() as NumericString,
  points: packGroth16Proof(proof)
}

To learn more about the proof generation, see the proof generation code.

Semaphore Proof Example

{
  merkleTreeDepth: 10,
  merkleTreeRoot: '4990292586352433503726012711155167179034286198473030768981544541070532815155',
  nullifier: '17540473064543782218297133630279824063352907908315494138425986188962403570231',
  message: '32745724963520510550185023804391900974863477733501474067656557556163468591104',
  scope: '37717653415819232215590989865455204849443869931268328771929128739472152723456',
  points: [
    '21668337069844646813015291115284438234607322052337623326830707330064154913250',
    '5484905467799538881631237123282286864306155680753671338313686933143657835972',
    '16129789229127169079253218689550197285028424883172925653046098078118792423164',
    '20777706122379854993524659601832014684665489694335277047215897593373874956681',
    '6697558559751679943942291885282718275907555268106795371542167431979105110434',
    '19709269142703129641057076037387702381970592578248722843989118216760760132874',
    '17493422037248079872314969622558990504818232868144223100447800353776555945950',
    '20398320346518400096197920333973312490517624241764728676355205902471625807914'
  ]
}

Proof Verification

Proof verification works both off-chain and on-chain (on EVM-compatible chains where the Groth16 Semaphore verifier can be deployed).

Semaphore v4 is currently deployed on many networks, the full list can be found in the Semaphore Docs.

Off-chain

Snarkjs is used to verify the proof.

return groth16.verify(
  verificationKey,
  [merkleTreeRoot, nullifier, hash(message), hash(scope)],
  unpackGroth16Proof(points)
)

To learn more about the proof verification in TypeScript/JavaScript, see the proof verification code.

On-chain

Snarkjs is used to generate a Solidity verifier per tree depth. Then, all verifiers are merged into a single one. To know more about the verifier see the Semaphore Verifier.

return
    verifier.verifyProof(
        [proof.points[0], proof.points[1]],
        [[proof.points[2], proof.points[3]], [proof.points[4], proof.points[5]]],
        [proof.points[6], proof.points[7]],
        [proof.merkleTreeRoot, proof.nullifier, _hash(proof.message), _hash(proof.scope)],
        proof.merkleTreeDepth
    );

Implementation Notes

These are the key components of Semaphore v4's implementation. When implementing Semaphore v4 with a different tech stack, these components should be rewritten, updated, or reused as needed.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
  • The Identity implementation SHOULD be compatible across all the other components: Group, Circuit, Proof and Smart Contracts.
  • The Group implementation SHOULD be compatible across all the other components: Identity, Circuit, Proof and Smart Contracts.
  • Identity generation SHOULD be performed off-chain, in the user's device.
  • Semaphore proof generation SHOULD be performed off-chain, in the user's device.

For the implementation of Semaphore v4 it is RECOMMENDED to:

  • Use EdDSA in the Identity component because it is a zk-friendly signature scheme. This allows the creation and verification of signatures using Semaphore identities.
  • Use LeanIMT in the Group component because it provides an efficient way to prove and verify that a leaf is part of a tree (i.e., that an identity is part of a group).
  • Use the Poseidon hash function in the circuit because it is a zk-friendly hash function.

Limitations of the implementation

  • Scalability: As the number of members grows, generating a client-side Merkle proof is no longer feasible. It’s necessary to have a server to do it. This not only is time and data consuming, but could also allow the server to deanonymize the proofs.
  • On-chain costs: As the tree grows on-chain, the gas cost of insertions also increases.
    • Possible solution: A data structure similar to the Merkle Forest.
  • zk-artifacts: People need to download zk-artifacts to generate the zk proof and these have a couple of megabytes which can be an issue with very slow internet connection.
    • Possible solution: Use a proving system that does not require downloading the artifacts.

Benchmarks

Benchmarks for the LeanIMT can be found in the LeanIMT paper.

Benchmarks for Semaphore are available in the Semaphore documentation.

Privacy Guarantees

The protocol MUST guarantee:

  • Private Identity values remain private and controlled by their owners.
  • The identity cannot be linked to the message.

References

  1. Semaphore Whitepaper
  2. Semaphore GitHub organization
  3. Semaphore repository

Appendix A: Semaphore v3 and v4 implementations

Semaphore v4 introduces two significant protocol updates:

The main protocol components, libraries, and contracts remain largely unchanged, with some adjustments to parameters and functions. However, Semaphore v4 is not backward compatible with the Semaphore v3.

LeanIMT

The new data structure improves over the old one in terms of efficiency and storage with two major changes.

  • Zero Hashes:
    In the Incremental Merkle tree used in version 3, when a node has only one child (always the left one), a zero hash is used instead of the missing right node to compute the hash of the parent node.
    In the new data structure when the right node is missing the parent node simply takes the value of the left child node. This means that zero hashes are no longer needed, so it is no longer necessary to pass the tree an initial value to calculate the list of zero hashes to be used for each level of the tree. And the number of hashes to be calculated to insert or update a new leaf is smaller, as there can be nodes with only one child in the path.
  • Dynamic Depth:
    In the Incremental Merkle tree used in version 3, the depth of the tree is defined at the time of initialization, and when a node is inserted, a number of hashes equal to the depth of the tree is always computed.
    In the new data structure, the tree depth is updated as leaves are inserted. This means that when a node is inserted the number of hashes to be computed grows as the tree grows.
    Suppose in the old data structure the tree depth is 16. In that case, it means that from the first to the last leaf, the number of hashes to be computed for insertion is always 16. In contrast, in the new data structure, the first leaf will be the root itself and the tree depth will be 0, the second leaf will be used together with the first one to compute the parent node which will become the new root of the tree with depth 1, and so on.

These two changes together make adding new members to the group much more efficient, reducing both off-chain processing time and on-chain gas costs.

Glossary

Message

In Semaphore, the term message (also known as signal) refers to the value a user shares when performing actions such as voting, confirming, sending a text message, and more.

Signaling

The act of sharing a message (e.g., a text message or vote).

zk-Artifacts

zk-artifacts are the collection of files generated during:

  • Circuit compilation in Circom (e.g., .r1cs, .wasm for circuit definition).
  • Proving system setup (e.g., Groth16), including trusted setup ceremony files such as:
    • Proving and verification keys (.zkey)
    • The verification key in JSON format.

In this document, zk-artifacts specifically refer to:

  • .wasm
  • .zkey
  • The verification key file in JSON format.

These zk-artifacts are used for:

  • Generating zero-knowledge proofs (.wasm, .zkey).
  • Verifying zero-knowledge proofs (verification key in JSON format).

EVM-Compatible Chains

EVM-compatible chains are blockchain networks that support the Ethereum Virtual Machine (EVM), allowing them to run Ethereum-based smart contracts and decentralized applications (dApps) without modification. These chains maintain compatibility with Ethereum’s tooling, such as wallets, development frameworks, and infrastructure, enabling seamless deployment and interoperability across ecosystems.

zk-friendly

zk-friendly refers to functions or data structures optimized for efficient computation in zero-knowledge proofs. They reduce constraints in proving systems like zk-SNARKs, lowering costs and proof sizes. Examples include Poseidon (hash function) and LeanIMT (data structure).ˇ