# Scale Semaphore Project Proposal ### Team - [Brechy](https://github.com/brech1) (FT) - [Keewoo](https://keewoolee.github.io) (PT) ## Introduction [Semaphore](https://docs.semaphore.pse.dev/) is a zero-knowledge protocol that enables users to prove membership in a group without revealing their identity, allowing for anonymous messaging, voting, and other privacy-preserving applications. However, as group sizes scale to millions of users (such as [Worldcoin](https://world.org/blog/worldcoin/world-id-faqs)'s 11M+ users), the protocol becomes impractical for client-side implementation due to storage, computation, and bandwidth limitations. The current solution adopted by large-scale implementations like Worldcoin's [World ID](https://world.org/blog/worldcoin/world-id-faqs) is to [delegate the proving process to servers](https://worldcoin.org/blog/worldcoin/intro-zero-knowledge-proofs-semaphore-application-world-id), which unfortunately compromises user privacy by requiring the disclosure of user identities to these servers. This project aims to develop and evaluate practical solutions that maintain privacy while enabling the Semaphore protocol to scale to millions+ of users. A more in depth review and description of this problem and the justification of the current chosen path can be found [here](https://hackmd.io/@brech1/scale-semaphore). ## Planning ### Objective Our objective is to research, develop, and evaluate practical solutions for scaling the [Semaphore protocol](https://hackmd.io/@vplasencia/B1sCrsoFkg) to support large groups while preserving users' privacy, enabling client-side proving without prohibitive resource requirements. ### Rationale Privacy is a fundamental right that should not be compromised as applications scale. Current implementations of Semaphore for large groups force a tradeoff between privacy and scalability, which contradicts the core purpose of the protocol. #### Challenges - Storing full [Merkle trees](https://github.com/privacy-scaling-explorations/zk-kit/tree/main/papers/leanimt) for large groups (500MB+ for 16M users) is impractical for most clients - Requesting specific Merkle paths from servers reveals user identity - Current privacy-preserving alternatives like [FHE](https://crypto.stanford.edu/pir-library/) or [MPC](https://github.com/privacy-scaling-explorations/private-proof-delegation-docs) are computationally expensive #### Relevance - Addresses a real limitation in privacy-preserving technologies used by millions of users today - Could enable a new generation of applications with strong privacy guarantees at scale - Bridges the gap between theoretical privacy protocols and practical implementations ## Related Works ### LeanIMT [LeanIMT](https://github.com/privacy-scaling-explorations/zk-kit/tree/main/papers/leanimt) provides a memory-efficient structure for Semaphore's group representation, optimized for dynamic groups. Our solution builds upon this [Rust implementation](https://github.com/vplasencia/leanimt-rs). ### Respire [Respire](https://eprint.iacr.org/2024/1165) is a PIR scheme designed for small records. It supports batch queries with no offline phase, making it ideal for our Merkle node retrieval approach. A detailed overview was presented in this [PSE L&S session](https://www.youtube.com/watch?v=Nf4IZ2kTPN4). - [Implementation](https://github.com/AMACB/respire) ### Merkle Forests [Merkle Forests](https://github.com/samzkback/merkle-forest) reduce computation by organizing users into manageable sub-trees. Our approach combines this with [Private Information Retrieval (PIR)](https://crypto.stanford.edu/pir-library/) to maintain privacy while enhancing scalability. ## Exploration Strategies and Plans Our primary focus will be implementing and evaluating the **Merkle Forests + PIR** approach as outlined in the initial [exploration path](https://hackmd.io/@brech1/scale-semaphore). ### Milestone 1 - Blind Merkle Path Retrieval - Implement mechanism to determine merkle path indices without fetching the entire tree. - Develop functionality to recreate path indices knowing only the amount of leaves. #### Deliver - March 24th - Merged in the [LeanIMT rust implementation](https://github.com/brech1/zk-kit.rust). ### Milestone 2 - PIR - Implement Merkle paths batch retrievals based on the [Respire implementation](https://github.com/AMACB/respire). - Focus initially on trees of up to 2^20 leaves #### Deliver - April 8th - Merged in [`semaphore-rs`](https://github.com/brech1/semaphore-rs) fork. ### Milestone 3 - Merkle Forests - Design hierarchical group structure using sub-trees - Implement proof mechanism across sub-trees - Extend the approach to support up to 2^32 total leaves #### Deliver - April 23rd - Merged in [`semaphore-rs`](https://github.com/semaphore-protocol/semaphore-rs). ### Milestone 4 - Evaluation - Benchmarking of the complete solution - Optimization based on performance findings #### Deliver - May 8th - Document explaining solution and comparison with alternative paths such as TEEs ## Why Now? - Recent PIR advances now make efficient retrieval viable - There is a present need for this as seen by the [Worldcoin RFP](https://world.org/rfp) - Current progress in the [Semaphore Rust implementation](https://github.com/semaphore-protocol/semaphore) facilitate this work. ## Why PSE? PSE is ideally positioned to lead this research as creator and maintainer of Semaphore. This work would enhance Semaphore's utility and drive wider adoption. The project also aligns with PSE's core objectives of building privacy tools. ## Evaluation Criteria The project could be deemed as successful if the following is achieved: - **Privacy Preservation**: Maintain full privacy guarantees for all group members with zero identity leakage - **Scalability**: Support for groups of at least $2^{20}$ identities (over 1 million members) ### Performance - **Network**: Bandwidth usage compatible with standard mobile networks (< 10MB per proof) - **Computation**: Proof generation feasible on typical mobile devices - **Server Efficiency**: Resource requirements allowing for cost-effective deployment ## References - [World ID Implementation](https://world.org/blog/worldcoin/world-id-faqs) - [Worldcoin's Use of Semaphore](https://worldcoin.org/blog/worldcoin/intro-zero-knowledge-proofs-semaphore-application-world-id) - [Lean Incremental Merkle Tree (LeanIMT) Paper](https://github.com/privacy-scaling-explorations/zk-kit/tree/main/papers/leanimt) - [LeanIMT Rust Implementation](https://github.com/vplasencia/leanimt-rs) - [Semaphore Circuit](https://github.com/semaphore-protocol/semaphore/blob/main/packages/circuits/src/semaphore.circom) - [Respire PIR Scheme](https://eprint.iacr.org/2024/1165) - [Respire Implementation](https://github.com/AMACB/respire) - [Respire PSE Presentation](https://www.youtube.com/watch?v=Nf4IZ2kTPN4) - [Merkle Forests Implementation](https://github.com/samzkback/merkle-forest) - [Private Proof Delegation Documentation](https://github.com/privacy-scaling-explorations/private-proof-delegation-docs) - [Semaphore v4 Benchmarks](https://github.com/vplasencia/semaphore-benchmarks)