Try   HackMD

Scale Semaphore Project Proposal

Team

Introduction

Semaphore is a zero-knowledge protocol that enables users to prove membership in a group without revealing their identity, allowing for anonymous messaging, voting, and other privacy-preserving applications. However, as group sizes scale to millions of users (such as Worldcoin's 11M+ users), the protocol becomes impractical for client-side implementation due to storage, computation, and bandwidth limitations.

The current solution adopted by large-scale implementations like Worldcoin's World ID is to delegate the proving process to servers, which unfortunately compromises user privacy by requiring the disclosure of user identities to these servers. This project aims to develop and evaluate practical solutions that maintain privacy while enabling the Semaphore protocol to scale to millions+ of users.

A more in depth review and description of this problem and the justification of the current chosen path can be found here.

Planning

Objective

Our objective is to research, develop, and evaluate practical solutions for scaling the Semaphore protocol to support large groups while preserving users' privacy, enabling client-side proving without prohibitive resource requirements.

Rationale

Privacy is a fundamental right that should not be compromised as applications scale. Current implementations of Semaphore for large groups force a tradeoff between privacy and scalability, which contradicts the core purpose of the protocol.

Challenges

  • Storing full Merkle trees for large groups (500MB+ for 16M users) is impractical for most clients
  • Requesting specific Merkle paths from servers reveals user identity
  • Current privacy-preserving alternatives like FHE or MPC are computationally expensive

Relevance

  • Addresses a real limitation in privacy-preserving technologies used by millions of users today
  • Could enable a new generation of applications with strong privacy guarantees at scale
  • Bridges the gap between theoretical privacy protocols and practical implementations

LeanIMT

LeanIMT provides a memory-efficient structure for Semaphore's group representation, optimized for dynamic groups. Our solution builds upon this Rust implementation.

Respire

Respire is a PIR scheme designed for small records. It supports batch queries with no offline phase, making it ideal for our Merkle node retrieval approach. A detailed overview was presented in this PSE L&S session.

Merkle Forests

Merkle Forests reduce computation by organizing users into manageable sub-trees. Our approach combines this with Private Information Retrieval (PIR) to maintain privacy while enhancing scalability.

Exploration Strategies and Plans

Our primary focus will be implementing and evaluating the Merkle Forests + PIR approach as outlined in the initial exploration path.

Milestone 1 - Blind Merkle Path Retrieval

  • Implement mechanism to determine merkle path indices without fetching the entire tree.
  • Develop functionality to recreate path indices knowing only the amount of leaves.

Deliver - March 24th

Milestone 2 - PIR

  • Implement Merkle paths batch retrievals based on the Respire implementation.
  • Focus initially on trees of up to 2^20 leaves

Deliver - April 8th

Milestone 3 - Merkle Forests

  • Design hierarchical group structure using sub-trees
  • Implement proof mechanism across sub-trees
  • Extend the approach to support up to 2^32 total leaves

Deliver - April 23rd

Milestone 4 - Evaluation

  • Benchmarking of the complete solution
  • Optimization based on performance findings

Deliver - May 8th

  • Document explaining solution and comparison with alternative paths such as TEEs

Why Now?

Why PSE?

PSE is ideally positioned to lead this research as creator and maintainer of Semaphore. This work would enhance Semaphore's utility and drive wider adoption. The project also aligns with PSE's core objectives of building privacy tools.

Evaluation Criteria

The project could be deemed as successful if the following is achieved:

  • Privacy Preservation: Maintain full privacy guarantees for all group members with zero identity leakage
  • Scalability: Support for groups of at least
    220
    identities (over 1 million members)

Performance

  • Network: Bandwidth usage compatible with standard mobile networks (< 10MB per proof)
  • Computation: Proof generation feasible on typical mobile devices
  • Server Efficiency: Resource requirements allowing for cost-effective deployment

References