Scale Semaphore Project Proposal

Team

Brechy (FT)
Keewoo (PT)

Introduction

Semaphore is a zero-knowledge protocol that enables users to prove membership in a group without revealing their identity, allowing for anonymous messaging, voting, and other privacy-preserving applications. However, as group sizes scale to millions of users (such as Worldcoin's 11M+ users), the protocol becomes impractical for client-side implementation due to storage, computation, and bandwidth limitations.

The current solution adopted by large-scale implementations like Worldcoin's World ID is to delegate the proving process to servers, which unfortunately compromises user privacy by requiring the disclosure of user identities to these servers. This project aims to develop and evaluate practical solutions that maintain privacy while enabling the Semaphore protocol to scale to millions+ of users.

A more in depth review and description of this problem and the justification of the current chosen path can be found here.

Planning

Objective

Our objective is to research, develop, and evaluate practical solutions for scaling the Semaphore protocol to support large groups while preserving users' privacy, enabling client-side proving without prohibitive resource requirements.

Rationale

Privacy is a fundamental right that should not be compromised as applications scale. Current implementations of Semaphore for large groups force a tradeoff between privacy and scalability, which contradicts the core purpose of the protocol.

Challenges

Storing full Merkle trees for large groups (500MB+ for 16M users) is impractical for most clients
Requesting specific Merkle paths from servers reveals user identity
Current privacy-preserving alternatives like FHE or MPC are computationally expensive

Relevance

Addresses a real limitation in privacy-preserving technologies used by millions of users today
Could enable a new generation of applications with strong privacy guarantees at scale
Bridges the gap between theoretical privacy protocols and practical implementations

LeanIMT

LeanIMT provides a memory-efficient structure for Semaphore's group representation, optimized for dynamic groups. Our solution builds upon this Rust implementation.

Respire

Respire is a PIR scheme designed for small records. It supports batch queries with no offline phase, making it ideal for our Merkle node retrieval approach. A detailed overview was presented in this PSE L&S session.

Implementation

Merkle Forests

Merkle Forests reduce computation by organizing users into manageable sub-trees. Our approach combines this with Private Information Retrieval (PIR) to maintain privacy while enhancing scalability.

Exploration Strategies and Plans

Our primary focus will be implementing and evaluating the Merkle Forests + PIR approach as outlined in the initial exploration path.

Implement mechanism to determine merkle path indices without fetching the entire tree.
Develop functionality to recreate path indices knowing only the amount of leaves.

Deliver - March 24th

Merged in the LeanIMT rust implementation.

Milestone 2 - PIR

Implement Merkle paths batch retrievals based on the Respire implementation.
Focus initially on trees of up to 2^20 leaves

Deliver - April 8th

Merged in semaphore-rs fork.

Milestone 3 - Merkle Forests

Design hierarchical group structure using sub-trees
Implement proof mechanism across sub-trees
Extend the approach to support up to 2^32 total leaves

Deliver - April 23rd

Merged in semaphore-rs.

Milestone 4 - Evaluation

Benchmarking of the complete solution
Optimization based on performance findings

Deliver - May 8th

Document explaining solution and comparison with alternative paths such as TEEs

Why Now?

Recent PIR advances now make efficient retrieval viable
There is a present need for this as seen by the Worldcoin RFP
Current progress in the Semaphore Rust implementation facilitate this work.

Why PSE?

PSE is ideally positioned to lead this research as creator and maintainer of Semaphore. This work would enhance Semaphore's utility and drive wider adoption. The project also aligns with PSE's core objectives of building privacy tools.

Evaluation Criteria

The project could be deemed as successful if the following is achieved:

Privacy Preservation: Maintain full privacy guarantees for all group members with zero identity leakage
Scalability: Support for groups of at least
$2^{20}$ identities (over 1 million members)

Performance

Network: Bandwidth usage compatible with standard mobile networks (< 10MB per proof)
Computation: Proof generation feasible on typical mobile devices
Server Efficiency: Resource requirements allowing for cost-effective deployment

References

Thore Hildebrandt

2025/03/13 13:27:42

Could you provide more detailed descriptions of the deliverables for each milestone, including expected outcomes and any additional info

2025/03/13 13:31:13

he project

To what degree will the semaphore team help with integration and testing of the solution?

Brechy

2025/03/17 16:54:01

Semaphore team will not be involved in either integration or testing, my main communication with them is based on asking guidelines on what approach would be the most useful for them

2025/03/13 13:32:07

he initial exploration pat

I assume this: https://hackmd.io/@brech1/scale-semaphore can you add it as link?

2025/03/13 13:34:10

Milestone

For each milestone can you add an etimated date for delivery?

2025/03/13 13:36:01

Is there a repo you have made public for this already? If not fine, but if so please add it.