Proof of Concept for a Decentralized, Zero-Knowledge Powered Data Marketplace

## Summary This project aims to develop a proof of concept (POC) for a Zero-Knowledge system, primarily targeted towards scientific use cases. The focus is to partner with credible entities and deploy the system, thereby creating useful ZK primitives and laying a foundation for a diverse set of further use cases in the future. This proposal details the steps, technical and financial support needed, and the milestones to be achieved. ## Project Goals - Develop a POC for a ZK system, with a focus on scientific datasets. - Partner with reputable parties to deploy this system. - Develop useful Zero-Knowledge primitives, such as a ZK stats library, during the process. - Gather and observe its usage to inform further development and potential use cases. ## Proof of Concept The POC is a platform where any user can interact, analyze and gain insights from a dataset through ZK functions, without acquiring details about the dataset itself. This could be useful in several scenarios: - Allow data to be released, cross-checked, examined with the release of any scientific papers - Allow federated training in adversarial scenarios (e.g. trading firms sharing their proprietary data) - Protect the privacy of individuals included in the data, as in the case of healthcare, genotype or pathology data The goal of the POC is to release a minimal ZK platform for a pilot dataset as fast as possible, with a minimal set of features required to gain interesting insights from the dataset or to demonstrate ZK tech’s potential. That includes the following features: - Ability to select a subset of the data through a selector - Ability to run arbitrary computation on a database consisting of N features, in the form of F(c1, c2, c3…, cN) → y, where y is a number. - A “ZK stats” library, consisting of ZK circuits for common statistical tasks: descriptive statistics (mean, media, most frequent value…) and common operations such as correlations - Ability to send computation as a Python program, best in a Jupyter notebook environment (but not strictly necessary for the POC). This is crucial for getting adopted by scientific users - A user-friendly web platform where - Data providers can upload and manage their data, as well as monitor and respond to computing requests - Data consumers can browse data schemas, download test data, write Python programs, send the program to data providers to run and get the results back - A standard for the interface between data providers and consumers, so other datasets can implement their own ZK service ## Future Vision The vision goes beyond creating a single dataset. This project can generalize to a ZK data marketplace platform, allowing anyone to upload their datasets, "sell" their computations, or solicit contributors for their computation project and properly reward them without sharing the data itself. Example of future directions include: - A general-purpose ZK data marketplace, where anyone can “publish” their data - A “matchmaking” engine to suggest collaborations between compatible datasets - A system for open-source collaboration on a dataset, and reward meaningful data contribution or meaningful changes to a model (evaluated by increasing the prediction score) - A platform to merge different data sources to draw meaningful inferences, such as genomic data from 23andme and pathology data to study the connection between genotype and phenotype - Ability to easily combine personal data from various sources such as hospitals, fitness tracking devices like Fitbit, Apple Watch, Strava, etc. and draw personal insights from them ## Timeline The project is projected to span over a period of six months, with the following high-level milestones: Start date: September 11, 2023 - Week 2 (September 25, 2023) - Tech research and evaluation (will produce a writeup) - Low-fidelity web app mockups to help the team align on the functionality of the POC - Low-fidelity data provider experience mockup (same purpose) - Week 4 (October 9, 2023) - First draft of the zk stats wrapper library (basic operations supported, more work to polish and complete the library may arise from here) - Week 6 (October 23, 2023) - Create a backend with one API endpoint to test an end-to-end request - Week 8 (November 6, 2023) - Functional web app for data users that includes - data set list and information - workflow to submit a computation request - the web app will not be polished from a design perspective at this stage - Week 10 (November 20, 2023) - Functional web app for data providers that includes - a way to view and execute computation requests - a way to upload, update, change their data, and submit the commitment for the new data - Week 12 (December 4, 2023) - E2E POC from web app to data provider experience, applied on an example dataset - Week 16 (January 1, 2024) - Working, polished demo for the front end and interface standard between data providers and consumers. - Finalize the data provider partner(s) for the POC - Week 18 (January 15, 2024:) - Agreement with data provider partner(s) about additional features and data we need to support - Week 22 (February 12, 2024) - Complete additional work required for the data provider partner(s) - Onboarding them to the front end, and completion of remaining front end features. - Week 24 (February 26, 2024){} - Finalize all the remaining work required for launch - Launch strategy - Officially launch the POC for users ## Role and Support Needed I will be handling the front-end development and the relationship management with partners. However, to successfully execute this project, we need: - Technical Support: An expert in ZK to lead the development of the ZK stats library and other ZK-related development. - Mentorship: I am eager to contribute to the ZK development, but also want to avoid being a bottleneck, so I would appreciate mentoring support to help me contribute to ZK stats (without leading it)