# ZK x Bio
## Bio Problems
- Data cleanup: no standardization, every institutino uses a different format
- Data fragmentation due to privacy: data gated by hositals because they have to be privacy compliant
- Data fragmentation due to competitiveness: private companies gate their data because of trade secrets
- Not enough data overall: even if we sequence everyone's data there's only 8 billion data points for trillions of features
- Maybe alphafold is bottlenecked by compute? Not sure
## ZKP Ideas
- Verify computations to train models
- Anyone with computation can participate
- Genotype -> phenotype model?
- Public diagnosis models (like lung X-ray but general diagnosis)
- Typically only trained by private company, now we can make public
- Second opinion - upload your X-ray, get AI opinion
- A sane WebMD
- Verify data quality
- All data in column X conforms to Y format
- Data is linked to a verified source
- Federated training
- Train model locally, upload weights and proof
- PSI to join two databases in a ZK way
- Plaid for medical data -- verify you're a member of one medical, kaiser etc. and/or PIR to pull your data from services like one medical without the provider knowing that you connected to a third party service
- Verify accurate donated data
- Donate emails, pull information from emails and verify that way
- Example use case: connect all your 23andme, one medical, amazon purchases data and more together
- Safe data store for health/nutrition/dna etc
- Donors contribute their health data
- Donors get free insights from inferrence (only the donor appreciates this benefits, no mafia going around to collect people's data)
- Researchers can make requests for features (e.g "I want to know how many glasses of wine people that match critera X drank last week") and donors can respond
- Faster and better way to collect data and draw correlations
- Public record database verified by zkp
- Incentivize people to index public records, clean, and make them available for discoverability and availability
- Proof that verify the source (fetch the public records site, verify that correct data returns) -> "Correctness guarantee"
- Marriage and divorce, obiturary, criminal records...
- Data marketplace (sources verified zk)
## MPC (multi party computation)
- Comparison between data sets
- Size of intersection (eg between two hospitals)
- Are our targets the same target (eg between drug discovery companies) -> public tool to reduce redundant efforts?
- Federated training
## MVP Ideas
- Find 2 hospitals, one with data and one with correlations --> calculate the correlation from the data
- Data integrity
- Minimal public record crawler
- Biomarker Consortium --> partner to implement zkp with their member organizaions