# ZK x Bio ## Bio Problems - Data cleanup: no standardization, every institutino uses a different format - Data fragmentation due to privacy: data gated by hositals because they have to be privacy compliant - Data fragmentation due to competitiveness: private companies gate their data because of trade secrets - Not enough data overall: even if we sequence everyone's data there's only 8 billion data points for trillions of features - Maybe alphafold is bottlenecked by compute? Not sure ## ZKP Ideas - Verify computations to train models - Anyone with computation can participate - Genotype -> phenotype model? - Public diagnosis models (like lung X-ray but general diagnosis) - Typically only trained by private company, now we can make public - Second opinion - upload your X-ray, get AI opinion - A sane WebMD - Verify data quality - All data in column X conforms to Y format - Data is linked to a verified source - Federated training - Train model locally, upload weights and proof - PSI to join two databases in a ZK way - Plaid for medical data -- verify you're a member of one medical, kaiser etc. and/or PIR to pull your data from services like one medical without the provider knowing that you connected to a third party service - Verify accurate donated data - Donate emails, pull information from emails and verify that way - Example use case: connect all your 23andme, one medical, amazon purchases data and more together - Safe data store for health/nutrition/dna etc - Donors contribute their health data - Donors get free insights from inferrence (only the donor appreciates this benefits, no mafia going around to collect people's data) - Researchers can make requests for features (e.g "I want to know how many glasses of wine people that match critera X drank last week") and donors can respond - Faster and better way to collect data and draw correlations - Public record database verified by zkp - Incentivize people to index public records, clean, and make them available for discoverability and availability - Proof that verify the source (fetch the public records site, verify that correct data returns) -> "Correctness guarantee" - Marriage and divorce, obiturary, criminal records... - Data marketplace (sources verified zk) ## MPC (multi party computation) - Comparison between data sets - Size of intersection (eg between two hospitals) - Are our targets the same target (eg between drug discovery companies) -> public tool to reduce redundant efforts? - Federated training ## MVP Ideas - Find 2 hospitals, one with data and one with correlations --> calculate the correlation from the data - Data integrity - Minimal public record crawler - Biomarker Consortium --> partner to implement zkp with their member organizaions