For Yuriko - HackMD

# Typos _I'm not a native speaker, but these are potential typos I think_ ```diff= - To summerize, I prototyped a system that allows mutually + To summarize, I prototyped a system that allows mutually ``` ```diff= - I see a structual priviledge here. Data tends to + I see a structual privilege here. Data tends to ``` ```diff= - Use case 1: Crowdsourced heathcare data analysis platform + Use case 1: Crowdsourced healthcare data analysis platform ``` ```diff= - And here is a new idea to gain even more efficiency utilizng veirifiability: + And here is a new idea to gain even more efficiency utilizing verifiability: ``` # Commentary Given the vast impact AI and machine learning have had in our every day lives (in the past 15 years, and especially the past 5 years), it is important to "get ahead of the curve" when it comes to dataset availability and training for these models. Indeed, implicit biases in the datasets may yield results that, interpreted by an "expert" without much sociopolitical context, result in conclusions that can harm lesser-represented communities. There is not much I can add but to highlight that individuals must adopt a collaborative mindset with their data, given that now we have privacy-preseving (_push style_). Some may argue that these efforts are in vain, as there is a lack of incentives for this open collaboration. Nonetheless, non-incetive based initiatives such as the Open Source Initiatives (est. 1998) are very much alive and have significantly shaped software for the past 25 years. An open-data tailored initiative backed by the current privacy preserving technologies is very much possible (and necessary!). Biology (_this includes many subfields. My expertise lies in molecular biology, hence what comes next is specific to this subfield but I believe it can be applied to the entire field without loss of generality_) has had vast breakthroughs in the past 30 years, shifting it from a mere "collect, test and conclude" type of discipline into a data-generating monster. Scientists in the field do push data constantly into organized silos (https://www.ncbi.nlm.nih.gov/ , https://gisaid.org/ , https://www.genome.jp/kegg/ , https://parts.igem.org/Main_Page ...) without incentive _(well, maybe sometimes under the coercion of journals; which have the pre-requisite of open data availability. But the way journals have twisted and monetized scientific research is a topic on its own ;) )_ but keep many details for themselves (to stay competent in this "_publish-or-perish_" paradigm). Thus, there is no "true openness" when it comes to data, reserving this way quality research for preivileged research groups located in privilieged countries. Things can get worse when talking about more sensitive data, like **genetic data**. See, I believe every human being has the **right** to **genetic privacy**. This is difficult to achieve in practice because as biological entities, we're constantly spewing around or DNA. The question goes more in the direction of: _If I get my genome sequenced, who stores it? Who owns it? Who can have access to it?_ Not having control over that is basically **exposing** your biological integrity to anyone (_diseases you may suffer, diseases you could develop in the future..._). In a distopian reality, this could mean the end of your individual freedom, as society could sistematically ostracise "unfit" individuals. 1997's blockbuster GATTACA (https://en.wikipedia.org/wiki/Gattaca) presented this idea. Back then, the human genome sequencing project was still running (it didn't complete until 2003), and the costs of it were vast (measured in millions of \$); thus GATTACA was an interesting take of a _future that lies too far ahead_. Almost 30 years later, sequencing a human genome is affordable (less than $1000), and some companies even offer "_ancestry services_" (heard of https://www.23andme.com/en-int/ ? _would you send a DNA sample to a company? Who is ensuring they won't sell my data to someone else?_). We're there, in a reality that my or may not develop into the distopia presented by GATTACA. On another hand, we **need** this information to be shared for the benefit of research; maybe you or me have the key to cure infant leukemia in our genomes. And we may never know! The only way to enable this collaboration in an ethical and responsible way is through individual open contributions, backed by privacy preserving technologies. So, it is crucial we coordinate initiatives of open collaboration while also preserving our individual privacy and integrity. We are shaping the world with data, and the world in turn shapes us, and those who will inherit it after we're long gone.