Here's [inspiration](http://bactra.org/notebooks/) for what this page (and the more to come) could look like. This is an attempt at digitally documenting tangents and rabbitholes for prosperity.
----
Recently, I joined a working group on campus focused on reforming the scientific publication by improving knowledge search. I'm working on something tangential, but here were some things that were brought up/came to mind.
A side note is that I'm thinking of running a 'Distillathon' while I'm at B21. [Distill](https://https://distill.pub/about/) - the ML journal- is an example of a really great effort in scientific communication. Unfortunately, it shut down in 2021, but I think it could be cool to capture the essence on a smaller-scale: get people together to distill technical topics, try and visualize them beautifully and hopefully the time serves as a solid learning opportunity.
Edit: This is in fact happening now!
-----
This is a collection of things I've noted down before and a list of resources to look back to. I'm sharing it publicly because I like searching up links on the beautiful world web compared to searching for files on my laptop. I say this jokingly, but I take this as my truth for the most part.
****
- [AJ's Metascience Reading List](https://www.ajkourabi.com/writings/metascience-reading-list)
- [A Vision of Metascience (I need to write notes on this later, particularly how they look at funding risky research, etc.)](https://notes.andymatuschak.org/z6yQo2XrLw1uNq8weAsVKEE?stackedNotes=z94Q2sJaKnPaYRgDwcNnJ4X)
- [Andy Matuschak's takes on funding](https://notes.andymatuschak.org/z6yQo2XrLw1uNq8weAsVKEE?stackedNotes=z94Q2sJaKnPaYRgDwcNnJ4X)
**A collection of papers, projects and resources.**
Initiatives
- [IntechOpen Journals](https://www.intechopen.com/)
- [Federation of American Scientists Day One's Project Open Science proposals](https://fas.org/accelerator/open-science/)
- Institute for Replication: https://i4replication.org/games.html
- 50Y Progress Index: https://progress.fiftyyears.com
- Berkeley Initiative for
Transparency in the Social Sciences: https://www.bitss.org
- Open Source Initiative: https://blog.opensource.org
- J-PAL's Data Publication Infrastructure: https://www.povertyactionlab.org/blog/4-5-23/j-pal-dataverse-turns-15-fifteen-uses-published-rct-data-packages
Research Navigation
* Elicit: https://elicit.com/
* Octopus: https://www.octopus.ac
* Exa: https://exa.ai/search
* So much to explore when it comes to search: embeddings, LLM-powered search, etc.
* Google Scholar (Had to mention lol)
What papers could look like:
- Distill Publication: https://distill.pub/about/
- World Models Paper (A model for what scientific papers could look like): https://worldmodels.github.io
Open-Access Journals
- Collective Intelligence (https://journals.sagepub.com/home/col)
- https://www.biorxiv.org
- http://arxiv.org
- https://pubmed.ncbi.nlm.nih.gov
Data and Research Archives
* NBER Public Data Archive: https://www.nber.org/research/data?page=1&perPage=50
* NERC Open Research Archive: https://nora.nerc.ac.uk
* Most government agencies also have public datasets
Building a philosophy
- [A Vision of Metascience](https://scienceplusplus.org/metascience/)
- [Augmenting Human Intellect: A Conceptual Framework](https://numinous.productions/ttft/assets/Engelbart1962.pdf)
- [Strategies for Knowledge Transfer](https://www.proquest.com/docview/2492796982?pq-origsite=gscholar&fromopenview=true)
- [Collaborative Research: Bringing End-to-End Provenance to Scientists](https://www.mtholyoke.edu/~blerner/DataProvenance/)
- [Underpinning EISB with Enterprise Interoperability Neighboring Scientific Domains](https://semanticscholar.org/paper/e98ff1607fc53c708fe203ab7bdbe2e48bd3a111)
- [Building a Better NIH](https://newscience.org/nih/#how-are-indirect-cost-rates-calculated)
### Improving Scientific Data Infrastructure in the U.S.
> [On 11 January 2023 the US White House - joined by 10 federal agencies, a coalition of more than 85 universities, and other organizations — declared 2023 to be a Year of Open Science.](https://www.whitehouse.gov/ostp/news-updates/2023/01/11/fact-sheet-biden-harris-administration-announces-new-actions-to-advance-open-and-equitable-research/)
>
Coordinated action is needed to build the data infrastructure required to advance science at its maximum potential and address societal challenges. The U.S. can become a global leader in open scientific data sharing with prudent policy changes and investments.
Within metascience, thinking about data management appears to be neglected especially when it comes to thinking about how to leverage technical expertise to improve it. There’s teachings from interoperability that are especially relevant to thinking about scientific data infrastructure.
In light of developments in software (AI) and tools that aim to support scientific discovery, we should be thinking more about how to make our data infrastructure as good as possible to take full advantage of what automation can offer.
The scientific process garners a lot of data from experimental data to patent data. For example, [Richard Gold at McGill](https://www.mcgill.ca/law/files/law/gold_baker_evidence-based_policy_jlis_online_version.pdf) looks at this. Overall, we lack standardization in how we capture, organize and share it and this should change.
**Premise**
A lack of coordinated data management impedes scientific progress. We can’t collaborate as efficiently, aren’t able to track what’s being worked on and data science initiatives aren’t as well-resourced as they could be
Immigration options (like the J-1) have an emphasis on exchange, but building out our digital infrastructure helps us think beyond physical exchange and allows room for creativity in thinking about how to support cross-pollination and interdisciplinarity.
There’s a shift towards open-science and UNESCO has published their set of recommendations. Ultimately, there’s policy concerns in regards to dual-use technologies, effects on commercialization and proprietorship when it comes to an open approach to data. However, laying out how we can approach data management can make conversations about open-science and open-access more productive.
Historically, there have been data-intensive collaborative efforts we can look to. These were possible with the progress we’ve made in data-management and future projects rely on progress in the space.
- Human Genome Project
- Wikipedia
- ArXiv, bioRxiv
- NBER Orange Book Dataset (Patent Data from the FDA)
**Why this should be a priority**
- Facilitates multi-institutional/interdisciplinary collaboration
- Allows for replication and validation of research
- Enables metascience, like evaluating intervention effectiveness through replication tests, RWE drug approval, RCTs, etc.
- Supports crisis response by pooling resources
- Advances artificial intelligence applications in science
**A Vision for the Future**
- Open, linked datasets available to all approved researchers
- Secure real-time sharing of intermediate results
- Versioning and provenance tracking for published research
- Role-based access control and cybersecurity safeguards. Security shouldn’t be an after-thought and how do we weigh the trade-offs with having a more public, open ecosystem.
- [The 23andMe data breach reveals the vulnerabilities of our interconnected data](https://theconversation.com/the-23andme-data-breach-reveals-the-vulnerabilities-of-our-interconnected-data-193615?utm_source=twitter&utm_medium=bylinetwitterbutton)
**Policy Recommendations**
- Increased funding for data management R&D
- Cross-institutional working groups to develop best practices
- Outreach to promote cultural change and uptake in the scientific community
Considerations for Implementation
- Balancing openness with security, privacy and commercial interests
- Dual use concerns regarding certain types of data
- Existing commercial products and competition
**Additional**
Relevant researchers in the space.
- Max Langenkamp (Has written about Open-Source ML w/ the economic implications)
- Margo Seltzer (Interopability Researcher)
- Yuri Demchenko (Interopability Researcher)
Cool Reads
<li><a class="c1" href="https://www.google.com/url?q=https://nlmdirector.nlm.nih.gov/2023/06/20/how-interoperability-advances-data-sharing-and-open-science/&sa=D&source=editors&ust=1700075273163907&usg=AOvVaw3XWu-G3c5xLBqQHotZAcXF">How Interoperability Advances Data Sharing and Open Science</a></span></li><li class="c0 c9 li-bullet-0"><span class="c6"><a class="c1" href="https://www.google.com/url?q=https://pubmed.ncbi.nlm.nih.gov/28938910/&sa=D&source=editors&ust=1700075273164289&usg=AOvVaw13UgD_aJELiQVDSWDygcRN">From the NIH: Creating a data resource: what will it take to build a medical information commons?</a></span><span class="c5"> </span></li><li class="c0 c9 li-bullet-0"><span class="c6"><a class="c1" href="https://www.google.com/url?q=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7845503/&sa=D&source=editors&ust=1700075273164650&usg=AOvVaw3kTbSc18RAEYNLQOUWaz8E">Identifying the challenges in implementing open science [version 1; peer review: 2 approved] - PMC</a></span></li><li class="c0 c9 li-bullet-0"><span class="c6"><a class="c1" href="https://www.google.com/url?q=https://pubmed.ncbi.nlm.nih.gov/33843054/&sa=D&source=editors&ust=1700075273164998&usg=AOvVaw0XuWEjzNVCjov-R0Bphj6y">Sharing biological data: why, when, and how</a></span></li><li class="c0 c9 li-bullet-0"><span class="c6"><a class="c1" href="https://www.google.com/url?q=https://www.ncbi.nlm.nih.gov/books/NBK218250/&sa=D&source=editors&ust=1700075273165355&usg=AOvVaw1bitTM_bXX9HmoE6a6pn1x">The Collection, Analysis, and Distribution of Information and Materials - Mapping and Sequencing the Human Genome - NCBI Bookshelf</a></span></li><li class="c0 c9 li-bullet-0"><span class="c6"><a class="c1" href="https://www.google.com/url?q=https://eprints.soton.ac.uk/412923/1/WD_sources_iswc_7_.pdf&sa=D&source=editors&ust=1700075273165709&usg=AOvVaw1QLNWh_eqCrJdNbgBfk5Ha">Provenance Information in a Collaborative Knowledge Graph: an Evaluation of Wikidata External References</a></span></li><li class="c0 c9 li-bullet-0"><span class="c6"><a class="c1" href="https://www.google.com/url?q=https://www.brookings.edu/articles/the-impact-of-open-access-scientific-knowledge/&sa=D&source=editors&ust=1700075273166110&usg=AOvVaw1O_ULiXo5SAXlyPKPaWAGC">The Impact of Open-Access Scientific Knowledge (Brookings)</a></span></li><li class="c0 c9 li-bullet-0"><span class="c6"><a class="c1" href="https://www.google.com/url?q=https://www.brookings.edu/articles/dismantling-the-ivory-towers-knowledge-boundaries-a-call-for-open-access-as-the-new-normal-in-the-social-sciences-post-covid/&sa=D&source=editors&ust=1700075273166542&usg=AOvVaw1lsONVe-ygVJuX_mWIG1Ck">Dismantling the ivory tower’s knowledge boundaries | Brookings</a></span></li><li class="c0 c9 li-bullet-0"><span class="c14 c23"> </span><span class="c6"><a class="c1" href="https://www.google.com/url?q=https://maxlangenkamp.me/posts/mloss_essay/&sa=D&source=editors&ust=1700075273166868&usg=AOvVaw2VVytxnf2JIP7rPdOEBf5Q">How Open Source Machine Learning Software Shapes AI</a></span></li><li class="c0 c9 li-bullet-0"><span class="c6"><a class="c1" href="https://www.google.com/url?q=https://www.epfl.ch/research/open-science/&sa=D&source=editors&ust=1700075273167180&usg=AOvVaw00DwCbAqtXIJupbZfa4TeU">Open Science ‐ EPFL</a></span></li><li class="c0 c9 li-bullet-0"><span class="c6"><a class="c1" href="https://www.google.com/url?q=https://www.unesco.org/en/open-science/about&sa=D&source=editors&ust=1700075273167466&usg=AOvVaw1vi131psPMzP9h7KLooJhZ">UNESCO Recommendation on Open Science</a></span></li><li class="c0 c9 li-bullet-0"><span class="c6"><a class="c1" href="https://www.google.com/url?q=https://link.springer.com/book/10.1007/978-3-319-00026-8&sa=D&source=editors&ust=1700075273167776&usg=AOvVaw1LIoduW1SopXVz8klXVHAl">Opening Science: The Evolving Guide on How the Internet is Changing Research, Collaboration and Scholarly Publishing | SpringerLink</a></span></li></ul><p class="c0 c2"><span class="c5"></span></p><p class="c0 c2"><span class="c3"></span></p></body></html>