BioHackathon 2023 proposal

# BioHackathon 2023 proposal Research interest: ☐ \[R1] Multi-omics analysis on human genotype to phenotype that includes genomic, transcriptomic, epigenomic, proteomic, protein structures, and biochemical data. ☐ \[R2] Automated data analysis of microorganisms including phylogenetic compositions, gene annotations, pathways, and growth conditions. ☑ \[R3] Data-driven interdisciplinary studies in public health, environment, agriculture, food, energy, and other fields. ☐ \[R4] Facilitating knowledge discovery and biological analysis of knowledge graphs and literature. ## Technical aspects: ☑ \[T1] Data ☐ \[T2] Algorithm ☐ \[T3] Analysis ☑ \[T4] Tool ☐ \[T5] Application ☐ \[T6] Workflow ## Abstract of your hacking plan: Less than 500 words. In the abstract, please describe how you plan to contribute and collaborate by indicating your "Research interest" (please pick one from R1 to R4) and explain your expertise chosen from the "Technical aspects" (please pick one or more options from T1 to T6). [[ Research and public health benefit enormously from the democratization of investigation. If one idea out of a thousand pays off, a health IT infrastructure that empowers researchers to ask as many questions as possible offers an immeasurable social good. The main obstacles to allowing a thousand questions to bloom are the information impedence imposed by crossing informatics system boundaries and the lack of shared semantics. Both RDF and SPARQL were designed to enable data integration with minimal plumbing costs and, as part of the semantic web ecosystem, can also facilitate the goal of semantic interoperability. In the ideal limit, SPARQL federated queries enable spontaneous construction of queries which link data from diverse sources. The Fast Healthcare Interoperability Resources (FHIR) standard aims to address both the impedence mismatch and semantic interoperability by offering an API to facilitate system integration and a metamodel allowing for the definition of interoperable resources. RDF is a supported FHIR payload format and the HAPI FHIR server currently supports three formats of FHIR: XML, JSON and Turtle (RDF). On February 24, the latest specification for FHIR went to ANSI ballot with a revised specification for the RDF model. This included a complete specification of the representation of all FHIR Resources as an RDF graph. Yet, querying for FHIR data currently relies on the FHIR REST API. We believe that we can further facilitate the adoption of RDF within the FHIR community by adding a SPARQL query engine to the existing HAPI FHIR server. The FHIR/RDF specification fully defines the data surface to be matched by SPARQL queries. Support for the RDF format already demands the Jena RDF library to be integrated into HAPI; the additional integration of Arq would be a small increase in footprint. The real work will be in analyzing SPARQL query objects and efficiently executing those queries over one or more of HAPI’s back-end stores. Time permitting, this work product can be integrated into other hackathon products in time for the final reports. The proposed participants have a track record of working effectively together and have contributed to numerous open-source projects, in the realm of clinical informatics, as well as infrastructure: Claude Nanjo is a senior clinical informaticist at the University of Utah and co-chair of the Health Level 7 (HL7) Clinical Information Model Initiative (CIMI). He was one of the initial FHIR contributors and contributed to the OpenEHR java library. Mr Nanjo is currently working as part of the ReimagineEHR initiative at the University of Utah, a project that aims to define a standard EHR API that enables an ecosystem of plug-n-play application. Claude Nanjo is currently working with Eric Prud’hommeaux and Iovka Boneva to update jena-shex to support ShEx 1.2, as well as to improve the efficiency and conformance of that implementation. Eric Prud’hommeaux has been responsible for maintaining RDF support in the HAPI server. As stated above, he’s working on jena-shex, and has his own typescript/javascript implementation of ShEx. His contributions to ts-jison, as well as earlier projects mapping queries to relational databases in SWObjects, leave him well-prepared to execute sound and complete query transformations. ]] ## Your relevant source code repository or website (if any) (Optional): - FHIR/RDF: https://github.com/hapifhir/hapi-fhir/blob/7a31860376a623280d74f3695628c6e3826302b9/hapi-fhir-base/src/main/java/ca/uhn/fhir/parser/RDFParser.java#L662 - SWObjects: https://github.com/ericprud/SWObjects/blob/sparql11/lib/SQLizer.hpp#L36 - ts-jison: https://github.com/ericprud/ts-jison/ ## Your relevant publications, talks, and posters (if any) (Optional): ## Financial support: Since we would like to invite as many participants as possible within our limited budget for the BioHackathon, we highly appreciate that if you could participate on your own budget. We have three travel support options for application proposal winners: full support, partial support, and no financial support. [*] Full support (flights to/from Japan, domestic transportation and accommodation) [ ] Partial support (accommodation only, without flights to/from Japan) [ ] No financial support is required (invitation letter only) ## Is your project funded by NBDC? No ## This inquiry is mostly for domestic participants, funded by the Department of NBDC Program (NBDC) in Japan: No ## City and country name of your (institutional) residence: - Clermont-Ferrand, France - San Diego, CA USA ## Visa requirement: If your passport requires a visa to visit Japan, please indicate it to prioritize your invitation process. [ ] Yes [*] No ## Comments to the organizers If you have any questions or comments, please use this form or write to organizers at 2023-admin@biohackathon.org.