BioHackathon MENA 2023. mOWL: Python library for machine learning with biomedical ontologies

--- title: 'BioHackathon MENA 2023. mOWL: Python library for machine learning with biomedical ontologies' tags: - ontologies - machine learning - software library authors: - name: Fernando Zhapa-Camacho orcid: 0000-0002-0710-2259 affiliation: 1 - name: Maxat Kulmanov orcid: 0000-0003-1710-1820 affiliation: 1 - name: Sheikha Lardhi orcid: 0000-0001-9061-8397 affiliation: 2 affiliations: - name: Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia index: 1 - name: KAUST Catalysis Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia index: 2 date: 11 February 2023 bibliography: paper.bib authors_short: Last et al. (2023) mOWL group: BioHackrXiv event: BioHackathon MENA 2023 --- # Introduction or Background mOWL is a software library that implements and provides methods to integrate ontologies as the main data structure for machine learning (ML) models. Recently, there have been several works using different components of ontologies (axioms, metadata, relationships between entities) to improve or implement knowledge-aware machine learning models. These works have been primarily designed for biomedical tasks such as protein-protein interaction prediction[@onto2graph], drug-target[@dtivoodoo] and gene-disease associations[@dl2vec] or protein function prediction[@deepgozero]. mOWL[@mowl] standardizes the different ways that ontologies can be transformed for machine learning models and also provides tools to help on the development of new ones. Since research on ontologies and ML fields are on continuing development, a library like mOWL needs to be up to date with the current developments and also provide better features that can ease the usability of the library. Therefore, by participating in the BioHackathon MENA 2023, we aim to extend and improve mOWL capabilities and add new methods that have developed recently. # New contributions to mOWL The initial objective is to improve existing functionalities of the library by extending the functionalities of existing components and adding new methods. Also, in this BioHackathon we focused on the use of mOWL on different downstream tasks by collaborating and assisting other projects that decided to use the library as part of their workflows. There are three main features that were added to mOWL in this event: ## Local visualization of nodes: This feature corresponds to the graph representation of ontologies, where large ontologies can generate large graphs which are not suitable to plot in a visualization graph. Thus, we added a method to the existing graph structure that will generate a visual representation of the structure of the graph around a particular node. ```python= ontology = PPIYeastDataset().ontology projector = DL2VecProjector() graph = Graph(projector.project(ontology)) # New feature graph.look_at(“GO_0000917") ``` ![](https://i.imgur.com/mD3ivfI.jpg) ```python= graph.look_at(“4932.YDR020C”) ``` ![](https://i.imgur.com/794EYIv.jpg) ## Axiom Scoring: In mOWL, several methods generate embeddings that can be used later in other machine learning tasks. However, some models are trained to encode some types of subsumption axioms by minimizing functions that represent the plausibility of those axioms. Once a model is trained, there is not a method that can generate scores for new query axioms. Therefore, we equipped some models in mOWL with a “model.score” method, that can receive a query axioms and produces a score. We show an example of how the new feature would work: ```python dataset = PPIYeastDataset() model = ELEmbeddings(dataset) model.train() # New feature owl_axiom = # some axiom defined in OWL model.score(owl_axiom) ``` ## Downstream tasks with mOWL: mOWL was part of other projects in two different ways: (i) members of project 20 collaborated actively in the development of other projects and (ii) members of project 20 assisted on the use of mOWL in other projects. More specifically, the projects where mOWL was involved were: ### Project 3: In collaboration with Project 3, we have started the development of an end-to-end ontology based conditional generative adversarial model for synthetic generation of clinical and laboratory measurements. During the hackathon, we have implemented a baseline model which learns embeddings using OWL2Vec Star and OPA2Vec methods (MOWL implementation) and trained a Vanilla CGAN which generates synthetic features. Next, we plan to evaluate the generated synthetic features using statistical methods and work on implementing an end-to-end approach where we combine CGAN with ELEmbeddings or Falcon models. ### Project 11: In collaboration with Project 11, we generated embeddings of phenotype and lifestyle ontologies to be used in a causal model that aims to find causal relationships between phenotypic and lifestyle factors in the diabetes disease. ### Other projects: Other projects where we assisted in the use of mOWL were: Project 13. # Discussion and/or Conclusion Participation on the BioHackathon MENA 2023 has been quite useful for the development of mOWL since not only the library got improved in terms of the methods therein, but also we found that many other projects found the library usable on their workflows and valuable feedback was obtained for mOWL. # Future work For the features started on this BioHackathon, future work will consist on refine the implementation. Also, since mOWL follows Test Driven Development, future works will consist on adding all the necessary tests to ensure the correct behaviour of the implemented features. # Jupyter notebooks, GitHub repositories and data repositories * Features implemented in this BioHackathon will be available in version 0.2.0. We follow Semantic Versioning for keep track of the versions of mOWL. # Acknowledgements We acknowledge the organization of the BioHackathon MENA and all the people that contributed to this project by implementing code features, using mOWL in different downstream tasks and detecting flawas, bugs and bottlenecks and provided valuable feedback. # References

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.