---
title: 'BioHackathon MENA 2023. mOWL: Python library for machine learning with biomedical ontologies'
tags:
- ontologies
- machine learning
- software library
authors:
- name: Fernando Zhapa-Camacho
orcid: 0000-0002-0710-2259
affiliation: 1
- name: Maxat Kulmanov
orcid: 0000-0003-1710-1820
affiliation: 1
- name: Sheikha Lardhi
orcid: 0000-0001-9061-8397
affiliation: 2
affiliations:
- name: Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
index: 1
- name: KAUST Catalysis Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
index: 2
date: 11 February 2023
bibliography: paper.bib
authors_short: Last et al. (2023) mOWL
group: BioHackrXiv
event: BioHackathon MENA 2023
---
# Introduction or Background
mOWL is a software library that implements and provides methods to integrate ontologies as the main data structure for machine learning (ML) models. Recently, there have been several works using different components of ontologies (axioms, metadata, relationships between entities) to improve or implement knowledge-aware machine learning models. These works have been primarily designed for biomedical tasks such as protein-protein interaction prediction[@onto2graph], drug-target[@dtivoodoo] and gene-disease associations[@dl2vec] or protein function prediction[@deepgozero]. mOWL[@mowl] standardizes the different ways that ontologies can be transformed for machine learning models and also provides tools to help on the development of new ones.
Since research on ontologies and ML fields are on continuing development, a library like mOWL needs to be up to date with the current developments and also provide better features that can ease the usability of the library. Therefore, by participating in the BioHackathon MENA 2023, we aim to extend and improve mOWL capabilities and add new methods that have developed recently.
# New contributions to mOWL
The initial objective is to improve existing functionalities of the library by extending the functionalities of existing components and adding new methods. Also, in this BioHackathon we focused on the use of mOWL on different downstream tasks by collaborating and assisting other projects that decided to use the library as part of their workflows. There are three main features that were added to mOWL in this event:
## Local visualization of nodes:
This feature corresponds to the graph representation of ontologies, where large ontologies can generate large graphs which are not suitable to plot in a visualization graph. Thus, we added a method to the existing graph structure that will generate a visual representation of the structure of the graph around a particular node.
```python=
ontology = PPIYeastDataset().ontology
projector = DL2VecProjector()
graph = Graph(projector.project(ontology))
# New feature
graph.look_at(“GO_0000917")
```

```python=
graph.look_at(“4932.YDR020C”)
```

## Axiom Scoring:
In mOWL, several methods generate embeddings that can be used later in other machine learning tasks. However, some models are trained to encode some types of subsumption axioms by minimizing functions that represent the plausibility of those axioms. Once a model is trained, there is not a method that can generate scores for new query axioms. Therefore, we equipped some models in mOWL with a “model.score” method, that can receive a query axioms and produces a score. We show an example of how the new feature would work:
```python
dataset = PPIYeastDataset()
model = ELEmbeddings(dataset)
model.train()
# New feature
owl_axiom = # some axiom defined in OWL
model.score(owl_axiom)
```
## Downstream tasks with mOWL:
mOWL was part of other projects in
two different ways: (i) members of project 20 collaborated actively in
the development of other projects and (ii) members of project 20
assisted on the use of mOWL in other projects. More specifically, the
projects where mOWL was involved were:
### Project 3:
In collaboration with Project 3, we have started the development of an end-to-end ontology based conditional generative adversarial model for synthetic generation of clinical and laboratory measurements. During the hackathon, we have implemented a baseline model which learns embeddings using OWL2Vec Star and OPA2Vec methods (MOWL implementation) and trained a Vanilla CGAN which generates synthetic features. Next, we plan to evaluate the generated synthetic features using statistical methods and work on implementing an end-to-end approach where we combine CGAN with ELEmbeddings or Falcon models.
### Project 11:
In collaboration with Project 11, we generated embeddings of phenotype and lifestyle ontologies to be used in a causal model that aims to find causal relationships between phenotypic and lifestyle factors in the diabetes disease.
### Other projects:
Other projects where we assisted in the use of mOWL were: Project 13.
# Discussion and/or Conclusion
Participation on the BioHackathon MENA 2023 has been quite useful for the development of mOWL since not only the library got improved in terms of the methods therein, but also we found that many other projects found the library usable on their workflows and valuable feedback was obtained for mOWL.
# Future work
For the features started on this BioHackathon, future work will consist on refine the implementation. Also, since mOWL follows Test Driven Development, future works will consist on adding all the necessary tests to ensure the correct behaviour of the implemented features.
# Jupyter notebooks, GitHub repositories and data repositories
* Features implemented in this BioHackathon will be available in version 0.2.0. We follow Semantic Versioning for keep track of the versions of mOWL.
# Acknowledgements
We acknowledge the organization of the BioHackathon MENA and all the people that contributed to this project by implementing code features, using mOWL in different downstream tasks and detecting flawas, bugs and bottlenecks and provided valuable feedback.
# References