FAIR Cookbook
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # How to build an application ontology with Robot **identifier:** **version:** [v0.1](v0.1) ___ **_Difficulty level:_** :triangular_flag_on_post: :triangular_flag_on_post: :triangular_flag_on_post: :white_circle: :white_circle: **_Reading time:_** 60 minutes **_Intended Audience:_** > :heavy_check_mark: *Terminology Manager* > :heavy_check_mark: *Ontologist* > :heavy_check_mark: *Application Developer* > :heavy_check_mark: *Data Scientist* **_Recipe Type_**: Hands-on practical **_Executable code_**: No --- # How to build application ontology with ROBOT 1. [Main FAIRification Objectives](#Main%20FAIRification%20Objectives) 2. [Graphical Overview of the FAIRification Recipe Objectives](#Graphical%20Overview%20of%20the%20FAIRification%20Recipe%20Objectives) 3. [FAIRification Objectives, Inputs and Outputs](#FAIRification%20Objectives,%20Inputs%20and%20Outputs) 4. [Capability & Maturity Table](#Capability%20&%20Maturity%20Table) 5. [Table of Data Standards](#Table%20of%20Data%20Standards) 6. [Executable Code in Notebook](#Executable%20Code%20in%20Notebook) 7. [How to create workflow figures](#How%20to%20create%20workflow%20figures) 8. [License](#License) ## Main Objectives > The main purpose of this recipe is building an application ontology from source ontologies using `ROBOT` via a sustainable dynamic pipeline to allow seamless integration of source ontology updates. An application ontology is a semantic artefact which is developed to answer the needs of a specific application or focus. Thus it may borrow terms from a number of reference ontologies, which can be extremely large but whose broad coverage may not be required by the application ontology. Yet, it is critical to keep the `application ontology` synchronized with the `reference ontologies` that imports are made from. We aim to document how a certain level of automation can be achieved. [ROBOT](http://robot.obolibrary.org/) is an open source tool for ontology development. It supports ontology editing, ontology annotation, ontology format conversion, and other functions. It runs on the Java Virtual Machine (JVM) and can be used through command line tools on Windows, macOS and Linux. ## Graphical Overview of the FAIRification Recipe Objectives <div class="mermaid"> graph TD I1(fa:fa-university ontology source 1):::box -->|extract| M1(fa:fa-cube ontology module 1):::box I2(fa:fa-university ontology source 2):::box -->|extract| M2(fa:fa-cube ontology module 2):::box I3(fa:fa-university ontology source 3):::box -->|extract| M3(fa:fa-cube ontology module 3):::box M1 --> |merge| R1(fa:fa-cubes core):::box M2 --> |merge| R1(fa:fa-cubes core):::box M3 --> |merge| R1(fa:fa-cubes core):::box R1(fa:fa-cubes core):::box --> |materialize| R2(fa:fa-cubes reasoned & reduced core):::box R2(fa:fa-cubes reasoned & reduced core) -->|report| R3(fa:fa-tasks report):::box R2(fa:fa-cubes reasoned & reduced core) --> |edit| R4(fa:fa-cubes reasoned & reduced core):::box R3 -->|edit| R4 R4 -->|annotate| R5(fa:fa-cubes fa:fa-tags fa:fa-cc annotated,reasoned,reduced core):::box R5 -->|convert| R6(fa:fa-star-o obo version):::box R5 -->|convert| R7(fa:fa-star owl version):::box classDef box font-family:avenir,font-size:14px,fill:#2a9fc9,stroke:#222,color:#fff,stroke-width:1px </div> <!-- [![](https://mermaid.ink/img/eyJjb2RlIjoiZ3JhcGggVERcbiAgSTEoZmE6ZmEtdW5pdmVyc2l0eSBvbnRvbG9neSBzb3VyY2UgMSkgLS0-fGV4dHJhY3R8IE0xKGZhOmZhLWN1YmUgb250b2xvZ3kgbW9kdWxlIDEpXG4gIEkyKGZhOmZhLXVuaXZlcnNpdHkgb250b2xvZ3kgc291cmNlIDIpIC0tPnxleHRyYWN0fCBNMihmYTpmYS1jdWJlIG9udG9sb2d5IG1vZHVsZSAyKVxuICBJMyhmYTpmYS11bml2ZXJzaXR5IG9udG9sb2d5IHNvdXJjZSAzKSAtLT58ZXh0cmFjdHwgTTMoZmE6ZmEtY3ViZSBvbnRvbG9neSBtb2R1bGUgMylcbiAgTTEgLS0-IHxtZXJnZXwgUjEoZmE6ZmEtY3ViZXMgY29yZSlcbiAgTTIgLS0-IHxtZXJnZXwgUjEoZmE6ZmEtY3ViZXMgY29yZSlcbiAgTTMgLS0-IHxtZXJnZXwgUjEoZmE6ZmEtY3ViZXMgY29yZSlcbiAgUjEoZmE6ZmEtY3ViZXMgY29yZSkgLS0-IHxtYXRlcmlhbGl6ZXwgUjIoZmE6ZmEtY3ViZXMgcmVhc29uZWQgJiByZWR1Y2VkIGNvcmUpXG5cbiAgUjIoZmE6ZmEtY3ViZXMgcmVhc29uZWQgJiByZWR1Y2VkIGNvcmUpIC0tPnxyZXBvcnR8IFIzKGZhOmZhLXRhc2tzIHJlcG9ydClcbiAgUjIoZmE6ZmEtY3ViZXMgcmVhc29uZWQgJiByZWR1Y2VkIGNvcmUpIC0tPiB8ZWRpdHwgUjQoZmE6ZmEtY3ViZXMgcmVhc29uZWQgJiByZWR1Y2VkIGNvcmUpXG4gIFIzIC0tPnxlZGl0fCBSNFxuICBSNCAtLT58YW5ub3RhdGV8IFI1KGZhOmZhLWN1YmVzIGZhOmZhLXRhZ3MgZmE6ZmEtY2MgYW5ub3RhdGVkLHJlYXNvbmVkLHJlZHVjZWQgY29yZSlcbiAgUjUgLS0-fGNvbnZlcnR8IFI2KGZhOmZhLXN0YXItbyBvYm8gdmVyc2lvbilcbiAgUjUgLS0-fGNvbnZlcnR8IFI3KGZhOmZhLXN0YXIgb3dsIHZlcnNpb24pIiwibWVybWFpZCI6eyJ0aGVtZSI6Im5ldXRyYWwifX0)](https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoiZ3JhcGggVERcbiAgSTEoZmE6ZmEtdW5pdmVyc2l0eSBvbnRvbG9neSBzb3VyY2UgMSkgLS0-fGV4dHJhY3R8IE0xKGZhOmZhLWN1YmUgb250b2xvZ3kgbW9kdWxlIDEpXG4gIEkyKGZhOmZhLXVuaXZlcnNpdHkgb250b2xvZ3kgc291cmNlIDIpIC0tPnxleHRyYWN0fCBNMihmYTpmYS1jdWJlIG9udG9sb2d5IG1vZHVsZSAyKVxuICBJMyhmYTpmYS11bml2ZXJzaXR5IG9udG9sb2d5IHNvdXJjZSAzKSAtLT58ZXh0cmFjdHwgTTMoZmE6ZmEtY3ViZSBvbnRvbG9neSBtb2R1bGUgMylcbiAgTTEgLS0-IHxtZXJnZXwgUjEoZmE6ZmEtY3ViZXMgY29yZSlcbiAgTTIgLS0-IHxtZXJnZXwgUjEoZmE6ZmEtY3ViZXMgY29yZSlcbiAgTTMgLS0-IHxtZXJnZXwgUjEoZmE6ZmEtY3ViZXMgY29yZSlcbiAgUjEoZmE6ZmEtY3ViZXMgY29yZSkgLS0-IHxtYXRlcmlhbGl6ZXwgUjIoZmE6ZmEtY3ViZXMgcmVhc29uZWQgJiByZWR1Y2VkIGNvcmUpXG5cbiAgUjIoZmE6ZmEtY3ViZXMgcmVhc29uZWQgJiByZWR1Y2VkIGNvcmUpIC0tPnxyZXBvcnR8IFIzKGZhOmZhLXRhc2tzIHJlcG9ydClcbiAgUjIoZmE6ZmEtY3ViZXMgcmVhc29uZWQgJiByZWR1Y2VkIGNvcmUpIC0tPiB8ZWRpdHwgUjQoZmE6ZmEtY3ViZXMgcmVhc29uZWQgJiByZWR1Y2VkIGNvcmUpXG4gIFIzIC0tPnxlZGl0fCBSNFxuICBSNCAtLT58YW5ub3RhdGV8IFI1KGZhOmZhLWN1YmVzIGZhOmZhLXRhZ3MgZmE6ZmEtY2MgYW5ub3RhdGVkLHJlYXNvbmVkLHJlZHVjZWQgY29yZSlcbiAgUjUgLS0-fGNvbnZlcnR8IFI2KGZhOmZhLXN0YXItbyBvYm8gdmVyc2lvbilcbiAgUjUgLS0-fGNvbnZlcnR8IFI3KGZhOmZhLXN0YXIgb3dsIHZlcnNpb24pIiwibWVybWFpZCI6eyJ0aGVtZSI6Im5ldXRyYWwifX0) --> ## Capability & Maturity Table <!-- TO DO --> | Capability | Initial Maturity Level | Final Maturity Level | | :------------- | :------------- | :------------- | | Interoperability | minimal | repeatable | ---- ## FAIRification Objectives, Inputs and Outputs | Actions.Objectives.Tasks | Input | Output | | :------------- | :------------- | :------------- | | [ontology building](http://edamontology.org/operation_XXXX) | [tsv,OWL]|OWL| ## Table of Data Standards | Data Formats | Terminologies | Models | | :------------- | :------------- | :------------- | | [OWL](https://fairsharing.org/FAIRsharing.atygwy) | | | | [OBO](https://fairsharing.org/FAIRsharing.aa0eat) | | | ___ ## Ingredients | Tool Name | Tool Location | Tool function | | :------------- | :------------- | :------------ | | ROBOT | [http://robot.obolibrary.org/](http://robot.obolibrary.org/) | ontology management cli | | Ontology development kit | [https://github.com/INCATools/ontology-development-kit](https://github.com/INCATools/ontology-development-kit) (comes with ROBOT included)| ontology management environment | | Protégé/other ontology editor | [https://protege.stanford.edu/](https://protege.stanford.edu/) | ontology editor | | SPARQL | [https://www.w3.org/TR/sparql11-query/](https://www.w3.org/TR/sparql11-query/) | ontology query language | | ELK |[https://www.cs.ox.ac.uk/isg/tools/ELK/](https://www.cs.ox.ac.uk/isg/tools/ELK/)|ontology reasoner| |Hermit|[http://www.hermit-reasoner.com/](http://www.hermit-reasoner.com/)|ontology reasoner| ## Step by step process ### Preliminary requirements The development of `application ontology` requires joint contributions from `domain experts`, `use case owners` and `ontology developers`. The domain expert provides essential domain knowledge. The use case owners defines the `competency questions` of the application ontology. And the ontology developers are `IT specialists` working on the construction of the application ontology. ### Step 1: Define the goal of the application ontology The development of an `application ontology` is driven by specific use cases and target datasets. The first step in application ontology development is to determine the subject and the aim of the application ontology. In this recipe, we demonstrate the workflow of building an application ontology for patient metadata and patient sequencing data investigations. `Competency questions` of this ontology are provided below: <!-- [here](https://github.com/FAIRplus/the-fair-cookbook/blob/ontology_robot_recipe/docs/content/recipes/ontology-robot/competency_questions.md). --> | **Theme** | **Competency Questions** | |:------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------| | **General Questions** | | | | :heavy_plus_sign: How to identify relevant public domain ontologies suiting our needs? | | | :heavy_plus_sign: How to establish an ontology covering all terms that are included in the actual data to be represented? | | :heavy_plus_sign: How to remove terms from the resulting ontology that are not needed? | | | :heavy_plus_sign: How to guarantee consistency of the final ontology? | | | :heavy_plus_sign: How to identify differences in comparison to a previous version of the resulting ontology? | | **Questions without specifying compounds or genes for the [example dataset](https://github.com/FAIRplus/the-fair-cookbook/blob/ontology_robot_recipe/docs/content/recipes/ontology-robot/ExternalStudiesKQ.xlsx)** | | | | :heavy_plus_sign: Identify all data generated from tissues taken from patients suffering from a specific disease. | | | :heavy_plus_sign: Identify all data generated from a specific tissues obtained from mouse models that are related to a specific disease. | | | :heavy_plus_sign: Identify all data generated from lung tissue taken from patients suffering from a lung disease that is not related to oncology. | | | :heavy_plus_sign: Identify all data generated from primary cells whose origin is the lung. | | | :heavy_plus_sign: Identify all data generated from celllines whose origin is the lung. | | **Questions including single genes or gene sets** | | | | :heavy_plus_sign: What is the expression of PPARg / growths factors in lung tissue from IPF patients? | | | :heavy_plus_sign: What is the expression of PPARg / growths factors in primary cells obtained from lung tissue from healthy subjects? | | | :heavy_plus_sign: What is the expression of PPARg / growths factors in all available tissues obtained from healthy subjects? | | **Questions including compound or treatment information** | | | | :heavy_plus_sign: Identify all data generated from primary cells treated with a kinase inhibitor. | | | :heavy_plus_sign: Identify all data from patients treated with a specific medication. | | | :heavy_plus_sign: Identify all data generated from cells / celllines that have been treated with compounds targeting a member of a specific pathway. | | | :heavy_plus_sign: What is the expression of PPARg in lung tissue upon treatment with a specific compound in patients suffering from a specific disease | _Table 1_ is a snapshot of the example dataset. The complete patient metadata example dataset is [here](https://github.com/FAIRplus/the-fair-cookbook/blob/ontology_robot_recipe/docs/content/recipes/ontology-robot/ExternalStudiesKQ.xlsx). | Study | source_id | sample_description | tissue | source_tissue | cell | cellline | disease | gender | species | |----------------------|-----------|---------------------|--------|---------------|-----------------------|----------|-------------------------------|--------|--------------| | GSE52463Nance2014 | EX08_001 | Lung - Normal | Lung | | | | Normal | male | Homo sapiens | | GSE52463Nance2014 | EX08_015 | Lung - IPF | Lung | | | | Idiopathic Pulmonary Fibrosis | male | Homo sapiens | | GSE116987Marcher2019 | EX101_1 | HSC CCl4-treated w0 | | | Hepatic Stellate Cell | | NA | | Mus musculus | _Table 1: Patient metadata example_ ### Step 2: Select terms from reference ontologies #### Step 2.1 Select source ontologies To build an `application ontology` that supports communication between different data resources, it is recommended to reuse existing terms from existing reference ontologies instead of creating new terms. >:bulb: _Reference ontology:_ a semantic artifact with a canonical and comprehensive representation of the entities in a specific domain. > >:bulb: _Source ontology:_ An ontology to be integrated into the application ontology, usually a reference ontology. When selecting reference ontologies as source ontologies, the reusability and sustainability of the source ontology need to be considered. Many ontologies in [the OBO foundry](http://www.obofoundry.org/) use the [CC-BY](https://creativecommons.org/licenses/by/2.0/) licence, allowing sharing and adaptation. Such ontologies can be directly used as source ontology. Reference ontologies with similar umbrella structures can be conveniently combined in the application ontology. The format and maintenance of the reference ontology shall also be considered. Specific use cases and requirements from the target dataset also affect the choice of source ontologies. For use cases focusing on extracting data from heterogeneous datasets with complicated data types and data relationships, reference ontologies with rich term relationships are preferred. The interpretation of each term also depends on the context and requirements of the target dataset. For example, _"hypertension"_ can be either interpreted as a _phenotype_ and mapped to phenotype ontologies, or a _disease_ mapped to disease ontologies. The example dataset includes metadata related to disease, species, cell lines, tissues and biological sex. _Table 2_ lists some reference ontologies available in corresponding domains. In this recipe, we selected _MONDO_ for disease domain, _UBERON_ for anatomy, _NCBItaxon_ for species taxonomy and _PATO_ for biological sex. |Domain|Reference Ontologies| |----|----| |Disease| [Mondo Disease Ontology, MONDO](http://www.obofoundry.org/ontology/mondo.html)<br> [Disease Ontology, DIOD](https://disease-ontology.org/)<br> | |Species|[NCBI organismal classification, NCBItaxon](http://www.obofoundry.org/ontology/ncbitaxon.html)| |Cell line|[Cell Ontology, CL](http://www.obofoundry.org/ontology/cl.html)<br> [Cell Line Ontology, CLO](http://www.obofoundry.org/ontology/clo.html)| |Tissue|[NCI Thesaurus OBO Edition, NCIT](http://www.obofoundry.org/ontology/ncit.html)<br> [Ontology for MIRNA Target, OMIT](http://www.obofoundry.org/ontology/omit.html)<br> [Uberon multi-species anatomy ontology, UBERON](http://www.obofoundry.org/ontology/uberon.html)| |Biological Sex|[Phenotype And Trait Ontology, PATO](http://www.obofoundry.org/ontology/pato.html)| _Table 2: Available reference ontologies in selected domains_ #### Step 2.2 Select seed ontology terms > :bulb: _Seed ontology terms_: A set of entities extracted from reference ontologies for the application ontology. > This step identifies the seeds needed to perform the knowledge extraction from external sources, i.e., the set of entities to extract in order to be integrated on the application ontology. Ontology Developer can provide the tools to ease and to scale the identification of the seeds. Domain experts can identify the right seeds for a given application ontology. Besides the fact that is always possible to manually identify the set of seeds needed for the performing of the concept extraction, to have a helper tool allows to run the task at scale. Following, an automatable approach based on using widely known life sciences annotators - [Zooma](https://www.ebi.ac.uk/spot/zooma/) and [NCBO Annotator](https://bioportal.bioontology.org/annotator) - are depicted. <div class="mermaid"> graph LR A(fa:fa-file-text-o Select seed terms):::box -->B1(fa:fa-user-o Manual selection):::box A -->B2(fa:fa-database Selection with ZOOMA):::box A -->B3(fa:fa-database Selection with NCBO Annotator):::box A -->B4(fa:fa-cogs Selection with SPARQL):::box classDef box font-family:avenir,font-size:14px,fill:#2a9fc9,stroke:#222,color:#fff,stroke-width:1px </div> _Figure 1: Ontology seed term selection approaches_ [ZOOMA](https://www.ebi.ac.uk/spot/zooma/) is a web service for discovering optimal ontology mappings, developed by the Samples, Phenotypes and Ontologies Team at the [European Bioinformatics Institute (EMBL-EBI)](https://www.ebi.ac.uk) . It can be used to automatically annotate plain text about biological entities with ontology classes. Zooma allows to limit the sources used to perform the annotations. These sources are either curated datasources, or ontologies from the [Ontology Lookup Service (OLS)](https://www.ebi.ac.uk/ols/index). Zooma contains a linked data repository of annotation knowledge and highly annotated data that has been feeded with manually curated annotation derived from publicly available databases. Because the source data has been curated by hand, it is a rich source of knowledge that is optimised for this task of semantic markup of keywords, in contrast with text-mining approaches. The [NCBO Annotator](https://bioportal.bioontology.org/annotator) is an ontology-based web service that annotates public datasets with biomedical ontology concepts based on their textual metadata. It can be used to automatically tag data with ontology concepts. These concepts come from the Unified Medical Language System (UMLS) Metathesaurus and the National Center for Biomedical Ontology (NCBO) BioPortal ontologies. Both the `Zooma` and `NCBO Annotator service` provides a `web interface` and a `REST API` to identify the seed terms by annotation of free text. Two scripts able to automate the annotation of a set of free text terms are shown following. #### Seed term extraction with ZOOMA The following sample script uses the `Zooma web service` to get a list of seed terms - i.e., the URIs of ontology classes -. The service also states the level of confidence of every seed proposed. ```python3 #Python3 #zooma-annotator-script.py file def get_annotations(propertyType, propertyValues, filters = ""): """ Get Zooma annotations for the values of a given property of a given type. """ import requests annotations = [] no_annotations = [] for value in propertyValues: ploads = {'propertyValue': value, 'propertyType': propertyType, 'filter': filters} r = requests.get('http://www.ebi.ac.uk/spot/zooma/v2/api/services/annotate', params=ploads) import json data = json.loads(r.text) if len(data) == 0: no_annotations.append(value) for item in data: annotations.append((f"{item['semanticTags'][0]} " f"# {value}" f"-Confidence:{item['confidence']}")) return annotations, no_annotations if __name__ == "__main__": propertyType = 'gender' propertyValues = ['male', 'female', 'unknown'] annotations, no_annotations = get_annotations(propertyType, propertyValues) from pprint import pprint pprint(annotations) pprint(no_annotations) ``` Running the above script to get the seeds for the terms `male`, `female`, and `unknown` generates the following results: ```python3 ['http://purl.obolibrary.org/obo/PATO_0000384 # male-Confidence:HIGH', 'http://purl.obolibrary.org/obo/PATO_0000383 # female-Confidence:HIGH', 'http://www.orpha.net/ORDO/Orphanet_288 # unknown-Confidence:MEDIUM', 'http://www.ebi.ac.uk/efo/EFO_0003850 # unknown-Confidence:MEDIUM', 'http://www.ebi.ac.uk/efo/EFO_0003952 # unknown-Confidence:MEDIUM', 'http://www.ebi.ac.uk/efo/EFO_0009471 # unknown-Confidence:MEDIUM', 'http://www.ebi.ac.uk/efo/EFO_0000203 # unknown-Confidence:MEDIUM', 'http://www.ebi.ac.uk/efo/EFO_0003863 # unknown-Confidence:MEDIUM', 'http://www.ebi.ac.uk/efo/EFO_0000616 # unknown-Confidence:MEDIUM', 'http://purl.obolibrary.org/obo/HP_0000952 # unknown-Confidence:MEDIUM', 'http://www.ebi.ac.uk/efo/EFO_0003853 # unknown-Confidence:MEDIUM', 'http://www.ebi.ac.uk/efo/EFO_1001331 # unknown-Confidence:MEDIUM', 'http://www.ebi.ac.uk/efo/EFO_0003769 # unknown-Confidence:MEDIUM', 'http://purl.obolibrary.org/obo/HP_0000132 # unknown-Confidence:MEDIUM', 'http://www.ebi.ac.uk/efo/EFO_0000408 # unknown-Confidence:MEDIUM', 'http://www.ebi.ac.uk/efo/EFO_0008549 # unknown-Confidence:MEDIUM', 'http://www.ebi.ac.uk/efo/EFO_0001642 # unknown-Confidence:MEDIUM'] [] ``` #### Seed term extraction with NCBO Annotator The following sample script uses the NCBO Annotator web service to get a list of seed terms - i.e., the URIs of ontology classes -. ```python3 #Python3 #ncbo-annotator-script.py file def get_annotations(propertyValues, ontologies = ''): """ Get NCBO Annotations for the values of a given property. """ import requests annotations = [] no_annotations = [] for value in propertyValues: ploads = {'apikey': '8b5b7825-538d-40e0-9e9e-5ab9274a9aeb', 'text': value, 'ontologies': ontologies, 'longest_only': 'true', 'exclude_numbers': 'false', 'whole_word_only': 'true', 'exclude_synonyms': 'false'} r = requests.get('http://data.bioontology.org/annotator', params=ploads) import json data = json.loads(r.text) if len(data) == 0: no_annotations.append(value) for item in data: annotations.append(f"{item['annotatedClass']['@id']} # {value}") return annotations, no_annotations if __name__ == "__main__": propertyType = 'gender' propertyValues = ['male', 'female', 'unknown'] annotations, no_annotations = get_annotations(propertyType, propertyValues) from pprint import pprint pprint(annotations) pprint(no_annotations) ``` Running the above script to get the seeds for the terms `male`, `female`, and `unknown` generates the following results: ```python3 ['http://purl.obolibrary.org/obo/UBERON_0003101 # male', 'http://purl.obolibrary.org/obo/UBERON_0003100 # female'] ['unknown'] ``` #### Step2.2.3 Seed term extraction with SPARQL Instead of manually maintaining a list of seed terms to generate a module, a term list can be generated on the fly using a `SPARQL query`. Here, we generate a subset of `UBERON` terms which have a crossreference to either `FMA (for human anatomy)` or `MA (for mouse anatomy)` terms, since our example datasets includes human, mouse and rat data. ```sql PREFIX scdo: <http://scdontology.h3abionet.org/ontology/> PREFIX obo: <http://purl.obolibrary.org/obo/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#> PREFIX owl: <http://www.w3.org/2002/07/owl#> SELECT ?uri WHERE { { { ?uri oboInOwl:hasDbXref ?xref . } UNION { ?axiom a owl:Axiom; owl:annotatedSource ?uri; oboInOwl:hasDbXref ?xref . } } ?uri a owl:Class FILTER isLiteral(?xref) FILTER regex( ?xref, "^FMA|^MA:", "i") } ``` ### Step 3: Extract ontology modules from source ontologies Module extractions from ontologies can be run manually and in an ad hoc fashion. We would however recommend to collect all steps together into a script or `Makefile` to avoid missing steps. `ROBOT` steps can in theory be chained together into single large commands. Practical experience however teaches that this can have unexpected consequences as well as making debugging difficult in the event of an issue. It is therefore advisable to split extractions and merges out into individual steps with intermediate artefacts which can be deleted at the end of the process chain. #### Step 3.1 Get source ontology files We recommend starting each (re)build of the application ontology with the latest versions of the source ontologies unless there is a good reason not to update to the latest version. Ideally, this should be done automatically, for example through a shell script that CURLs all ontologies from their source URIs, e.g. ```shell curl -L http://purl.obolibrary.org/obo/uberon.owl > uberon.owl ``` #### Step 3.2 Choose ontology extraction algorithms `ROBOT` supports two types of ontology module extraction, `Syntactic Locality Module Extractor` (SLME) and `Minimum Information to Reference an External Ontology Term` (MIREOT). `SLME` extractions can be further subdivided into TOP (top module), `BOT` (bottom module) and `STAR` (fixpoint-nested module). For full details of what these different options entail, see http://robot.obolibrary.org/extract.html. We recommend the use of BOT for comprehensive modules that preserve the links between ontology classes and the use of `MIREOT` if relationships apart from parent-child ones are less important. #### Step 3.3 Extract modules using seed terms Using `robot` tool provided by the `OBO foundry`, the creation of a module can be done with one command: ```bash robot extract --method <some_selection> \ --input <some_input.owl> \ --term-file <list_of_classes_of_interest_in_ontology.txt> \ --intermediates <choose_from_option> \ --output ./ontology_modules/extracted_module.owl ``` The `robot` extract command takes several arguments: * *method*: `ROBOT` uses 4 different algorithms to generate a module. `TOP`, `BOT`, `STAR` (all from the SLME method), and `MIREOT`. The first two will create a module below or above the seed classes (the classes of interest in the target ontology) respectively. The `STAR` method creates a module by pulling all the properties and axioms of the seed classes but nothing else. `MIREOT` uses a different methods and offers some more options, in particular when it comes to how many levels up or down (parent and children) are needed. * *input*: this argument is to specify the target ontology you want to extract a module from. It can be the original artefact or a filtered version of it. * *imports*: this argument allows to specific whether or not to include imported ontologies. Note that the default is to do so using the value `include`. Choose `exclude` otherwise. * *term-file*: the text file holding the list of classes of interested identified by their iri (e.g. http://purl.obolibrary.org/obo/UBERON_0001235 # adrenal cortex) * *intermediates*: specific to the `MIREOT` method, it allows to let the algorithm know how much or how little to extract. It has 3 levels (`all`,`minimal`, `none`). * *output*: the path to the owl file holding the results of the module extraction * *copy-ontology-annotations*: a boolean value true/false to pull or not any class annotation from the parent ontology. default is `false` * *sources*: optional, a pointer to a file mapping * *annotate-with-source*: a boolean value true/false. default is `false` The above query, saved under `select_anatomy_subset.sparql` can be used to generate a dynamic seed list, then do a `BOT` extraction: ```shell robot query --input uberon.owl --query select_anatomy_subset.sparql uberon_seed_list.txt robot extract --method BOT --input uberon.owl --term-file uberon_seed_list.txt -o uberon_subset.owl ``` #### Step 3.4 Assess extracted modules The extracted ontology module should include all seed terms and represent the term relationships correctly. It should also preserve the correct hierarchical structure of the source ontology and have consistent granularity. ### Step 4: Build the upper level umbrella ontology The umbrella ontology is the root structure of the ontology. Building the umbrella ontology aims to model the main entity of the use case and its relationships with the ontology classes extracted on the previous step. The main identity of the ontology and relationships with extracted modules can be identified by domain experts, or specified by the use case owner. `ROBOT` provides a template-driven ontology term generation system. A `ROBOT template` is a `tab-separated values` (TSV) or `comma-separated values` (CSV) file that depicts a set of patterns to build ontology entities - i.e., classes, properties, and individuals -. These templates can be used for modular ontology development. After the _Domain Expert_ and the _Use Case/Scenario Owner_ specify the main entity of the use case and its relationships with remaining ontology entities, the _Ontology Developer_ generates the `ROBOT` templates depicting the set of patterns aimed to build the _umbrella_ ontology. A `ROBOT` command using a template to build an ontology is shown below: ```bash robot template --template template.csv \ --prefix "r4: https://fairplus-project.eu/ontologies/R4/" \ --ontology-iri "https://fairplus-project.eu/ontologies/R4/" \ --output ./templates/umbrella.owl ``` And a template sample is presented following: | ID | Label | Entity Type | Equivalent Axioms | Characteristic | Domain | Range | | -------- | -------- | -------- | -------- |-------- | -------- | -------- | | ID | LABEL | TYPE | EC % | CHARACTERISTIC | DOMAIN | RANGE | |ex:cl_2 | Class_2 | class |-|-|-|-| |ex:cl_3 | Class_3 | class |ex:cl_2|-|-|-| |ex:op_1 |Prop_1| object property|-|Class_2|Class_3| | |ex:dp_1 |Prop_2| data property|-|functional|Class_2|xsd:string| >:warning: **_Tip:_** The generated ontology can be visualized by using the Protege tool or a [local deployment of OLS](https://hackmd.io/@FAIRcookbook/H1axYY3wU). The `OLS local deployment` option is recommended by this recipe, given that `Protege` crash when loading medium or large size ontologies. ### Step 5: Merge ontology modules and umbrella ontology Merging the ontology modules previuosly extracted and the umbrella ontology locally built produces a first draft of the application ontology. The `merge` ROBOT command allows to combines two or more separate input ontology modules into a single ontology. Following, the `ROBOT` command merging the umbrella ontology and the modules is shown: ```bash java -jar robot.jar merge \ --input ./ontology_modules/iao_mireot_robot_module_1.owl \ --input ./ontology_modules/obi_mireot_robot_module_2.owl \ --input ./ontology_modules/duo_mireot_robot_module_3.owl \ --input ./templates/umbrella.owl \ --output ./results/r4_app_ontology.owl ``` ### Step 6: Post-merge modifications #### Step 6.1: Materialize and Reasoning Commands below infer superclasses and superclasses and reduce duplicated axioms merged term: ```bash robot materialize \ --reasoner ELK \ --input merged_owl \ reduce \ --output materialized.owl ``` The ontology materialization uses `OWL reasoners`. `ROBOT` provides [several ontology reasoners](http://robot.obolibrary.org/reason). It is also possible to identify issues in the ontology with some `quality control SPARQL queries`. ```bash robot report --input edit.owl --output report.tsv ``` #### Step 6.2: Annotate Ontology annotation adds metadata to the owl file. It is recommended to provide `ontology IRIs`, `version IRIs`, `ontology title`, `descriptions` and `license` to support future usage and management. The annotation can be added either line-by-line or provided in a turtile(.ttl) file. ```bash #ANNOTATE robot annotate --input materialized.owl \ --ontology-iri "https://github.com/ontodev/robot/examples/annotated.owl" \ --version-iri "https://github.com/ontodev/robot/examples/annotated-1.owl" \ --annotation rdfs:comment "Comment" \ --annotation rdfs:label "Label" \ --annotation-file annotations.ttl \ --output results/annotated.owl ``` Below is an example annotation file. ```bash @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix example: <https://github.com/ontodev/robot/examples/> . @prefix dcterms: <http://purl.org/dc/terms/> . example:annotated.owl rdf:type owl:Ontology ; rdfs:comment "Comment from annotations.ttl file." . dcterms:title "Example title" dcterms:license <http://creativecommons.org/publicdomain/zero/1.0/> owl:versionIRI <https://github.com/ontodev/robot/examples/annotated-1.owl> ``` #### Step 6.3: Convert Besides `OWL format`, the `Open Biomedical Ontology (OBO) format` is also widely used in life science related ontologies. It is possible to convert an `.owl` file to `.obo` file using: ```bash #CONVERT: robot convert \ --input annotated.owl\ --format obo \ --output results/annotated.obo ``` ### Step 7: Assess coverage of the ontology scope The final step of the ontology construction is to assess coverage of the ontology scope by verifying if it is able to answer the competency questions predefined. The competency questions can be implemented as a set of `SPARQL queries` and run against the developed ontology to check if the answers/results are aligned with the scope of the ontology. The `use case owner` and o`ntology developer` can also collaborate on the assessment of the competency questions answers/results. `ROBOT` provides the `query` command to perform `SPARQL queries` against an ontology to verify and validate the ontology. The query command runs SPARQL `ASK`, `SELECT`, and `CONSTRUCT` queries by using the `--query` option with two arguments: `a query file` and `an output file`. Instead of specifying one or more pairs (query file, output file), it is also possible to specify a single `--output-dir` and use the `--queries` option to provide one or more queries of any type. Each output file will be written to the output directory with the same base name as the query file that produced it. An pattern example of this command is shown following. ```bash robot query --input <input_ontology_file> \ --queries <query_file> \ --output-dir <path_to_rsults> results/ ``` --- ## Conclusions: > Creation an application ontology and semantic model to support knowledge discovery is an important process in the data management life cycle. This more advanced recipe has identified and described all the different steps that one needs to consider to build such a resource. While this is important, one should bear in mind the costs associated with maintaining those artefacts and keeping them up to date. It is therefore also critical to understand the benefits of contributing to existing open efforts. > #### What should I read next? > * [How to establish a minimal metadata profile?]() > * [How to submit/request terms to an ontology ?]() ___ ## References - R.C. Jackson, J.P. Balhoff, E. Douglass, N.L. Harris, C.J. Mungall, and J.A. Overton. [_ROBOT: A tool for automating ontology workflows_](https://link.springer.com/epdf/10.1186/s12859-019-3002-3?author_access_token=bB8BLjFWrdh42vR6DjT-nG_BpE1tBhCbnbw3BuzI2RPCZ2BK7EeexaCNYfT-cCz8Q_mrZomT2_svoQf12CW661Sagzw6JGF9DhJq3Q3fTPdMGFMtais7MRgx8-kDhp6uC9g2qcVh5FumTsveV22XVQ%3D%3D). BMC Bioinformatics, vol. 20, July 2019 - Arp, Robert, Barry Smith, and Andrew D. Spear. _Building ontologies with basic formal ontology_. Mit Press, 2015. ## Supplementary material - [Example Dataset](https://github.com/FAIRplus/the-fair-cookbook/blob/ontology_robot_recipe/docs/content/recipes/ontology-robot/ExternalStudiesKQ.xlsx) - [Competency questions](https://github.com/FAIRplus/the-fair-cookbook/blob/ontology_robot_recipe/docs/content/recipes/ontology-robot/competency_questions.md) - [Scripts and intermediate artifacts](https://github.com/FAIRplus/the-fair-cookbook/tree/ontology_robot_recipe/docs/content/recipes/ontology-robot) ## Authors | Name | Affiliation | orcid | CrediT role | | :------------- | :------------- | :------------- |:------------- | |Danielle Welter | LCSB, University of Luxembourg| [0000-0003-1058-2668](https://orcid.org/0000-0003-1058-2668) | Writing - Original Draft, Code | |Philippe Rocca-Serra| UOXF|[0000-0001-9853-5668](https://orcid.org/orcid.org/0000-0001-9853-5668)| Writing - Review, Code| |Fuqi Xu|EMBL-EBI|[0000-0002-5923-3859](https://orcid.org/orcid.org/0000-XXXX-XXXX-XXXX)|Writing - Draft, Review| |Emiliano Reynares| Boehringer Ingelheim|[0000-0002-5109-3716](https://orcid.org/0000-0002-5109-3716)|Writing - Review, Code | |Karsten Quast|Boehringer Ingelheim|[0000-0003-3922-5701](https://orcid.org/0000-0003-3922-5701)| Writing| ___ ## License: <a href="https://creativecommons.org/licenses/by/4.0/"><img src="https://mirrors.creativecommons.org/presskit/buttons/80x15/png/by-sa.png" height="20"/></a>

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully