# What I acheived since I joined IFB
I joined the IFB as an Open Data Developer to support the scientific community in its efforts to develop bioinformatics resources that are more easily discoverable on the web and reusable (FAIR principles).
The consistent sharing of bioinformatics data and tools requires the adoption of interoperable metadata standards such as Bioschemas (http://Bioschemas.org)[^1]. This will allow to automatically index these resources and provide search engines (e.g. [Bio.tools](https://bio.tools/), [OrphaNET](https://www.orpha.net/consor/cgi-bin/index.php), Galaxy, [Pippa](https://pippa.psb.ugent.be/), [CorkoakDB](https://corkoakdb.org/gene/), etc) that value ontologies from the scientific community.
[^1]: Bioschemas is an international initiative that aims to facilitate the discoverability on the web of digital resources in life sciences such as databases, analysis software, training material, but also biological entities such as genes or proteins.
These where the tasks appearing in my job desciption which I was able to acheive most:
1. Developing a software tool for Bioschemas metadata extraction and assembly of open and queryable datasets following semantic web standards.
1. Propose a query catalog combining Bioschemas metadata of different nature (e.g. biological entities, training material, algorithms and workflows, etc.).
1. Implement the Bioschemas metadata standards within the FAIDARE, Orphanet and Bio.Tools tools in close collaboration with the development teams.
1. Participate in the evolution of these standards.
1. Design, develop and extend automatic metadata harvesting tools (crawlers).
1. Propose queries combining several types of metadata for the IFB communities.
1. Document the source code and queries developed.
1. Propose training material and participate in the organization of training sessions on annotation and indexing of web resources in life sciences.
1. Participate in the life of the open source project Bioschemas (website, GitHub) and in the development of its standards in connection with Schema.org, in the framework of the European infrastructure Elixir.
---
**Technical environment:** Python, Flask, Jekyll, GraphDB, Github Actions, Bluma, Black, unittest, Sparql - chrome driver - RDF - Conda Envirenment
---
Concretely speaking, these are all the achievements in order (For better visualisation, please consult this [gant](https://hackmd.io/eF1duWjUTCydl25OgNlzJA)):
1. First, I communicated the data producing communities in France to have a visualization on the data flow. The outcome of this conversation was to understand the progress of the annotation process of this ressources and to understand the needs from a scrapper or an dump.
Some of them has even provided me with sitemaps that I started investigating.
2. Second, I wrote some nootbooks to scrape these resources and save them in RDF format in GraphDB
**Github Src:**
**Execution Screenshots:**
3. Third, I developed a web application ( Bioschemas dashboard) to visualise the interest of annotating the resources, and to show some statistics. The steps are the following:
* Define the specifications
* Choose the technical stack (Python, Flask for the back and Bluma for the front)
* Develop the interfaces
* Development of the API
* Unit testing
* Code cleaning, PSF/Black (python script formatter)
* Optimization of Sparql queries
**Github Src:**
**Execution Screenshots:**
4. The RDF dumps that feed the dashboard are the following:
* Orphanet.ttl
* Workflow_Hub.ttl
* Pippa.ttl
* CorkoakDB.ttl
* Bio_tools.ttl
5. [FAIR-Checker](https://fair-checker.france-bioinformatique.fr/): I added a command line tool to the application. And a [notebook](https://github.com/IFB-ElixirFr/FAIR-checker/blob/Scrapper_CMD_Tool/notebooks/CMD_FC_Demo_Notebook.ipynb) to explain the role of each command.
**Github Src:** https://github.com/IFB-ElixirFr/FAIR-checker/blob/Scrapper_CMD_Tool
**Usage examples:**
```
python cli.py --evaluate --urls http://bio.tools/bwa
python cli.py --validate-bioschemas --url http://bio.tools/bwa
python cli.py --extract-metadata --urls http://bio.tools/bwa -o metadata_dump
python cli.py --extract-metadata --url-collection input_urls.txt
```
**Execution Screenshots:**
6. I wrote a workflow to render Bioschemas profiles (DDE -> Bioschemas website) and it has been approved and passed in to production during the biohackathon paris
The goal of this work is to enable automated rendering of Bioschemas profiles on the website as soon as their JSON-LD serialization on GitHub is updated.
**The GitHub Action Code:**
**The documentations:**
* [Profile Auto-Generation](https://hackmd.io/iA6B7MKRQ4aaWjkLN7QCDQ)
* [Bioschemas Profile Rendering Documentation](https://hackmd.io/nGcGVXgqSJiVjI7bO9Rlhg)
* [Steps to Render a Bioschemas Profile on the Bioschema Website](https://hackmd.io/zGOAxx-BRfi4rDiaW9Rk4Q)
* [Profiles Rendering - Documentation](https://hackmd.io/sxerRv3FSJOTk3sYxBxu6g)
* [HTML Bioschemas Profiles Rendering](https://hackmd.io/Q1hH-pebQOa_gpe-iqAdIw)
7. As part of exchanging knowledge Lucie Lamothe and I had organized a EDAM Bioschemas workshop in Nantes. ([Workshop notes](https://hackmd.io/ckn2ZCjGQ66jdlAyQOMf5w))
Part of my job also was to participate in the community ceremonies:
1. Tour de gaul - Marseille, Mai 2022
3. Institue de Thorax annual meeting - Pornic, Juin 2022
5. JOBIM - Rennes, Juillet 2022
2. Biohackathon - Paris, November 2022