owned this note
owned this note
Published
Linked with GitHub
---
title: 'WORKSHOP ON WORKFLOW PLATFORMS'
---
Workshop on Workflow Platforms
2-3 April 2020
===
![eosc-life](https://www.eosc-life.eu/wp-content/themes/eosc-life-v2/assets/images/eosclogo.png =200x)
## Table of Contents
[TOC]
## Logistics
:::info
Agenda
:::
https://docs.google.com/document/d/1Od0JzTduih7DIlaIoYS-MnfFUl1UVmy6DprqhBlqAV8
:::info
Zoom details
:::
> Download Zoom Client: https://zoom.us/download
> Zoom URL: https://zoom.us/j/931553045
> Meeting ID: 931 553 045
> Find your local number: https://zoom.us/u/aELms5eNV
Please make sure that you have __your name__ in Zoom
:::info
Feedback survey
:::
https://www.surveymonkey.co.uk/r/KLWJ285
:::info
Markdown cheatsheet
:::
https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
## Session I: Problems/Solutions of RI
> ECRIN [name=Sergey Goryanin]
* MetaData Repository (MDR): webportal to maximize discoverability of clinical data
* [Project Overview](http://ecrin-mdr.online/index.php/Project_Overview)
* [ppt 25.02.2020](https://www.corbel-project.eu/fileadmin/corbel/media/docs/Final_AGM_presentations/15_ECRIN_MDR_Gorianin.pdf)
* [zenodo](https://zenodo.org/record/3562911)
* At the moment: [json-schema](http://json-schema.org/); in the future mix with [bioschemas](https://bioschemas.org/) or [schema](https://schema.org/)
> ELIXIR [name=Laura Rodriguez-Navas, José María Fernández González]
* [ELIXIR Spain](https://elixir-europe.org/about-us/who-we-are/nodes/spain): [Barcelona Supercomputing Center - BSC](https://www.bsc.es/)
* [Workflow Hub](https://dev.workflowhub.eu/) by [SEEK](https://seek4science.org/) -- _'Created as part of the EOSC-Life WP2 Tools Collaboratory, the WorkflowHub is under development'_
* [Wetlab2Variations Workflow Demonstrator](https://dev.workflowhub.eu/workflows/34), local and HPC
* [OpenEbench](https://openebench.bsc.es/) -- _software that provides guidance and software infrastructure for the benchmarking and technical monitoring of bioinformatics tools, web servers and workflows. The work is part of Work Package 2 of the ELIXIR-EXCELERATE project._
* Sensitive Data:
? encryption and authorization level
? reproducibility
[EncFS - an Encrypted Filesystem](https://github.com/vgough/encfs)
> INFRAFRONTIER [name=Philipp Gormanns]
* Distributed infrastructure
* European Mouse Mutant Archive (EMMA)
* systemic phenotyping, IMPC (international mouse phenotyping consortium)
* imaging data, X-ray
* normal imaging transformation steps, NN, image segmentation, feature extraction
* Data annotation with MPO
* general-flow
* data provisioning and preparation
* processing
* ranking
* Visualisation
> INSTRUCT [name=Andrea Giachetti, Vincenzo Laveglia, Marco Fragai]
* [CWL workflow for NMR spectra Peak Picking](https://github.com/andreagia/CWL_dem1_NMR_Peak_Picking)
* integration with Galaxy
* Visualization in a cloud?
> MIRRI [name=Alexander Vasilenko]
* _Romano P, Smith D, Bunk B, Vasilenko A, Glöckner FO. 2017. Designing the MIRRI information system. PeerJ Preprints 5:e2815v1 https://doi.org/10.7287/peerj.preprints.2815v1_
* [mirri.org](https://www.mirri.org/) -- the pan-European Microbial Resource Research Infrastructure
* big data matrix
* inventory of microorganisms
* Paolo Romano, who is also part of MIRRI, mentioned different workflows under development (topics metagenomics, metabarcoding, identification of micro-organisms) and a Galaxy server that is built with the aim of supporting curation of strain related information.
## Session II: Workflow Management Systems (WMS)
> Introduction to EOSC-Life roadmap [name=Frederik Coppens]
* Clouds, pets vs. cattle
* automatic deployments with terraform, ansible etc
* Conda - Bioconda
* Containers
* Automated Continuous Integration
* Registries: containers, workflows
* Bioschemas.org adopted to describe tools and make them searchable, e.g. through google
---
> Galaxy [name=Björn Grüning, Frederik Coppens]
* Open Source - everything on GitHub
* Different front-ends can use the underlying API
* Main components are tools, "building blocks" for pipelines
* workflows can be created/run through the GUI but also providing a yaml file
* Notebooks & interactive tools: live.usegalaxy.eu
* Multi-users system
* Domain specific tools and instances
* [toolshed](https://galaxyproject.org/toolshed/)
> Nextflow [name=Alexander Peltzer]
* [Slides for talk](https://slides.com/apeltzer/2020-04-03-nf-core?token=zb3qWibl)
* Dataflow Model:
* automated
* parallelizable
* reliable
* easy for others to run
* reproducible results
* Easy to learn - fast prototyping
* Self-contained
* Docker, Singularity, Bioconda
* Executor abstraction: independent of the platform
* [nf-core](https://github.com/nf-core) community; [pipelines](https://nf-co.re/pipelines); [tower.nf](https://tower.nf/) in dev.
* Interactive reports
* Legacy code can be run within a workflow
* ToDo: integrate with [biocontainers](https://biocontainers.pro/#/)
* _Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. nf-core: Community curated bioinformatics pipelines. bioRxiv. 2019. p. 610741. doi: 10.1101/610741._
* Q: licensing? - can be retrieved using `nf-core licences nf-core/<pipelinename>` for example (relies on anaconda API)
```
nf-core licences nf-core/eager
Package Name Version Licence
----------------------- ---------- ---------------------------
pygments 2.5.2 BSD 2-clause
pandas 1.0.1 BSD 3-clause
gatk4 4.1.4.1 BSD-3-Clause
markdown 3.1.1 BSD-3-Clause
biopython 1.76 Biopython License Agreement
rename 1.601 GNU GPLv3
preseq 2.0.3 GPL
...
```
> Snakemake [name=Johannes Köster, presented by Björn Grüning]
* [koesterlab.github.io](https://koesterlab.github.io/)
* [snakemake.readthedocs.io](https://snakemake.readthedocs.io/en/stable/)
* some slides at [johanneskoester.bitbucket.org](johanneskoester.bitbucket.org)
* DAG of jobs
* comments:
* user e.g. of Snakemake workflow [ARMOR](https://www.g3journal.org/content/9/7/2089)
* user comment - check [Simple comparison of Snakemake, Nextflow and Cromwell/WDL](https://github.com/grst/snakemake_nextflow_wdl)
* From Harshil Patel to Everyone: 11:28 AM
_As Björn mentioned, Snakemake has a lower barrier for entry for users that are already experienced in Python.
You don't actually need to know any Java/Groovy to use Nextflow.
Nextflow seems to have better cloud support._
* From Harshil Patel to Everyone: 11:30 AM
_You also have to remember that both Snakemake and Nextflow will allow you to run bash, R, Perl scripts embedded within the workflow too._
## Session III: Breakouts
### Room 1 - Galaxy
__Participants__
1. + Björn Grüning
2. + Frederik Coppens
3. Laura del Cano
4. + Philipp G.
5. + Loraine Guéguen
7. + Paolo Romano
8. + Romain Dallet
9. + Jose Miguel Lopez
10. + Vincenzo Laveglia
11. + Marko Petek
12. + João Machado
13. + Sara Zullino
14. + Gianluca De Moro
15. + Laura Rodríguez-Navas
16. + Michele Maroni
17. + MajaZ ([NIB.SI](http://www.nib.si/eng/)) ~zagorGit
18. + Mireia Ferrer
19. + Maddalena Fratelli
20. + Evangelos Pafilis (hpc.hcmr.gr)
21. + Angelika Paluch
22. + Alexander Vasilenko
23. + Marc Portier
24. + Stelios Ninidakis
<b>Where to find tools?</b>
- [Toolshed](https://toolshed.g2.bx.psu.edu) is the main 'app store'
- If you want new tools: contact us
- Gitter channels (https://gitter.im/galaxyproject/Lobby/)
- Mailing list (https://galaxyproject.org/mailing-lists/)
- IUC maintains tools according to the Galaxy Best Practices
- stat/tool [stats.galaxyproject.eu
](stats.galaxyproject.eu)
- [usegalaxy.* tools](https://github.com/usegalaxy-eu/usegalaxy-eu-tools)
<b>How to integrate a Shiny app</b> (https://shiny.rstudio.com)
- Install the RShiny server in a Docker and make this into a Interactive tool
- Connect to an existing RShiny server (with a dedicated tool)
<b>Validation of Yaml format</b>
- [gxformat2](https://github.com/galaxyproject/gxformat2)
- FC: we might need to look into coupling this to the json validation work done in ELIXIR
<b>Back end</b>
- Kubernetes
- Singularity
- ...
<b>Computational resources distribution</b>
- administrator defined
<b>CWL in Galaxy?</b>
- focus on exporting 'Abstract CWL' (not executable): [CWL from Galaxy](https://github.com/ieguinoa/cwl-from-galaxy)
- project to export executable CWL, but the CWL file is not readable
- importing CWL in Galaxy is not possible at this point (and not high on the priority list)
<b>Reproducibility and Traceability</b>
record in the DB for every tool run
<b>Python library to interact with Galaxy</b>
https://bioblend.readthedocs.io/en/latest/
[BioBlend](https://github.com/galaxyproject/bioblend) -- a Python (3.5, 3.6, 3.7 and 3.8) library for interacting with Galaxy and CloudMan APIs
<b>Data transfer docs</b>
[ftp-upload](https://galaxyproject.org/ftp-upload/)
<b>Tool update</b>
- communities (proteomics, chemoinformatics, ..)
- if conda package -> automatic
<b>_Meta job scheduler_</b>
- HTcondor
<b>BioBlend API</b>: [link](https://crs4.github.io/Galaxy4Developers/lectures/10.bioblend_api/)
<b>Jupyter Notebook interaction with Galaxy</b>
- start tools/workflows from it
- e.g. for VMD: Galaxy as a back end, Notebook as a trigger
- [Use Jupyter notebooks in Galaxy tutorial](https://galaxyproject.github.io/training-material/topics/galaxy-ui/tutorials/galaxy-intro-jupyter/tutorial.html)
- do heavy data crunching outside jupyter
---
### Room 2 - Nextflow
__Participants__
1. + Alexander Peltzer
3. + Harshil Patel
4. Phil Ewels
5. + Gisela Gabernet
6. + Maxime Garcia
7. + Raphael Michna
8. + Sergei Gorianin
9. + ricardo.gonzalo
10. + Andrea Giachetti
11. + Tom Hancocks
12. + Gildas Le Corguillé
13. + Manoj Kumar Chinnasamy
14. Rahini PannierSelvan
15. + arnold.knijn
16. + Alex Sanchez
17. + José Mª Fernández
* Example of PBS executor usage on `nf-core/configs`:
* https://github.com/nf-core/configs/search?q=pbs&unscoped_q=pbs
* Nextflow caching and resuming workflows:
* https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html
## Session IV: Summary & Discussion
* __[Feedback Survey]__(https://www.surveymonkey.co.uk/r/KLWJ285)