changed 5 years ago
Linked with GitHub

Workshop on Workflow Platforms
2-3 April 2020

eosc-life

Table of Contents

Logistics

Agenda

https://docs.google.com/document/d/1Od0JzTduih7DIlaIoYS-MnfFUl1UVmy6DprqhBlqAV8

Zoom details

Download Zoom Client: https://zoom.us/download

Zoom URL: https://zoom.us/j/931553045

Meeting ID: 931 553 045

Find your local number: https://zoom.us/u/aELms5eNV

Please make sure that you have your name in Zoom

Feedback survey

https://www.surveymonkey.co.uk/r/KLWJ285

Markdown cheatsheet

https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet

Session I: Problems/Solutions of RI

ECRIN Sergey Goryanin

ELIXIR Laura Rodriguez-Navas, José María Fernández González

INFRAFRONTIER Philipp Gormanns

  • Distributed infrastructure
  • European Mouse Mutant Archive (EMMA)
  • systemic phenotyping, IMPC (international mouse phenotyping consortium)
  • imaging data, X-ray
  • normal imaging transformation steps, NN, image segmentation, feature extraction
  • Data annotation with MPO
  • general-flow
    • data provisioning and preparation
    • processing
    • ranking
    • Visualisation

INSTRUCT Andrea Giachetti, Vincenzo Laveglia, Marco Fragai

MIRRI Alexander Vasilenko

  • Romano P, Smith D, Bunk B, Vasilenko A, Glöckner FO. 2017. Designing the MIRRI information system. PeerJ Preprints 5:e2815v1 https://doi.org/10.7287/peerj.preprints.2815v1
  • mirri.org the pan-European Microbial Resource Research Infrastructure
  • big data matrix
  • inventory of microorganisms
  • Paolo Romano, who is also part of MIRRI, mentioned different workflows under development (topics metagenomics, metabarcoding, identification of micro-organisms) and a Galaxy server that is built with the aim of supporting curation of strain related information.

Session II: Workflow Management Systems (WMS)

Introduction to EOSC-Life roadmap Frederik Coppens

  • Clouds, pets vs. cattle
    • automatic deployments with terraform, ansible etc
  • Conda - Bioconda
  • Containers
  • Automated Continuous Integration
  • Registries: containers, workflows
  • Bioschemas.org adopted to describe tools and make them searchable, e.g. through google

Galaxy Björn Grüning, Frederik Coppens

  • Open Source - everything on GitHub
  • Different front-ends can use the underlying API
  • Main components are tools, "building blocks" for pipelines
  • workflows can be created/run through the GUI but also providing a yaml file
  • Notebooks & interactive tools: live.usegalaxy.eu
  • Multi-users system
  • Domain specific tools and instances
  • toolshed

Nextflow Alexander Peltzer

  • Slides for talk
  • Dataflow Model:
    • automated
    • parallelizable
    • reliable
    • easy for others to run
    • reproducible results
  • Easy to learn - fast prototyping
  • Self-contained
  • Docker, Singularity, Bioconda
  • Executor abstraction: independent of the platform
  • nf-core community; pipelines; tower.nf in dev.
  • Interactive reports
  • Legacy code can be run within a workflow
  • ToDo: integrate with biocontainers
  • Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. nf-core: Community curated bioinformatics pipelines. bioRxiv. 2019. p. 610741. doi: 10.1101/610741.
  • Q: licensing? - can be retrieved using nf-core licences nf-core/<pipelinename> for example (relies on anaconda API)
nf-core licences nf-core/eager 

Package Name             Version     Licence
-----------------------  ----------  ---------------------------
pygments                 2.5.2       BSD 2-clause
pandas                   1.0.1       BSD 3-clause
gatk4                    4.1.4.1     BSD-3-Clause
markdown                 3.1.1       BSD-3-Clause
biopython                1.76        Biopython License Agreement
rename                   1.601       GNU GPLv3
preseq                   2.0.3       GPL
...

Snakemake Johannes Köster, presented by Björn Grüning

Session III: Breakouts

Room 1 - Galaxy

Participants

    • Björn Grüning
    • Frederik Coppens
  1. Laura del Cano
    • Philipp G.
    • Loraine Guéguen
    • Paolo Romano
    • Romain Dallet
    • Jose Miguel Lopez
    • Vincenzo Laveglia
    • Marko Petek
    • João Machado
    • Sara Zullino
    • Gianluca De Moro
    • Laura Rodríguez-Navas
    • Michele Maroni
    • Mireia Ferrer
    • Maddalena Fratelli
    • Angelika Paluch
    • Alexander Vasilenko
    • Marc Portier
    • Stelios Ninidakis

Where to find tools?

How to integrate a Shiny app (https://shiny.rstudio.com)

  • Install the RShiny server in a Docker and make this into a Interactive tool
  • Connect to an existing RShiny server (with a dedicated tool)

Validation of Yaml format

  • gxformat2
  • FC: we might need to look into coupling this to the json validation work done in ELIXIR

Back end

  • Kubernetes
  • Singularity

Computational resources distribution

  • administrator defined

CWL in Galaxy?

  • focus on exporting 'Abstract CWL' (not executable): CWL from Galaxy
  • project to export executable CWL, but the CWL file is not readable
  • importing CWL in Galaxy is not possible at this point (and not high on the priority list)

Reproducibility and Traceability
record in the DB for every tool run

Python library to interact with Galaxy
https://bioblend.readthedocs.io/en/latest/
BioBlend a Python (3.5, 3.6, 3.7 and 3.8) library for interacting with Galaxy and CloudMan APIs

Data transfer docs
ftp-upload

Tool update

  • communities (proteomics, chemoinformatics, ..)
  • if conda package -> automatic

Meta job scheduler

  • HTcondor

BioBlend API: link

Jupyter Notebook interaction with Galaxy


Room 2 - Nextflow

Participants

    • Alexander Peltzer
    • Harshil Patel
  1. Phil Ewels
    • Gisela Gabernet
    • Maxime Garcia
    • Raphael Michna
    • Sergei Gorianin
    • ricardo.gonzalo
    • Andrea Giachetti
    • Tom Hancocks
    • Gildas Le Corguillé
    • Manoj Kumar Chinnasamy
  2. Rahini PannierSelvan
    • arnold.knijn
    • Alex Sanchez
    • José Mª Fernández

Session IV: Summary & Discussion

Select a repo