owned this note
owned this note
Published
Linked with GitHub
# Containers materials
Contributors: Alexander Botzki, Marko Vidak, Mateusz Kuzak, Geert van Geest, Pedro Fernandes
## Persona 1
### Learner's profiles
- Who are they?
- Researcher / RSE who wants to reuse others analysis / pipeline / software which has been containerised.
- people that have recently discovered that they have to work more reproducibly (aiming at automation, mass processing)
- people realise that others cannot reproduce their analysis (installation, dependencies, testing, example)
- Tweak existing container recipes
<!--
- mybinder / Jupyter / docker file
- go from images to dockers
- people have heard of it and what to use it because they want to use pipeline
- people who want to reuse others’ software which have been containerized.
-->
- What challenges are they facing?
- Hard to understand the concept of docker vs conda environments
- They don’t know how to use or run containers, how to troubleshoot issues with running them.
- They need to develop Docker images locally
- Use singularity for deployment - HPC - command line / production pipeline
<!--
- The incomplete documentation of containers they would like to use.
- aiming at automation for mass processing
-->
- How will the lesson/workshop help them?
- Being able to run a basic container from a registry
- Understand that containers do not solve the problem of correctness of analyses
- Create isolated environment / share image after tinkering a live images
- Understand documentation of docker recipes
### Learner's objectives
- Prerequisites for the course/tutorial
- basic command line experience
- Understand concepts of ‘system admin, installation’, depending on the background
- Define exercises with learning outcomes (measurable)*
- check whether docker is properly setup (docker run hello-world)
- Participants are able to check whether their environment is running
- running a container with docker run, outcomes:
- Participant can run a container from dockerhub
- tagging
- run a command
- process listing
- pruning/cleaning
- attached + interactive mode (-it)
- detached mode
- running web services, mapping ports
- volumes, bind-mounts
- parameters (working dir, user)
- dockerfiles
- handling permissions
*Goals: “After following one of these tutorials, learners will be able to …” - Blooms taxonomy:
### Training materials
Outline:
- intermediate example, not covered by the Carpentries lesson: running a Jupyter notebook
- Find a image from docker hub containing bwa
(ex: https://hub.docker.com/r/biocontainers/bwa)
- Exercise: Align reads in file `x.fastq.gz` to reference genome `y.fa` using this container with the latest BWA version from dockerhub, write the alignments to `aln-pe.sam` file: `bwa mem ref.fa read1.fq read2.fq > aln-pe.sam`
- in analogy :
`docker run --rm --name fastqc_albot -u="$(id -u):$(id -g)" -w="/data/" -v ~/workshop-janssen/data/:/data quay.io/biocontainers/fastqc:0.11.9--0 /bin/bash -c "fastqc WT*.fq.gz" `
## Persona 2
- Who are they?
- Researcher / RSE who wants to make their analysis / pipeline / software more reusable and reproducible.
- People that want to create Docker containers or Singularity images
- What challenges are they facing?
- They have challenges to understand containers conceptually.
- Current documentation from Docker/Singularity is not for beginners
- How will the lesson/workshop help them?
- Make sure their documentation is clear and complete.
- Give a template on which they can build further
### Learner's objectives
- Prerequisites for the course/tutorial
- knowledge on command line in UNIX
- know how to change permission/ working directory
- package managers
- use git
- account on docker hub
- understanding a Python script
- Define exercises with learning outcomes (measurable)*
- fetch script from git repo
- write a Dockerfile which will create an image with a specific version of [tool] build from source with resolved dependencies, other can run this as command line tool with the parameters provided on CLI
- put it on Docker Hub
- requires specific python dependencies
- default command and entrypoint
*Goals: “After following one of these tutorials, learners will be able to …” - Blooms taxonomy:
### Training materials
Use [Carpentries Docker lesson](https://carpentries-incubator.github.io/docker-introduction/) as a starting point
```sh
docker run \
--rm \
--name fastqc0119 \
-u="$(id -u):$(id -g)" \
-w="/data/" \
--mount type=bind,source=~/workshop-janssen/data/,target=/data \
quay.io/biocontainers/fastqc:0.11.9--0 \
fastqc WT1.fq.gz
```
Outline: executing bioinformatics tool on local file
- exercise
- download example files from s3 or anywhere else
- specific to Linux / MacOS behaves differently
- `docker run --rm -u="$(id -u)" quay.io/biocontainers/fastqc:0.11.9--0 touch examplefile`
- default owner of Linux is 'root'
- explain user and group ID
- see [Geert's course](https://sib-swiss.github.io/containers-introduction-training/course_material/managing_docker/#mounting-a-directory)
- introduce user option
- `docker run --rm -u="$(id -u)" quay.io/biocontainers/fastqc:0.11.9--0 touch file1`
- `docker run --rm -u="$(id -u):$(id -g)" quay.io/biocontainers/fastqc:0.11.9--0 touch file2`
- learn how to know where the working directory of the image is
- default is '/'
- `docker inspect quay.io/biocontainers/fastqc:0.11.9--0 | grep 'WorkingDir'`
- or `docker run --rm -it quay.io/biocontainers/fastqc:0.11.9--0`
- ` inside the container: pwd'`
- https://docs.docker.com/engine/reference/run
- comment on entrypoint/cmd
- introduce working directory option -w
- we recommand to use full command e.g. `fastqc W1.fq.qz`
Outline: executing bioinformatics tool on local files
- Dockerfile: alpine, bwa, resolve dependencies, ...
- two person exercise:
- image on docker hub
- produce a visualisation which is exported to png
- write a Dockerfile which will create an image with a specific version of [tool] build from source with resolved dependencies, other can run this as command line tool with the parameters provided on CLI
- put it on Docker Hub
- requires specific python dependencies
- default command and entrypoint
Outline: containerize something, container Dockerfile
- exercises
- fetch script from git repo
- write a Dockerfile which will create an image with a specific version of [tool] build from source with resolved dependencies, other can run this as command line tool with the parameters provided on CLI
- put it on Docker Hub
- requires specific python dependencies
- default command and entrypoint
- Dockerfile: alpine, bwa, resolve dependencies, ...
- two person exercise:
- image on docker hub
- produce a visualisation which is exported to png
## Notes
- need a short section on other container registries: biocontainers, quay.io (supports podman and rkt) etc. Also to mention [open containers initiative](https://opencontainers.org/)
- Discussion about reproducibility, tagging and export of images.
## Contribution notes
This Markdown-file can be collaboratively and simultaneously edited from [this link](https://hackmd.io/@tmuylder/H1CSvMFZ_/edit).
Please push to the branch `containers` in order not to make conflicting versions. This is automatically done when you push the Hackmd-file to GitHub (click on Settings... right top --> Versions and GitHub Sync --> Push). However, might only be done by @tmuylder.