RSH 015 internal: containers

# RSH 015 internal: containers ## Planning What do we want to convey to audience? * When to use a container ~~vs~~ and when to produce an reproducible environment using [conda, virtualenv, installable software, ...] <-- I would not put versus * How to build a container [pick some engine(s)] * Basics of how to create a container definition file ## Outline - what is a container? - recipe -> image -> container - maybe show the figure: https://journals.plos.org/ploscompbiol/article/figure/image?size=large&id=10.1371/journal.pcbi.1008316.g002 - how they differ from virtual machine ~~hypervisors~~ <-- may not be very relevant for our audience - Basic example - docker pull something - docker images - docker run - docker shell - inside the shell: cat /etc/os-release * Taxonomy of containers in science (what are the types of **use cases**) * simple code portability for a micro-task * workflows * whole development environment * "works on my computer" * transparency: documentation of dependencies * testing of dependencies in isolation * reproducibility * distributing data - not a use case * data cannot travel (too big, too sensitive), "computer" travels to the data - how we use containers - containers as abstraction and isolation - Lessons learned from "10 rules" to have: https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/NZ_Defence_Force_assistance_to_OP_Rena.jpg/1280px-NZ_Defence_Force_assistance_to_OP_Rena.jpg rather than https://upload.wikimedia.org/wikipedia/commons/thumb/7/77/Rena_ship_07.jpg/800px-Rena_ship_07.jpg - https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008316 - contains a very nice analogy for recipes, images, and containers - transparency/understandability vs performance (in space or time) - containers are built from text file recipes which are understandable for humans and computers - good for open-source software - use existing tools - repo2docker - build on top of existing images - official images - how to order layers - make sure you can inspect the recipe - let CI build the container from recipe instead of building it locally - use version-specific tags, avoid "latest" - format for clarity - document within the dockerfile - add comments - group related commands - add metadata - include usage instructions - specify software versions - pin versions - balance: specify in Dockerfile or in requirements.txt/environment.yml? - use version control - put Dockerfile into the project repo - mount datasets at runtime - make the image one-click runnable - define reasonable entrypoints and unsuprising default behavior - again: usage instructions - order the instructions - first those that change the least often - regularly use and rebuild containers - eat your own medicine: use the container for your work, not only at the end - singularity vs docker - use cases - how to use docker images in singularity - there can be a short demo (Radovan) - registries - dockerhub - quay - singluarity hub vs the other one - gitlab - github - zenodo - Risks with containers? - can invite to use practices which make it difficult to use a piece of software outside of a container