tags: seminar series
title: Singularity containerisation of SLURM orchestrators
- **Video connection details:**
- Zoom ID: 662 0907 5434
- Zoom password: rse
- Zoom invite link: https://uwasa.zoom.us/j/66209075434?pwd=VmRBaFRVOXNKNFRYb1NDRGY5SXZndz09
- **Date and time**: Tuesday, November 23rd 2021 13:00 CET
- **This page:** https://hackmd.io/@nordic-rse/singularity_containerization
# Singularity containerisation of SLURM orchestrators
## GitHub link
## Ice breaker question
- Do you use containers in your work and if yes what containers?
- Do you think containers help with reproducibility?
- If there is no other way, yes. For me this is the last solution not the first
- Do you use workflow management tools and if yes what?
- Or distributed computing abstractions such as Ray?
## About the series
This is the fourth event in the Nordic RSE seminar series.
* Reminder about starting recording
* Find out about future events:
* Check https://nordic-rse.org/events/seminar-series/.
* Previous seminar talks videos available at [Youtube channel](https://www.youtube.com/channel/UC8OyVrmJEuT2lrH7zXoBrhQ)
* Follow [@nordic_rse](https://twitter.com/nordic_rse) on Twitter for announcements
* Join the [Nordic RSE stream](https://coderefinery.zulipchat.com/#narrow/stream/213720-nordic-rse) of the CodeRefinery chat
* Suggest speakers:
* on the [Nordic RSE stream](https://coderefinery.zulipchat.com/#narrow/stream/213720-nordic-rse)
* by creating an issue on the [Nordic RSE website repository](https://github.com/nordic-rse/nordic-rse.github.io/issues)
## About the Nordic RSE
* Represents Research Software Engineers in the Nordics.
* Check out [nordic-rse.org](https://nordic-rse.org/) for other activities.
* Registereed as an association this fall.
* To become a member, fill in the [membership form](https://forms.gle/qCVVRGXPi3Hq7inW6).
## Speaker: Frankie Robertson
- Doctoral Researcher in Educational Technology at the University of Jyväskylä
- PhD project centers around using background knowledge including massive corpora and language models to model language learner knowledge and needs at an unprecedented level of detail
- I like making data munging reproducible!
## Short abstract
While [SLURM](https://slurm.schedmd.com) itself provides tools for job orchestration like job arrays, high level tools like [Snakemake](https://snakemake.github.io/) and [Ray](https://www.ray.io/) are cluster agnostic and can either make use of SLURM or run on a laptop. To make Snakemake and Ray to run within Singularity, I present [singreqrun](https://github.com/frankier/singreqrun), which works by requesting the host runs programs on behalf of the container.
The talk doubles as an introduction to Snakemake and Ray. After some brief background on the main tools (Singularity, SLURM, Snakemake and Ray), we proceed to shell code-along to run the following examples:
* Snakemake for heterogeneous (mixture of CPU and GPU nodes) video corpus processing + quick porting across HPC clusters
* Snakemake for text corpus processing including using extra Singularity containers for utilities
* Ray for hyperparameter search
I end the talk by opening for discussion. Is this a good approach? Can we improve upon it?
If you would like to code along in a preprepared CSC environment during the talk, email username frrobert in the domain student.jyu.fi at least 24 hours before the talk with a CSC username if possible (e.g if you are at a Finnish university) or from an institutional (ideally university) email.
## Long abstract
Singularity is a container platform for HPC. As well as addressing the security concerns of HPC administrators, its "convention over configuration" approach (e.g. binding the current working directory into the container by default) seems to dovetail well with needs of software development for HPC environments. In particular, it encourages writing software which can both be run in a container in a HPC environment and tested uncontainerised on a laptop for a faster hack-test-loop, as well as interoperating well with the typical SLURM + networked file system design of modern HPC clusters.
While SLURM provides some relatively high level tools for job orchestration like job arrays, there are also tools such as Snakemake and Ray which are cluster agnostic but can make use of SLURM (with slurm-profile and yaspi), or run limited to a single laptop. However, SLURM connector plugins typically work by running the SLURM utilities like squeue, which are only available on the host. While it is theoretically possibly to bind host executables and libraries into the container, this introduces strong library version requirement coupling between the container and the host. Therefore, I present singreqrun, a shim for requesting the host runs programs from within the container.
The talk begins with a quick roll call of the players: Singularity, SLURM, HPC, Snakemake and Ray.
In a live shell session (with all Python code preprepared), I then demonstrate the ways in which singreqrun can be used:
Snakemake for heterogeneous (mixture of CPU and GPU nodes) video corpus processing which can be ported across HPC clusters
Snakemake for text corpus processing including using extra Singularity containers for utilities
Ray for hyperparameter search
I end the talk by asking for comments. In particular, is this the right direction or a hack too far? Are there better ways to combine general purpose container orchestrators + Singularity + SLURM? The current implementation is quite hacky, and more like a proof-of-concept. If it is a good idea, how can we stabilise and improve upon this approach?
The talk expands on some ideas I give in a blog post: https://frankie.robertson.name/research/effective-cluster-computing/#use-monolithic
## Ask your questions here
- is this a question?
- yes, and this is an answer
- There might be a license violation in what is shown?i.e downloading youtube videos. Thus the quetion, what are we allowed to distribute via containers
- What does sankemake bering, that the docker file can not handle by its own.what are the part of the current screen that we must use snakemake?
- Is it possible to run Julia codes that can be installed with Conda environment?
- One quesiton I get during HPC basic teaching is what is a container actually and what is the defference of that compared to a module(not comparable, yes). How can we answer this whihtout using terms like "kernel" (a person asking the above question would have a hard time undestand what a kernal is)
- when I got started, to me it was pretty confusing the difference between a virtual machine and a container
- Do you have any recommendations regarding HPC tutorials or documentations?