Singularity containerisation of SLURM orchestrators

GitHub link

https://github.com/frankier/blur_sing

Ice breaker question

Do you use containers in your work and if yes what containers?
- Singularity
- Podman
Do you think containers help with reproducibility?
- If there is no other way, yes. For me this is the last solution not the first
Do you use workflow management tools and if yes what?
Or distributed computing abstractions such as Ray?

About the series

This is the fourth event in the Nordic RSE seminar series.

Reminder about starting recording
Find out about future events:
- Check https://nordic-rse.org/events/seminar-series/.
- Previous seminar talks videos available at Youtube channel
- Follow @nordic_rse on Twitter for announcements
- Join the Nordic RSE stream of the CodeRefinery chat
Suggest speakers:
- on the Nordic RSE stream
- by creating an issue on the Nordic RSE website repository

About the Nordic RSE

Represents Research Software Engineers in the Nordics.
Check out nordic-rse.org for other activities.
Registereed as an association this fall.
- To become a member, fill in the membership form.

Speaker: Frankie Robertson

Doctoral Researcher in Educational Technology at the University of Jyväskylä
PhD project centers around using background knowledge including massive corpora and language models to model language learner knowledge and needs at an unprecedented level of detail
I like making data munging reproducible!

Short abstract

While SLURM itself provides tools for job orchestration like job arrays, high level tools like Snakemake and Ray are cluster agnostic and can either make use of SLURM or run on a laptop. To make Snakemake and Ray to run within Singularity, I present singreqrun, which works by requesting the host runs programs on behalf of the container.

The talk doubles as an introduction to Snakemake and Ray. After some brief background on the main tools (Singularity, SLURM, Snakemake and Ray), we proceed to shell code-along to run the following examples:

Snakemake for heterogeneous (mixture of CPU and GPU nodes) video corpus processing + quick porting across HPC clusters
Snakemake for text corpus processing including using extra Singularity containers for utilities
Ray for hyperparameter search

I end the talk by opening for discussion. Is this a good approach? Can we improve upon it?

If you would like to code along in a preprepared CSC environment during the talk, email username frrobert in the domain student.jyu.fi at least 24 hours before the talk with a CSC username if possible (e.g if you are at a Finnish university) or from an institutional (ideally university) email.

Long abstract

Singularity is a container platform for HPC. As well as addressing the security concerns of HPC administrators, its "convention over configuration" approach (e.g. binding the current working directory into the container by default) seems to dovetail well with needs of software development for HPC environments. In particular, it encourages writing software which can both be run in a container in a HPC environment and tested uncontainerised on a laptop for a faster hack-test-loop, as well as interoperating well with the typical SLURM + networked file system design of modern HPC clusters.

While SLURM provides some relatively high level tools for job orchestration like job arrays, there are also tools such as Snakemake and Ray which are cluster agnostic but can make use of SLURM (with slurm-profile and yaspi), or run limited to a single laptop. However, SLURM connector plugins typically work by running the SLURM utilities like squeue, which are only available on the host. While it is theoretically possibly to bind host executables and libraries into the container, this introduces strong library version requirement coupling between the container and the host. Therefore, I present singreqrun, a shim for requesting the host runs programs from within the container.

The talk begins with a quick roll call of the players: Singularity, SLURM, HPC, Snakemake and Ray.

In a live shell session (with all Python code preprepared), I then demonstrate the ways in which singreqrun can be used:

Snakemake for heterogeneous (mixture of CPU and GPU nodes) video corpus processing which can be ported across HPC clusters
Snakemake for text corpus processing including using extra Singularity containers for utilities
Ray for hyperparameter search

I end the talk by asking for comments. In particular, is this the right direction or a hack too far? Are there better ways to combine general purpose container orchestrators + Singularity + SLURM? The current implementation is quite hacky, and more like a proof-of-concept. If it is a good idea, how can we stabilise and improve upon this approach?

The talk expands on some ideas I give in a blog post: https://frankie.robertson.name/research/effective-cluster-computing/#use-monolithic

Ask your questions here

is this a question?
- yes, and this is an answer
There might be a license violation in what is shown?i.e downloading youtube videos. Thus the quetion, what are we allowed to distribute via containers
What does sankemake bering, that the docker file can not handle by its own.what are the part of the current screen that we must use snakemake?
Is it possible to run Julia codes that can be installed with Conda environment?
One quesiton I get during HPC basic teaching is what is a container actually and what is the defference of that compared to a module(not comparable, yes). How can we answer this whihtout using terms like "kernel" (a person asking the above question would have a hard time undestand what a kernal is)
- when I got started, to me it was pretty confusing the difference between a virtual machine and a container
Do you have any recommendations regarding HPC tutorials or documentations?

Singularity containerisation of SLURM orchestrators

GitHub link

Ice breaker question

About the series

About the Nordic RSE

Speaker: Frankie Robertson

Short abstract

Long abstract

Ask your questions here

Read more

Nordic-RSE Conference common notes

Nordic-RSE 25 communication plan

Nordic RSE 2025 Conference

Nordic-RSE 2025 career panel