<style>
.reveal {
font-size: 18px;
}
.reveal pre {
font-size: 20px;
}
.reveal section p {
text-align: left;
font-size: 18px;
vertical-align: top;
}
.reveal section figcaption {
text-align: center;
font-size: 20px;
line-height: 1.2em;
vertical-align: top;
}
.reveal section h1 {
font-size: 26pxem;
vertical-align: top;
}
.reveal section h2 {
font-size: 24px;
line-height: 1.2em;
vertical-align: top;
}
.reveal section h3 {
font-size: 22px;
line-height: 1.2em;
vertical-align: top;
}
.reveal ul {
display: block;
}
.reveal ol {
display: block;
}
</style>
# Get Your Brain Straight 2022
<img src="https://github.com/InsightSoftwareConsortium/GetYourBrainStraight/blob/main/HCK01_2022_Virtual/logos/banner.png?raw=true" width="50%" />
<center>
Ivan Cao-Berg<br>
Pittsburgh Supercomputing Center<br>
Carnegie Mellon University
</center>
---
## Before we begin
- :warning: Have an issue? Send an email to the Help Desk `bil-support@psc.edu`
- :computer: Where can I find the hackathon data? The data is located in `/bil/data/hackathon/2022_GYBS/` and is also available over HTTPS [here](https://download.brainimagelibrary.org/hackathon/2022_GYBS/).
- Where do I save my output? You can save your output in `/bil/data/hackathon/2022_GYBS/output/`.
- There is no quota on your home directory, however be mindful when using this space. Same applies for the shared space in the hackathon folder.
- The home directories and data folders are
- Where can I find the docs? You can find the BIL Hackathon documentation [here](https://hackmd.io/@biomed-apps/B1B8mQCb5).
---
## Resources available during this hackathon
* The virtual machine `workshop.brainimagelibrary.org` with 2.5TB, 56 cores and RTX8000 GPU with 4608 cores and 48GB memory.
* The virtual machine `workshop2.brainimagelibrary.org` with 1.5TB, 144 cores (hyperthreaded) and 2 [NVIDIA V100](https://www.nvidia.com/en-us/data-center/v100/) GPUs, each with 5120 cores and 32GB memory, coupled with NVLink.
* 8 large-memory compute nodes that can be accessed using SLURM from within the virtual machine in a partition named `compute`.
* Can connect to the `workshop` and `workshop2` VMs over `ssh` or using `X2Go`.
* You can access the large-memory compute nodes from the `workshop` VM (not `workshop2`).
---
### Connecting to the `workshop` VM using X2Go
<figure>
<img src="https://i.imgur.com/0ItmP6u.png" width="65%" />
<figcaption>Figure 1. X2Go client.</figcaption></figure>
Instructions on how to install X2Go can be found [here](https://hackmd.io/@biomed-apps/B1B8mQCb5).
---
### Connecting to the `workshop` VM using X2Go
<figure>
<img src="https://i.imgur.com/wUSiAUu.png" width="65%" />
<figcaption>Figure 2. My desktop on the workshop VM.</figcaption></figure>
Instructions on how to install x2go can be found [here](https://hackmd.io/@biomed-apps/B1B8mQCb5).
---
### Connecting to the `workshop` VM using a Terminal
<figure>
<img src="https://i.imgur.com/0PyrOv2.png" width="65%" />
<figcaption>Figure 3. You can connect to the VMs from anywhere.</figcaption></figure>
Detailed instructions on how to connect using Terminal can be found [here](https://hackmd.io/@biomed-apps/B1B8mQCb5).
---
### Connecting to the `workshop` VM using a Terminal
<figure>
<img src="https://i.imgur.com/ouyVDxu.png" width="70%" />
<figcaption>Figure 4. You can use a Terminal client to connect as well.</figcaption></figure>
Detailed instructions on how to connect using Terminal can be found [here](https://hackmd.io/@biomed-apps/B1B8mQCb5).
---
### Connecting to the `workshop` VM using a Terminal
:::info
:bulb: Useful Tips and Tricks
* In MacOSX you can download and use [iTerm2](https://iterm2.com/) or [Terminal](https://support.apple.com/guide/terminal/welcome/mac).
* There are many clients for Windows. Many users prefer [PuTTY](https://www.putty.org/).
* Most if not all Linux distros come with a Terminal/terminal client, however you might need to install an SSH-client. Follow the instructions for your Linux distro.
:::
---
<img src="https://lmod.readthedocs.io/en/latest/_static/Lmod-4color@2x.png" width="25%" />
Environment Modules provide a convenient way to dynamically change the users' environment through modulefiles.
```bash=
module avail #to list all available modules
module load <module-name> #to load module <module-name>
module unload <module-name> #to unload module <module-name>
```
These are the most commonly used options you will be more than likely be using. Full documentation on how to use LMOD can be found [here](https://lmod.readthedocs.io/en/latest/index.html).
---
### LMOD - Example 1 - List all available modules
```bash=
module avail
------------------------- /bil/modulefiles -------------------------
ANARI-SDK/anari-sdk gnu_parallel/20210522
ITK/5.2.1 gotop/3.3.0
ImageMagick/7.1.0 graphviz/2.44.0
ImageMagick/7.1.0-2 (D) htslib/1.9
R/3.5.1 ilastik/1.3.3
R/3.6.3 (D) imagej-fiji/1.52p
Rust/1.58.1 itksnap/3.8.0
Scala/2.13.5 java/jdk8u201
TeraStitcher/1.10.18 java/jdk8u211
VisRTX/0.2.0 java/jdk8u241 (D)
anaconda/3.2019.7 julia/1.0.5
anaconda3/4.9.2 knime/4.3.2
anaconda3/4.11.0 (D) lazygit/0.22.9
aspera/3.9.6 matlab/2019a
aws-cli/2.4.17 matlab/2021a (D)
bcftools/1.9 md5deep/4.4
bioformats/6.0.1 ncdu/1.16
bioformats/6.1.1 nextflow/21.10.6
--More--
```
---
### LMOD - Example 2 - Module load `Matlab 2021a`
```bash=
module load matlab/2021a
matlab -nodesktop -nosplash
MATLAB is selecting SOFTWARE OPENGL rendering.
< M A T L A B (R) >
Copyright 1984-2021 The MathWorks, Inc.
R2021a Update 5 (9.10.0.1739362) 64-bit (glnxa64)
August 9, 2021
To get started, type doc.
For product information, visit www.mathworks.com.
>>
```
Every user needs to request access to Matlab. To request access, click [here](https://www.psc.edu/resources/software/matlab/permission-form/).
---
### LMOD - Example 3 - Module load `Anaconda3`
```bash=
module load anaconda3
ipython
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.29.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
```
---
### LMOD
:::info
:bulb: Useful Tips and Tricks
When building scripts, add as many calls to LMOD as needed.
For example,
```bash=
#!/bin/bash
module load bioformats/6.8.0
module load bioformats2raw/0.3.0
module load raw2ometiff/0.3.0
...
```
loads Bio-Formats as well as some other Glencoe tools.
:::
---
### In a nutshell :chestnut:
- `LMOD` is used to load software in the `workshop` VM and the L-nodes.
---
<img align="left" src="https://slurm.schedmd.com/slurm_logo.png" width="15%"/>
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
```bash=
sinfo #view information about Slurm nodes and partitions
squeue #view information about jobs located in the Slurm scheduling queue
scontrol #view or modify Slurm configuration and state
sbatch #submit a batch script to Slurm
```
The commands above are the most common commands you might be using for this hackathon. For full documentation about SLURM, click [here](https://slurm.schedmd.com/documentation.html).
---
### `sinfo` - Example 1
```bash=
sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
compute* up 2-00:00:00 1 drain l008
compute* up 2-00:00:00 7 idle l[001-007]
```
As a participant of this hackathon, you should have access to the partition `compute` using the reservation `hackathon`.
---
### `squeue` - Example 1
Use `squeue -u $(whoami)` to list your jobs and their status
```bash=
squeue -u $(whoami)
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
14243 compute script.s icaoberg R 15:34
```
---
### `sbatch` - Example 1
Consider the following file named `script.sh`
```bash=
cat script.sh
#!/bin/bash
module load anaconda3
pip install --user cowsay
cowsay "Hello, World"
```
`sbatch` is used to submit jobs to the scheduler. For more info on `sbatch`, click [here](https://slurm.schedmd.com/sbatch.html).
---
### `sbatch` - Example 1 (cont.)
:::info
:bulb: Remember to use the reservation `hackathon` when submitting a job to the scheduler.
:::
```bash
sbatch -p compute -A tra220018p --reservation=hackathon script.sh
Submitted batch job 82721
```
For more info on `sbatch`, click [here](https://slurm.schedmd.com/sbatch.html).
---
### `sbatch` - Example 1 (cont.)
If you do not specify an output filename, the scheduler will create a file automatically. In this example `slurm-82721.out`
```bash
cat slurm-82721.out
____________
| Hello, World |
============
\
\
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||
```
For more info on `sbatch`, click [here](https://slurm.schedmd.com/sbatch.html).
---
### `sbatch` - Example 2
```bash
sbatch -p compute -N1 script.sh #number of nodes - please avoid using!
sbatch -p compute -n1 script.sh #number of cores
sbatch -p compute --mem=64Gb script.sh #memory
sbatch -p compute -N1 -n10 --mem=128Gb script.sh #combine as needed
```
For more info on `sbatch`, click [here](https://slurm.schedmd.com/sbatch.html).
---
### `scancel` - Example 1
```bash
scancel -u $(whoami) #cancel all my jobs
scancel -u <username> #cancel username's jobs
scancel 1234 #cancel job 1234
```
For more info on `scancel`, click [here](https://slurm.schedmd.com/scancel.html).
---
## In a nutshell :chestnut:
- `LMOD` is used to load software in the `workshop` VM and the L-nodes.
- `SLURM` is used to submit jobs to the scheduler managing the large-memory nodes.
---
## `interact`
The interact command is an in-house script for starting interactive sessions.
```bash
> interact -h
Usage: interact [OPTIONS]
-d Turn on debugging information
--debug
--noconfig Do not process config files
-gpu Allocate 1 gpu in the GPU-shared partition
--gpu
--gres=<list> Specifies a comma delimited list of generic
consumable resources. e.g.: --gres=gpu:1
--mem=<MB> Real memory required per node in MegaBytes
...
```
---
### `interact` (cont.)
:::info
:bulb: Useful Tips and Tricks
- `interact` is a wrapper built in house.
- Use `interact` and avoid using `salloc` or `srun` on BIL hardware.
- The template is
```bash
interact -A tra220018p -p compute -R hackathon -n <number-of-cores> --mem=<memory>
```
- Remember to specify the account and reservation when using `interact`
- Account: `tra220018p`
- Reservation: `hackathon`
:::
---
### In a nutshell :chestnut:
- `LMOD` is used to load software in the `workshop` VM and the L-nodes.
- `SLURM` is used to submit jobs to the scheduler managing the large-memory nodes.
- `interact` is used to start interactive sessions on the large-memory nodes.
---
<img src="https://apptainer.org/static/hero-2a27cfd36994146df5eeb86652ea2e1d.png" width="25%" />
Singularity's command line interface, `singularity` allows you to build and interact with containers.
- [Apptainer/Singularity](https://apptainer.org/) is the most widely used container system for HPC.
- On Brain Image Library hardware, you can only build containers remotely, not locally. For details intructions, click [here](https://hackmd.io/@biomed-apps/B1B8mQCb5).
- The recipe used to build containers is often referred to as a `Singularity definition file`.
- The container file on disk is often referred to as `Singularity image file`.
---
### Let's look at a definition file
The Singularity definition file includes similar instructions to installing the software in your local system. Generally speaking, just follow the developers' instructions.
```bash=
Bootstrap: docker
From: debian:stretch
%environment
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
%post
apt update
apt install -y libblosc1 wget unzip openjdk-8-jdk
cd /opt/
wget -nc https://github.com/glencoesoftware/bioformats2raw/releases/download/v0.3.0/bioformats2raw-0.3.0.zip
unzip bioformats2raw-0.3.0.zip && rm -f bioformats2raw-0.3.0.zip
ln -s /opt/bioformats2raw-0.3.0/bin/bioformats2raw /usr/local/bin/bioformats2raw
apt remove -y wget unzip
apt clean
%runscript
/usr/local/bin/bioformats2raw
```
If you want to see the repository with all the recipes, then click [here](https://github.com/pscedu/singularity-bioformats2raw).
---
### Let's look at a definition file (cont.)
```bash=1
Bootstrap: docker
From: debian:stretch
```
- `Bootstrap` determines the bootstrap agent that will be used to create the base operating system you want to use.
- In this example, I am pulling a [Debian](https://www.debian.org/) image from DockerHub.
- For reference, you could have use an [Ubuntu](https://hub.docker.com/_/ubuntu) image and it would have worked too
```bash=1
Bootstrap: docker
From: ubuntu:18.04
```
---
### Let's look at a definition file (cont.)
```bash=4
%environment
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
```
- The `environment` section lets the user define enviroment variabled needed by the containerized app to run properly.
- In this example, I had to set the variable `JAVA_HOME`.
---
### Let's look at a definition file (cont.)
```bash=7
%post
apt update
apt install -y libblosc1 wget unzip openjdk-8-jdk
cd /opt/
wget -nc https://github.com/glencoesoftware/bioformats2raw/releases/download/v0.3.0/bioformats2raw-0.3.0.zip
unzip bioformats2raw-0.3.0.zip && rm -f bioformats2raw-0.3.0.zip
ln -s /opt/bioformats2raw-0.3.0/bin/bioformats2raw /usr/local/bin/bioformats2raw
apt remove -y wget unzip
apt clean
```
- The `post` section is where you can install and configure your container.
- In the example above I am putting the binary `bioformats2raw` in `/opt` and then soft-linking to `/usr/local/bin/`.
---
### Let's look at a definition file (cont.)
```bash=17
%runscript
/usr/local/bin/bioformats2raw
```
- The `runscript` defines the binary/script to be run when the container is invoked.
- If you are familiar with Docker, this section is similar to the `ENTRYPOINT` section in a Dockerfile.
---
### Building the Singularity image file
For simplicity, this `Singularity` definition file is available on the GitHub
```bash=
git clone git@github.com:pscedu/singularity-bioformats2raw.git
cd singularity-bioformats2raw/3.0.0/
singularity build --remote bioformats2raw.sif Singularity
```
---
### Building the Singularity image file (cont.)
Running the previous command should build the image remotely
```bash=3
singularity build --remote bioformats2raw.sif Singularity
INFO: Remote "cloud.sylabs.io" added.
INFO: Access Token Verified!
INFO: Token stored in /root/.singularity/remote.yaml
INFO: Remote "cloud.sylabs.io" now in use.
INFO: Starting build...
Getting image source signatures
Copying blob sha256:0030cc4ce25ce472fe488839def15ec8f2227bb916461b518cf534073c019a86
Copying config sha256:d8d0f98475c05ca0009ed1c2c4bad86f243ef7f80788fad7f9d6dc0c9ca58d03
Writing manifest to image destination
...
INFO: Adding labels
INFO: Creating SIF file...
INFO: Build complete: /tmp/image-2973854208
WARNING: Skipping container verification
INFO: Uploading 377360384 bytes
INFO: Build complete: bioformats2raw.sif
```
---
### Building the Singularity image file (cont.)
If successful, then the command will build and download the image from SyLabs.io.
```bash
ls -lta *.sif
-rwxr-xr-x 1 icaoberg pscstaff 377360384 Apr 2 02:10 bioformats2raw.sif
```
---
### Building the Singularity image file (cont.)
To test this simple container, let's invoke the `help` section.
```bash=1
singularity exec -B /bil/ bioformats2raw.sif bioformats2raw --help
```
---
### Building the Singularity image file (cont.)
:thumbsup: It works!
```bash=1
singularity exec -B /bil/ bioformats2raw.sif bioformats2raw --help
Missing required parameters: '<inputPath>', '<outputLocation>'
Usage: <main class> [-p] [--no-hcs] [--[no-]nested] [--no-ome-meta-export]
[--no-root-group] [--overwrite]
[--use-existing-resolutions] [--version] [--debug
[=<logLevel>]] [--extra-readers[=<extraReaders>[,
<extraReaders>...]]]... [--options[=<readerOptions>[,
<readerOptions>...]]]... [-s[=<seriesList>[,
<seriesList>...]]]...
...
-w, --tile_width=<tileWidth>
Maximum tile width to read (default: 1024)
-z, --chunk_depth=<chunkDepth>
Maximum chunk depth to read (default: 1)
-p, --progress Print progress bars during conversion
--version Print version information and exit
```
---
### :bulb: What if an image exists in DockerHub?
![](https://i.imgur.com/2kmfYfX.png)
---
### You can pull it directly from DockerHub!
```bash=
# use this command to pull images from DockerHub and convert them
# to Singularity image files
singularity pull docker://openmicroscopy/bioformats2raw:0.4.0
```
:::warning
:warning: Be careful downloading or pulling random containers from the cloud. Only do so from trusted organizations, e.g. [PSC](https://github.com/pscedu/singularity), or trusted or official collaborators/companies.
:::
---
### You can pull it directly from DockerHub! (cont.)
```bash=
# Version 0.4.0 is released!
singularity exec -B /bil bioformats2raw_0.4.0.sif /opt/bioformats2raw/bin/bioformats2raw --help
Missing required parameters: '<inputPath>', '<outputLocation>'
Usage: <main class> [-p] [--no-hcs] [--[no-]nested] [--no-ome-meta-export]
[--no-root-group] [--overwrite]
[--use-existing-resolutions] [--version] [--debug
...
```
---
### You can pull it directly from DockerHub! (cont.)
- :warning:The main difference between my image and the official image built by Glencoe Software is the location of the binary file.
- :thumbsup: That is why when calling the binary in the Glencoe image, I needed to use the full-path to `bioformats2raw` and in mine I didn't have to.
---
### Using the Singularity image file
Consider the file `example.sh`
```bash=
#!/bin/bash
shopt -s expand_aliases
alias bioformats2raw='singularity exec -B /bil bioformats2raw.sif bioformats2raw'
FILE=/bil/data/84/c1/84c11fe5e4550ca0/SW170711-02A/SW170711-02A_4_06.tif
OUTPUT=SW170711-02A_4_06.zarr
bioformats2raw $FILE $OUTPUT --resolutions 6 --tile_width 128 --tile_height 128
```
---
### Using the Singularity image (cont.)
```bash=
# remember to use the account and reservation
sbatch -p compute -A tra220018p --reservation=hackathon -n 2 --mem=16Gb example.sh
Submitted batch job 82803
```
---
### Using the Singularity image (cont.)
Let's take a look at the status of my job
```bash=
squeue -u icaoberg
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
82803 compute example. icaoberg R 0:03 1 l005
```
---
### Using the Singularity image (cont.)
Let's look at the output
```bash=
cat slurm-82803.out
2022-04-02 06:44:10,549 [main] WARN loci.formats.Memoizer - skipping memo: directory not writeable - /bil/data/84/c1/84c11fe5e4550ca0/SW170711-02A
2022-04-02 06:44:10,836 [main] WARN loci.formats.Memoizer - skipping memo: directory not writeable - /bil/data/84/c1/84c11fe5e4550ca0/SW170711-02A
...
```
and wait...
---
### Can I use more than one image per script?
Of course! You can use as many Singularity image files as needed. Consider this example,
```bash=
# copy raw2ometiff Singularity container to current working directory
cp /bil/data/hackathon/2022_GYBS/src/PSC/icaoberg/singularity/singularity-raw2ometiff-3.0.0.sif raw2ometiff.sif
```
Now I have 2 Singularity images in my current working directory
```bash=
ls -lta *.sif
-rwxr-xr-x 1 icaoberg pscstaff 258281472 Apr 2 02:48 raw2ometiff.sif
-rwxr-xr-x 1 icaoberg pscstaff 377360384 Apr 2 02:10 bioformats2raw.sif
```
one for `raw2ometiff` and another for `bioformats2raw`.
---
### Can I use more than one image per script? (cont.)
Consider the updated file, `example.sh`
```
#!/bin/bash
shopt -s expand_aliases
alias bioformats2raw='singularity exec -B /bil bioformats2raw.sif bioformats2raw'
alias raw2ometiff='singularity exec -B /bil raw2ometiff.sif raw2ometiff'
FILE=/bil/data/84/c1/84c11fe5e4550ca0/SW170711-02A/SW170711-02A_4_06.tif
OUTPUT=SW170711-02A_4_06.zarr
bioformats2raw $FILE $OUTPUT --resolutions 6 --tile_width 128 --tile_height 128
OUTPUT_IMAGE=SW170711-02A_4_06.ome.tiff
raw2ometiff $OUTPUT $OUTPUT_IMAGE
```
---
### Can I use more than one image per script? (cont.)
You can submit the script `example.sh` using the command
```
sbatch -p compute -A tra220018p --reservation=hackathon -n 2 --mem=16Gb example.sh
```
and wait....
---
### Can I use more than one image per script? (cont.)
When the script is done, you should find on OME.TIFF on disk
```bash=
file SW170711-02A_4_06.ome.tiff
SW170711-02A_4_06.ome.tiff: Big TIFF image data, big-endian
du -h SW170711-02A_4_06.ome.tiff
723M SW170711-02A_4_06.ome.tiff
```
---
### Singularity
:::info
:bulb: Useful Tips and Tricks
- You can either create one Singularity container with
- all the tools, or
- one container per tool.
- You can build a container locally, e.g. a laptop or desktop, and then upload the image to the home directory.
:::
---
### Singularity (cont.)
:::info
:bulb: Useful Tips and Tricks
- You can find official images on DockerHub that you can use as a base to install applications.
<img align="center" src="https://i.imgur.com/OjvOzEo.png" width="75%" />
:::
---
### `circos` - Example 1
```
Bootstrap: docker
From: perl:5.32.1
%environment
export LANGUAGE=en_US.UTF-8
export LC_ALL=C
%post
export DEBIAN_FRONTEND=noninteractive
apt update && apt-get install -y locales libipc-run3-perl libgd-dev
locale-gen en_US.UTF-8
cpan install Math::Round
cpan install Font::TTF::Font
cpan install Config::General
cpan install Clone
cpan install GD::Polyline
cpan install Math::Bezier
cpan install GD
cpan install List::MoreUtils
cpan install Params::Validate
cpan install Readonly
cpan install Math::VecStat
cpan install Statistics::Basic
cpan install Set::IntSpan
cpan install Regexp::Common
cpan install SVG
cpan install Text::Format
cd /opt
wget http://circos.ca/distribution/circos-0.69-9.tgz
tar -xvf circos-0.69-9.tgz && rm -f circos-0.69-9.tgz
ln -s $(pwd)/circos-0.69-9/bin/circos /usr/local/bin/circos
```
---
### `mc` - Example 2
```
Bootstrap: docker
From: alpine:edge
%post
apk update
apk add mc
```
---
### `cwltool` - Example 3
```
Bootstrap: docker
From: debian:buster
%post
apt update
apt install -y python3 python3-pip nodejs
pip3 install cwltool==3.1.20220210171524 cwlref-runner
```
---
### `spades` - Example 4
```
Bootstrap: docker
From: quay.io/biocontainers/spades:3.15.3--h95f258a_0
%labels
MAINTAINER icaoberg
EMAIL icaoberg@psc.edu
SUPPORT help@psc.edu
REPOSITORY http://github.com/pscedu/singularity-spades
COPYRIGHT Copyright © 2021 Pittsburgh Supercomputing Center. All Rights Reserved.
VERSION 3.15.3
```
---
### `visidata` - Example 5
```
Bootstrap: docker
From: python:3.8-alpine
%environment
export TERM="xterm-256color"
%post
apk update
apk add git
pip install requests python-dateutil wcwidth tabulate
mkdir -p /opt/visidata
git clone https://github.com/saulpw/visidata.git
cd visidata
sh -c 'yes | pip install -vvv .'
rm -rfv visidata
```
---
### Need more examples?
<img src="https://sylabs.io/_nuxt/singularity-logo.24b82a06.svg" width="10%" />
To see a list of curated Singularity definition files maintained by the Pittsburgh Supercomputing Center? Click [here](https://github.com/pscedu/singularity).
---
## In a nutshell :chestnut:
- `LMOD` is used to load software in the `workshop` VM and the L-nodes.
- `SLURM` is used to submit jobs to the scheduler managing the large-memory nodes.
- `interact` is used to start interactive sessions on the large-memory nodes.
- `Singularity` allows you to create and run containers that package up pieces of software in a way that is portable and reproducible.
---
## A Gentle intro to CWL
- The Common Workflow Language (CWL) is a standard for describing computational data-analysis workflows.
- Workflows are written in YAML/JSON and define
- Tools (such as what container to download and how to use them)
- Inputs
- Outputs
- If designed properly, then CWL workflows are very portable.
- You can find our complete gentle intro [here](https://hackmd.io/@biomed-apps/r1-PTOUb9).
---
<img src="https://raw.githubusercontent.com/common-workflow-language/media/main/CWL-Logo-VGA.png" width="50%"/>
```bash=
interact -A tra220018p -p compute -R hackathon -n 2 --mem=16Gb
module load anaconda3
pip install --user cwltool cwlref-runner
pip install --user cowsay #needed for this exercise
```
The files needed for this exercise can be found [here](https://github.com/pscedu/singularity-cowsay).
---
## `cowsay`
```
cowsay "Hello, World\!"
_______________
< Hello, World! >
---------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
```
---
## `cowsay` (cont.)
Basic workflow that uses `cowsay` create the file `cowsay.cwl`
```
#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool
baseCommand: cowsay
inputs:
message:
type: string
inputBinding:
position: 1
outputs: []
```
CWL documents are written either in YAML or JSON. For example, we can create the input file `message.cwl`
```
message: Hello world!
```
---
# `cowsay` (cont.)
```
cwltool cowsay.cwl message.cwl
INFO /bil/packages/anaconda3/4.11.0/bin/cwltool 3.1.20220210171524
INFO Resolved 'cowsay.cwl' to 'file:///bil/users/icaoberg/code/singularity-cowsay/3.04/cowsay.cwl'
INFO [job cowsay.cwl] /tmp/l7knmpt3$ cowsay \
'Hello world!'
____________
| Hello world! |
============
\
\
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||
INFO [job cowsay.cwl] completed success
{}
INFO Final process status is success
```
---
## `cowsay` on Docker
:::warning
This step cannot run on Brain Image Library hardware since we do not support Docker.
:::
Consider the following `Dockerfile`
```
FROM ubuntu:latest
RUN apt-get update && apt-get install -y cowsay --no-install-recommends && rm -rf /var/lib/apt/lists/*
ENV PATH $PATH:/usr/games
CMD ["cowsay"]
```
---
## `cowsay` on Docker (cont.)
The file above creates a Docker image with the `cowsay` binary. I can be built and pushed using the commands
```
docker build -t icaoberg/cowsay .
docker push icaoberg/cowsay
```
---
## `cowsay` on Docker (cont.)
This is a dummy example but technically now there exists a container with `cowsay` on my account in DockerHub. Now, I can recycle the CWL workflow from before and have it pull the container from DockerHub by adding the lines
```
hints:
DockerRequirement:
dockerPull: icaoberg/cowsay
```
---
## `cowsay` on Docker (cont.)
The `cowsay.cwl` can be updated to include the previous lines
```
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
requirements:
SubworkflowFeatureRequirement: {}
DockerRequirement:
dockerPull: icaoberg/cowsay
baseCommand: cowsay
inputs:
message:
type: string
inputBinding:
position: 1
outputs: []
```
---
## `cowsay` on uDocker
:pencil: Just like with Singularity, you cannot build Docker container on BIL hardware. However, if the Docker containers were built properly, you can use uDocker to execute containerized apps in Docker on BIL.
```
module load anaconda3
pip install --user uDocker
cwltool --user-space-docker-cmd=udocker --debug cowsay.cwl message.yml
```
---
## `cowsay` on uDocker (cont.)
```
******************************************************************************
* *
* STARTING b8530cb4-5887-3c41-9a73-71cf189a90f4 *
* *
******************************************************************************
executing: cowsay
______________
< Hello world! >
--------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
INFO [job cowsay2.cwl] Max memory used: 17MiB
INFO [job cowsay2.cwl] completed success
DEBUG [job cowsay2.cwl] outputs {}
DEBUG [job cowsay2.cwl] Removing input staging directory /tmp/m5ppaxkq
DEBUG [job cowsay2.cwl] Removing temporary directory /tmp/npq32ztj
DEBUG Removing intermediate output directory /tmp/u5867sg7
```
---
### `cowsay` on Singularity
The main issue is that most HPC clusters do not support Docker and prefer Singularity or Apptainers. However, if the Docker image in DockerHub has proper entrypoints, then you could simply use the `--singularity` option to ask CWL tools to convert the Docker image to Singularity.
:::warning
:warning: If the Docker image does not a proper entry point this step might fail if you are not aware of how the image was built.
:::
Using the option
```
cwltool --singularity cowsay2.cwl message.cwl
```
will run the workflow.
---
## In a nutshell :chestnut:
- `LMOD` is used to load software in the `workshop` VM and the L-nodes.
- `SLURM` is used to submit jobs to the scheduler managing the large-memory nodes.
- `interact` is used to start interactive sessions on the large-memory nodes.
- `Singularity` allows you to create and run containers that package up pieces of software in a way that is portable and reproducible.
- `cwltool` can be used to create complex workflows for data processing.
- These workflows can pull Docker images from public registries making them very portable.
---
### How many parameters can I use?
As many as needed. You can expose as many parameters as your tool can use and can set default values as well. Consider `cowsay`, it takes other input parameters
```
cowsay(6) Games Manual cowsay(6)
NAME
cowsay/cowthink - configurable speaking/thinking cow (and a bit more)
SYNOPSIS
cowsay [-e eye_string] [-f cowfile] [-h] [-l] [-n] [-T tongue_string] [-W column] [-bdg‐
pstwy]
```
---
### How many parameters can I use? (cont.)
```
#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool
requirements:
SubworkflowFeatureRequirement: {}
DockerRequirement:
dockerPull: icaoberg/cowsay
baseCommand: "cowsay"
inputs:
message:
type: string
inputBinding:
position: 2
format:
type: string
inputBinding:
position: 1
prefix: -f
default: "flaming-sheep"
outputs: []
```
---
### How many parameters can I use? (cont.)
```
cat message2.yml
message: Hello world!
format: flaming-sheep
cwltool --singularity cowsay3.cwl message2.yml
______________
< Hello world! >
--------------
\ . . .
\ . . . ` ,
\ .; . : .' : : : .
\ i..`: i` i.i.,i i .
\ `,--.|i |i|ii|ii|i:
UooU\.'@@@@@@`.||'
\__/(@@@@@@@@@@)'
(@@@@@@@@)
`YY~~~~YY'
|| ||
INFO [job cowsay3.cwl] completed success
{}
INFO Final process status is success
```
---
### More meaningful example
```
cwlVersion: v1.2
class: CommandLineTool
requirements:
SubworkflowFeatureRequirement: {}
DockerRequirement:
dockerPull: icaoberg/bioformats2raw:0.4.0
dockerOutputDirectory: /opt/bioformats2raw
inputs:
inputImage:
type: File
inputBinding:
position: 1
outputDirectory:
type: Directory
inputBinding:
position: 2
default: zarr
resolutions:
type: int
inputBinding:
position: 3
prefix: --resolutions
default: 6
tile_width:
type: int
inputBinding:
position: 4
prefix: --tile_width
default: 128
tile_height:
type: int
inputBinding:
position: 5
prefix: --tile_height
default: 128
outputs:
zarr_image:
type: Directory
outputBinding:
glob: $(inputs.outputDirectory)
baseCommand: ['bioformats2raw']
stdout: bioformats2raw.out
```
---
### What is the plan?
- The plan is for the teams to create Singularity definition file(s) or Dockerfile(s) that other people can use to build and recreate pipelines.
- With the potential of sharing Singularity containers in BIL and Docker images in public registries.
- Design, create and share a worflow using CWL so that your pipeline can be easily reproducible in BIL.
{"metaMigratedAt":"2023-06-16T22:21:05.707Z","metaMigratedFrom":"YAML","title":"Get Your Brain Straight 2022","breaks":true,"slideOptions":"{\"theme\":\"white\",\"transition\":\"slide\",\"spotlight\":{\"enabled\":false},\"previewLinks\":true}","contributors":"[{\"id\":\"95d26c43-541b-4d60-ba03-d5ba7942c504\",\"add\":58532,\"del\":26759},{\"id\":\"09c21379-569c-4af4-8ad5-b7c91052b2ec\",\"add\":34,\"del\":45}]"}