Nextflow on HPC - best practices

# Nextflow on HPC - best practices > Applicable for HPC clusters in Sweden === Document in progress === ## Overview Depending on the particular HPC environment setup and workflow design, there are two common scenarios to start a `nextflow` pipeline run. - **on the login node** in `screen`/`tmux` session or with `nextflow run -bg ...`, with [SLURM executor](https://www.nextflow.io/docs/latest/executor.html#slurm) - **on a reserved compute node**, with or without [SLURM executor](https://www.nextflow.io/docs/latest/executor.html#slurm) - **interactively**, starting the workflow in `screen`/`tmux` or with `nextflow run -bg ...` - **non-interactively**, starting the workflow with an sbatch script ## Running pipelines - general considerations - **Keep your `.bashrc` clean**. Avoid automatic conda activation - this could interfere with the software modules provided by the center and might cause some unexpected problems. - **Store all your research data in the allocated project folder**. Home folders are not enough to accommodate large amount of data and it is meant to store only common program configuration files and cache. - **Make sure you have enough space.** Pipelines will need space for data, containers, and to perform the calculations. Intermediate steps from previous successful or failed runs could easily "pile up". https://deploy-preview-3644--nf-core-docs.netlify.app/docs/tutorials/storage_utilization/managing_work_directory_growth ## Running `nf-core` pipelines Recent pipelines from the nf-core project are fully setup to run efficiently on [known HPC environments](https://nf-co.re/configs/). The configurations are maintained continuously by teams working closely with users on these clusters, although some documentation might be outdated. - On the login node, start `screen` or `tmux`. - Load the necessary software modules - specific for each cluster. `module load PDC nextflow apptainer` - for Dardel `module load Nextflow/latest` - for Rackham/Bianca `module load Nextflow` - for Pelle - Follow the instructions to start the pipeline. Make sure to provide the institutional profile and project. The institutional profile will set the the path to available databases, container model, scratch location, time limits, cluster partitions etc. . ```bash nextflow run nf-core/sarek \ -profile uppmax \ --project naiss-XXXX-XX \ -params-file workflow_parameters.yml \ --input samplesheet.csv \ --outdir <OUTDIR> ``` - `-profile` Dardel@PDC: `-profile pdc_kth` Rackham,Pelle,Bianca,Maya@UPPMAX: `-profile uppmax` - `-params-file` - store user configurations in a file instead, [example](https://hackmd.io/@pmitev/nextflow-HPC-best-practices). - **If the computer center does not allow you to run long jobs on the login node**, or the pipeline is **not designed to use a shared resource scheduler** (*like SLURM*), you **NEED** to start the pipeline on a reserved compute node. - **non-interactive job example** - please adapt the parameters. Small tasks will be performed locally (*on the reserved node*) and they need some RAM and CPU resources. ```slurm #!/bin/bash -l #SBATCH -A naiss-XXXX-XX #SBATCH -p shared #SBATCH -t 1-00:00 #SBATCH --mem=16GB #SBATCH -c 4 ... module load PDC nextflow apptainer cd /project/folder nextflow run nf-core/sarek -profile pdc_kth --project naiss-XXXX-XX ... ... ``` - Run the last 3 lines in an **interactive session** within `screen` or `tmux` to ensure your pipeline will continue running even if your connection to the computer gets interrupted. ### Pros/Cons to consider. - Starting Nextflow on a **login node** allows the complete workflow to run longer than the allowed reservation time, which will be relevant only to the separate tasks. In contrary, when pipeline is started on a compute node, the whole workflow is limited by the time of the reservation. - Starting Nextflow on a **compute node**, requires reservation of resources (CPU,RAM, and time) for the main process. These resources will be allocated and accounted for the duration of the workflow run and used inefficiently. - Keep in mind, some computer centers redirects the logins to **different login nodes** to distribute user sessions, which might cause confusion where is your pipeline running. - On Bianca/Maya, the project login node is **automatically decommissioned** when not used, which will kill any process running on it - you need to start the workflow on a reserved compute node/cores! ## Running other Nextflow pipelines Some nextflow pipelines might not be adapted to run on HPC environment and often are designed to run on a single computer without workload scheduler like SLURM. The easiest way to run such pipelines is to run them interactively or non-interactively on a reserved compute node that will satisfy the largest requirements with respect to CPU/GPU and RAM, despite, that this is not an efficient way to use the reserved resources... Alternatively, you might want to [adapt the pipeline to run with SLURM](https://nf-co.re/docs/tutorials/external_usage/nf-core_configs_outside_nf-core). It is not always easy, since many pipelines aim to utilize commercial cloud resources instead. ## Relevant documentation - [Getting started](https://nf-co.re/docs/usage/getting_started/introduction) - [`nf-core` documentation](https://hackmd.io/@pmitev/nfcore_pipelines_Bianca) - [Nextflow, nf-core etc. on Dardel](/_ZReAhaHRve1TumdS1CUcA) - UPPMAX related - [nextflow on Rackham and Bianca](/HjMsa37-QZm4-fA3t88o_g) - [`nf-core` pipelines on Bianca/Maya](https://hackmd.io/@pmitev/nfcore_pipelines_Bianca) - [Running offline](https://nf-co.re/docs/usage/getting_started/offline) ## Troubleshooting - https://nf-co.re/docs/usage/troubleshooting/overview ## Advanced - [Adapting "non nf-core" pipelines to run with nf-core configs](https://nf-co.re/docs/tutorials/external_usage/nf-core_configs_outside_nf-core) - [De Novo Assembly Project Template](https://github.com/NBISweden/assembly-project-template) ## Contacts - [Pavlin Mitev](https://katalog.uu.se/profile/?id=N3-1425) - [UPPMAX](https://www.uu.se/en/centre/uppmax) - [AE@UPPMAX - related documentation](/8sqXISVRRquPDSw9o1DizQ)