CBS SLURM server guide

# CBS SLURM server guide This guide introduces how to use the CBS SLURM cluster for your computational research. The cluster uses the Simple Linux Utility for Resource Management (SLURM) that allows multiple users to **share computing resources fairly and efficiently**. Work is submitted to SLURM in the form of **batch jobs** or **interactive sessions**, which are "smartly" scheduled and distributed to the compute nodes without interfering with other users. The CBS SLURM cluster consists of **7 high-performance compute nodes**, each with **128 CPU cores, 500+ GB RAM, 7TB local SSD**, and **2× NVIDIA L40S GPUs** — optimized for compute-intensive and data-heavy workloads (See details at General information section). This guide explains how to: - Connect to the cluster and understand SLURM. - Request CPU, GPU, and memory resources. - Store and manage data across `/nfs/<pi_last_name>`, `/tmp`, and `/nfs/scratch`. - Submit batch and interactive jobs. - Use modules to manage software. - Use kslurm as a wrapper for SLURM commands. ## 1. What is SLURM? - **SLURM** (Simple Linux Utility for Resource Management) is a **job scheduler**. - A job scheduler manages **when, where, and how jobs run** on a shared computing cluster. - Instead of running programs directly, you **submit jobs to SLURM**, which allocates resources (CPU, memory, GPU, time) and executes your workload on the compute nodes. This ensures **fairness**, **efficiency**, and **stability** in a multi-user research environment. ## 2. Connecting to the Cluster To log in to the **master node**: ```bash ssh your_uwo_username@rri-cbs-slurm.fmd.uwo.pri ``` __Note__: An automatic setup script runs every hour to configure new users. If this is your first time connecting to the SLURM server, please: - Connect and disconnect once. - Wait one hour. - Then reconnect. > [!Caution] > **NEVER** run computations directly on the **master node**. > The master node is only for **submitting jobs, file management, and light setup**. All computational work should be submitted via SLURM. > [!Important] > - Direct ssh access is only possible through the CBS VDI servers or when you are connected to the UWO network. > - For access off-campus you should concect through the CBS VDI servers. > [!Tip] > You can use `-X` to enable **X11 forwarding** if you require GUI applications in interactive sessions (see [section 9](#9.-Interactive-Jobs-(with-X11-Forwarding))). --- ## 3. Submitting a Job Jobs are submitted using: ```bash sbatch my_job_script.sh ``` Where `my_job_script.sh` is a shell script that describes: 1. The resources you need (CPUs, memory, GPUs, runtime). 2. The commands to run. See [section 7](#7.-Job-Script-Template) for an example template of a job script. > [!TIP] Best practice > Jobs cannot exceed **48 hours**. If your workflow is longer, design your code with **checkpoints** so you can restart from intermediate results. ## 4. Common SLURM Flags for Resource Requests Add these to your job script as `#SBATCH` directives: | Flag | Purpose | Example | |------|----------|---------| | `--job-name` | Job name | `#SBATCH --job-name=myjob` | | `--time` | Runtime limit (HH:MM:SS, max = 48:00:00) | `#SBATCH --time=12:00:00` | | `--cpus-per-task` | CPUs per task | `#SBATCH --cpus-per-task=2` | | `--mem` | Memory per node | `#SBATCH --mem=16G` | | `--gpus-per-node` | Request GPU(s) | `#SBATCH --gpus-per-node=1` | | `--gpus-per-node=name:number` | Request specific type of GPU(s) | `#SBATCH --gpus-per-node=l40s:2` (Note: currently only l40s GPUs are available, but a100 GPUs will be available soon.| | `--output` | Save output log | `#SBATCH --output=slurm-%j.out` | ## 5. Managing Software with **Modules** The cluster uses **modules** (Lmod) to manage software environments. - See available software: ```bash module avail ``` - Load a module: ```bash module load freesurfer/7.4.1 ``` - Check loaded modules: ```bash module list ``` - Remove a module: ```bash module unload freesurfer/7.4.1 ``` Load modules in your script or interactive session to have access to the software (See example in [section 7](#7.-Job-Script-Template)). ## 6. Data Storage: `/nfs/<pi_last_name>`, `/nfs/scratch`, and `/tmp` ### 🔒 Lab Shares: `/nfs/<pi_last_name>` - Labs can request a **dedicated fileshare** for storing and processing project data: ``` /nfs/<pi_last_name> ``` - These shares are provisioned on the **2025 OneFS fileserver** and are accessible from all **compute nodes**. - **Note:** Older lab shares mounted via `/cifs/...` from the legacy **2018 fileserver** are **not accessible from the compute nodes**. All compute jobs should use the new `/nfs/<pi_last_name>` shares or `/nfs/scratch`. > 💡 For details on the CBS servers storage, see the [CBS Server Storage Document](https://hackmd.io/@CompCore/cbs_storage). ### 📂 Shared Temporary Space: `/nfs/scratch` - `/nfs/scratch` is a **shared, high-capacity (25 TB) space** accessible to all users on the compute nodes. - Use it to **stage input/output data** before and after job runs — especially if your lab doesn’t yet have a `/nfs/<pi_last_name>` share. > [!Caution] Scratch Policy > - Files older than **30 days** are **automatically deleted**. > - This storage is **not backed up** and is intended **only for temporary use**. ### ⚡ Fast Local Storage: `/tmp` on Compute Nodes - Each compute node has a **7TB fast SSD** local disk. - When your job starts, a **personal `/tmp` folder** is automatically created **just for that job**. - This folder is: - **Private to your job** - **Faster than network storage** (/nfs/...) - **Deleted at job completion/termination** (all contents are wiped) > [!Tip] Best practice > Copy your data to `/tmp`, run your job there, and move outputs back to `/nfs/<pi_last_name>` or `/nfs/scratch` **before the job finishes**. ### 💬 Requesting Storage - Contact [support-cbs-server@uwo.ca](mailto:support-cbs-server@uwo.ca) to request a **new lab share** on `/nfs/<pi_last_name>`. - For billing and quota details, refer to the [CBS Servers Document](https://hackmd.io/@CompCore/cbs_servers) or email: support-cbs-server@uwo.ca ## 7. Job Script Template 📋 **Example Workflow** A typical script might: 1. **Copy data** from `/nfs/<pi_last_name>` or `/nfs/scratch` to `/tmp`. 2. **Proccess data** using data in `/tmp` and write logs locally. 3. **Checkpoint** outputs periodically (e.g., after each step in a pipeline) to `/nfs/scratch`. (Optional but recomended). 4. **Copy final results** from `/tmp` to `/nfs/<pi_last_name>` or `/nfs/scratch` before the job exits. Example: `my_job_script.sh` ```bash #!/bin/bash #SBATCH --job-name=myanalysis #SBATCH --time=12:00:00 #SBATCH --cpus-per-task=4 #SBATCH --mem=16G #SBATCH --gpus-per-node=0 #SBATCH --output=slurm-%j.out # Load software module load python/3.13.2 # Copy input data to local scratch (/tmp) cp /nfs/lab/myproject/data/input.csv /tmp # Run computation python myscript.py /tmp/input.csv > /tmp/results.txt # Copy results back to lab share cp /tmp/results.txt /nfs/lab/myproject ``` Submit it with: ```bash sbatch my_job_script.sh ``` ## 8. Monitoring and Managing Jobs - List your jobs: ```bash squeue -u <your_uwo_username> ``` - Cancel a job: ```bash scancel <jobID> ``` - Show job history: ```bash sacct -u <your_uwo_username> ``` - Detailed info on a running or completed job: ```bash scontrol show job <jobID> ``` - List nodes on the cluster, what is up and down: ```bash sinfo ``` ## 9. Interactive Jobs Sometimes you need an **interactive session** (e.g., for debugging). > [!Caution] > Interactive session are limited to **5 hours**. - Request an interactive session: ```bash salloc --time=05:00:00 --nodes=1 --ntasks=1 --cpus-per-task=2 --mem=8G ``` - With GPUs: ```bash salloc --time=05:00:00 --gpus-per-node=1 ``` ### GUI (X fowarding) In case you need GUI or windows you need to start your interactive session with X fowarding enabled (using --x11 flag). The above example would become: ```bash salloc --time=05:00:00 --nodes=1 --ntasks=1 --cpus-per-task=2 --mem=8G --x11 ``` Make sure you connect to the SLURM cluster with X forwarding enabled: ```bash ssh -X your_uwo_username@rri-cbs-slurm.fmd.uwo.pri ``` ## 10. Using **kslurm** (Simplified SLURM Wrapper) [**kslurm**](https://kslurm.readthedocs.io/en/latest/) simplifies SLURM commands. It has three commands for requesting jobs: - **kbatch:** for batch submission jobs (no immediate output). kbatch does not require a script file. You can use it to directly submit a command (see example). - **krun:** for interactive submission. These commands use simplified argument parsing, meaning that instead of writing a SLURM file or typing out confusing --arguments, you can request resources with an intuitive syntax. Examples: - This command schedules a 12hr job with 16 cores and 24 G‌‌B of memory. Once started, the job will run recon-all. ```bash kbatch 12:00 16 24G recon-all <recon-all-args> ``` - This command will request an interactive session with 4 cores, for 3hr, using 15 GB of memory, and a gpu. ```bash krun 4 3:00 15G gpu ``` See kslurm full [**documentation**](https://kslurm.readthedocs.io/en/latest/) for further details. # General information ## Billing Rates CBS SLURM server is only available to Power users. Note that CBS SLURM only support the OneFS Storage datashare (/nfs). For further details and fees please check the [CBS servers](https://hackmd.io/@CompCore/cbs_servers) wiki entry. ## Cluster Architecture The CBS SLURM cluster is designed to support high-throughput, data-intensive research with a combination of powerful compute nodes and a lightweight master node for job submission. ### 🖥️ Master Node (`rri-cbs-slurm.fmd.uwo.pri`) Used for **SSH access**, **file management**, and **SLURM job submission**. It is not intended for any computation or heavy processing. - **CPU:** 2× Intel® Xeon® Gold 6230R @ 2.10GHz (2 logical CPUs total) - **Memory:** 8 GB - **Storage:** - 30 GB system disk (`/`) - 100 GB scratch (`/localscratch`) - **Virtualization:** Hosted on VMware > [!Caution] > Do not run computational workloads on the master node. Always use `sbatch`, `salloc`, or `kslurm` to submit jobs to the compute nodes. ### 🚀 Compute Nodes (`rri-cbs-h1` → `rri-cbs-h7.schuich.uwo.ca`) These are the nodes where all batch and interactive compute jobs run. Each node is identical and designed for high-performance computing (HPC) and GPU-accelerated workflows. - **CPU:** 7× Intel® Xeon® Gold 6448Y - 64 physical cores × 2 threads = **128 logical CPUs** - Clock Speed: 800 MHz (min) – 4.1 GHz (max) - 120 MB L3 cache - **Memory:** ~504 GB usable RAM - **GPUs:** 2× NVIDIA L40S - 46 GB memory each - CUDA 12.8, Driver version 570.133.20 - **Local Scratch Storage:** - 7 TB SSD (RAID-0) mounted at `/tmp`, `/localscratch` - Fast I/O for temporary job data; auto-deleted at job completion These compute nodes are best suited for large-scale CPU and GPU workloads such as simulations, neuroimaging pipelines, and machine learning training. # Additional resources For detailed explanations and further reading we highly recommend Canada Alliance documentation on SLURM. https://docs.alliancecan.ca/wiki/What_is_a_scheduler https://docs.alliancecan.ca/wiki/Running_jobs https://docs.alliancecan.ca/wiki/Using_GPUs_with_Slurm https://docs.alliancecan.ca/wiki/MATLAB # Need help? 📧 Contact: support-cbs-server@uwo.ca