Getting started to use HPC (Hive) :bee:

Getting started to use HPC (Hive) :bee: === ```notes: ``` <div style=" width: 100%; padding: 4rem 2rem; background: #f9fafb; text-align: center; font-family: 'Segoe UI', Arial, sans-serif; border-bottom: 4px solid #2563eb; "> <h1 style=" margin: 0; font-size: 4rem; font-weight: 800; color: #111827; line-height: 1.1; "> Dennis Lab Meeting </h1> <p style=" margin: 1rem 0 0; font-size: 2rem; font-weight: 500; color: #374151; "> Thu, Aug 14, 2025 </p> <p style=" margin: 0.5rem 0; font-size: 1.6rem; color: #4b5563; "> <strong>By:</strong> Mohamed Abuelanin </p> <p style=" margin: 0.5rem 0; font-size: 1.4rem; "> <a href="mailto:mahussien@ucdavis.edu" style=" color: #2563eb; text-decoration: none; border-bottom: 2px solid rgba(37, 99, 235, 0.3); " onmouseover="this.style.borderBottomColor='rgba(37,99,235,1)';" onmouseout="this.style.borderBottomColor='rgba(37,99,235,0.3)';"> mahussien@ucdavis.edu </a> </p> </div> # 1. What is HPC? ## 1.1 Organization overview ![image](https://hackmd.io/_uploads/r1lYZUKuxx.png) ## 1.2 How storage actually works? (Quobyte) **- NFS, parallel IO** ![image](https://hackmd.io/_uploads/H1h1U9tule.png) ## 1.3 Slurm ![image](https://hackmd.io/_uploads/B1yc2cFOgl.png) ## 1.4 SSH ![image](https://hackmd.io/_uploads/HklUljK_lg.png) ## 1.5 HIPPO ![image](https://hackmd.io/_uploads/B1CZJUtuee.png) <hr style="border: 0; border-top: 6px solid currentColor; border-bottom: 2px solid currentColor; height: 0; margin: 1.25rem 0;"> # 2. Understanding the Shell Environment ## 2.1 What is a Shell? 🐚 A **shell** is a program that takes your commands and gives them to the operating system to perform. It's the primary way you interact with a command-line interface. **Analogy:** Think of the shell as a translator between you and the computer's core (the kernel). * **You type:** `ls -l` (a command to list files). * **The Shell interprets** this and asks the kernel for the file list. * **The Kernel provides** the list, and the shell displays it for you. The most common shell is **bash**. ```mermaid graph TD subgraph "How a Command Works" User -- "Types 'ls -l'" --> Shell; Shell -- "Asks for file list" --> Kernel; Kernel -- "Returns file list" --> Shell; Shell -- "Displays output" --> User; end ```` ## 2.2 What are Shell Configuration Files? When a shell starts, it runs special scripts to set up your environment. These are called configuration files (or "dotfiles" because their names start with a dot, like `.bashrc`). You use them to customize your command-line world: * **Aliases:** Create shortcuts for long commands (e.g., `alias ll='ls -alF'`). * **Environment Variables:** Set important variables like `$PATH`, which tells the shell where to find programs. ## 2.3 Login vs. Interactive Shells To understand which file gets used, you need to know the difference between two types of shells: 1. **Login Shell:** This is the *first* shell session that starts after you authenticate. * **Example:** When you log in to a server with **SSH**. 2. **Interactive Non-Login Shell:** This is any *new* shell you start from an existing session. * **Example:** Opening a new terminal window on your desktop, or typing `bash` inside your SSH session. This distinction is the key to why we have different configuration files. ```mermaid graph TD subgraph "Which Shell Am I?" A(User Action) --> B{Is this the initial login?}; B -- "Yes (e.g., SSH)" --> C[Starts a LOGIN shell]; B -- "No (e.g., new terminal tab)" --> D[Starts an INTERACTIVE NON-LOGIN shell]; end ``` ## 2.4 `.bash_profile` vs. `.bashrc` Bash uses two main files for interactive shells: ### `~/.bash_profile` * **When it runs:** **Only once**, when you start a **login shell** (e.g., `ssh`). * **Purpose:** For things that only need to be set up once per session. The most important is the `PATH` environment variable. ### `~/.bashrc` * **When it runs:** For **every new interactive non-login shell**. * **Purpose:** For things you want in every terminal, like aliases and prompt settings. **The Common Trick:** To make sure your aliases are available everywhere, most `.bash_profile` files are set up to automatically run `.bashrc`. ```bash # Inside ~/.bash_profile if [ -f ~/.bashrc ]; then source ~/.bashrc fi ``` ## 2.5 The HPC Login Process Here’s what happens step-by-step when you connect to an HPC cluster using Bash. ```mermaid %%{init: { "theme": "base", "themeVariables": { "lineColor": "#26A69A" }}}%% flowchart TB A["You — local terminal"]:::you --> B["ssh your_user\@hive.hpc.ucdavis.edu"]:::action B --> C["Login node<br>hive.hpc.ucdavis.edu"]:::login C --> D{"Authenticate<br>(key or password)"}:::decision D -->|ok| E["Bash starts<br>login shell"]:::bash E --> F["Run ~/.bash_profile"]:::file F --> G["Source ~/.bashrc<br>(from .bash_profile)"]:::file G --> H["Prompt ready"]:::done %% Alternative path (non-login shell) subgraph ALT["Non-login shell"] I["srun --pty bash<br>(or new terminal tab)"]:::action --> J["Bash starts<br>non-login shell"]:::bash J --> K["Run ~/.bashrc only"]:::file end K --> H %% Styles classDef you fill:#D1F2EB,stroke:#48C9B0,color:#0E6251,stroke-width:2px; classDef login fill:#E3F2FD,stroke:#42A5F5,color:#0D47A1,stroke-width:2px; classDef action fill:#FFF3E0,stroke:#FB8C00,color:#7D6608,stroke-width:2px; classDef decision fill:#FDEDEC,stroke:#E57373,color:#B71C1C,stroke-width:2px; classDef bash fill:#E8EAF6,stroke:#5C6BC0,color:#283593,stroke-width:2px; classDef file fill:#F3E5F5,stroke:#AB47BC,color:#4A148C,stroke-width:2px; classDef done fill:#E8F5E9,stroke:#66BB6A,color:#1B5E20,stroke-width:2px; ``` 1) **Connect** ```bash ssh your_user@hive.hpc.ucdavis.edu ``` 2) **Authenticate** on the **login node**. 3) **Login shell starts** (Bash is a *login* shell). 4) **`~/.bash_profile` runs.** 5) **`~/.bashrc` is sourced** (your `~/.bash_profile` should call it). 6) **Prompt ready** — you’re in a configured environment. ---- ## 2.6 Use Case: Relocating Your Home Environment **Problem:** Our default home directory (for example, my home directory is `/home/mabuelanin`) is on a small 20GB drive. Programs like Conda, VS Code Server, and package managers install lots of data there, causing it to fill up. You need these programs to use your group's large 400TB drive (`/quobyte/mydennisgrp/mabuelanin`) as their "home". **Solution:** Change the `$HOME` environment variable. This tells all programs where your home is. This requires careful setup to avoid breaking your login process. --- ### Step 1: Prepare Your New Home :::info **IMPORTANT – DO THIS ONCE:** Before changing your `.bash_profile`, you must copy your existing configuration files to the new location. This ensures your shell and other programs can find them after `$HOME` is changed. ::: Run this command directly in your terminal **once**: ```bash # Copies all dotfiles (like .bashrc, .vimrc, etc.) to your new home. # The -a flag preserves permissions and timestamps. cp -a /home/mabuelanin/.[a-zA-Z0-9]* /quobyte/mydennisgrp/mabuelanin/ ``` --- ### Step 2: Modify `.bash_profile` Open your **original** profile at `/home/mabuelanin/.bash_profile` and add/change the following lines: ```bash # Relocate HOME to the large project directory export HOME=/quobyte/mydennisgrp/mabuelanin # Source the .bashrc from the NEW home directory # "$HOME/.bashrc" now points to the new location if [ -f "$HOME/.bashrc" ]; then source "$HOME/.bashrc" fi # Change directory to the new home cd "$HOME" ``` --- ### Result - When you log in, your shell will set `$HOME` to the new, larger directory. - The `~` shortcut will now point to `/quobyte/mydennisgrp/mabuelanin`. - Programs like Conda and VS Code will install their files and caches in this new location, solving your storage problem. - You will start your session inside the new home directory. <hr style="border: 0; border-top: 6px solid currentColor; border-bottom: 2px solid currentColor; height: 0; margin: 1.25rem 0;"> # 3. Slurm Overview Slurm is the job scheduler used on the cluster. You interact with it to submit jobs, request resources, and monitor their progress. Jobs can be run **interactively** (directly from the terminal) or as **batch jobs** (submitted scripts). --- ## 3.1 Interactive Jobs Interactive jobs are useful for debugging, testing small workloads, or running interactive programs like Jupyter Notebook. Example: ```bash srun --account=mydennisgrp --partition=high \ --cpus-per-task=4 --mem=16G --time=01:00:00 \ --pty bash ``` **Key points:** - `--pty bash` starts an interactive shell on the allocated node. - Release the node when done by typing `exit`. --- ## 3.2 Batch Jobs with `sbatch` Batch jobs run in the background without an active terminal connection. They are ideal for long or unattended jobs. **Example template:** ```bash #!/bin/bash #SBATCH --job-name=myjob #SBATCH --account=mydennisgrp #SBATCH --partition=high #SBATCH --cpus-per-task=4 #SBATCH --mem=16G #SBATCH --time=02:00:00 #SBATCH --output=logs/%x_%j.out # STDOUT #SBATCH --error=logs/%x_%j.err # STDERR #SBATCH --mail-type=END,FAIL # Email notifications #SBATCH --mail-user=youremail@ucdavis.edu # Optional: create logs directory if it doesn't exist mkdir -p logs # Load Conda properly inside Slurm # Auto-detect the base conda path for the current user CONDA_BASE=$(conda info --base) . "$CONDA_BASE/etc/profile.d/conda.sh" # Activate a specific environment (change 'myenv' as needed) conda activate myenv # Your command(s) here python myscript.py --input data.txt ``` **How to submit:** ```bash sbatch myjob.sh ``` --- ## 3.3 Background Terminals with `screen` Sometimes you need to keep a session alive after disconnecting (e.g., to monitor logs or run an interactive session). `screen` is a terminal multiplexer that lets you detach and reattach to sessions. **Basic commands:** ```bash # Start a new screen session screen -S mysession # Detach from the session (keep it running) Ctrl + A, then D # List existing sessions screen -ls # Reattach to a session screen -r mysession # Kill a session screen -X -S mysession quit ``` <hr style="border: 0; border-top: 6px solid currentColor; border-bottom: 2px solid currentColor; height: 0; margin: 1.25rem 0;"> # 4. Different Types of Accounts and Partitions Below is an overview of the accounts and partitions available for **our lab**. ```mermaid flowchart TD %% ========= NODES ========= A["Accounts"] --> B["mydennisgrp"] A --> C["publicgrp"] A --> D["genome-center-grp"] %% mydennisgrp (Dedicated) B --> B_high["high"] B_high --> B_cpu["128 CPUs<br>1000 GB RAM<br>Dedicated Tecate node"] %% publicgrp (Shared) C --> C_high["high"] C --> C_low["low"] %% (UPDATED) publicgrp: no GPU block under 'high' C_high --> C_cpu["128 CPUs<br>2000 GB RAM"] %% publicgrp: low still shows CPU + GPU (scavenged resources) C_low --> C_cpu_low["Unused cluster CPUs & RAM<br>(3 days max / 1 node)"] C_low --> C_gpu_low["Unused cluster GPUs"] %% genome-center-grp (Shared) D --> D_high["high"] D --> D_gpu_a100["gpu-a100"] D_high --> D_cpu["616 CPUs<br>9856 GB RAM"] D_gpu_a100 --> D_gpu_a100_cpu["128 CPUs<br>2000 GB RAM"] D_gpu_a100 --> D_gpu_a100_gpu["8 GPUs (A100)"] %% ========= STYLES ========= %% Accounts classDef acctPrivate fill:#E8F5E9,stroke:#66BB6A,stroke-width:2px,color:#1B5E20; classDef acctShared fill:#E3F2FD,stroke:#42A5F5,stroke-width:2px,color:#0D47A1; %% Partitions classDef part fill:#FFF3E0,stroke:#FB8C00,stroke-width:1.5px,color:#E65100; %% Resource nodes classDef cpu fill:#F3E5F5,stroke:#AB47BC,stroke-width:1.5px,color:#4A148C; classDef gpu fill:#FCE4EC,stroke:#EC407A,stroke-width:1.5px,color:#880E4F; %% Apply classes class B acctPrivate; class C,D acctShared; class B_high,C_high,C_low,D_high,D_gpu_a100 part; class B_cpu,C_cpu,C_cpu_low,D_cpu,D_gpu_a100_cpu cpu; class C_gpu_low,D_gpu_a100_gpu gpu; ``` ## Example `srun` Commands The following are interactive job examples for each **account → partition → resource** combination. Each example includes the **maximum per-job limits** allowed by Slurm for that combination. Default runtime here is short (`--time=00:05:00`) for quick testing. --- ### mydennisgrp → high <div style="display:flex; gap:8px; margin:6px 0;"> <div style="background-color:#D1F2EB; color:#0E6251; padding:8px 10px; border:1px solid #48C9B0; border-radius:8px; flex:1;"> <b>mydennisgrp (Dedicated)</b> </div> <div style="background-color:#FCF3CF; color:#7D6608; padding:8px 10px; border:1px solid #F7DC6F; border-radius:8px; flex:1;"> <b>Partition: high</b> </div> <div style="background-color:#E8DAEF; color:#4A235A; padding:8px 10px; border:1px solid #BB8FCE; border-radius:8px; flex:2;"> 128 CPUs · 1000 GB RAM · Dedicated Tecate node </div> </div> **Max per job:** 128 CPUs · 1000 GB RAM · No GPU ```bash srun --account=mydennisgrp --partition=high \ --cpus-per-task=4 --mem=16G --time=00:05:00 \ --pty bash ``` --- ### publicgrp → high <div style="display:flex; gap:8px; margin:6px 0;"> <div style="background-color:#D6EAF8; color:#154360; padding:8px 10px; border:1px solid #5DADE2; border-radius:8px; flex:1;"> <b>publicgrp (Shared)</b> </div> <div style="background-color:#FCF3CF; color:#7D6608; padding:8px 10px; border:1px solid #F7DC6F; border-radius:8px; flex:1;"> <b>Partition: high</b> </div> <div style="background-color:#E8DAEF; color:#4A235A; padding:8px 10px; border:1px solid #BB8FCE; border-radius:8px; flex:2;"> 128 CPUs · 2000 GB RAM </div> </div> **Max per job:** 8 CPUs · 128 GB RAM · No GPU ```bash srun --account=publicgrp --partition=high \ --cpus-per-task=8 --mem=128G --time=00:05:00 \ --pty bash ``` --- ### publicgrp → low <div style="display:flex; gap:8px; margin:6px 0;"> <div style="background-color:#D6EAF8; color:#154360; padding:8px 10px; border:1px solid #5DADE2; border-radius:8px; flex:1;"> <b>publicgrp (Shared)</b> </div> <div style="background-color:#FDEDEC; color:#641E16; padding:8px 10px; border:1px solid #F1948A; border-radius:8px; flex:1;"> <b>Partition: low</b> </div> <div style="background-color:#E8DAEF; color:#4A235A; padding:8px 10px; border:1px solid #BB8FCE; border-radius:8px; flex:2;"> Scavenged CPUs · Runtime max: 3 days · 1 node </div> </div> **Max per job:** 1 node · 3 days runtime · No guaranteed resources ```bash srun --account=publicgrp --partition=low \ --cpus-per-task=4 --mem=16G --time=01:00:00 \ --pty bash ``` --- ### genome-center-grp → high <div style="display:flex; gap:8px; margin:6px 0;"> <div style="background-color:#D6EAF8; color:#154360; padding:8px 10px; border:1px solid #5DADE2; border-radius:8px; flex:1;"> <b>genome-center-grp (Shared)</b> </div> <div style="background-color:#FCF3CF; color:#7D6608; padding:8px 10px; border:1px solid #F7DC6F; border-radius:8px; flex:1;"> <b>Partition: high</b> </div> <div style="background-color:#E8DAEF; color:#4A235A; padding:8px 10px; border:1px solid #BB8FCE; border-radius:8px; flex:2;"> 616 CPUs · 9856 GB RAM </div> </div> **Max per job:** (Check with `sacctmgr show qos format=name,grace,maxtres`) ```bash srun --account=genome-center-grp --partition=high \ --cpus-per-task=16 --mem=256G --time=00:30:00 \ --pty bash ``` --- ## GPU-Capable Partitions :::danger **Warning:** Running on a GPU partition **without** `--gres=gpu:<type>:<count>` will still use a GPU node but leave the GPUs idle, wasting resources. ::: --- ### genome-center-grp → gpu-a100 <div style="display:flex; gap:8px; margin:6px 0;"> <div style="background-color:#D6EAF8; color:#154360; padding:8px 10px; border:1px solid #5DADE2; border-radius:8px; flex:1;"> <b>genome-center-grp (Shared)</b> </div> <div style="background-color:#EBDEF0; color:#512E5F; padding:8px 10px; border:1px solid #AF7AC5; border-radius:8px; flex:1;"> <b>Partition: gpu-a100</b> </div> <div style="background-color:#FDEDEC; color:#641E16; padding:8px 10px; border:1px solid #F1948A; border-radius:8px; flex:2;"> 8 GPUs (A100) </div> </div> **CPU-only example:** ```bash srun --account=genome-center-grp --partition=gpu-a100 \ --cpus-per-task=4 --mem=16G --time=00:30:00 \ --pty bash ``` **GPU example:** ```bash srun --account=genome-center-grp --partition=gpu-a100 \ --gres=gpu:a100:1 \ --cpus-per-task=4 --mem=16G --time=00:30:00 \ --pty bash ``` <hr style="border: 0; border-top: 6px solid currentColor; border-bottom: 2px solid currentColor; height: 0; margin: 1.25rem 0;"> # 5. OnDemand (web access) https://ondemand.hive.hpc.ucdavis.edu/ ## What OnDemand Provides - File browser and editor - Interactive applications (Jupyter, RStudio, desktop environments) - Job submission and monitoring interfaces - No SSH knowledge required ![image](https://hackmd.io/_uploads/rkSIs9cOxx.png) <hr style="border: 0; border-top: 6px solid currentColor; border-bottom: 2px solid currentColor; height: 0; margin: 1.25rem 0;"> # 6. Preinstalled HPC Modules Most software on the cluster is provided through the **Environment Modules** system (`module` command). These modules let you load, unload, and switch between different versions of installed software without changing your shell configuration. --- **Listing available modules** ```bash module avail ``` This will print all available modules. You can narrow the search: ```bash module avail python module avail gcc ``` **Loading a module** ```bash module load python/3.11.4 ``` Once loaded, the software is available in your `$PATH`. To verify: ```bash which python python --version ``` **Unloading a module** ```bash module unload python/3.11.4 ``` **Switching versions** ```bash module swap python/3.9.16 python/3.11.4 ``` **Viewing loaded modules** ```bash module list ``` --- :::info **Tip:** Some core tools (e.g., `gcc`, `openmpi`, `cuda`) are only available after loading the corresponding module. If a command is “not found,” try searching with `module avail`. ::: <hr style="border: 0; border-top: 6px solid currentColor; border-bottom: 2px solid currentColor; height: 0; margin: 1.25rem 0;"> # 7. Utilizing the Scratch Directory Heavy I/O (lots of small writes / temporary files) can overwhelm shared filesystems and slow everyone down. Use each compute node’s **local NVMe scratch** for temporary data during jobs, then copy results back to shared storage when done. --- **Why scratch?** - **NVMe local disks** → very high IOPS & throughput (perfect for many small temporary files or random access patterns). - **Reduces contention** on shared storage (your `$HOME`, group drives). - **Ephemeral**: you clean up at the end; nothing is backed up. --- ## 7.1 Architecture at a glance ```mermaid flowchart LR %% --- Nodes --- L["Login Node<br>Interactive prep"]:::login subgraph CA["Compute Node A"] SA["Local NVMe Scratch<br>/scratch/$USER or $TMPDIR"]:::scratch end subgraph CB["Compute Node B"] SB["Local NVMe Scratch<br>/scratch/$USER or $TMPDIR"]:::scratch end H["Shared Storage<br>$HOME / group shares"]:::shared %% --- Edges (minimal, non-overlapping) --- L --> CA L --> CB L --> H CA --> H CB --> H %% --- Styles --- classDef login fill:#D6EAF8,stroke:#5DADE2,stroke-width:2px,color:#154360; classDef shared fill:#E8EAF6,stroke:#5C6BC0,stroke-width:2px,color:#283593; classDef scratch fill:#FDEDEC,stroke:#E57373,stroke-width:2px,color:#B71C1C; ``` **Explanation** - Use **shared storage** to **stage inputs** and **save final outputs**. - During the job, write temp files to the **node-local NVMe scratch** (fast, isolated). - Each node has its **own** scratch; data on `Compute Node A` isn’t visible on `Compute Node B`. --- ## 7.2 Practical usage patterns **Check what scratch env var you have** ```bash echo "TMPDIR=${TMPDIR:-unset}" echo "NODE scratch exists?"; ls -ld /scratch/$USER 2>/dev/null || echo "no /scratch/$USER" ``` **Create a per-job scratch directory and auto-cleanup** ```bash # Inside your job script or interactive srun session SCRATCH_BASE="${TMPDIR:-/scratch/$USER}" JOB_SCRATCH="$(mktemp -d "${SCRATCH_BASE}/job_${SLURM_JOB_ID:-manual}_XXXX")" echo "Using scratch: $JOB_SCRATCH" # Cleanup on exit (even if job fails) cleanup() { rm -rf "$JOB_SCRATCH"; } trap cleanup EXIT ``` **Stage inputs → run → stage outputs** ```bash # 1) Copy inputs from shared storage to scratch cp /path/to/shared/input.dat "$JOB_SCRATCH/" # 2) Run the workload from scratch cd "$JOB_SCRATCH" mytool --in input.dat --out result.bin # 3) Copy outputs back to shared storage mkdir -p /path/to/shared/results/${SLURM_JOB_ID:-manual} rsync -a result.bin /path/to/shared/results/${SLURM_JOB_ID:-manual}/ ``` **Monitor space / speed** ```bash df -h "$JOB_SCRATCH" ``` --- ## 7.3 Slurm examples that favor scratch **SBATCH template using node-local NVMe scratch** ```bash #!/bin/bash #SBATCH --job-name=scratch_demo #SBATCH --account=mydennisgrp #SBATCH --partition=high #SBATCH --cpus-per-task=8 #SBATCH --mem=32G #SBATCH --time=02:00:00 #SBATCH --output=logs/%x_%j.out #SBATCH --error=logs/%x_%j.err # Setup job scratch SCRATCH_BASE="${TMPDIR:-/scratch/$USER}" JOB_SCRATCH="$(mktemp -d "${SCRATCH_BASE}/job_${SLURM_JOB_ID}_XXXX")" cleanup(){ rm -rf "$JOB_SCRATCH"; } trap cleanup EXIT echo "Job scratch: $JOB_SCRATCH" cp /shared/inputs/data.tar "$JOB_SCRATCH/" cd "$JOB_SCRATCH" tar xf data.tar # Run your I/O-heavy workload here my_pipeline.sh > pipeline.log 2>&1 # Save results back to shared storage mkdir -p /shared/outputs/${SLURM_JOB_ID} rsync -a . /shared/outputs/${SLURM_JOB_ID}/ --include='*.out' --include='*.log' --exclude='*' # Cleanup happens via trap ``` --- ## 7.4 Best practices - **Write temp files to scratch**, not `$HOME` or group shares. - **One job → one scratch dir**; clean it up with `trap` to avoid leftovers. - **Copy back only what you need** (logs, final artifacts). - **Assume scratch is not backed up** and may be purged after the job or after inactivity. - **Scratch is per node**: multi-node jobs should **not** assume a single shared scratch unless the site provides one (e.g., parallel scratch/burst buffer). ## 7.5. Symbolic linking (symlink, or soft link) A symbolic link (symlink) is like a shortcut — it lets you access a file or folder from another location without copying it. This is handy on the HPC when you want to point software to a bigger storage location or organize files without moving them. **Example:** ```bash ln -s /quobyte/mydennisgrp/datasets/bigdata ~/bigdata ``` Now `~/bigdata` will open the data stored on the large drive. :::info Always use **absolute paths** (`/something/...`) when making symlinks. Relative paths can break if you run commands from a different directory. ::: To see where a symlink points: ```bash ls -l ~/bigdata ``` To remove a symlink (this won’t delete the real data): ```bash rm ~/bigdata ``` <hr style="border: 0; border-top: 6px solid currentColor; border-bottom: 2px solid currentColor; height: 0; margin: 1.25rem 0;"> # 8 Miniconda (Miniforge) https://conda-forge.org/ Install a lightweight Conda (from **conda-forge**) in your home directory, hook it into Bash, then use **mamba** for fast solves. Keep `base` clean. --- ## 8.1 Install Miniforge ```bash # Download the right installer for your OS/arch curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" # Make it executable and install (batch mode to $HOME/miniforge3) chmod +x Miniforge3-$(uname)-$(uname -m).sh ./Miniforge3-$(uname)-$(uname -m).sh -b ``` **Add Conda to Bash (`~/.bashrc`)** Option A — let Conda handle it: ```bash $HOME/miniforge3/bin/conda init bash # reload your shell so changes take effect source ~/.bashrc ``` **Verify** ```bash conda --version which conda ``` ## 8.2 Create and use environments ```bash # create a fresh env with Python 3.11 conda create -n py311 python=3.11 -y # activate / deactivate conda activate py311 conda deactivate ``` :::danger **Best practice:** keep **base** clean — **don’t install project packages** into `base`. Create project-specific envs (`conda create -n <name> ...`) and install packages there. ::: --- **Use `mamba` for speed** ```bash # install mamba into base (tooling-only is OK) conda install -n base -c conda-forge mamba -y # faster env & package operations mamba create -n py311 python=3.11 -y conda activate py311 mamba install numpy pandas -y ``` **Using Conda/Mamba inside Slurm (`sbatch`)** ```bash #!/bin/bash #SBATCH --job-name=conda_demo #SBATCH --account=mydennisgrp #SBATCH --partition=high #SBATCH --cpus-per-task=2 #SBATCH --mem=8G #SBATCH --time=00:30:00 #SBATCH --output=logs/%x_%j.out #SBATCH --error=logs/%x_%j.err # Load Conda correctly in non-interactive shells . "$HOME/miniforge3/etc/profile.d/conda.sh" # or: CONDA_BASE=$(conda info --base); . "$CONDA_BASE/etc/profile.d/conda.sh" conda activate py311 python -c "import sys,platform; print(sys.executable, platform.python_version())" ``` --- :::danger **Don’t mix Miniforge with `module load conda`.** If your `~/.bashrc` sources Miniforge, it will **shadow** the site Conda module. In that shell, **do not** run `module load conda/anaconda`. Use **one** Conda per session. ::: ## 8.3 Conda channels (add **bioconda** by default) **What are channels?** Conda installs packages from *channels* (package repositories). - **conda-forge**: community-maintained, huge catalog, up-to-date. - **bioconda**: bioinformatics packages (built on top of conda-forge). **Order matters** and using **strict priority** avoids mixing builds. **Option A — configure via commands (keeps existing settings)** ```bash # Make sure conda-forge is present (Miniforge already has it) conda config --add channels conda-forge # Append bioconda after conda-forge conda config --add channels bioconda # Prefer higher-priority channels strictly conda config --set channel_priority strict ``` **Example:** ```bash mamba create -n bio python fastqc samtools bedtools bwa bowtie2 -y mamba activate bio # after creation, we can install more tools mamba install bioconda::bcftools ``` :::info With this setup you no longer need `-c bioconda -c conda-forge` on each install—the defaults cover it. Keep **base** clean; install tools into named envs. ::: <hr style="border: 0; border-top: 6px solid currentColor; border-bottom: 2px solid currentColor; height: 0; margin: 1.25rem 0;"> # 9 Port Forwarding Port forwarding lets you open services (JupyterLab, VS Code Server, custom web apps) running on a **compute node** in your local browser. On Hive, connections **must go through the login node**: `hive.hpc.ucdavis.edu`. ## 9.1 How it works ```mermaid flowchart LR A["Local Machine<br>Browser / VS Code"]:::local -- "ssh -L tunnel" --> B["Login Node<br>hive.hpc.ucdavis.edu"]:::login B -- "ssh" --> C["Compute Node<br>hive-dc-7-9-48"]:::compute classDef local fill:#d0e8ff,stroke:#4a90e2,stroke-width:2px,color:#0f3a66 classDef login fill:#ffe7c2,stroke:#e69b1c,stroke-width:2px,color:#7a4b00 classDef compute fill:#d5f5e3,stroke:#25a25a,stroke-width:2px,color:#0e5a2e ``` Flow: your browser → `localhost:LOCAL_PORT` → SSH tunnel → **login node** → **compute node:REMOTE_PORT**. --- ## 9.2 Quick template (recommended) > Replace the vars, then run the single SSH command **from your laptop**. ```bash # --- set these on your LOCAL machine --- LOCAL_PORT=8888 # any unused port on your laptop (e.g., 8888, 9000, 9999) REMOTE_NODE=hive-dc-7-9-48 # copy verbatim from the compute-node prompt (see below) REMOTE_PORT=3000 # the port your app uses on the compute node (e.g., 3000, 8888) ssh -N -L ${LOCAL_PORT}:${REMOTE_NODE}:${REMOTE_PORT} mhussien@hive.hpc.ucdavis.edu ``` - Open your browser at: `http://localhost:${LOCAL_PORT}` - The `-N` flag keeps the tunnel open without starting a remote shell. - To run it in the background, add `-f` (optional): `ssh -f -N -L ...` **How to get `REMOTE_NODE`:** - When you start an interactive job (e.g., with `srun --pty bash`), your prompt usually shows the node. - Or run `hostname` on the compute node; use that string as `REMOTE_NODE`. --- ## 9.3 Example: Simple HTTP file browser on a compute node (Python only) **1) Start an interactive job (on Hive)** ```bash srun --account=mydennisgrp --partition=high \ --cpus-per-task=4 --mem=16G --time=02:00:00 \ --pty bash ``` You’ll land on a node like `hive-dc-7-9-48`. **2) Launch a file-listing HTTP server on that node (Python only)** ```bash module load python # only load Python cd ~ python -m http.server 8000 --bind 0.0.0.0 ``` Leave this running; it serves a **directory listing** on port `8000`. **3) From your laptop, forward the port through the login node** ```bash LOCAL_PORT=8000 REMOTE_NODE=hive-dc-7-9-48 REMOTE_PORT=8000 ssh -N -L ${LOCAL_PORT}:${REMOTE_NODE}:${REMOTE_PORT} username@hive.hpc.ucdavis.edu ``` Open in your browser: ``` http://localhost:8000 ``` :::info - Change `LOCAL_PORT` if 8000 is busy on your laptop (e.g., 9000). - The `--bind 127.0.0.1` option keeps the server accessible **only** via your SSH tunnel (safer). ::: --- ## 9.4 Example: JBrowse use case (CHM13 + BAM on a compute node) --- **1) Start an interactive job (on Hive)** ```bash srun --account=mydennisgrp --partition=high \ --cpus-per-task=4 --mem=16G --time=02:00:00 \ --pty bash # Note the node name (e.g., hive-dc-7-9-48) ``` **2) Create a Conda env and install JBrowse CLI** ```bash mamba create -n jbrowse nodejs -y mamba activate jbrowse # install JBrowse CLI globally inside this env npm install -g @jbrowse/cli ``` **3) Download JBrowse Web and register the assembly + track** ```bash # Create a working directory and fetch the JBrowse Web build JBROOT=$HOME/jbrowse_web2 jbrowse create $JBROOT cd $JBROOT # Add CHM13 assembly (FASTA + index in the same directory) jbrowse add-assembly \ /quobyte/mydennisgrp/mabuelanin/ref/chm13v2.0.fa \ --alias chm13 \ --name "T2T-CHM13v2.0" \ --load symlink \ --out . # Add a BAM track (expects .bai next to the .bam) jbrowse add-track \ "/quobyte/mydennisgrp/mabuelanin/projects/simon/2025-simon-autism/all_simon_bams/STOPPED_ADDING_chm13_SD_bams/SS0013029_proband_11144.sorted.bam" \ --name "SS0013029 proband (BAM)" \ --trackId ss0013029_bam \ --assemblyNames T2T-CHM13v2.0 \ --load symlink \ --out . # Serve the downloaded web app from the current directory npx serve -S . # leave this running ``` **4) From your laptop, forward a local port through the Hive login node** ```bash LOCAL_PORT=9000 REMOTE_NODE=hive-dc-7-9-48 REMOTE_PORT=3000 ssh -N -L ${LOCAL_PORT}:${REMOTE_NODE}:${REMOTE_PORT} YOUR_USERNAME@hive.hpc.ucdavis.edu ``` Open in your browser: ``` http://localhost:9000 ``` <hr style="border: 0; border-top: 6px solid currentColor; border-bottom: 2px solid currentColor; height: 0; margin: 1.25rem 0;"> # 10. Containers: Apptainer and DeepVariant Apptainer (formerly Singularity) is preinstalled on Hive for running containerized applications without root access. The following example runs DeepVariant for variant calling. --- ## Example: DeepVariant with CHM13 Reference Create `run_deepvariant.sbatch`: ```bash #!/bin/bash #SBATCH --job-name=deepvariant_chr13 #SBATCH --account=mydennisgrp #SBATCH --partition=high #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=32 #SBATCH --mem=100G #SBATCH --time=00-10:00:00 # Set up paths BASE_DIR="/quobyte/mydennisgrp/mabuelanin" REF="${BASE_DIR}/ref/chm13v2.0.fa" BAM="${BASE_DIR}/projects/simon/2025-simon-autism/all_simon_bams/STOPPED_ADDING_chm13_SD_bams/SS0013029_proband_11144.sorted.bam" PROBAND_ID="SS0013029_proband_11144" SIF_PATH="/quobyte/mydennisgrp/mabuelanin/apps/sifs/deepvariant_1.9.0.sif" # Create output directory in your scratch space SCRATCH_DIR="/scratch/${USER}/deepvariant_chr13_${SLURM_JOB_ID}" mkdir -p "${SCRATCH_DIR}" # Create tmp directory in scratch space mkdir -p "${SCRATCH_DIR}/tmp" # Define the target region REGION="chr13:56356433-56364371" echo "Working directory: ${SCRATCH_DIR}" echo "Reference: ${REF}" echo "Analyzing region: ${REGION}" echo "Starting DeepVariant analysis..." # Run DeepVariant on specific region (high-performance) apptainer run \ --bind "${BASE_DIR}:/mnt" \ --bind "${SCRATCH_DIR}:/output" \ --env TMPDIR="/output/tmp" \ "${SIF_PATH}" \ /opt/deepvariant/bin/run_deepvariant \ --model_type=WGS \ --ref="/mnt/ref/chm13v2.0.fa" \ --reads="/mnt/projects/simon/2025-simon-autism/all_simon_bams/STOPPED_ADDING_chm13_SD_bams/${PROBAND_ID}.sorted.bam" \ --regions="${REGION}" \ --sample_name="${PROBAND_ID}" \ --output_vcf="/output/${PROBAND_ID}_chr13.vcf.gz" \ --output_gvcf="/output/${PROBAND_ID}_chr13.g.vcf.gz" \ --intermediate_results_dir="/output/intermediate" \ --num_shards=8 echo "DeepVariant completed. Results in: ${SCRATCH_DIR}" ls -lh "${SCRATCH_DIR}" # Show any variants found echo "Variants found in region:" zcat "${SCRATCH_DIR}/${PROBAND_ID}_chr13.vcf.gz" | grep -v "^#" | head -10 # Create permanent output directory and copy results PERMANENT_DIR="/quobyte/mydennisgrp/mabuelanin/tmp" mkdir -p "${PERMANENT_DIR}" echo "Copying results to: ${PERMANENT_DIR}" rsync -av "${SCRATCH_DIR}/" "${PERMANENT_DIR}/" # Verify copy was successful echo "Results copied successfully:" ls -lh "${PERMANENT_DIR}" # Clean up scratch directory echo "Cleaning up scratch directory: ${SCRATCH_DIR}" rm -rf "${SCRATCH_DIR}" echo "Job completed! Results available at: ${PERMANENT_DIR}" ``` ### Submit the job ```bash sbatch run_deepvariant.sbatch ``` ### Copy results when finished ```bash # Check job completion first squeue -u $USER ``` <hr style="border: 0; border-top: 6px solid currentColor; border-bottom: 2px solid currentColor; height: 0; margin: 1.25rem 0;"> # 11. Snakemake with SLURM Integration ## 11.1 Environment Setup ```bash # Create conda environment for Snakemake mamba create -n snakemake snakemake -y mamba activate snakemake pip install snakemake-executor-plugin-slurm ``` ## 11.2 Multi-Sample DeepVariant Workflow Create `Snakefile`: ```python # Configuration for multiple samples SAMPLES = [ "SSC00003_proband_11006", "SSC00004_father_11006" ] BASE_DIR = "/quobyte/mydennisgrp/mabuelanin" REFERENCE = "/quobyte/mydennisgrp/mabuelanin/ref/chm13v2.0.fa" BAM_DIR = "/quobyte/mydennisgrp/mabuelanin/projects/simon/2025-simon-autism/all_simon_bams/STOPPED_ADDING_chm13_SD_bams" SIF_PATH = "/quobyte/mydennisgrp/mabuelanin/apps/sifs/deepvariant_1.9.0.sif" REGION = "chr13:56356433-56364371" OUTPUT_DIR = "/quobyte/mydennisgrp/mabuelanin/tmp" USER = "mhussien" rule all: input: expand(OUTPUT_DIR + "/{sample}_chr13.vcf.gz", sample=SAMPLES) rule deepvariant: input: ref = REFERENCE, bam = BAM_DIR + "/{sample}.sorted.bam" output: vcf = OUTPUT_DIR + "/{sample}_chr13.vcf.gz", gvcf = OUTPUT_DIR + "/{sample}_chr13.g.vcf.gz" log: OUTPUT_DIR + "/logs/{sample}_deepvariant.log" resources: mem_mb = 100000, runtime = 600, # 10 hours in minutes cpus = 32, slurm_account = "mydennisgrp", slurm_partition = "high", slurm_job_name = "deepvariant_{sample}" shell: """ set -euo pipefail # Exit on any error # Create scratch directory SCRATCH_DIR="/scratch/{USER}/deepvariant_{wildcards.sample}_${{SLURM_JOB_ID}}" mkdir -p "${{SCRATCH_DIR}}" mkdir -p "${{SCRATCH_DIR}}/tmp" echo "Processing sample: {wildcards.sample}" echo "Working directory: ${{SCRATCH_DIR}}" echo "Analyzing region: {REGION}" echo "Starting DeepVariant analysis..." # Run DeepVariant apptainer run \ --bind "{BASE_DIR}:/mnt" \ --bind "${{SCRATCH_DIR}}:/output" \ --env TMPDIR="/output/tmp" \ "{SIF_PATH}" \ /opt/deepvariant/bin/run_deepvariant \ --model_type=WGS \ --ref="/mnt/ref/chm13v2.0.fa" \ --reads="/mnt/projects/simon/2025-simon-autism/all_simon_bams/STOPPED_ADDING_chm13_SD_bams/{wildcards.sample}.sorted.bam" \ --regions="{REGION}" \ --sample_name="{wildcards.sample}" \ --output_vcf="/output/{wildcards.sample}_chr13.vcf.gz" \ --output_gvcf="/output/{wildcards.sample}_chr13.g.vcf.gz" \ --intermediate_results_dir="/output/intermediate" \ --num_shards=8 echo "DeepVariant completed for {wildcards.sample}. Results in: ${{SCRATCH_DIR}}" # Check if output files exist before proceeding if [[ -f "${{SCRATCH_DIR}}/{wildcards.sample}_chr13.vcf.gz" ]]; then echo "VCF file created successfully" # Show any variants found (with error handling) echo "Variants found in region for {wildcards.sample}:" zcat "${{SCRATCH_DIR}}/{wildcards.sample}_chr13.vcf.gz" | grep -v "^#" | head -5 || echo "No variants found or error reading file" # Copy results to permanent storage mkdir -p "{OUTPUT_DIR}" echo "Copying results for {wildcards.sample} to: {OUTPUT_DIR}" rsync -av "${{SCRATCH_DIR}}/" "{OUTPUT_DIR}/" # Verify copy was successful if [[ -f "{OUTPUT_DIR}/{wildcards.sample}_chr13.vcf.gz" ]]; then echo "Results copied successfully for {wildcards.sample}:" ls -lh "{OUTPUT_DIR}/{wildcards.sample}_chr13"* else echo "ERROR: Copy failed for {wildcards.sample}" exit 1 fi else echo "ERROR: DeepVariant failed to create output files" exit 1 fi # Clean up (with error handling) echo "Cleaning up scratch directory: ${{SCRATCH_DIR}}" if [[ -d "${{SCRATCH_DIR}}" ]]; then rm -rf "${{SCRATCH_DIR}}" || echo "Warning: Could not fully clean scratch directory" fi echo "Job completed successfully for {wildcards.sample}! Results available at: {OUTPUT_DIR}" """ ``` ## Execution Commands ```bash # Activate environment conda activate snakemake # Dry run to preview all jobs snakemake -n # Run all samples in parallel (max 2 jobs simultaneously) snakemake --executor slurm --jobs 2 ``` <hr style="border: 0; border-top: 6px solid currentColor; border-bottom: 2px solid currentColor; height: 0; margin: 1.25rem 0;"> # 12. Monitoring Disk Usage with Duc > *Dude, where are my bytes?* > [Official Documentation](https://duc.zevv.nl/) **Duc** is a blazing-fast disk usage analyzer that works by first **indexing** the directories you want to inspect, and then letting you **browse** the index interactively. It’s an excellent way to explore large file systems without repeatedly scanning everything. ![Duc interface example](https://hackmd.io/_uploads/SylbraK3gx.png) --- ## Installation You can install Duc easily via **conda**: ```bash conda create -n duc duc -y ``` Then activate the environment as usual: ```bash conda activate duc ``` --- ## Indexing Large Directories When dealing with very large directories, it’s often useful to exclude unnecessary paths — such as hidden files, Conda environments, and working directories from Snakemake or Nextflow. Here’s an example of how to create an index efficiently: ```bash duc index -p \ --exclude=".*" \ --exclude=miniconda \ --exclude="*conda*" \ --exclude="from-lssc0/programs" \ --exclude=".nextflow" \ --exclude=".snakemake" \ --max-depth=10 \ /quobyte/mydennisgrp/from-lssc0/users \ /quobyte/mydennisgrp/from-lssc0/projects ``` > *The index is stored in your local `~/.cache/duc/` directory.* To view any indexed directory, run for example: ```bash duc ui /quobyte/mydennisgrp/from-lssc0/projects ``` --- ## Sharing the Index with Other Users If you want others to be able to view the existing index **without installing Duc** or re-indexing the data, simply link the shared database file to their local cache: ```bash mkdir -p $HOME/.cache/duc/ ln -s /quobyte/mydennisgrp/from-lssc0/duc.db $HOME/.cache/duc/ ``` Then they can launch the Duc UI directly: ```bash # View users directory /quobyte/mydennisgrp/mabuelanin/miniforge3/envs/duc/bin/duc ui /quobyte/mydennisgrp/from-lssc0/users # View projects directory /quobyte/mydennisgrp/mabuelanin/miniforge3/envs/duc/bin/duc ui /quobyte/mydennisgrp/from-lssc0/projects ``` <hr style="border: 0; border-top: 6px solid currentColor; border-bottom: 2px solid currentColor; height: 0; margin: 1.25rem 0;"> # 13. Dennislab Hive documentation and management https://github.com/mydennislab/hive # Other Topics - Termius - Vscode - Github