QUEST: A guide to Northwestern's HPC

--- title: "QUEST: A guide to Northwestern's HPC" breaks: false tags: General --- # A guide for using Quest ### Quick links - **[Quest User Guide](https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=505)**: Extensive manual on Quest - **[Slurm Guide](https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1964#section-partitions)**: In depth guide for submitting jobs to Quest using Slurm - **[Basic Research Computing Guide](https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=2004#quest-fundamentals)**: Suite of various tutorial videos - **[Globus](https://app.globus.org/file-manager)**: Globus file transfer service for up/downloading files to Quest - **[Job Resource Logs](https://docs.google.com/spreadsheets/d/1P1zpt1aznNnXwMjUfynWioOXvGkjydFfhUBG_EppNMQ/edit?usp=sharing)**: List of jobs and their associated resources/efficiencies - **[Jupyter Hub](https://jupyter.questanalytics.northwestern.edu)**: Link to Jupyter Hub (requires NU VPN) --- ## Applying for free access To apply for free access on Quest, click **[this link](https://www.it.northwestern.edu/secure/forms/research/allocation-request-forms.html)** and under "Request a Research Allocation" and select *Research allocation 1 original*. This will bring up a long form; details about certain fields are below. **You will need a PI's email and netID.** :::info **Applying for access** - Will members of your allocation be using compute resources or only storage space? - Compute and storage - Please provide associated chart strings.... - NA (you can provide chart strings if you have them, it will not affect whether or not they approve your request) - Will members of your allocation be using the Quest Analytics nodes? - Yes - Statement of purpose - Not super serious, a few sentences about why you need compute resources - e.g. I plan on running a number of bioinformatics pipelines that involve assembling and analyzing Next Generation Sequencing data. The pipelines I intend to use require high computational power and the data I am analyzing requires large amounts of data storage. As a **non-NU affiliate**: - Your School: Weinberg College of Arts and Sciences - Your Affiliation: HN put visiting scholar - Is your research supported by any of the following? : HN put Other ::: --- ## Global Protect: VPN to NU :::danger I think you might not need a VPN to connect actually! ::: In order to connect to Quest, you must be connected to Northwestern's internet. Of course, if you're not on campus, you will need to use a VPN. **[Here](https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1818)** is NUIT's guide on setting up a VPN to NU called **Global Protect**. --- ## Connecting to Quest ### SSH into Quest on MacOS or Linux Mac and Linux devices can simply open their terminal application and use **SSH** to connect to Quest. To do so, type: ```bash! ssh <netID>@quest.northwestern.edu #eg ssh jzy0986@quest.northwestern.edu ``` :::info If you plan on using software that utilizes a GUI (a graphical display) through the command line or using R, add the `-X` flag to your **ssh** line. ```bash! ssh -X <netID>@quest.northwestern.edu ``` ::: #### Optional setup below If you would like to save time in the future and avoid typing the entire address every time you'd like to connect, you can edit ssh's config file by doing the following: ```bash! #In your terminal window (before connecting to quest) sudo nano ~/.ssh/config #This will open a new file via nano text editor #Type the following into the new file after entering your local device's password Host quest HostName quest.northwestern.edu User <netID> #enter your net ID here, such as jzy0986 ``` ![](https://i.imgur.com/bd1j3UQ.png) Save the file by hitting "control" and "x", then "y" and lastly "enter". Now you can ssh into Quest simply by typing ```bash! ssh quest ``` ### Windows devices For Windows devices, you may need to download some type of ssh-client. NUIT suggests PuTTY, and more instructions can be found **[here](https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1541)**. Windows also comes with a program called **Windows Powershell** that should work as well. ### Disconnecting from Quest To disconnect from Quest, simply type `exit` --- ## Quest Architecture When referring to the architecture of a High Performance Cluster (HPC) like Quest, we are describing the the physical and digital components of the computer system. See more details **[here](https://www.it.northwestern.edu/departments/it-services-support/research/computing/quest/specs.html)**. ### File & Folder Structure Once your application to Quest has been approved, you will have access to 2 important folders. These are your **home** and **project** directories. Here is a concise diagram describing their locations. #### Quest File Structure ![](https://i.imgur.com/h6tP6PT.png) Note that your project folder is not in your home folder, in fact they are on the same level. #### Symlink Every time you log on to Quest you will be sent to your home directory. You can create shortcuts (called symlinks) to easily navigate to other folders. If I wanted to add a shorcut in my home folder that would take me to my projects folder, I can make a symlink: ```bash! #ln -s <absolute path to folder> <where shortcut is located ln -s /projects/p31911 ~/project_shortcut ``` This is saying: add a shortcut to `/projects/p31911` in my home directory (shorthand `~/`) called "project_shorcut". Now when I type `ls` in my home directory, it will also list "project_shorcut" as a folder. If I `cd` into "project_shortcut", it takes me to /projects/p31911 ### Node Structure You can think of nodes on a HPC as individual computers that are connected together. As general access accounts, we have access to two types of nodes: **login** nodes and the **general-access compute** nodes. Every time you log on to Quest via `ssh`, you are accessing one of Quest's login nodes. ++Every login node has only 4 CPU's and 4 GB RAM++, which means the amount of computational power you have is relatively limited. You can perform simple tasks like moving files around, but more demanding tasks and analyses should be sent to compute nodes. The ++general-access compute nodes we have access to each have 28 CPU's and 84 GB of RAM available++. --- ## File Transfers to & from Quest Once you have a Quest account, you will be able to transfer files to and from your Quest allocation. For transfering files to and from a personal device, you can use **[Globus](https://app.globus.org/file-manager)**. Transfering large amounts of data from Fabronia (or buxbaumia or another remote server not on Globus) requires the use of `rsync`. > **[Here](https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1962)** is NUIT's guide on using Globus > [color=#907bf7] ### Transferring from Sequencing facility (Sharepoint or OneDrive) using Globus Sequencing will share your raw reads using OneDrive/SharePoint. > **[Here](https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1969)** is NUIT's guide on using Globus to download directly from Sharepoint ### Transferring from local devices using Globus 1) You must **log in** at **[globus.org](https://www.globus.org/)** by selecting Northwestern University as your instituion and entering your NU credentials when prompted. ![Globus login](https://i.imgur.com/PwoSd6K.png) 2) In the _Collection_ search bar, **search for "Northwestern Quest"** and click on it. There will be multiple options, make sure to click the one titled "Northwestern Quest" exactly. ![Globus collection search](https://i.imgur.com/7n2jU6X.png) ![Globus NU Quest entry](https://i.imgur.com/LFpjZ5G.png) 3) After clicking, you should now be able to see your home directory. Using the _Path_ bar or the GUI, you can **change directories/move files around**. You can also **upload/download** files and perform other simple tasks using the right hand panel. ![Example of home directory](https://i.imgur.com/q79lIJT.png) ![Upload/download files](https://i.imgur.com/923QHfK.png) 4) You can **bookmark** Quest so that you do not need to search for it every time you open globus. ![Bookmark Quest on Globus](https://i.imgur.com/de3yiyk.png) ### Transferring data from Sequencing facility The sequencing facility will store the data in one drive. There will be a link which can allow you to download to your drive and then transfer. To 1) You must **log in** at **[globus.org](https://www.globus.org/)** by selecting Northwestern University as your instituion and entering your NU credentials when prompted. ![Globus login](https://i.imgur.com/PwoSd6K.png) 2) In the _Collection_ search bar, **search for "Northwestern Quest One drive Pilot"** and click on it. There will be multiple options, make sure to click the one titled "Northwestern Quest" exactly. ![Globus collection search](https://i.imgur.com/7n2jU6X.png) 3) After clicking, you should now be able to see your home directory. Using the _Path_ bar or the GUI, you can **change directories/move files around**. The shared folder is usually "/Shared/". Below is a an example of samples from run "Fant09" ![Link to shared folder](https://hackmd.io/_uploads/SyL3qiJu2.png). You should be able to locate the files from NUSEQ core here. You can then log onto your "project" and drag and drop ### Transferring data from remote servers using Rsync If you need to transfer a large amount of data from a remote server to your Quest allocation, use the command `rsync`, short for "remote sync". **While connected to the remote server**, basic usage is as follows: ```bash rsync <flags> <path to contents to transfer> <questID@quest.northwestern.edu>:<destination path> ``` If I wanted to transfer a folder of raw reads in my home directory on Fabronia to my project folder on Quest, I would type: ```bash rsync -aivP /home/JZhang/rawreads/ jzy0986@quest.northwestern.edu:/projects/p31911/ ``` :::warning **Flags** - `a` is for archive mode, meaning recurse through subdirectories and to preserve things like permissions, ownership, last modification times, etc. for files. - `i` is for itemize changes, meaning print out a summary of changes if a file/folder is updated - `v` is for verbose output - `P` is for partial transferring, meaning if the transfer is interrupted, re-running the original command will pick up from where the previous transfer left off - If the destination server is not default port 22, you can add the flag `-e 'ssh -p xxxx'` where `xxxx` is the port ::: :::warning **Transfer Destination** Remember that you have 2 separate spaces to work with in Quest. One is your home directory which has **80gb** max, while your project allocation offers **1000gb** of storage. ::: :::danger **Buxbaumia issue** Buxbaumia does not allow for external connections to be made (to servers like quest), so `rsync` and other commands like `scp` do not work for copying/moving files between servers. ++You may need to manually transfer the files (using cyberduck).++ ::: --- ## Software ### Modules Quest has many preinstalled tools available through the use of **modules**. You can either search online **[here](https://www.it.northwestern.edu/departments/it-services-support/research/computing/quest-software-and-applications.html)** to see if a tool is preinstalled, or you can search within your terminal by typing: ```bash! module -r spider <module_name> ##e.g. if I wanted to search for trimmomatic, I could type [jzy0986@quser21~]$ module -r spider trim -------------------------------------------------------- trimmomatic: trimmomatic/0.39 -------------------------------------------------------- This module can be loaded directly: module load trimmomatic/0.39 Help: Module for trimmomatic 0.39 ``` If the softare you are looking for is available as a module, you can load the module, or check to see which modules you already have loaded, and get rid of modules you no longer need. ```bash! ###Load module load trimmomatic/0.39 ###List all the modules you have loaded module list ###Remove a module module purge trimmomatic/0.39 ###Remove all modules module purge ###To run trimmomatic in a bash script use the following (everyting before PE must be used in order for it to work in a bash script, everything after PE is just an example code of one sample) ALSO NOTE that you must give the full path to the software for the ILLUMINACLIP or have it in the working directoy!!: java -jar /software/trimmomatic/0.39/trimmomatic-0.39.jar PE -phred33 A_tomentosa_stenophylla_S10_R1_001.fastq.gz A_tomentosa_stenophylla_S10_R2_001.fastq.gz A_tomentosa_stenophylla_S10_R1_001_paired.fastq.gz A_tomentosa_stenophylla_S10_R1_001_unpaired.fastq.gz A_tomentosa_stenophylla_S10_R2_001_paired.fastq.gz A_tomentosa_stenophylla_S10_R2_001_unpaired.fastq.gz ILLUMINACLIP:/software/trimmomatic/0.39/adapters/TruSeq3-PE.fa:2:30:10:2::true LEADING:10 TRAILING:10 SLIDINGWINDOW:4:20 MINLEN:40 ``` :::info All modules will automatically be purged when you disconnect from quest ::: ### Conda/Mamba If the software isn't pre-installed on quest, you need to install it yourself. **Conda** is a good package/tool manager that can allow you to easily install these tools (as long as they are available on the conda channels). https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=2064 ### R https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1556 https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1560 ### Jupyter https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1810 --- ## Slurm: Submitting jobs to Quest **Slurm** is the tool to use when submitting jobs to Quest's computer nodes (opposed to the login nodes). This section will cover submitting jobs via bash scripts and also setting up an _interactive job_. :::info **What's an interactive job?** When you ask for an interactive job, Quest essentially gives you a terminal/access to a computational node with resources that you've requested. Here you can also run programs with a GUI through the command line, as long as you've added the `-X` flag when you ssh'ed into Quest. ::: :bulb: Some [Slurm commands](https://slurm.schedmd.com/quickstart.html) ### Creating bash jobs for Slurm Let's say we have a bash script called `test_bash.sh` that we want to submit to Quest's computational nodes. To do so, we need to **create and format our own bash script for slurm** so that our script will be succesfully added as a job. The first number of lines of slurm bash scripts are going to contain a number of arguments and will look as follows: :::info **Commenting in slurm scripts** Note that hashtag `#SBATCH` at the beginning of each `SBATCH` line is how slurm interprets these lines as arguments, whereas conventionally hashtags are interpreted as comments (aka ignored text). ::: ```bash #!/bin/bash #SBATCH --account=<ALLOCATION NAME> ## Required: your allocation/account name, i.e. eXXXX, pXXXX or bXXXX #SBATCH --partition=<PARTITION TYPE> ## Required: what type of partition (buyin, short, normal, long, gengpu, genhimem, etc) #SBATCH --time=00:10:00 ## Required: How long will the job need to run (remember different partitions have restrictions on this parameter) hours:minutes:seconds #SBATCH --mem=1G ## Suggested: how much RAM do you need per computer/node (this affects your FairShare score so be careful to not ask for more than you need) #SBATCH --nodes=1 ## Suggested: how many computers/nodes do you need (no default value but most likely 1) #SBATCH --ntasks-per-node=1 ## Suggested: how many cpus or processors do you need on per computer/node (default value 1) #SBATCH --job-name=<JOB NAME> ## Name of job #SBATCH --output=slurm_outlog.log ## File to log stdout #SBATCH --error=slurm_errors.log ## File to log errors #SBATCH --mail-type=<NOTIF ALERTS> ## When to notify user via email (begin, end, fail, all) #SBATCH --mail-user=<email address> ## Email address for sending notifications ``` :::warning **More details for slurm arguments** - **partition**: The type of partition you select should depend on your needs: ![Quest partitions](https://i.imgur.com/D69NWZt.png) - **time**: Instead of setting the time to the maximum value (based on the partition chosen), having the correct amount of time will get your job starting sooner (it moves up the queue if it's a shorter task). However, if the timer runs out before the job is completed the entire process will be killed. - **nodes**: This should be 1 unless you're sure your job can utilize running on multiple nodes - **ntasks-per-node** and **mem**: The maximum values you choose are based on nodes you have access to. Similar to **partition**, hogging resources unnecessarily will likely put you further back in the queue and cause your _FairShare_ score to go down. ![Quest nodes](https://i.imgur.com/ZxIrJhL.png) - **mail-type** and **mail-user**: these arguments must be used together. mail-type options for job notifications include when it "begins", "ends", "fails", or "all". ::: :::info **Fairshare Score** Your FairShare score is a reflection on how efficiently you use the resources you call for. If you call for 64 GB of ram but the job only utilizes 10 GB, your FairShare score will decrease. Those with lower scores have lower priority when submitting jobs in the future. ::: ### Writing the script In most cases we will be working through command line/bash. Once we have written out the slurm arguments, you can begin writing bash commands below. Here's an example slurm submission script: ```bash #!/bin/bash #SBATCH --account=p31911 #SBATCH --partition=short #SBATCH --time=00:05:00 #SBATCH --mem=1G #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --job-name=slurm_test #SBATCH --output=slurm_outlog.log #SBATCH --error=slurm_errors.log module purge ##remove previously loaded modules module load example_module eval "$(conda shell.bash hook)" ##activate conda conda activate /home/jzy0986/.conda/envs/genomics ##specify environment bash ./test_bash.sh ##run a bash script called test_bash ``` #### Modules and Conda Software and tools that have been installed or Quest need to be loaded in if necessary. Modules and conda are two approaches for doing so (Dylan has more notes about this). ### Submitting and checking on the job To **submit** the bash job via slurm, simply type `sbatch <slurm script>`. In our case, it would be `sbatch slurm_test.sh`. Once you submit a job, you will get a message either saying it was succesfully submitted with a **job ID** or an error message. There are also a number of commands to **check** the status/view statistics about a job, shown below. ```bash sbatch <script> #submit a bash script to slurm scancel <job ID> #cancel a job sacct -X #Shows the statuses of jobs you've run in the past 24 hours seff <job ID> #Shows resource usage and other statistics about a COMPLETE job (if incomplete, statistics will be inaccurate) squeue -u <netID> #Shows current jobs running for your user id #R = Running #PD = Pending checkjob <job ID> #Prints out details of a job ``` ### Requesting and using interactive jobs An **interactive job** essentially gives you access to more resources within a terminal session. As an example, if you'd like to perform a number of tasks in a terminal session with more than the default 4 core/4GB ram provided by login nodes, you can ask for an interactive session with more resources using the commands below. ```bash #salloc salloc -N <num nodes> -n <num cores> --account=<account> --mem=<memory>G --partition=<partition> --time=<hh:mm:ss> #srun srun -N <num nodes> -n <num cores> --account=<account> --mem=<memory>G --partition=<partition> --time=<hh:mm:ss> ``` :::info The flags for `salloc` and `srun` are the same arguments that you would use when submitting a slurm batch job (described earlier). ::: The **difference between** `salloc` and `srun` - `srun` will ++immediately open a new terminal session++ that grants the resources you ask for (once they are available) - `salloc` will ++create a new terminal session that you can ssh into++ (within Quest). When you request an interactive session with `salloc`, this new login will provided once the resources are available, and you can ssh into this new terminal: `ssh qnodexxxx` - If you exit out of the `srun` terminal, ++it ends the interactive job++. If you exit out of the `salloc` terminal, ++it will still exist++ until the time you requested is reached or you cancel the job. - If you intend to run a ++program with a GUI++ through command line, you will need to use `srun` and add the `--x11` flag as one of the arguments, along with having added the `-X` flag when you ssh'd into quest Here's an example of requesting an interactive job with `salloc`: ```bash [quser21 ~]$ salloc -N 1 -n 6 --account=p31911 --mem=10G --partition=short --time=00:15:00 salloc: Pending job allocation 276305 salloc: job 276305 queued and waiting for resources salloc: job 276305 has been allocated resources salloc: Granted job allocation 276305 salloc: Waiting for resource configuration salloc: Nodes qnode8029 are ready for job #this is the qnode you will ssh into [quser21 ~]$ ssh qnode8029 Warning: Permanently added 'qnode8029,172.20.134.29' (ECDSA) to the list of known hosts. [qnode8029 ~]$ #note the new username ``` :::danger **Can't find qnode to ssh into**. If `salloc` doesn't show you a node ID to ssh into, you can use the `squeue -u <netID>` command to show any running jobs. The interactive job and it's node ID will be displayed here. ![](https://i.imgur.com/KQhPW9u.png) ::: --- ## Allocations ### Joining another allocation To gain access/permissions to another user's existing allocation, you must fill out a form found **[here](https://www.it.northwestern.edu/secure/forms/research/allocation-request-forms.html)**. Select "Join an Existing Allocation" and fill out the form. You will need to do know the allocation ID. ### Exisitng allocations ```csvpreview {header="true"} Name, NetID, Allocation Alissa, amd7403, p31915 Andrew, amd9539, p31922 Andrea, , p32850 Brendan, bjc8165, p31939 Dylan (Amsonia), dcl9541,p31913 Dylan (IMLS), , p32416 Emma, ewf7555, p32037 Fernando, ?, p31923 Haley,?,? Hilary, scx4713, p31934 Jackson, vkr9577, p32812 Jeremie, jbf420, p31927 Jianing, usk1776, p32630 Johnathan, jzy0986, p31911 Julie, jkp8741, p31853 Luciana, ,p32835 Marne, imh4101, p32585 Nora, nsg2868, p31944 Nora (part 2),, p31984 Nyree, nzy820, p31924 Remek, oup0164, p32252 Rafael, ruc1765, p32080 Stephan, srg9103, p32038 Zoe, zdj0460, p31916 ``` --- ## Useful commands ### General commands ```bash checkproject <alloc-id> #shows how much space is left in your project directory homedu #shows how much space is left in your home directory rm -r -- */ #remove all folders in folder ls | wc -l #show # of items in folder ####A for loop to copy the first 10 files of a folder to another folder for file in $(ls -p | grep -v / | head -10) do cp $file <path to other folder> done ``` ### Slurm commands ```bash! sacct -X #Shows the jobs you've run in the past 24 hours scancel <job ID> #Cancels the job seff <job ID> #Shows statistics about a COMPLETE job #(if not complete statistics will be inaccurate) squeue -u <netID> #Shows current jobs running #R = Running #PD = Pending checkjob <job ID> #Prints details about a job ``` ### Permissions When you type `ls -l` the first column will show you permissions. ```bash drwxrwsr-x 15 jzy0986 p31911 4096 Feb 8 15:58 EG30 -rw-rw-r-- 1 jzy0986 p31911 16331121 Apr 28 2016 EG30_R1_test.fastq -rw-rw-r-- 1 jzy0986 p31911 13979559 Apr 28 2016 EG30_R2_test.fastq ``` It is 10 characters together, split as 1 descriptor character and then 3 groups of 3. The order of the characters is as follows: - Descriptor (1 character) - User (3 characters) - Group (3 characters) - All/other (3 characters) Each letter means: - r = read - w = write - x = execute - d = is directory As an example, file `EG30_R1_test.fastq` has the following permissions: - file | U: read write | G: read write | A: read #### To change permissions for a file or folder use `chmod` ```bash #who: u for users; g for the group; o for others #what: + to add, - to remove permissions #examples: chmod o-x example_file #remove execute permissions from others for example_file ``` --- ### Storage Space - Home, Project, Scratch Space is limited, but luckily we all get 1 TB of space in our **project** directory. This is where you should STORE your raw and downstream data files. ***WARNING DO NOT STORE RAW DATA IN THE HOME DIRECTORY!!!*** More info on Storage: (https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1546) --- **Home Directory** What lives here: batch submission scripts, job log files, and local package and software installations Disk space: 80gb Check space: ```homedu``` Navigate there: ```cd /home/<netid>``` --- **Project Directory** What lives here: High-speed storage which should be used for computation Input/Output (IO) and/or data analyses. ***STORE THE MAJORITY OF YOUR FILES/DATA HERE*** Disk Space: 1TB with a Research Allocation I and 2 TB with a Research Allocation II.(Pretty sure we all got the 1TB allocation) Check space: ```checkproject <allocation_id>``` Navigate there: ```cd /project/<allocation_id> ``` --- **Scratch Directory** (I dont think we have access to this, but should figure out how to get access) What lives here: High-speed storage which should be used for storing temporary files from running jobs, downloading data for processing, and short-term storage for large datasets. Disk Space: 5 TB Check space: ```checkscratch``` Navigate there: ```cd /scratch/<netid> ``` ---