Tips and tricks for working in Unity

# Tips and tricks for working in Unity Helpful resources: * [Transferring data to Unity](https://georgiastuart.gitlab.io/shared-cluster-to-unity/) * [Job submission and SLURM](https://georgiastuart.gitlab.io/shared-cluster-to-unity-2/) ## Running jobs in the Unity space: <font size=2.5>Following discussions with Unity staff - we now have 30TB for the MEC Lab in the `/nese/` partition. Jobs can be run from here and works just as efficiently as the work space and the MGHPCC. Location: `/nese/meclab/`. Make a folder here and run jobs from this location. - Add an additional command in your scripts to monitor resource usage: ``` seff ${SLURM_ARRAY_JOB_ID} >> ./logs/${SLURM_ARRAY_JOB_ID}.log # This assumes that the log file is named ./logs/${SLURM_ARRAY_JOB_ID}.log - this can be updated ``` - You can also do this after the job has completed as long as you know the jobID number (equivalent to `SLURM_ARRAY_JOB_ID` in the above script): `seff job_ID` ### Submitting a job [Translating LSF to SLURM cheatsheet](https://slurm.schedmd.com/rosetta.pdf) Main commands: ![](https://hackmd.io/_uploads/r1TO8cLi2.png) **Running an interactive job** If you are running smaller bash scripts, want to see output as you go, or in general are planning to use any commands other than simple navigation commands lik `ls` or `cd`, then it's good practice to start an interactive job. When we are logged into the cluster, we're working on the "head node", which is what everybody uses to navigate around the Unity cluster. When people perform tasks on the head node that require a lot of resources, it slows things down for everybody. Instead, if we want to have the freedom to navigate around the cluster and also run commands that might take awhile (e.g. moving or copying big files, unzipping/zipping large files), then it's best to open an interactive job. An interactive job just moves you away from the head node to your own separate node. It is functionally equivalent to working on the head node; but now, what you do will not affect other users on the cluster. An interactive job can be started by running this from the head node: `srun -c 6 -p cpu --pty bash` **Running a large job** Submitting a large job through slurm can be accomplished with the command `sbatch` (which is equivalent to `bsub` on the MGHPCC). Similar to `bsub`, parameters for `sbatch` (i.e. resources requested) can be included at the top of a script. This is a reasonable generic group of parameters to use at the top of an sbatch script: ``` #!/bin/bash #SBATCH -c 4 # Number of Cores per Task #SBATCH --mem=8192 # Requested Memory #SBATCH -p gpu # Partition #SBATCH -G 1 # Number of GPUs #SBATCH -t 01:00:00 # Job time limit #SBATCH -o slurm-%j.out # %j = job ID ``` These are the queues that can be called upon, as well as their maximum time limit for jobs: ![](https://i.imgur.com/76SGRMW.png) ![](https://hackmd.io/_uploads/H1_oScUo3.png) **Checking on your job** `squeue -u [user_nbame]` will show you whether the job is running or not. There is no equivalent to `bpeek` on Unity, but the log and error files are updated in real time so you can monitor those via something like `tail [job].log` **Killing a job** Check your job to see the job ID, then run `scancel [jobID]` ### Tips for working efficiently #### Aliases Usernames for Unity are longer than usual (e.g. username_umass_edu) and can be laborious to work with in specifying directory paths. Recommend setting up aliases early to overcome this. 1. Create a `.bashrc` file in your home directory `touch ~/.bashrc` 2. Edit the [.bashrc](https://www.digitalocean.com/community/tutorials/bashrc-file-in-linux) file using vi or a text editor 3. Add your aliases (see below for some examples) to the file 4. Save the file and run `source ~/.bashrc` to initiate the aliases. - Do this each time an alias is added to load the aliases in the current session. The `.bashrc` file loads automatically each time you log into the cluster, so the aliases will automatically load in the future. 5. [Use conda environments](https://docs.unity.rc.umass.edu/software/conda.html) to install packages without going through the Unity admin **Useful examples:** `alias proj=’cd /project/MEC_lab/username/’` `alias work=’cd /work/username_umass_edu/’` `alias sq=’squeue -u username_umass_edu’` ##### Helpful commands `module av` will show the programs available to load, which can be done with `module load [program]` `module av *` will show all modules with the text of *. For instance, `module av vcf` will show all modules with "vcf" in their name. This can be very useful if you know what software you want to run but don't remember the version on the cluster. ##### Useful resources Slurm job monitoring: https://help.jasmin.ac.uk/article/4892-how-to-monitor-slurm-jobs ### Submitting an array **Notes from Blair via Slack** So I learned the array through googling "how to convert LSF to SLURM" There's a SBATCH header for the array in SLURM which is actually much more intuitive than LSF: #SBATCH --array=1-299 this tells the sbatch handler how many jobs to split the array into (in this case it's 299). The SLURM_ARRAY_TASK_ID variable is the number of the array - so for JOBID_1, it's be 1, JOBID_2 = 2... etc. You can then use this variable to cycle through a file, for example, which essentially acts like a loop, where each loop is run as an independent job. file=$(ls 03_clone_filter_out/*1.fq.gz | sed -n ${SLURM_ARRAY_TASK_ID}p) creates a new variable , 'file' - in this command, it lists all the forward read files in the clone_filter directory, then selects the line number that corresponds to the array - so JOBID_1 would run the command on the FIRST file in that directory, the second job in the array runs the second file and so on The %a in the error/log outputs also corresponds to the SLURM_ARRAY_TASK_ID number, so you can get an independent log file for each of the array jobs rather than having it overwrite itself every time. Another good resource for understanding how to submit a job: https://slurm.schedmd.com/sbatch.html ### Setting up a conda environment Conda environments on Unity simplify the process of installing commonly used software packages relative to emailing IT. It can take a moment to get used to the conda lingo. Here are a couple good resources: Conda user manual: https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html Unity tips on using conda: https://docs.unity.rc.umass.edu/software/conda.html #### **Notes for conda environments given space limitations:** * By default, Conda installs files for environments into your home directory - but the quota for `$HOME` is currently (2023-04-10) only 50GB, which gets exhausted very quickly * Best practice is to redirect Conda to install environment packages to your work directory (you may or may not have a personal work directory - if you don't then you'll have access to Lisa's PI work directory). * To do this, add the below lines to the `.condarc` file in your home directory: ``` envs_dirs: - $WORK/.conda/envs pkgs_dirs: - $WORK/.conda/pkgs - /modules/apps/miniconda/4.8.3/pkgs ``` NOTE that if you are trying to use conda from your directory in the /nese/ space, you need to export the conda path in addition to changing the file above. For example, if I were to submit a script that uses my conda environment using sbatch, I would include in the script: ``` module load conda export CONDA_ENVS_PATH=/nese/meclab/Jamie/.conda/envs/:/home/jstoll_umass_edu/.conda/envs/ conda activate test <SCRIPT CONTENTS> conda deactivate ``` You must export the path you want to use for environments instead of the default home directory. If you do not do this, your script will not run and conda will throw many errors. You can also add additional conda channels in the .condarc file above if needed. ## GPU vs CPU CPUs are solid for most tasks and fast with transfers, but have few cores and limited ability to multi-task. GPUs are better for parallelizing bc they have many cores (thousands) but slower for data transfer. gpu-preempt has the most cores and shortest wait time, but you can get kicked off so don't want to run big jobs on it. gpu and gpu-long can run bigger jobs, but with longer wait times. ## Outdated information that might be useful in the future if things change There are 4 primary spaces that will be useful/required for job submissions on Unity (see https://docs.unity.rc.umass.edu/technical/storage.html for details): - `/work/username/`: this path is where you should run jobs from. There is a 3TB limit per user, and the MEC lab also has 3TB of space for file sharing (not currently active 07/26/2022). - `/project/MEC_lab/`: this space is the long-term storage space. Total storage memory is 30TB and houses data from all lab members. Create a directory for yourself within this space. - `/scratch/`: every time you submit a job to the node, a directory gets created in the scratch space. The directory created is `/scratch/[nodeid]/[jobid]/` and this is assigned to `$TMP`. This is primarily used for large volumes of intermediate files. Per user capacity = 40TB. The Unity website states that these files will be deleted at the completion of the job. So do not use this space for storage. - Note that ALL files in scratch are purged every 24 hours. ### Recommended procedure for running jobs on the cluster - If the outputs are expected to be relatively small (<3TB): - recommend running out of your own workspace and then transferring across to the project space for long-term storage. - If the outputs are expected to be large (>=3TB): - run the job out of your work space, but write outputs to scratch (i.e. $TMP), and then transfer these across to the project space for long-term storage.