owned this note
owned this note
Published
Linked with GitHub
---
tags: datalab,farm,2023
---
# A hands-on introduction to the farm HPC at UC Davis
[toc]
Have you ever felt like you needed more disk space? Or memory? Or CPUs? Or access to GPUs? Is your laptop just not cutting it any more? Or you just want to, like, give your laptop a vacation? Boy are you in luck!
UC Davis has several compute clusters, aka "high performance computers", that let you log into remote computers and use them for bigger compute. But they are not SUPER easy to use… and there are lots of details!
In this session, we'll cover some or all of:
* logging into farm with ssh from Mac, Windows, and Linux computers.
* the basics of working on a multiuser system: disk space, processes, other users, and permissions.
* using slurm to request computing resources (memory and CPU)
* the basics of using the module system and also conda to install your own software
* how to run R and Python scripts on remote systems in "headless" mode
* various and sundry other topics that titus is blanking on at the moment but trust me they will be fascinating
This will be a precursor to another hands-on session - using and abusing RStudio Server on the cluster.
Join me on Friday (Jan 27th) at 2pm! I'll plan on an hour.
Remote attendees welcome! And I'll record stuff.
### Important! You will want a farm account! SIGN UP NOW PLEASE.
If you already have a farm account, you should be ready to go!
If you need one, you can get one! Please follow the instructions [here](https://github.com/dib-lab/farm-notes/blob/latest/getting-started.md) but instead of indicating 'Brown' as supervisor, please use 'Datalab (Brown, Titus)'.
---
Watch this space for location and zoom connection info! You can subscribe to this issue to get updates about just this workshop. Or you can e-mail me at ctbrown@ucdavis.edu :).
## Notes
### Logging in with ssh
See: [Connecting to remote computers with ssh](https://ngs-docs.github.io/2021-august-remote-computing/connecting-to-remote-computers-with-ssh.html)
Typically you need:
* an ssh program (`ssh` on Linux, Mac; something like MobaXterm on Windows)
* a public/private key pair
Then you use your private key to log into your account on farm.
Hostname: farm.cse.ucdavis.ed
Username: (account name)
& no password, you use your private key.
Typical shell command line:
```
ssh username@farm.cse.ucdavis.edu -i /path/to/pem/file
```
You can omit `-i /path/to/pem/file` if you are using a "standard" location like `~/.ssh/id_rsa`.
See [Using SSH private/public key pairs](https://ngs-docs.github.io/2021-august-remote-computing/running-programs-on-remote-computers-and-retrieving-the-results.html#using-ssh-privatepublic-key-pairs) for some background info.
### Working on a multiuser system
See [Working on farm](https://ngs-docs.github.io/2021-august-remote-computing/running-programs-on-remote-computers-and-retrieving-the-results.html#working-on-farm) and [File systems, directories, and shared systems](https://ngs-docs.github.io/2021-august-remote-computing/running-programs-on-remote-computers-and-retrieving-the-results.html#file-systems-directories-and-shared-systems).
`whoami` gives you your username.
`hostname` gives you the name of the computer you're logged into - should be `farm`.
`top` tells you what is currently running on this computer - `q` to exit.
### Allocating compute resources
By default you log into the farm head node.
This is for editing files, copying files, moving files, and setting up jobs.
You shouldn't run big processes here though! `top` should not be showing anything big.
You should instead be using the queue, which runs things on other computers on the farm cluster - generally referred to as 'nodes'.
Try running `squeue` - that shows you the jobs in the queue!
Ref: [Executing large analyses on HPC clusters with slurm](https://ngs-docs.github.io/2021-august-remote-computing/executing-large-analyses-on-hpc-clusters-with-slurm.html) for a more detailed Slurm tutorial.
### Allocating an interactive session:
Run:
```
srun -p high2 --time=1:00:00 --nodes 1 --cpus-per-task 1 --mem 5GB --pty /bin/bash
```
This asks for an allocation for 1 hour, of one CPU on one node, with 5 GB of RAM; using the 'high2' high priority queue.
The result will be to log you into a compute node (check out `hostname`!) with those resources.
If you go over those resources (time, memory, CPUs) your srun will be terminated.
### Running scripts via the queue
You can use `sbatch` to run scripts!
Put the following in a file named something like `runme.sh`:
```
#! /bin/bash -login
#SBATCH -J test
#SBATCH -t 0:01:00
#SBATCH -p high2
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 1
#SBATCH --mem=5gb
### run your commands here!
sleep 60
# Print out values of the current jobs SLURM environment variables
env | grep SLURM
```
and then execute:
```
sbatch runme.sh
```
and then run
```
squeue -u $USER
```
and you should see your job running!
### Module subsystem
There's a lot of software available via the module system: e.g.,
```
module avail R
```
Load one with:
```
module load spack/R/4.111
```
then run R, or type:
```
type R
```
to see what it runs.
You can unload with:
```
module unload spack/R
type R
```
### Conda/Mamba
The conda/mamba system is an alternate installation system that lets you
install lots of software in your account.
Installer: https://github.com/conda-forge/miniforge#mambaforge
You can install it like so:
```
bash ~ctbrown/shared-conda-on-farm/Miniforge3-Linux-x86_64.sh -b -p $HOME/miniforge3
```
and then log out, log back in - you should have a prompt that looks like
this:
```
(base) ctbrown@farm: ~/$
```
If not, you might need to run:
```
~/miniforge3/bin/conda shell.bash hook
```
and/or
```
conda init
```
Give installing something a try -
```
conda install -y mamba
mamba install -y tree
```
### Running R scripts
You can run any R code as a script like so:
```
#! /usr/bin/env Rscript
# <add R code below>
```
and then `chmod +x` the file.
This is now something you can run from an sbatch command.
### R isolated environments in conda
```
mamba create -n r_env -y r-base
conda activate r_env
```
and run R, or do:
```
type R
```
This is now your very own conda environment with R running in it.
### sbatch with conda environments
To activate conda environments in an sbatch script:
```
#! /bin/bash -login
#SBATCH -J conda-test
#SBATCH -t 0:01:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 4
#SBATCH --mem=5gb
# activate conda in general
. "/home/ctbrown/miniforge3/etc/profile.d/conda.sh"
# activate a specific conda environment, if you so choose
conda activate somename
### run your commands here!
# Print out values of the current jobs SLURM environment variables
env | grep SLURM
```
So now you could:
* install the R version * packages you want via conda
* write an R script to do the thing you want to do
* write an sbatch script to activate the conda environment and run the R code
* fun, profit
This will also work with Python!