# MEC Lab Guidelines for working on the MGHPCC <font size=2>For information about working on the cluster in general, see our [Intro to using the MGHPCC](https://hackmd.io/dbB1u8JuT3qJDqp5fkE9dw?view) and the [official MGHPCC wiki](http://wiki.umassrc.org/wiki/index.php/Main_Page). Personal work spaces on the MGHPCC only have 50gb of storage space. Therefore, we often work in a shared cluster space where we have a **total** of 20 TB of storage. The shared space is located here: `/project/uma_lisa_komoroske`.</font> ## Basic rules and considerations - **Monitor your space usage** - <font size=2>Because the project space is shared, the jobs of others in the lab could fail or be delayed if one person is using a disproportionate amount of space. Be sure to clean out your directory frequently (for example checking on this once per week). - The command `du -sh [directory]` will tell you space usage details. - The command `du --max-depth=1 -h /project/uma_lisa_komoroske/` will tell you space usage details for everyone in the project space, which can help you understand how much space is being used/is available total.</font> - **Be EXTREMELY careful with using the `-rm` and `-mv` commands**. - <font size=2>You should only be using this command within your own folder, and even then with caution. It can be helpful to work within filezilla when removing files for the purpose of caution. - Use caution with the mv command when moving or renaming files.</font> - **Do not navigate into other people's directories within the project space**. - <font size=2>You should only be moving within your own or within project specific folders that you are working with. - This is different, of course, if you have a prior arrangement and/or are collaborating.</font> - **Compare md5sums when downloading or transferring files to and from the cluster** - <font size=2>Files may become corrupted after transfer. This can be a very big problem if data are lost and we aren't aware of it.</font> - **If running any commands that take time (e.g. >30s)** - <font size=2>You should submit a request for an interactive job, as not to pull resources from others on the head node. - <font size=2> The bash script below will start an interactive job, and the parameters can be edited to suit: `bsub -Is -q interactive -R "rusage[mem=8192] span[hosts=1]" -n 1 -W 4:00 /bin/bash` - **If you have technical issues**, <font size=2>the cluster IT people are *very* helpful (and nice!). They can be reached at hpcc-support@umassmed.edu</font> ## Learning command line and basic commands **Before using our project space, you need to have proficiency with the command line**. This includes: 1) Navigating through directories, 2) Making directories, 3) Adding and safely removing files, 4) Editing text with a text editor, 5) Writing scripts. <font size=2>You should be comfortable with all [commands in this bootcamp](http://korflab.ucdavis.edu/bootcamp.html). Practice these in your own space (`/home/[user_name]`), and you will be given access to the project space once you have displayed proficiency in these basic commands. Refer to the <a href="#cheatsheet"> Unix command cheatsheet </a> at the bottom of this page for help.</font> ## Installing packages If you need to install a package, you can install it to your personal directory using commands like `git clone [repository]` `apt-get`, or `pip install`. <font size=2>However, sometimes there are loads of dependencies and it's much easier to have a package made available globally. In this case, email the cluster IT people at hpcc-support@umassmed.edu. Send them a link to the software/package and ask if they'd kindly be willing to install it. They will make the package globally available to all users via `module load`. Another fantastic option for installing packages that gives flexibility and ease is to create and use a [conda environment](https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf). **For more info about installing/activating packages, accessing the cluster, submitting jobs, and transferring files, see our [Intro to using the MGHPCC](https://hackmd.io/dbB1u8JuT3qJDqp5fkE9dw?view)**</font> ## Tips and tricks ### <p id="cheatsheet">Unix command cheatsheet</p> #### Basic navigation: `ssh` - secure shell, log into a machine
 `clear` - clears the terminal of previous commands
 `bash` - enters into the bash shell (default on ND machines is -tch)
 `echo` - echo typed words, or a variable (which has $ before it)
 `declare` - defines a variable, i.e. declare myvar=hello. No spaces permitted without escape or quotes. `pwd` - displays the absolute path to the current directory from root (/)
 `ls` - listing or "let's see". lists the files in the current folder.
 `man COMMAND` - displays the manual pages for the command
 `cd` - change current directory
 `mkdir` - make a new directory
 `less FILE` - opens a readable file, does not allow you to edit it.
 `nano` - opens a file (or creates a new one) in a word pad style editor
 `mv FILE DESTINATION` - moves a file from one point to another
 `mv FILE NEWNAME` - renames a file
 `cp FILE DESTINATION` - copies a file from one point to another
 `rm FILE` - removes a file (-r flag needed to remove a directory)
 `scp FILE COMPUTER:DESTINATION` - moves a file from home computer to another computer, in the file designated after the :
 `scp COMPUTER:FILE DESTINATION` - moves a file from another computer to the current computer. `head` - list out the top ten lines of a file to the terminal
 `tail` - list the last ten lines of a file to the terminal
 `info COMMAND` - more extensive info than on the man pages
 `date` - outputs date and time
 `which COMMAND` - tells you where a program exists if it is within you PATH variable
 `ln -s DESTINATION LINKNAME` - creates a symbolic link from the destination (ie:dropbox in course directory) to a ./linkname/ (ie:linktodropbox). Deleting a softlink does not effect the actual directory (unless you use `rm -r`). `gzip FILE` - compress a file
 `gzip -d FILE` - decompress a file, file must end in .gz
 `tar cf FILE` - tar files together
 `tar xfz FILE` - untar files: e(x)tract (f)rom g(z)ip #### File System/System Information: `free -mg` - gives you the amount of RAM on the current machine in GB (-g)
 `du -h` - gives you disk usage information (in human readable format)
 `df -h` - gives you info about amount of disk free (in human readable format)
 `top` - shows all processes running and who owns them, their CPU usage, Memory usage, and other things (q to quit) #### Obtaining Data and Software: `wget http://...` - downloads file at http link (can also use with ftp link) `gzip -d file.gz` - decompresses file.gz (can be a tar.gz file)
 `bunzip2 file.bz2` - decompresses file.bz2 (can be tarred)
 `unzip file.zip` - decompresses file.zip (can be tarred) `tar -xf file.tar` - unpacks file.tar
 `tar -zxf file.tar.gz` - decompresses and unpacks a tar.gz file
 `tar -zcvf file.tar.gz targetdir` - makes file.tar.gz out of targetdir (z = gzip, c = create) #### Installing Software: `./configure --prefix=/path/to/install` - configures your source to install. Leave off prefix if you have root and want to install to /usr/bin
 `make` - creates binaries
 `make install` - installs to the path specified in prefix `arch` - check to see if we're running a 32- or 64-bit machine #### Special Characters: `/` - separates directory names in a file path (ex: MyDocuments/MyMovies/movie.mov). Also refers to "root" directory (/)
 `\` - "escape character", causes the single character following it to be read literally (ex: echo \\ echoes a single \)
' `$` - signals a variable. When inside a set of "" CANNOT be escaped and always signifies a variable. (i.e. echo "$HOME is my home")
 asterisk - used to match any number of characters. i.e. test*.txt, test*, *
 `?` - used to match any single character. i.e. test1?.txt, test?.txt
 `;` - used to signal the end of a command on a single line.
 `#` designates a comment in bash. Anything following will not be read by computer.
 `#!` - hash bang or shebang, tells the computer you are about to give it a program to use to interpret the code you are providing. i.e. #!/bin/bash
 `:` - used to separate directory locations in PATH variable.
 `>` - used to capture output to screen and put it into a file. #### Monitoring jobs To check status of jobs: `bjobs` To "peek" into log and error files of a specific running job: `bpeek [jobID]` To kill a specific job: `bkill [jobID]` To kill all running jobs: `bkill 0` List the users on the machine and the resources in use: `top` #### Loading modules To check modules available on cluster: `module avail` To load a module for the current session: `module load [package_name]` #### Submitting jobs Generic beginning of script for submitting to the **long queue**: ``` #!/bin/bash #Purpose of script #BSUB -q long #BSUB -W 40:00 #BSUB -R rusage[mem=16000] #BSUB -n 4 #BSUB -R span[hosts=1] #BSUB -e error_file.err #BSUB -oo log_file.log #Load required modules - examples module load star/2.7.0e module load gcc/8.1.0 module load samtools/1.9 module load RSEM/1.3.0 ``` Generic beginning of script for submitting to the **short queue** ``` #!/bin/bash #run fastqc #BSUB -q short #BSUB -W 4:00 #BSUB -R rusage[mem=1000] #BSUB -n 8 #BSUB -R span[hosts=1] #BSUB -e error_file.err #BSUB -oo log_file.log #Load required modules, e.g. module load fastqc/0.11.5 ``` #### Hints: - Tab will complete names to the next point of ambiguity
 - Hit the up key to go back through your previous commands
 - You can chain together multiple directories for commands. Ex. `cd ../../..` - goes up 3 directories
 - `Ctrl-c` kills a running program
 - Add directories to your PATH with `declare PATH=$PATH:` - `.bashrc` is your set up for your terminal in bash. You can add any code in this file that you wish to be executed upon entering bash (such as adding things to your path, aliases, variable declarations, etc).