Marne's trial and error

--- title: "Marne's Attempt at Bioinformatics" breaks: false tags: General --- # Trial and error ### Quick links - **[Powerpoint from Suzy](https://github.com/CBG-Conservation-Genomics/BotanyWorkshop2024/blob/main/2.BaselineSkills/2.BaseSkills.pptx)** - **[Linex commands from Suzy](https://github.com/CBG-Conservation-Genomics/BotanyWorkshop2024/blob/main/2.BaselineSkills/2.Bioinfo_unix_command_cheatsheet.pdf)** - **[Quest User Guide](https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=505)**: Extensive manual on Quest - **[Slurm Guide](https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=1964#section-partitions)**: In depth guide for submitting jobs to Quest using Slurm - **[Basic Research Computing Guide](https://services.northwestern.edu/TDClient/30/Portal/KB/ArticleDet?ID=2004#quest-fundamentals)**: Suite of various tutorial videos - **[Globus](https://app.globus.org/file-manager)**: Globus file transfer service for up/downloading files to Quest - **[Job Resource Logs](https://docs.google.com/spreadsheets/d/1P1zpt1aznNnXwMjUfynWioOXvGkjydFfhUBG_EppNMQ/edit?usp=sharing)**: List of jobs and their associated resources/efficiencies - **[Jupyter Hub](https://jupyter.questanalytics.northwestern.edu)**: Link to Jupyter Hub (requires NU VPN) --- ### Logging in to quest 1) Open the terminal (Command Prompt) on the laptop. 2) Type the below code then type your password. It will look like you're not typing but you are. ```bash! C:\Users\mquig>ssh imh4101@quest.northwestern.edu Password: ends in * ``` 3) A bunch of stuff will pop up. 4) Type **quit** to exit. --- ### Globus **What is globus????** It looks like globus is a program that shows you where your files are and you can use to transfer files. So exciting. You can access this with your NU username and password. **[Link to Globus](https://www.globus.org/)** #### How to use Globus... 1) Go to file manager. 2) Seach for Northwestern Quest under Collections 3) View your project folder by typing **/projects/p32585/** in the path. 4) Jeremie put the elm files in there. --- ### Next lets talk about Modules Ok so it looks like modules are just different software (like R). They are tools. ```bash! #to load a module module load name_of_module/version #to see all the modules you have module list #remove a module module purge name_of_module/version #remove ALL modules module purge ``` These are some examples of modules you can install: - R and Rstudio - Conda/Mamba - Jupyter So Brenden used Conda a lot in his analysis, so lets try downloading that. ```bash! #this is an environment manager for data science packages like R and python #conda is short for anaconda module purge module load python-miniconda3/4.12.0 #it didnt look like anything happened #now we need to initialize your shell to use the conda conda init bash source ~/.bashrc #now conda is ready to use every time I log in! ``` #### All modules purge everytime you close the terminal!!! --- ### In the slurm, we all fam **So what is a slurm???** - it is the tool to use when submitting jobs to the computer nodes - basically it puts you in the job line to run your analysis ```bash! #SBATCH --account=<ALLOCATION NAME> ## Required: your allocation/account name, i.e. eXXXX, pXXXX or bXXXX #SBATCH --partition=<PARTITION TYPE> ## Required: what type of partition (buyin, short, normal, long, gengpu, genhimem, etc) #SBATCH --time=00:10:00 ## Required: How long will the job need to run (remember different partitions have restrictions on this parameter) hours:minutes:seconds #SBATCH --mem=1G ## Suggested: how much RAM do you need per computer/node (this affects your FairShare score so be careful to not ask for more than you need) #SBATCH --nodes=1 ## Suggested: how many computers/nodes do you need (no default value but most likely 1) #SBATCH --ntasks-per-node=1 ## Suggested: how many cpus or processors do you need on per computer/node (default value 1) #SBATCH --job-name=<JOB NAME> ## Name of job #SBATCH --output=slurm_outlog.log ## File to log stdout #SBATCH --error=slurm_errors.log ## File to log errors #SBATCH --mail-type=<NOTIF ALERTS> ## When to notify user via email (begin, end, fail, all) #SBATCH --mail-user=<email address> ## Email address for sending notifications ``` **NOTE: your fairshare score is kinda like a credit score... if you ask for 64 GB of ram but only use 10 GB then you get points taken off and the lower scores get lower priority when deciding the order of jobs** --- ### Lets put everything together and write a script! ```bash! #!/bin/bash #SBATCH --account=p31911 #SBATCH --partition=short #SBATCH --time=00:05:00 #SBATCH --mem=1G #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --job-name=slurm_test #SBATCH --output=slurm_outlog.log #SBATCH --error=slurm_errors.log module purge ##remove previously loaded modules module load example_module eval "$(conda shell.bash hook)" ##activate conda conda activate /home/imh4101/.conda/envs/genomics ##specify environment bash ./test_bash.sh ##run a bash script called test_bash ``` Great! Now you have a script... but how to you make it go??? ```bash! sbatch <script> OR sbatch slurm_test.sh #submit a bash script to slurm scancel <job ID> #cancel a job sacct -X #Shows the statuses of jobs you've run in the past 24 hours seff <job ID> #Shows resource usage and other statistics about a COMPLETE job (if incomplete, statistics will be inaccurate) squeue -u <netID> #Shows current jobs running for your user id #R = Running #PD = Pending checkjob <job ID> #Prints out details of a job ``` So this can take HOURS, so most people set it up to run overnight and hope for the best.