# Lab 5 1. `ssh` onto bi278 like usual, `mkdir labwk5` # Command line BLAST 2. To run a BLAST search from bi278 we need to tell BLAST+ where to look for my databases. This can be checked this by running the following command: `echo $BLAST+` This didn't work the first time, so to tell BLAST+ wehre my databases are I used the command `export BLASTDB="$BLASTDB:/courses/bi278/Course_Materials/blastdb"` 3. I then checked that this worked this time by running the `echo $BLASTDB` command again and this time I got the result /courses/bi278/Course_Materials/blastdb so i know that it worked Note that this command will need to be sued everytime I start a new session unless/until I modify the .bash_profile 4. Explore how BLAST+ works using the commands `blastn -help` and `blastp -help` ## **Run a Command Line BLASTP**## 5. My FASTA files from the lab lab were not complete so I am using the FASTA files from lab_04 `cd /courses/bi278/Course_Materials/lab_04/` `ls` which then yielded bbqs395 bbqs433 bhqs69 so `cd bbqs395` and then `ls` to fgure out which of these three files I needed I used the command `head PRO*` using these heads I determined I would need PROKKA_09162022.faa and PROKKA_09162022.ffn 6. so then I needed to copy these into the labwk5 directory that I had made using the commad `cp PROKKA_09162022.faa ~/labwk5` and `cp PROKKA_09162022.ffn ~/labwk5` 7. I always check to make sure this works `cd ~` `cd labwk5` and once I've made sure the PROKKA files are there I can finally move back to seperating the records in a multi-fasta file 8. `awk 'BEGIN {RS=">"} /hypothetical protein/ {print ">"$0}' ORS="" PROKKA_inputfile | head` In this case I left `hypothetical protein` and the PROKKA_inputfile was `PROKKA_09162022.faa` and `PROKKA_09162022.ffn` 9. to write the hypothetical proteins to their own fasta file (just the proteins): `awk 'BEGIN {RS=">"} /hypothetical protein/ {print ">"$0}' ORS="" PROKKA_09162022.faa > hypPROKKA` 10. `awk 'BEGIN {RS=">"; srand()} (rand()<=.002) {print ">"$0}' ORS="" hypPROKKA` then I'm moving this into another FASTA file 11. `awk 'BEGIN {RS=">"; srand()} (rand()<=.002) {print ">"$0}' ORS="" hypPROKKA > smPROKKA` 12. Run a protein blast `blastp -task blastp-fast -num_threads 2 -db database_name -evalue 1e-6 -query faa_file` in this case, se refseq_select_prot as the database, smPROKKA as faa file Originally ths didn't work, so I got rid of -evalue 1e-6 and added a PATH so the command looked like `blastp -task blastp-fast -num_threads 2 -db /courses/bi278/Course_Materials/blastdb/refseq_select_prot -query smPROKKA` 13. Next we wanted to try adding the specfications ` -outfmt "6 std ppos qcovs stitle sscinames staxid" -max_target_seqs 10` or `-outfmt 3 -num_descriptions 10 -num_alignments 10` in this case I used the first one so my command looked like ``blastp -task blastp-fast -num_threads 2 -db /courses/bi278/Course_Materials/blastdb/refseq_select_prot -outfmt "6 std ppos qcovs stitle sscinaes staxid" -max_target_seqs 10 -query smPROKKA > subsetblast`` 14. while i was waiting for this to run, I went back to this week's folder in bi278 to look at the example outfput files `cd ~` `cd /courses/bi278/Course_Materials/lab_05` `ls` `head example*blastp*outfmt*txt` 15. then we can save the subject sequence ids as a list `awk '$1=="BO395_01177" {print $2}' example_blastp_outfmt6.txt | sort | uniq > example_blastp_BO395_01177.sseqids.txt` 16. Then we can us `blastdbcmd` to find entries from the search target database that macth the subject sequence ids using `blastdbcmd -db refseq_select_prot -entry_batch example_blastp_BO395_01177.sseqids.txt` ## **Run Command Line BLASTN** 17. Now to use the nucleotide BLAST search, mainy using blastn instead of blastp (proteins) 18. the basic command is `blastn -task megablast -num_threads 2 -query ffn_file -db nt -evalue 1e-6` in my case I modified this a little so it loked like `blastn -task megablast -num_threads 2 -query ffn_file -db /courses/bi278/Course_Materials/blastdb/ nt` 19. to run a nucleotide BLAST search use the command `` since this takesa. while, skipping to use the output files in the lab folder. However, i didn't see these files so I am skipping ahead and ignoring question 1 ## **Parse the Output of BLAST Results** 20.