# Lab 5
1. `ssh` onto bi278 like usual, `mkdir labwk5`
# Command line BLAST
2. To run a BLAST search from bi278 we need to tell BLAST+ where to look for my databases. This can be checked this by running the following command:
`echo $BLAST+`
This didn't work the first time, so to tell BLAST+ wehre my databases are I used the command
`export BLASTDB="$BLASTDB:/courses/bi278/Course_Materials/blastdb"`
3. I then checked that this worked this time by running the `echo $BLASTDB` command again and this time I got the result /courses/bi278/Course_Materials/blastdb so i know that it worked
Note that this command will need to be sued everytime I start a new session unless/until I modify the .bash_profile
4. Explore how BLAST+ works using the commands `blastn -help` and `blastp -help`
## **Run a Command Line BLASTP**##
5. My FASTA files from the lab lab were not complete so I am using the FASTA files from lab_04
`cd /courses/bi278/Course_Materials/lab_04/`
`ls`
which then yielded bbqs395 bbqs433 bhqs69
so `cd bbqs395` and then `ls`
to fgure out which of these three files I needed I used the command `head PRO*`
using these heads I determined I would need PROKKA_09162022.faa and PROKKA_09162022.ffn
6. so then I needed to copy these into the labwk5 directory that I had made using the commad
`cp PROKKA_09162022.faa ~/labwk5` and `cp PROKKA_09162022.ffn ~/labwk5`
7. I always check to make sure this works
`cd ~` `cd labwk5`
and once I've made sure the PROKKA files are there I can finally move back to seperating the records in a multi-fasta file
8. `awk 'BEGIN {RS=">"} /hypothetical protein/ {print ">"$0}' ORS="" PROKKA_inputfile | head`
In this case I left `hypothetical protein` and the PROKKA_inputfile was `PROKKA_09162022.faa` and `PROKKA_09162022.ffn`
9. to write the hypothetical proteins to their own fasta file (just the proteins):
`awk 'BEGIN {RS=">"} /hypothetical protein/ {print ">"$0}' ORS="" PROKKA_09162022.faa > hypPROKKA`
10. `awk 'BEGIN {RS=">"; srand()} (rand()<=.002) {print ">"$0}' ORS="" hypPROKKA`
then I'm moving this into another FASTA file
11. `awk 'BEGIN {RS=">"; srand()} (rand()<=.002) {print ">"$0}' ORS="" hypPROKKA > smPROKKA`
12. Run a protein blast
`blastp -task blastp-fast -num_threads 2 -db database_name -evalue 1e-6 -query faa_file`
in this case, se refseq_select_prot as the database, smPROKKA as faa file
Originally ths didn't work, so I got rid of -evalue 1e-6 and added a PATH so the command looked like
`blastp -task blastp-fast -num_threads 2 -db /courses/bi278/Course_Materials/blastdb/refseq_select_prot -query smPROKKA`
13. Next we wanted to try adding the specfications
` -outfmt "6 std ppos qcovs stitle sscinames staxid" -max_target_seqs 10`
or `-outfmt 3 -num_descriptions 10 -num_alignments 10`
in this case I used the first one so my command looked like
``blastp -task blastp-fast -num_threads 2 -db /courses/bi278/Course_Materials/blastdb/refseq_select_prot -outfmt "6 std ppos qcovs stitle sscinaes staxid" -max_target_seqs 10 -query smPROKKA > subsetblast``
14. while i was waiting for this to run, I went back to this week's folder in bi278 to look at the example outfput files
`cd ~` `cd /courses/bi278/Course_Materials/lab_05` `ls`
`head example*blastp*outfmt*txt`
15. then we can save the subject sequence ids as a list
`awk '$1=="BO395_01177" {print $2}' example_blastp_outfmt6.txt | sort | uniq > example_blastp_BO395_01177.sseqids.txt`
16. Then we can us `blastdbcmd` to find entries from the search target database that macth the subject sequence ids using `blastdbcmd -db refseq_select_prot -entry_batch example_blastp_BO395_01177.sseqids.txt`
## **Run Command Line BLASTN**
17. Now to use the nucleotide BLAST search, mainy using blastn instead of blastp (proteins)
18. the basic command is `blastn -task megablast -num_threads 2 -query ffn_file -db nt -evalue 1e-6`
in my case I modified this a little so it loked like
`blastn -task megablast -num_threads 2 -query ffn_file -db /courses/bi278/Course_Materials/blastdb/ nt`
19. to run a nucleotide BLAST search use the command ``
since this takesa. while, skipping to use the output files in the lab folder.
However, i didn't see these files so I am skipping ahead and ignoring question 1
## **Parse the Output of BLAST Results**
20.