# BI278 Lab #5 Notes
## Exercise#1
first, I ran `echo $BLASTDB` to run BLAST.
Then I ran `export
BLASTDB="$BLASTDB:/courses/bi278/Course_Materials/blastdb`, then ran `echo $BLASTDB` again to see whenre BLAST is located.
I ran `awk 'BEGIN {RS=">"} /hypothetical protein/ {print ">" $0}' ORS="" lab_04/PROKKA_09242022.faa` which does:
* Read the file you designate at the end of the command by whatever you tell it to use as a record separator (RS=">")
* Match a particular pattern (/pattern/) you give it in each record, such as the word ‘hypothetical’
* When a record contains that pattern, it then prints the symbol ‘>’ followed by the record ($0)
* When it prints, it doesn’t use an additional output record separator (ORS="")
I ran the same thing with .ffn file as well
I made a directory for this week lab_05 by `mkdir lab_05`, then `awk 'BEGIN {RS=">"} /hypothetical protein/ {print ">" $0}' ORS="" lab_04/PROKKA_09242022.faa > lab_05/protein_keeper` which wrote the output to the new file protein_keeper.
Then, I ran `awk 'BEGIN {RS=">"; srand()} (rand()<=.002) {print ">"$0}' ORS="" lab_05/protein_keeper` to subset my data.
I made a subset file into its own fasta file by using the same comand `>` as above `awk 'BEGIN {RS=">"; srand()} (rand()<=.002) {print ">"$0}' ORS="" lab_05/protein_keeper > lab_05/subset_01`
Then, I wrote `blastp -task blastp-fast -num_threads 2 -db refseq_select_prot -evalue 1e-6 -outfmt "6 std ppos qcovs stitle sscinames staxid" -max_target_seqs 10 -query lab_05/subset_01 > lab_05/subset_blastp_fmt3.txt` to put everything into a file subset_blastp_fmt3.txt
Then, I ran `awk '$1=="BO395_01177" {print $2}' lab_05/subset_blastp_fmt3.txt | sort |uniq > subset_blastp_BO395_01177.sseqid.txt` to save them as a list.
Then, I ran `blastdbcmd -db refseq_select_prot -entry_batch lab_05/subset_blastp_BO395_01177.sseqid.txt` which showed me nothing.
## ecercise 1.3
I ran `awk -F "\t" '{print $1,$2,$15,$16,$13,$14,$11}' OFS="\t" lab_05/subset_blastp_fmt3.txt` to take the information out fro, the output file and reorganizes it.
Q. **What are the default values of the options matrix, evalue, max_target_seqs, num_descriptions, num_alignments for each type of BLAST? Under what circumstances would you use those defaults vs. when would you consider changing them?**
Default values of matrix for all are BLOSUM62, except it is PAM30 for blastp-short. Evalue is 10.0. max_target_seqs is 500, num_descriptions is also 500, and num_alignments is 250. I would change then when defaults dont work well.