# BI278 Lab #5 Notes ## Exercise#1 first, I ran `echo $BLASTDB` to run BLAST. Then I ran `export BLASTDB="$BLASTDB:/courses/bi278/Course_Materials/blastdb`, then ran `echo $BLASTDB` again to see whenre BLAST is located. I ran `awk 'BEGIN {RS=">"} /hypothetical protein/ {print ">" $0}' ORS="" lab_04/PROKKA_09242022.faa` which does: * Read the file you designate at the end of the command by whatever you tell it to use as a record separator (RS=">") * Match a particular pattern (/pattern/) you give it in each record, such as the word ‘hypothetical’ * When a record contains that pattern, it then prints the symbol ‘>’ followed by the record ($0) * When it prints, it doesn’t use an additional output record separator (ORS="") I ran the same thing with .ffn file as well I made a directory for this week lab_05 by `mkdir lab_05`, then `awk 'BEGIN {RS=">"} /hypothetical protein/ {print ">" $0}' ORS="" lab_04/PROKKA_09242022.faa > lab_05/protein_keeper` which wrote the output to the new file protein_keeper. Then, I ran `awk 'BEGIN {RS=">"; srand()} (rand()<=.002) {print ">"$0}' ORS="" lab_05/protein_keeper` to subset my data. I made a subset file into its own fasta file by using the same comand `>` as above `awk 'BEGIN {RS=">"; srand()} (rand()<=.002) {print ">"$0}' ORS="" lab_05/protein_keeper > lab_05/subset_01` Then, I wrote `blastp -task blastp-fast -num_threads 2 -db refseq_select_prot -evalue 1e-6 -outfmt "6 std ppos qcovs stitle sscinames staxid" -max_target_seqs 10 -query lab_05/subset_01 > lab_05/subset_blastp_fmt3.txt` to put everything into a file subset_blastp_fmt3.txt Then, I ran `awk '$1=="BO395_01177" {print $2}' lab_05/subset_blastp_fmt3.txt | sort |uniq > subset_blastp_BO395_01177.sseqid.txt` to save them as a list. Then, I ran `blastdbcmd -db refseq_select_prot -entry_batch lab_05/subset_blastp_BO395_01177.sseqid.txt` which showed me nothing. ## ecercise 1.3 I ran `awk -F "\t" '{print $1,$2,$15,$16,$13,$14,$11}' OFS="\t" lab_05/subset_blastp_fmt3.txt` to take the information out fro, the output file and reorganizes it. Q. **What are the default values of the options matrix, evalue, max_target_seqs, num_descriptions, num_alignments for each type of BLAST? Under what circumstances would you use those defaults vs. when would you consider changing them?** Default values of matrix for all are BLOSUM62, except it is PAM30 for blastp-short. Evalue is 10.0. max_target_seqs is 500, num_descriptions is also 500, and num_alignments is 250. I would change then when defaults dont work well.