# BI278 Lab#6 Notes
## Exercise 1
First, I ran `export BLASTDB="/courses/bi278/Course_Materials/blastdb"` and `echo $BLASTDB` to establish where the system should look for the BLAST database.
Next, i ran rpsblast by putting in`rpsblast -query lab_04/PROKKA_09242022.faa -db Cog -evalue 0.01 -max_target_seqs 25 -outfmt "6 qseqid sseqid evalue qcovs stitle" > PROKKA_09242022.rpsblast.table`
Then, I ran `awk -F'[\t,]' '!x[$1]++ && $4>=70 {print $1,$5}' OFS="\t" lab_06/PROKKA_09242022.rpsblast.table > lab_06/PROKKA_09242022.cog.table`. This command keep only the best COG match for each query sequence that have query coverage of at least 70.
Then, I ran `awk -F "\t" 'NR==FNR {a[$1]=$2;next} {if ($2 in a){print $1, $2, a[$2]} else {print $0}}' OFS="\t" /courses/bi278/Course_Materials/blastdb/cognames2003-2014.tab lab_06/PROKKA_09242022.cog.table > temp` which takes the first file cognames2003-2014.tab and makes a look-up table that can access the categories by COG number. It then takes the second file
PROKKA_09242022.cog.table and goes through it row by row and looks up what category the COG number in that row belongs to. It then writes the row along with the COG category that it looked up and stores it in a new file.
next, I ran `awk -F "\t" '{if ( length($3)>1 ) { $3 = substr($3, 0, 1) } else { $3 = $3 }; print}' OFS="\t" lab_06/temp > lab_06/temp2`, which This command keeps the first category (letter) from the third field. We want to do this because COGs can belong to multiple functional categories but the first one is its primary category.
I ran `awk -F "\t" 'NR==FNR {a[$1]=$2;next} {if ($3 in a){print $0, a[$3]} else {print $0}}' OFS="\t" /courses/bi278/Course_Materials/blastdb/fun2003-2014.tab lab_06/temp2 > lab_06/PROKKA_09242022.cog.categorized`. This command is to have a full description of the functional category as well.
At last, ran `rm lab_06/temp lab_06/temp2` to remove the temp and temp2 file which we don't need anymore
## Exercise 2
I first made a new directory myblastdb to have my own BLAST database
Then, I ran `export BLASTDB="/courses/bi278/Course_Materials/blastdb:/home2/khirat24/myblastdb"` , then `makeblastdb -in lab_04/PROKKA_09242022.faa -dbtype prot -title genomename_prot -out myblastdb/genomename_prot -parse_seqids` to make protein databases
I ran `makeblastdb -in lab_04/PROKKA_09242022.ffn -dbtype nucl -title genomename_nucl -out myblastdb/genomename_nucl -parse_seqids` to make nucleotide databases
`blastdb_aliastool -dbtype prot -title 4burk_prot -out myblastdb/4burk_prot -dblist "bb395_prot bb433_prot bh155_prot bh69_prot "` This is to make custom databases with four sequences, but she told me I don't have to do it
## Exercise 3
I first made `mkdir lab_06_3`, then I ran `cp /courses/bi278/Course_Materials/lab_06/*.faa lab_06_3` which copied all faa files to my directory. I ran `orthofinder -f lab_06_3/ -X` which created my own data. `mv lab_06_3/OrthoFinder/Results_Oct25/ .` to move the result to the home.