Bi278 Lab Num 6

# Bi278 Lab Num 6 ### By Lee Ferenc 10/25/2022 ## Exercise 1. Reversed position specific BLAST 1. Establish where system looks for BLAST databases. ``` export BLASTDB="/courses/bi278/Course_Materials/blastdb" ``` 2. Run rpsblast on your group of proteins. Start with fasta file. ``` rpsblast -query lab_05/PROKKA_09242022Hypothetical.faa -db Cog -evalue 0.01 -max_target_seqs 25 -outfmt "6 qseqid sseqid evalue qcovs stitle" > bh69hypo.rpsblast.table ``` Now this output a weird thing, so prof Noh traced back to an error where there was a space. To fix it: ``` cat lab_05/PROKKA_09242022Hypothetical.faa | sed 's/^ //g' > temp #write correct file over current file mv temp lab_05/PROKKA_09242022Hypothetical.faa rpsblast -query lab_05/PROKKA_09242022Hypothetical.faa -db Cog -evalue 0.01 -max_target_seqs 25 -outfmt "6 qseqid sseqid evalue qcovs stitle" > bh69newhypo.rpsblast.table ``` 3. Keep *only* the best COG match for each query sequence that have query coverage of at least 70. ``` awk -F'[\t,]' '!x[$1]++ && $4>=70 {print $1,$5}' OFS="\t" bh69newhypo.rpsblast.table > bh69hypo.cog.table ``` And now have all the proteins about 70%. 4. Let’s look up the functional categories for the COG numbers you have first. Info stored in a table: cognames2003-2014.tab ``` awk -F "\t" 'NR==FNR {a[$1]=$2;next} {if ($2 in a){print $1, $2, a[$2]} else {print $0}}' OFS="\t" /courses/bi278/Course_Materials/blastdb/cognames2003-2014.tab bh69hypo.cog.table > temp ``` According to the lab document: > This command takes the first file (cognames2003-2014.tab) and makes a look-up table that can access the categories by COG number. It then takes the second file (*X*.cog.table) and goes through it row by row and looks up what category the COG number in that row belongs to. It then writes the row along with the COG category that it looked up and stores it in a new file. 5. Keep the first category for each COG number. COGs can belong to multiple functional categories but the first one is its primary category. ``` awk -F "\t" '{if ( length($3)>1 ) { $3 = substr($3, 0, 1) } else { $3 = $3 }; print}' OFS="\t" temp > temp2 ``` Command keeps first catefory (letter) form the third field. 6. Get full description of the functional category. It can be looked up in another table: fun2003-2014.tab ``` awk -F "\t" 'NR==FNR {a[$1]=$2;next} {if ($3 in a){print $0, a[$3]} else {print $0}}' OFS="\t" /courses/bi278/Course_Materials/blastdb/fun2003-2014.tab temp2 > bh69hypo.cog.categorized ``` 7. Remove the temp files (clean-up because we aren't animals). ``` rm temp temp2 ``` #### Final File Glimpse: ``` head bh69hypo.cog.categorized ``` > BH69_00012 COG3328 X Mobilome: prophages, transposons BH69_00068 COG1792 D Cell cycle control, cell division, chromosome partitioning BH69_00079 COG3247 S Function unknown BH69_00092 COG3159 S Function unknown BH69_00098 COG5285 Q Secondary metabolites biosynthesis, transport and catabolism BH69_00107 COG4585 T Signal transduction mechanisms BH69_00109 COG3568 R General function prediction only BH69_00123 COG0748 P Inorganic ion transport and metabolism BH69_00130 COG0792 L Replication, recombination and repair BH69_00175 COG3310 S Function unknown ## Exercise 2. Make and use a custom BLAST database Warning: I moved everything out of colbyhome which I had due to a weird display on Windows and making it easier to access (I forgot why I did it until I opened my laptop. I thought it was due to ITS where I have access to some extra files on filer you can only see on Windows but nope. Though I still copied everything back because that's the only easy way to back-up on my computer quickly....) 8. Make personal directory for BLAST databases in your bi278 home. ``` mkdir enfblastDB ``` 9. Add this directory to your $BLASTDB. ``` export BLASTDB="/courses/bi278/Course_Materials/blastdb:/home2/enfere24/Genomics/enfblastDB" ``` 10. To make protein databases ``` makeblastdb -in Genomics/lab_06/bhqs69_prokka.faa -dbtype prot -title bhqs69_prot -out enfblastDB/bhqs69_prot -parse_seqids ``` 11. To make nucleotide databases. ``` makeblastdb -in /courses/bi278/Course_Materials/lab_04/bhqs69/PROKKA_09242022.ffn -dbtype nucl -title bhqs69_nucl -out enfblastDB/bhqs69_nucl -parse_seqids ``` 12. Combine any BLAST databases of the same type. Example below: ``` blastdb_aliastool -dbtype prot -title 4burk_prot -out enfblastDB/4burk_prot -dblist "bbqs395_prot bbqs433_prot bhqs155_prot bhqs69_prot" ``` 13. Test ``` blastp -query Genomics/lab_06/tssH_bpseudo.fasta -db 4burk_prot -outfmt 6 ``` ## Exercise 3. Orthofinder 1. Collect all predicted protein sequence in a dicrector and run in that folder ``` #orthofinder -f ./ -X ``` >In this command, -f specifies where my protein fasta files are (./ means here because I was in the same directory as these files) and -X specifies that I don’t want orthofinder to add species names to my gene ids because I already did this when running prokka.