# Bi278 Lab Num 3
### By Lee Ferenc 9/27/2022
## Exercise 1. Annotate a genome with Prokka
### Annotate a gene
```
mkdir colbyhome/enfere24/Genomics/lab_03
cp ////courses/bi278/Course_Materials/lab_03/P.bonniea_bbqs395.nanopore.fasta colbyhome/enfere24/Genomics/lab_03/
#below is A SINGLE LINE
#(also oops i chose bbqs395 which is the same as the lab manuel but i'd rather not switch)
prokka --force --outdir colbyhome/enfere24/Genomics/lab_03/bbqs395 --proteins ////courses/bi278/Course_Materials/lab_03/Burkholderia_pseudomallei_K96243_132.gbk --locustag BB395 --genus Paraburkholderia --species bonniea --strain bbqs395 colbyhome/enfere24/Genomics/lab_03/P.bonniea_bbqs395.nanopore.fasta
```
### Question 1: Different output files and what they contain.
* .faa: "Protein FASTA file of the translated CDS sequences."
* .ffn: "Nucleotide FASTA file of all the prediction transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA)"
* .fna: "Nucleotide FASTA file of the input contig sequences."
* .gff: The master annotation in .gff3 "contains both sequences and annotations. It can be viewed directly in Artemis or IGV."
* .txt: Text file with statistics relating to the annotated features
* .tsv: "tab-separated file of all features: locus_tag, ftype, len_bp, gene, EC_number, COG, product"
### Question 2: Then, explore and use the best file from above to answer each question:
#### How many CDS do you have?
3427
#### How many tRNAs?
56
```
cat colbyhome/enfere24/Genomics/lab_03/bbqs395/PROKKA_09272022.txt
```
#### How many 'ribosomal proteins'?
53
```
grep -c "ribosomal protein" colbyhome/enfere24/Genomics/lab_03/bbqs395/PROKKA_09272022.faa
```
#### How many 'hypothetical proteins'
1461
```
grep -c "hypothetical protein" colbyhome/enfere24/Genomics/lab_03/bbqs395/PROKKA_09272022.faa
```
### Question 3: Which file type is the best for annotation
.tsv is really easy to read, gives locus tag, gene, the product, and more. So .tsv (but .gff is somewhat close)
### Question 4:
1) P. bonniea was somewhat close. In reality it has 35550 which more than the 3427. There are only 88 psuedo genes compared to the 1461 hypothetical ones. It has the same number of tRNAs (56) but didn't list the number of ribosomal proteins
2) I also ran P.hayleyella has 3607 genes compared to the predicted 3584 and 115 hypothetical ones compared to the predicted 1500 (wow! i hope psuedoprotein is hypothetical because then this is wrong). Also it has 57 tRNAs which is 6 more than the 57 tRNAs.
```
cp ////courses/bi278/Course_Materials/lab_03/P.hayleyella_bhqs69.pacbio.fasta colbyhome/enfere24/Genomics/lab_03/
prokka --force --outdir colbyhome/enfere24/Genomics/lab_03/bhq69 --proteins ////courses/bi278/Course_Materials/lab_03/Burkholderia_pseudomallei_K96243_132.gbk --locustag Bhq69 --genus Paraburkholderia --species hayleyella --strain bhqs69 colbyhome/enfere24/Genomics/lab_03/P.hayleyella_bhqs69.pacbio.fastacat colbyhome/enfere24/Genomics/lab_03/bhq69/PROKKA_09272022.txt
grep -c "ribosomal protein" colbyhome/enfere24/Genomics/lab_03/bhq69/PROKKA_09272022.faa
grep -c "hypothetical protein" colbyhome/enfere24/Genomics/lab_03/bhq69/PROKKA_09272022.faa
```