# Notebook 3 Genomics Lab
```
prokka --force --outdir PATH/bbqs395 --proteins
/courses/bi278/Course_Materials/lab_03/Burkholderia_pseudomallei_K96
243_132.gbk
--locustag BB395 --genus Paraburkholderia --species bonniea --strain
bbqs395 PATH/P.bonniea_bbqs395.nanopore.fasta
```
Use the whole block of code to run prokka annotation on fasta files (in this example, it is P.bonniea_bbqs395.nanopore.fasta)
### Q1
#### .faa
Protein FASTA file of the translated CDS sequences. It contains CDS forms of the proteins indicated.
#### .ffn
Nucleotide FASTA file of all the prediction transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA). It contains nucleotide sequence of proteins indicated.(same protein as the .faa file)
#### .fna
Nucleotide FASTA file of the input contig sequences. It contains two "<" lines indicating which input contig it is, and they are followed by nucleotide squence of the genome.
#### .gff
This is the master annotation in GFF3 format, containing both sequences and annotations. It can be viewed directly in Artemis or IGV.
#### .txt
Statistics relating to the annotated features found.It contains features like organisms, contig numbers, base numbers, and tRNA numbers.
#### .tsv
Tab-separated file of all features: locus_tag,ftype,len_bp,gene,EC_number,COG,product
### Q2
#### How many CDS do you have
Use .txt file. It has 3427 CDS.
#### How many tRNA do you have
Use .txt file. It has 56 tRNA.
#### How many 'ribosomal proteins' do you have
Use .tsv file. It has 53 ribosomal protein.
#### How many ‘hypothetical proteins’ do you have
use .tsv file. It has 1461 hypothetical proteins.
### Q3
I think the .gff file is the most useful for finding genomic locations of any type of annotated feature because it has an indication of which region each gene is in and what features like product it has.
### Q4
The metrics are very similar. By practicing on this genome, I do get more confident.