Koltin Htut Lab 3 Notes
log in using ssh
```
ssh kkhtut24@bi278
password
```
make a new directory for lab 3
```
mkdir lab_3
ls
```
run prokka to make annotation using prokka
```
prokka --force --outdir ~/lab_3/bbqs395 --proteins
/courses/bi278/Course_Materials/lab_03/Burkholderia_pseudomallei_K96
243_132.gbk
--locustag BB395 --genus Paraburkholderia --species bonniea --strain
bbqs395 PATH/P.bonniea_bbqs395.nanopore.fasta
```
Q1:
when using cat to view the .faa file, it shows each of the genes of the P. bonniea organism with a sequence for each, however, I am unsure what the importance of the sequence is. However, when usin cat on the .fnn file, it shows a nucleotide sequence for each of the same gene loci that are shown in the .faa file, so I am inclined to believe that the .faa file contains information on the genetic information stored at each loci. The .fna shows a full sequence when opened with cat. The .gff files have two parts, but due to the length of the text file, it is nevessary to use the head command to view the first portion of the file. The .gff file consists of two parts, the first half being an index of the gene loci, and the second part consisting of a genomic sequence, presumably of P. bonniea. The .txt file contains a useful set of information about the seauence, including the number of contigs, bases, CDS, rRNA, tRNA, and tmRNA. Lastly, the .tsv file is a table index of all 3496 gene loci in P.bonniea, including the locus tag, ftpye (whether its tRNA, mRNA, etc), the basepair length of that oci, the gene number, EC number, and the product of the gene coding when expressed.
Q2:
Using cat the .txt file, it is known that there are 3427 CDS gene loci, and 56 tRNAs coded within the organism.
```
grep -c hypothetical PROKKA_10042022.tsv
```
this command will search and count every hypothetical in the text file, in order ot count the number of hypothetical proteins. The same can be done for ribosomal, which reveals that there are 1461 hypothetical proteins in the genome, and 66 ribosomal proteins.
Q3: by viewing the first 3501 lines of the .gff file using
```
head -3501 PROKKA_10042022.gff
```
you can look at a index of each gene loci, including the specific chromosome and base pair location of every annotated part of the P.bonniea genome.
q4: Looking at the different metrics of my selected genome and the two reference genomes provided, the results between base pair length, proteins, and RNAs are fairly similar. Because of the similarity between the metrics of these genomes, I am confident in the sequencing quality of the new draft genome.