# BIT 150 LAB 9 ## Part 1 ### 1. Choose two of the options listed and explain their purpose, their default value, and why you might want to modify them. Annotations: --kingdom [X] The kingdom annotation option defines the type of organism to annotate and the options are: Archaea, Bacteria, Mitochondria, Viruses. The default mode is Bacteria. I would change it to virus when annotating viruses genomes. Outputs: --outdir [X] This option refers to the output directory. The default is the output folder and I would modify it in order to keep order in my files. In the lab we modify it to the prokka_annotation ### 2. When Prokka finishes, look carefully at the end of the output. Note that the last part of the command ran an error. What caused the error? Use Google to figure out what caused the error and how it could be solved (you don't need to actually solve the problem, just report what caused it). The error is “could not run command tb2alsn” and it was caused because the version of the program in which the author didn’t release the newest tbl2asn binaries. One of the way that it can be solved by updating the program and asking the authors to upload new versions. ### 3. The first several features in the GFF output file are all labeled as CDS, meaning they are protein-coding genes. But there are other annotated features in the output file. Find one and describe it. repeat_region is part of DNA where codons are repeated many times. Is a target of site duplications. ### 4. PROKKA annotates protein coding sequences using BLAST. What modifications to the prokka command would you have to make in order to increase the specificity of annotation assignments in your E. coli O104:H4 strain TY-2482 genome assembly? I would modify the expect value (e-value). The default here is 1e-06 but if I want to increase the specificity I would lower the e-value. ### 5. Find a gene or two that you think might be relevant to the virulence of the organism. Explain your reasoning. apxIB, is a gene that codes for the toxin RTX-I translocation ATP-binding protein. RTX family toxins are known by being a pore-forming exotoxin. Also, supposedly it helps to bacterial proliferation by killing or incapacitating host cells. stxA, is a gene that codes for the shiga toxin subunit A precursor. It is related with the protein inhibition synthesis by inactivation of 60S ribosomal subunits. After cleavage, one of the protein fragment participates in the holotoxin assembly with the shiga toxin B subunit. The shiga toxin is closely related with the 2011 outbreak of the E. coli O104:H4 strain TY-2482 patogenic strain. ## Part 2: RAST Annotation With the RAST annotation (Figure 1) there are more genes annotated (5,750 versus 5,396) than in Prokka (Figure 2). We can visualize that now there are 852 virulent genes (versus 498) and 637 avirulent (versus 541). ![](https://i.imgur.com/hlkhobW.png) Figure 1. Vein diagrama made using RStudio scripts. Data obtained from RAST annotation and Roary Pan genome analisys. Virulent correspond to genes found in E. coli O104:H4 strain TY-2482. ![](https://i.imgur.com/neEEVHY.png) Figure2. Vein diagrama made using RStudio scripts. Data obtained from PROKKA annotation and Roary Pan genome analisys. Virulent correspond to genes found in E. coli O104:H4 strain TY-2482.