IBA - Genome annotation and cgMLST

<center><img src="https://i.imgur.com/BWehQrf.png" alt="drawing" width="700"/></center> # IBA - Genome annotation and cgMLST In this hands-on exercise, you will work on sequence data using the Galaxy platform. Galaxy is an open, web-based platform for accessible, reproducible, and transparent computational biomedical research, and usegalaxy.no is a national Galaxy server for life science data hosted and supported by ELIXIR Norway. **I:** Investigate the output of the **Shovill** log. What steps did it do? Try to identify these in the log. (https://github.com/tseemann/shovill). **II:** To get a better visual picture of your genome stats, run **Quast** (Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D., & Gurevich, A. (2018). Versatile genome assembly evaluation with QUAST-LG. Bioinformatics, 34(13), i142–i150. https://doi.org/10.1093/bioinformatics/bty266). Inspect the output of **Quast** by downloading the html file and opening up the window. Can you summarize: * How many contigs? * What is N50? **III:** In this section we will use a software tool called Prokka to annoatate a draft genome sequence. Prokka finds and annotates features (both protein coding regions and RNA genes, i.e. tRNA, rRNA) present on on a sequence. Note, Prokka uses a two-step process for the annotation of protein coding regions: first, protein coding regions on the genome are identified using Prodigal; second, the function of the encoded protein is predicted by similarity to proteins in one of many protein or protein domain databases. Prokka is a software tool that can be used to annotate bacterial, archaeal and viral genomes quickly, generating standard output files in GenBank, EMBL and gff formats. More information about Prokka can be found [https://github.com/tseemann/prokka] **IV:** Examine the output Once Prokka has finished, examine each of its output files. The GFF and GBK files contain all of the information about the features annotated (in different formats.) The .txt file contains a summary of the number of features annotated. The .faa file contains the protein sequences of the genes annotated. The .ffn file contains the nucleotide sequences of the genes annotated. </div> </html> <br/> :::info :information_source: **Note:** Names of files, folders, commands, and other functionality elements will be highlighted through the exercises in the following way: `file_name_example.txt`. User specific (and thus variable) character strings as for example user_name will be written between `<` and `>` as `<user_name>` to indicate the meaning. Futher, information about software is written in *italics*, and questions ***bold italics***. ::: <center><img src="https://i.imgur.com/gxolTZS.png" alt="drawing" width="800"/></center> <br/> IX: Now its time to trim those adapters and low quality sequences away and thereafter perform assembly. We will do this using **Shovill** (see above). Find Shovill from Tools, and choose batch mode. Choose correct forward and reverse reads, and click on for `Trim reads` (default is OFF), and under Advnace options, write 1.6 under `Estimated genome size` and unclick `Disable post-assembly correction` (you want that). Run. Type `Done` in chat when you are finnished (this is the end of todays handson) Inspect Shovill log - find what it has done and how Annotation Prokka View annoateated features in Jbrowse Search for AMR genes using Abricate