<center><img src="https://i.imgur.com/rPIZUIq.png" alt="drawing" width="700"/></center> # ACEIDHA-SV: Quality control of assemblies - short read ###### Relies heavily on * **Put the assemblies from *Campylobacter* in a separate History.** If you dont have any contigs, import from https://usegalaxy.eu/u/allarena/h/campylobacter-assemblies ## Assess Assembly quality with Quast Quast ([Gurevich et al. 2013](https://training.galaxyproject.org/training-material/topics/assembly/tutorials/unicycler-assembly/tutorial.html#Gurevich2013)) is a tool providing quality metrics for assemblies, and can also be used to compare multiple assemblies. The Quast tool outputs assembly metrics as an html file with metrics and graphs. **I.** Run Quast on the assemblies. **TASK** Can you summarize: * How long are the assemblies? * How many contigs have been built? * What is the median and max length of the contigs? * What is N50 and what does it inform you about? * How does the GC% content match what you expect? **II** Even if all seems like its fine, a quick species check should be standard before you continue with more analysis. We will estimate the average nucleotide identity (ANI) between our assemblies and the reference genome of *Campylobacter jejuni* (NCTC 11168, accession number NC_002163.1) using a program called **FastANI** (https://github.com/ParBLiSS/FastANI). ANI is defined as mean nucleotide identity of orthologous gene pairs shared between two microbial genomes. - Import the **[history with the reference file](https://usegalaxy.eu/u/allarena/h/reference-genome-c-jejuni)** - Move the Reference file to history with the Campylobacter assemblies. - For FastANI, the input is the query sequence (YOUR genomes in fasta format) and reference sequence (The Ref strain in fasta format). Read the instructions on Galaxy and run the analysis. **FastANI** produces a table output with with columns: Query Genome, Reference Genome, ANI Value, Count of Bidirectional Fragment Mappings, and Total Query Fragments. Questions: * What is the ANI? * Do we have genomes of the species *C. jejuni*? Hint: Check the original paper for intra-species ANI variation. [High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Jaine et al (2018)](https://doi.org/10.1038/s41467-018-07641-9) > ###### Anton Nekrutenko, Delphine Lariviere, Simon Gladman, 2022 Unicycler Assembly (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/assembly/tutorials/unicycler-assembly/tutorial.html Online; accessed Tue Oct 18 2022 > ###### Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012 > ###### Simon Gladman, Helena Rasche, Saskia Hiltemann, 2022 De Bruijn Graph Assembly (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/assembly/tutorials/debruijn-graph-assembly/tutorial.html Online; accessed Wed Oct 19 2022