<center><img src="https://i.imgur.com/rPIZUIq.png" alt="drawing" width="700"/></center>
# ACEIDHA-SV - Rapid genotyping and pangenome definition - the *Campylobacter* dataset
The goal of our analysis is to check if there are circulating clones of Campylobacter in Luxembourg. We want to see if they are the same type. Type can be defined differently. For example, the bacterial type may be Staphyloccus aureus with resistance to penicillin. We can also “type” a bacteria by focusing on several genes, and see which allele is present for each of these genes. Each bacterial species will have its own scheme, which is the set of genes that are looked at. Overall, this process is called multi-locus (= several genes) sequence typing, or MLST. If you have clinical isolates, their AMR pattern is also relevant. Here we will again use `Staramr` to perform both MLST and AMR in silico characterization.
**I**:
* Select your genome contigs (in FASTA format).
* If you dont have contigs for some reason, import them from here: https://usegalaxy.eu/u/allarena/h/campylobacter-assemblies
* Select whether or not you wish to scan your genome for point mutations giving antimicrobial resistance using the PointFinder database. This requires you to specify the specific organism you are scanning.
* Run the tool.
Inspect the results:
- Summarize your genomes genotypes, plasmids and AMR genes
- Which ST type do you have? Any of the same type?
**II.**
We know that we have isolates of the same STs, but that does not mean that they are of the same clone? We need to compare them at the whole genome level to figure that out. That can be done in a range of ways, here we will start by performing a core genome alignement and investigate their phylogeny based upon that. To do so, we need to annotate the entire genome (as for the conjugate dataset), compare the genes the different isolates have, and compare the ones they have in common (core genes). Only these can be aligned.
- Find **Prokka** under Annotation Section
- Select two contigs to annotate (you will import the rest from here: https://usegalaxy.eu/u/allarena/h/camplobacter-gff-files
- Fill out Species name and make sure the Select Multiple dataset mode for *Contigs to annotate*, and make sure that *Kingdom* is set to *Bacteria*. Adjust outputs so you get annotations in a gff and gbk file and statistics only (otherwise you will get so many files).
- Press execute
Try to answer the following:
- How many CDS does one of the strain have? Is that normal for this species?
**RENAME YOUR FILES - this is superimportant!**
**III** Next we will use the `prokka` generated *.gff files from the Campylobacter strains of similar ST to determine which genes are present in all genomes (core genome), and which are accessory (accessory genome). In addition, by aligning the genes making up the core genome, we can estimate the divergence between the strains, and thereafter build a phylogenetic tree. This phylogenetic tree will inform us how closely related the strains are - i.e. if they are the same, it might be that the humans got contaminated from the animals.
**Roary** is a commonly used pangenome pipeline which quickly estimates core, accessory genome and constructs a core genome alignment from gff3 files. Also, it makes summary statistics and a table of gene presence and absence which we can vizualize later.
Roary *only works with genomes of the same species* which are similar to eachother. If you want to compare more distantly related genomes, other tools such as Mashtree, could be more useful.
Run Roary on the gff files as input.
> Now that you have used Galaxy and its tools for a while - maybe its time to try filling it out without me helping?
**IV Rename file entries**. Roary **might** change the names of the files to an internal filesystem name `(Dataset_xxxxxx)`. This gets problematic because you will no longer be able to deduct which samples is which. To handle this, you can do some text manipulation. *Kjetil Klepper from NTNU* has written a script that will swap the dataset names for the filename. Find it under `General Text Tools - Text Manipulation - Rename file entries`, and execute. Try to figure out which files goes where before you ask us.