<center><img src="https://i.imgur.com/rPIZUIq.png" alt="drawing" width="700"/></center>
# ACEIDHA-SV- Making a phylogenetic tree and exploring vizualisation tool for *Campylobacter* dataset
**I**: Inspecting the *Summary statistics* output of **Roary**
* How many genes is present all together in either of the strains?
* How many are present in all genomes?
**II**: Making a phylogenomic/evolutionary tree from a core gene alignement using **IQtree**. **Q-TREE** takes as input a multiple sequence alignment and will reconstruct an evolutionary tree that is best explained by the input data. The input alignment can be in various common formats, such as PHYLIP, FASTA, NEXUS, and CLUSTALW.
`Inspect your alignment - what filetype do you have?`
Run **IQ-TREE** with default settings. Use the core genome alignment from the Roary output as input to **IQ-TREE**. When done, inspect the Report and Final Tree to understand which tree you should vizualise.
Download this treefile to your local computer. Store the textfile with a .nwk ending (Newick format)
**III.**[Microreact](https://microreact.org/) is a vizualisation tool for phylogenetic trees and one can add metadata such as GPS locations, time and other. For it to work best, add metadata such as AMR genes, ST type and year of isolation. Details on how to make compatible datasets can be found [here](https://docs.microreact.org/instructions/data/supported-file-formats).
**IV** The metadata file can be made in Excel and saved as a csv file; comma separated file. It can be made in Excel.
> Required columns
> Only an identifier for your data rows is required. The ID column must be unique (i.e. each row has a unique ID value) - same as leaf names.
> Column 2 header: ST__autocolour
> Column 3 AMR (pick one)
Metadata could be for instance resistance data and ST type.
Try it out.
As you see, the seven strains are very similar. But how similar, and where do they vary? Lets do a variant calling excercise to figure out!
If you dont want to bother with Microreact yourself.
https://microreact.org/project/c7GiLn1KvLjYcwy2gvR2Fg-roarycampylobacterset
### Variant calling
Variant calling is the process of identifying differences between two genome samples. Usually differences are limited to single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels). Larger structural variation such as inversions, duplications and large deletions are not typically covered by “variant calling”.
In this tutorial, we will use the tool “Snippy” to find high confidence differences (indels or SNPs) between our known genome and our reads.
For the read alignment (read mapping) step, Snippy uses BWA MEM with a custom set of settings which are very suitable to aligning reads for microbial type data. For the variant calling step, Snippy uses Freebayes with a custom set of settings. snpeff is then used to describe what the predicted changes do in terms of the genes themselves.
**V.** Import the following **[Workflow](https://usegalaxy.eu/u/allarena/w/imported-microbial-variant-calling-imported-from-url)**
**V.** **Choose two strains** that cluster close together in the phylogenetic tree. Use one of them as reference strain and the other as query. The reference strains should be entered with a gff, a gbk and a fasta. The query will just be entered with PE reads (forward and reverse)
**VI.**
##### Examine the Snippy output
It produces quite a bit of output, there can be up to [10 output files.](https://github.com/tseemann/snippy#output-files).
Have a look at the contents of the SNP table file (snippy on data XX, data XX and data XX table):
1. Which types of variants have been found?
2. How many snps do we find between the isolates?