If you are here and you didn't get the link from me. Please go to the official Anvio tutorial here and here. Meren, his team and Mike Lee do a fantastic job on their website and it will be much better.
This page for anvio
And we'll need a side set of script from Mike Lee:
We run the Pangenomic analysis with our Bacta annotated genomes.
First place the name of genome of interest in a Gene.list.txt
file
Our genomes:
Our genbank files are not compatible with the default anvio pipeline and we need to hack our way in. Thankfully, Mike Lee did all the job for us with his bit
environment.
All Prokka gbk file are copied in the GBK
folder and we create a clean
folder for having the genbank corrected files stored in.
In brief the script rename Bacta genbank contig name to unique contig per genomes
Anvio is a very comprehensive pipeline with many modules. We'll barrel through the pipeline here, for more information go to the Anvio website.
Import all the information we have from our genbank files into compatible file format
Create a contig database, with all the gene call information
Look for unique copy marker genes:
Create a collection file
Get the genome database and file ready:
Finally run the pangenome comparison
Get everything exported for downstream analysis
Voila