--- tags: anvio title: Anvi'o pangenomics with Genbank annotations --- # Anvi'o pangenomics with GenBank annotations [toc] ## Conda envs ### bit We use [bit](https://github.com/AstrobioMike/bioinf_tools#bioinformatics-tools-bit) to download things from NCBI and make sure the LOCUS names in any input GenBank files won't cause problems: ```bash mamba create -y -n bit -c conda-forge -c bioconda -c defaults -c astrobiomike bit=1.8.47 ``` ### Anvi'o Anvi'o 7.1 was installed as described here: https://osf.io/8sy2a/wiki/3.%20Pangenomics/ ## Getting GenBank references From a file holding the NCBI assembly assessions we want, e.g.: ```bash printf "GCF_000013425.1\nGCA_006094375.1\n" > target-refs.txt cat target-refs.txt ``` ``` GCF_000013425.1 GCA_006094375.1 ``` Downloading their genbank files: ```bash conda activate bit bit-dl-ncbi-assemblies -w target-refs.txt -f genbank -j 2 gunzip *.gb.gz ``` ## Putting all GenBank files (ours and refs) into one place ```bash mkdir genbank-files mv *.gb genbank-files ``` ## Cleaning LOCUS names just to be sure they don't cause problems later ```bash bit-genbank-locus-clean-slate -i GCF_000013425.1.gb -w GCF_000013425.1 -o GCF_000013425.1-clean.gb # renaming back to original so easier to work with mv GCF_000013425.1-clean.gb GCF_000013425.1.gb bit-genbank-locus-clean-slate -i GCA_006094375.1.gb -w GCA_006094375.1 -o GCA_006094375.1-clean.gb mv GCA_006094375.1-clean.gb GCA_006094375.1.gb ``` Done with bit now: ```bash conda deactivate ``` ## Anvi'o Installed as described here: https://osf.io/8sy2a/wiki/3.%20Pangenomics/ > **NOTE** > This page is the template for what will be a new anvi'o tutorial. It is not done yet, following from [here](https://hackmd.io/@astrobiomike/ISS-Staph-paper-pangenomics-notes#Processing-each-genome-into-contigs-and-profile-dbs) should help for now. ```bash conda activate anvio ``` ### Making input files for anvi'o from the GenBank files ```bash= cd ../ ``` ignoring that the annotation version will be different for those annotated with JCVI's PGAP and NCBI's PGAP, not important here mkdir input-files-for-anvio anvi-script-process-genbank -i all-genbank-files/OBSA1.gb -O input-files-for-anvio/OBSA1 anvi-script-process-genbank -i all-genbank-files/OBSA2.gb -O input-files-for-anvio/OBSA2