Quick MD note on how to run the GTDB-Tk pipeline on a google machine. GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
This is useful to estimate which taxonomic unit the bin you get from a metagenomic assembly belongs to.
I started a Google E2 standard machine with 16 CPUs, 64 GB RAM, and a 200 GB HDD to run the pipeline.
The Google CLI is a very convenient way to interact with the Google VMs.
More details TBD
There is already an extensive done documentation here
Nothing like a fresh Ubuntu install to mess up!
Python3 is already there; just symlink it
Same for pip
And a quick check to see if it worked
Make it simple get the mamba version
And activate it
This step will take a while :/
If you are using the pipeline often, it might be faster to make an image of it.
You did the most difficult part. From now on, there is nothing extraordinary; just follow the manual
We want to place our genome in a reference folder.
On the glcoud VM side:
If you are using the google cloud CLI on your local machine, you can run:
If you are using the web SSH window you can use the interactive upload option
The --mash
command will calculate the mash database the first time you run it, then it should go faster.
There are more details on how the pipeline works on this page
tutorials
Metagenomic
taxonomy
Mini
pipeline