--- tags: BRAILLE title: BRAILLE update (31-Mar-2021) --- [toc] --- >**Interactive function plots and tables are [here](https://astrobiomike.shinyapps.io/braille-mg/)**. --- # BRAILLE update (31-Mar-2021) > Previous update docs > - [https://hackmd.io/@astrobiomike/BRAILLE-notes-17-Mar-2021](https://hackmd.io/@astrobiomike/BRAILLE-notes-17-Mar-2021) > - [https://hackmd.io/@astrobiomike/BRAILLE-update-24-Feb-2021](https://hackmd.io/@astrobiomike/BRAILLE-update-24-Feb-2021) > - [https://hackmd.io/@astrobiomike/BRAILLE-update-3-Feb-2021](https://hackmd.io/@astrobiomike/BRAILLE-update-3-Feb-2021) > - [https://hackmd.io/@astrobiomike/BRAILLE-notes-12-Dec-2020](https://hackmd.io/@astrobiomike/BRAILLE-notes-12-Dec-2020) Metagenomic processing repository with data and code is at [OSF here](https://osf.io/uhk48/wiki/home/). All data are there, but not necessarily all of the processing and figures in these note docs. --- # Summary * 3 of our 7 Actinobacteria that can fix carbon were in the Rubrobacterales family ([tree here](https://hackmd.io/@astrobiomike/BRAILLE-notes-17-Mar-2021#Phylogenomic-tree-of-our-Actinobacteria)) * Diana and/or Duane I think noted *Rubrobacter* aren't typically known for C-fixation, so started poking at this * Of the 46 *Rubrobacter* genomes in NCBI (searched on 23-Mar-2021), only 3 had C-fixation capabilities * Of the 70 in the order Rubrobacterales, just 1 more * So only 4 of the 70 Rubrobacterales in NCBI are capabale of C-fixation, and all 4 were recovered from desert environment metagenomes * Antarctica: Mackay Glacier regions | [PRJNA630822](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA630822/) * soil metagenome, polar desert biome * [GCA_013695915.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_013695915.1/) | [SAMN15052114](https://www.ncbi.nlm.nih.gov/biosample/SAMN15052114/) * [GCA_013697825.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_013697825.1/) | [SAMN15052026](https://www.ncbi.nlm.nih.gov/biosample/SAMN15052026/) * Chile: Atacama Desert | [PRJNA665391](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA665391/) * soil metagenome, desert biome * [GCA_016781705.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_016781705.1/) | [SAMN16295052](https://www.ncbi.nlm.nih.gov/biosample/SAMN16295052/) * Israel: Negev Desert * soil **biocrust** metagenome, subtropical desert biome * [GCA_902806105.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_902806105.1/) | [SAMEA6526607](https://www.ncbi.nlm.nih.gov/biosample/SAMEA6526607/) > So seems Rubrobacterales Actinos might be primary producers in desert and cave environments --- ***Rubrobacter* tree** <a href="https://i.imgur.com/r3yH9Rp.png"><img src="https://i.imgur.com/r3yH9Rp.png"></a> [Interactive tree](https://itol.embl.de/tree/1566850116437001616542516) --- **Rubrobacterales tree** <a href="https://i.imgur.com/sAvopfv.png"><img src="https://i.imgur.com/sAvopfv.png"></a> [Interactive tree](https://itol.embl.de/tree/1566850110140451617147117) --- # Current scope thoughts * Nitrogen-focused metagenomics component added to the work Molly is leading. She and I are going to chat to get me on the same page and start working out details of that * Paper focused on C-fixation, Actino/Rubrobacter highlight given Maggie's group's work, general characterization of rest of high-quality MAGs (pangenomics deep on Rubros?) * metagenomes as a whole including specific targets, e.g. silica-related (in explicit astrobio-direction) # Work/code ## Getting Rubrobacter genomes in NCBI `esearch` is included in `bit`: ``` esearch -db assembly -query 'Rubrobacter[Organism]' | esummary | xtract -pattern DocumentSummary -def "NA" -element AssemblyAccession > rubrobacter_accessions.txt ``` ``` cd genomes/ bit-dl-ncbi-assemblies -w ../rubrobacter_accessions.txt -f fasta gunzip *.gz ``` ## C-fixation capability screening Based on KEGG modules, with summarization and grouping help from kegg-decoder. ``` cat target-C-fixation-KOs.txt K00134 K00150 K00615 K00855 K00927 K01601 K01602 K01621 K01623 K01624 K01783 K01807 K02446 K03841 K05298 K11532 K11645 ``` I put those HMMs into their own directory and make a target-ko-list of just them. Then running against our Actinobacteria MAGs and the 46 pulled from NCBI. ```bash cat scan-for-C-fixation-KOs.sh ``` ```bash set -u mkdir -p genes results counts cat <( printf "KO_ID\n" ) $2 > Combined-counts.tsv for genome in $(cat $1) do echo "$genome" # calling genes prodigal -q -c -a genes/${genome}-genes.faa -d genes/${genome}-genes.fa -f gff -o genes/${genome}.gff -i genomes/${genome}.fa # kofamscan exec_annotation -p target_HMM_profiles/ -k target-ko-list --cpu 5 -f detail-tsv -o ${genome}-targets.tmp genes/${genome}-genes.faa # filtering results bit-filter-KOFamScan-results -i ${genome}-targets.tmp -o ${genome}-C-fixation-KO-annots.tmp grep -v -w "NA" ${genome}-C-fixation-KO-annots.tmp > results/${genome}-C-fixation-KO-annots.tsv # clearing intermediete files rm -rf tmp/ ${genome}-targets.tmp ${genome}-C-fixation-KO-annots.tmp # counting each KO printf "${genome}\n" > counts/${genome}-KO-counts.txt for KO in $(cat $2) do grep -c -w "${KO}" results/${genome}-C-fixation-KO-annots.tsv >> counts/${genome}-KO-counts.txt done paste Combined-counts.tsv counts/${genome}-KO-counts.txt > building.tmp && mv building.tmp Combined-counts.tsv done ``` ```bash bash scan-for-C-fixation-KOs.sh all-genomes.txt target-C-fixation-KOs.txt ``` ## 3 of the 46 in NCBI have C-fixation capabilities ``` library(tidyverse) tab <- read.table("Combined-counts.tsv", sep = "\t", header=TRUE, check.names = FALSE, row.names = 1) tab <- t(tab) %>% data.frame() tab %>% filter(K01601 > 0 | K01602 > 0) ``` ``` K00134 K00150 K00615 K00855 K00927 K01601 K01602 K01621 K01623 K01624 K01783 K01807 K02446 K03841 K05298 K11532 K11645 GCA_013695915.1 1 0 1 1 2 0 1 0 1 2 1 1 0 0 0 0 0 GCA_013697825.1 1 0 2 1 1 1 1 0 1 0 2 1 0 0 0 0 0 GCA_016781705.1 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 0 0 VAL_B04-MAG-70 2 0 1 2 2 1 1 0 1 0 1 0 1 0 0 0 0 VAL_D01-MAG-140 1 0 2 1 1 1 1 0 0 0 1 1 0 0 0 0 0 VAL_E02-MAG-166 1 0 2 1 1 2 1 0 1 2 2 1 0 0 0 0 0 VAL_E02-MAG-88 1 0 1 1 1 1 1 1 1 3 1 0 1 0 0 0 0 YEL_B04-MAG-195 2 0 1 1 1 1 1 0 1 0 2 0 1 0 0 0 0 YEL_C01-MAG-155 1 0 2 1 1 1 1 0 1 0 2 1 0 0 0 0 0 YEL_D03-MAG-109 0 0 1 1 0 1 1 0 1 0 0 0 2 0 0 0 0 ``` --- * [GCA_013695915.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_013695915.1/) * BioProject: [PRJNA630822](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA630822/) * BioSample: [SAMN15052114](https://www.ncbi.nlm.nih.gov/biosample/SAMN15052114/) * soil metagenome, polar desert biome * Antarctica: Mackay Glacier regions * [GCA_013697825.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_013697825.1/) * Bioproject: [PRJNA630822](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA630822/) * BioSample: [SAMN15052026](https://www.ncbi.nlm.nih.gov/biosample/SAMN15052026/) * soil metagenome, polar desert biome * Antarctica: Mackay Glacier regions --- * [GCA_016781705.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_016781705.1/) * BioProject: [PRJNA665391](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA665391/) * BioSample: [SAMN16295052](https://www.ncbi.nlm.nih.gov/biosample/SAMN16295052/) * soil metagenome, desert biome * Chile: Atacama Desert --- ## Treeing Rubrobacter Ours and the 3 above with C-fixation capabilites are noted. ``` GToTree -f input-fastas.txt -H Bacteria -o Rubrobacter-gtotree -j 20 ``` ***Rubrobacter* tree** <a href="https://i.imgur.com/r3yH9Rp.png"><img src="https://i.imgur.com/r3yH9Rp.png"></a> [Interactive tree](https://itol.embl.de/tree/1566850116437001616542516) ## Getting GTDB taxonomy for those labeled Rubrobacter in NCBI ```bash gtdbtk classify_wf --genome_dir genomes/ -x fa --out_dir gtdbtd-out --cpus 40 ``` ``` cut -f 1,2 gtdbtd-out/gtdbtk.bac120.summary.tsv | sed 's/-m//' | sort -k 2 | column user_genome classification GCA_013695915.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781585.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781605.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781625.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781675.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781705.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781775.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781805.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCF_011492965.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ VAL_D01-MAG-140 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCF_001029505.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__Rubrobacter_A aplysinae GCF_000014185.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_B;s__Rubrobacter_B xylanophilus GCF_007164525.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_B;s__Rubrobacter_B xylanophilus_A GCF_004337705.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_C;s__Rubrobacter_C taiwanensis GCF_003568865.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter;s__Rubrobacter indicoceani GCF_000661895.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter;s__Rubrobacter radiotolerans GCF_900175965.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter;s__Rubrobacter radiotolerans GCA_013695475.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_013697315.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_013697825.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_013817025.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_014534485.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_016781635.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ VAL_E02-MAG-166 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_013695335.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013696585.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013812245.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013812275.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013813465.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013815485.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013815575.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013820785.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013821095.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_014534105.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_014534615.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_016781545.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_016781725.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_016781855.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_016781885.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCF_011492945.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ YEL_C01-MAG-155 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_016781665.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781735.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781765.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781825.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781845.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781905.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781945.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781965.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ ``` ## Doing Rubrobacterales order ## Getting Rubrobacterales genomes in NCBI `esearch` is included in `bit`: ``` esearch -db assembly -query 'Rubrobacterales[Organism]' | esummary | xtract -pattern DocumentSummary -def "NA" -element AssemblyAccession > rubrobacterales_accessions.txt ``` ``` cd genomes/ bit-dl-ncbi-assemblies -w ../rubrobacterales_accessions.txt -f fasta -j 5 gunzip *.gz cd ../ ``` Copied some stuff from above to this new location and running C-fixation screening on all: ``` bash scan-for-C-fixation-KOs.sh all-genomes.txt target-C-fixation-KOs.txt ``` Only one more than above of just Rubrobacter: ``` K00134 K00150 K00615 K00855 K00927 K01601 K01602 K01621 K01623 K01624 K01783 K01807 K02446 K03841 K05298 K11532 K11645 GCA_013695915.1 1 0 1 1 2 0 1 0 1 2 1 1 0 0 0 0 0 GCA_013697825.1 1 0 2 1 1 1 1 0 1 0 2 1 0 0 0 0 0 GCA_016781705.1 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 0 0 GCA_902806105.1 1 0 2 0 1 1 0 0 0 1 2 2 0 0 0 0 0 VAL_D01-MAG-140 1 0 2 1 1 1 1 0 0 0 1 1 0 0 0 0 0 VAL_E02-MAG-166 1 0 2 1 1 2 1 0 1 2 2 1 0 0 0 0 0 YEL_C01-MAG-155 1 0 2 1 1 1 1 0 1 0 2 1 0 0 0 0 0 ``` Additional one is: * [GCA_902806105.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_902806105.1/) * BioProject: [PRJEB36534](https://www.ncbi.nlm.nih.gov/bioproject/PRJEB36534/) * BioSample: [SAMEA6526607](https://www.ncbi.nlm.nih.gov/biosample/SAMEA6526607/) * soil biocrust metagenome, subtropical desert biome * Israel: Negev Desert --- > In the order Rubrobacterales (according to NCBI), there are 70 genomes, and 4 of them have C-fixation. **Rubrobacterales tree** <a href="https://i.imgur.com/sAvopfv.png"><img src="https://i.imgur.com/sAvopfv.png"></a> [Interactive tree](https://itol.embl.de/tree/1566850110140451617147117) Order tax based on GTDB: ```bash cut -f 1,2 gtdbtk-rubrobacterales-out/gtdbtk.bac120.summary.tsv | sed 's/-m//' | sort -k 2 | column user_genome classification GCA_013695475.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_013697315.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_013697825.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_013817025.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_014534435.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_014534485.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_016781635.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_902805955.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_902806015.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_902806055.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_902806135.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCA_902806155.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ VAL_E02-MAG-166 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__ GCF_003568865.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter;s__Rubrobacter indicoceani GCF_000661895.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter;s__Rubrobacter radiotolerans GCF_900175965.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter;s__Rubrobacter radiotolerans GCA_013695915.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781585.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781605.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781625.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781675.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781705.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781775.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_016781805.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCA_902806105.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCF_011492965.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ VAL_D01-MAG-140 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__ GCF_001029505.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__Rubrobacter_A aplysinae GCF_000014185.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_B;s__Rubrobacter_B xylanophilus GCF_007164525.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_B;s__Rubrobacter_B xylanophilus_A GCF_004337705.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_C;s__Rubrobacter_C taiwanensis GCA_013695335.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013696435.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013696585.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013698015.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013698525.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013812245.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013812275.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013813465.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013813545.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013815485.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013815575.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013816455.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013816955.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013817085.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013820785.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_013821095.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_014534105.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_014534615.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_016781545.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_016781725.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_016781855.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_016781885.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_902805975.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_902805985.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_902806025.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_902806035.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_902806045.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_902806065.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_902806075.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_902806085.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCA_902806095.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCF_011492945.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ YEL_C01-MAG-155 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__ GCF_007970665.1 d__Bacteria;p__Actinobacteriota;c__Thermoleophilia;o__Solirubrobacterales;f__Solirubrobacteraceae;g__Conexibacter_A;s__ GCA_016781665.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781735.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781765.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781825.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781845.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781905.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781945.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ GCA_016781965.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__ ``` # Working with GTDB Using GToTree v1.6.11, bit v1.8.30, and KOFamScan v1.3.0. ```bash gtt-get-accessions-from-GTDB -t Rubrobacterales ``` ``` Reading in the GTDB info table... Using GTDB v202: Released April 27, 2021 The rank 'order' has 31 Rubrobacterales entries. The targeted NCBI accessions were written to: GTDB-Rubrobacterales-order-accs.txt A subset GTDB table of these targets was written to: GTDB-Rubrobacterales-order-metadata.tsv ``` ```bash mkdir genomes cd genomes/ bit-dl-ncbi-assemblies -w ../GTDB-Rubrobacterales-order-accs.txt -f fasta -j 5 gunzip *.gz cd ../ bash scan-for-C-fixation-KOs.sh GTDB-Rubrobacterales-order-accs.txt target-C-fixation-KOs.txt mv Combined-counts.tsv GTDB-combined-counts.tsv ``` **Script used** ```bash cat scan-for-C-fixation-KOs.sh ``` ```bash set -u mkdir -p genes results counts cat <( printf "KO_ID\n" ) $2 > Combined-counts.tsv for genome in $(cat $1) do echo "$genome" # calling genes prodigal -q -c -a genes/${genome}-genes.faa -d genes/${genome}-genes.fa -f gff -o genes/${genome}.gff -i genomes/${genome}.fa # kofamscan exec_annotation -p target_HMM_profiles/ -k target-ko-list --cpu 5 -f detail-tsv -o ${genome}-targets.tmp genes/${genome}-genes.faa # filtering results bit-filter-KOFamScan-results -i ${genome}-targets.tmp -o ${genome}-C-fixation-KO-annots.tmp grep -v -w "NA" ${genome}-C-fixation-KO-annots.tmp > results/${genome}-C-fixation-KO-annots.tsv # clearing intermediete files rm -rf tmp/ ${genome}-targets.tmp ${genome}-C-fixation-KO-annots.tmp # counting each KO printf "${genome}\n" > counts/${genome}-KO-counts.txt for KO in $(cat $2) do grep -c -w "${KO}" results/${genome}-C-fixation-KO-annots.tsv >> counts/${genome}-KO-counts.txt done paste Combined-counts.tsv counts/${genome}-KO-counts.txt > building.tmp && mv building.tmp Combined-counts.tsv done ```