---
tags: BRAILLE
title: BRAILLE update (31-Mar-2021)
---
[toc]
---
>**Interactive function plots and tables are [here](https://astrobiomike.shinyapps.io/braille-mg/)**.
---
# BRAILLE update (31-Mar-2021)
> Previous update docs
> - [https://hackmd.io/@astrobiomike/BRAILLE-notes-17-Mar-2021](https://hackmd.io/@astrobiomike/BRAILLE-notes-17-Mar-2021)
> - [https://hackmd.io/@astrobiomike/BRAILLE-update-24-Feb-2021](https://hackmd.io/@astrobiomike/BRAILLE-update-24-Feb-2021)
> - [https://hackmd.io/@astrobiomike/BRAILLE-update-3-Feb-2021](https://hackmd.io/@astrobiomike/BRAILLE-update-3-Feb-2021)
> - [https://hackmd.io/@astrobiomike/BRAILLE-notes-12-Dec-2020](https://hackmd.io/@astrobiomike/BRAILLE-notes-12-Dec-2020)
Metagenomic processing repository with data and code is at [OSF here](https://osf.io/uhk48/wiki/home/). All data are there, but not necessarily all of the processing and figures in these note docs.
---
# Summary
* 3 of our 7 Actinobacteria that can fix carbon were in the Rubrobacterales family ([tree here](https://hackmd.io/@astrobiomike/BRAILLE-notes-17-Mar-2021#Phylogenomic-tree-of-our-Actinobacteria))
* Diana and/or Duane I think noted *Rubrobacter* aren't typically known for C-fixation, so started poking at this
* Of the 46 *Rubrobacter* genomes in NCBI (searched on 23-Mar-2021), only 3 had C-fixation capabilities
* Of the 70 in the order Rubrobacterales, just 1 more
* So only 4 of the 70 Rubrobacterales in NCBI are capabale of C-fixation, and all 4 were recovered from desert environment metagenomes
* Antarctica: Mackay Glacier regions | [PRJNA630822](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA630822/)
* soil metagenome, polar desert biome
* [GCA_013695915.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_013695915.1/) | [SAMN15052114](https://www.ncbi.nlm.nih.gov/biosample/SAMN15052114/)
* [GCA_013697825.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_013697825.1/) | [SAMN15052026](https://www.ncbi.nlm.nih.gov/biosample/SAMN15052026/)
* Chile: Atacama Desert | [PRJNA665391](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA665391/)
* soil metagenome, desert biome
* [GCA_016781705.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_016781705.1/) | [SAMN16295052](https://www.ncbi.nlm.nih.gov/biosample/SAMN16295052/)
* Israel: Negev Desert
* soil **biocrust** metagenome, subtropical desert biome
* [GCA_902806105.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_902806105.1/) | [SAMEA6526607](https://www.ncbi.nlm.nih.gov/biosample/SAMEA6526607/)
> So seems Rubrobacterales Actinos might be primary producers in desert and cave environments
---
***Rubrobacter* tree**
<a href="https://i.imgur.com/r3yH9Rp.png"><img src="https://i.imgur.com/r3yH9Rp.png"></a>
[Interactive tree](https://itol.embl.de/tree/1566850116437001616542516)
---
**Rubrobacterales tree**
<a href="https://i.imgur.com/sAvopfv.png"><img src="https://i.imgur.com/sAvopfv.png"></a>
[Interactive tree](https://itol.embl.de/tree/1566850110140451617147117)
---
# Current scope thoughts
* Nitrogen-focused metagenomics component added to the work Molly is leading. She and I are going to chat to get me on the same page and start working out details of that
* Paper focused on C-fixation, Actino/Rubrobacter highlight given Maggie's group's work, general characterization of rest of high-quality MAGs (pangenomics deep on Rubros?)
* metagenomes as a whole including specific targets, e.g. silica-related (in explicit astrobio-direction)
# Work/code
## Getting Rubrobacter genomes in NCBI
`esearch` is included in `bit`:
```
esearch -db assembly -query 'Rubrobacter[Organism]' | esummary | xtract -pattern DocumentSummary -def "NA" -element AssemblyAccession > rubrobacter_accessions.txt
```
```
cd genomes/
bit-dl-ncbi-assemblies -w ../rubrobacter_accessions.txt -f fasta
gunzip *.gz
```
## C-fixation capability screening
Based on KEGG modules, with summarization and grouping help from kegg-decoder.
```
cat target-C-fixation-KOs.txt
K00134
K00150
K00615
K00855
K00927
K01601
K01602
K01621
K01623
K01624
K01783
K01807
K02446
K03841
K05298
K11532
K11645
```
I put those HMMs into their own directory and make a target-ko-list of just them. Then running against our Actinobacteria MAGs and the 46 pulled from NCBI.
```bash
cat scan-for-C-fixation-KOs.sh
```
```bash
set -u
mkdir -p genes results counts
cat <( printf "KO_ID\n" ) $2 > Combined-counts.tsv
for genome in $(cat $1)
do
echo "$genome"
# calling genes
prodigal -q -c -a genes/${genome}-genes.faa -d genes/${genome}-genes.fa -f gff -o genes/${genome}.gff -i genomes/${genome}.fa
# kofamscan
exec_annotation -p target_HMM_profiles/ -k target-ko-list --cpu 5 -f detail-tsv -o ${genome}-targets.tmp genes/${genome}-genes.faa
# filtering results
bit-filter-KOFamScan-results -i ${genome}-targets.tmp -o ${genome}-C-fixation-KO-annots.tmp
grep -v -w "NA" ${genome}-C-fixation-KO-annots.tmp > results/${genome}-C-fixation-KO-annots.tsv
# clearing intermediete files
rm -rf tmp/ ${genome}-targets.tmp ${genome}-C-fixation-KO-annots.tmp
# counting each KO
printf "${genome}\n" > counts/${genome}-KO-counts.txt
for KO in $(cat $2)
do
grep -c -w "${KO}" results/${genome}-C-fixation-KO-annots.tsv >> counts/${genome}-KO-counts.txt
done
paste Combined-counts.tsv counts/${genome}-KO-counts.txt > building.tmp && mv building.tmp Combined-counts.tsv
done
```
```bash
bash scan-for-C-fixation-KOs.sh all-genomes.txt target-C-fixation-KOs.txt
```
## 3 of the 46 in NCBI have C-fixation capabilities
```
library(tidyverse)
tab <- read.table("Combined-counts.tsv", sep = "\t", header=TRUE, check.names = FALSE, row.names = 1)
tab <- t(tab) %>% data.frame()
tab %>% filter(K01601 > 0 | K01602 > 0)
```
```
K00134 K00150 K00615 K00855 K00927 K01601 K01602 K01621 K01623 K01624 K01783 K01807 K02446 K03841 K05298 K11532 K11645
GCA_013695915.1 1 0 1 1 2 0 1 0 1 2 1 1 0 0 0 0 0
GCA_013697825.1 1 0 2 1 1 1 1 0 1 0 2 1 0 0 0 0 0
GCA_016781705.1 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 0 0
VAL_B04-MAG-70 2 0 1 2 2 1 1 0 1 0 1 0 1 0 0 0 0
VAL_D01-MAG-140 1 0 2 1 1 1 1 0 0 0 1 1 0 0 0 0 0
VAL_E02-MAG-166 1 0 2 1 1 2 1 0 1 2 2 1 0 0 0 0 0
VAL_E02-MAG-88 1 0 1 1 1 1 1 1 1 3 1 0 1 0 0 0 0
YEL_B04-MAG-195 2 0 1 1 1 1 1 0 1 0 2 0 1 0 0 0 0
YEL_C01-MAG-155 1 0 2 1 1 1 1 0 1 0 2 1 0 0 0 0 0
YEL_D03-MAG-109 0 0 1 1 0 1 1 0 1 0 0 0 2 0 0 0 0
```
---
* [GCA_013695915.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_013695915.1/)
* BioProject: [PRJNA630822](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA630822/)
* BioSample: [SAMN15052114](https://www.ncbi.nlm.nih.gov/biosample/SAMN15052114/)
* soil metagenome, polar desert biome
* Antarctica: Mackay Glacier regions
* [GCA_013697825.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_013697825.1/)
* Bioproject: [PRJNA630822](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA630822/)
* BioSample: [SAMN15052026](https://www.ncbi.nlm.nih.gov/biosample/SAMN15052026/)
* soil metagenome, polar desert biome
* Antarctica: Mackay Glacier regions
---
* [GCA_016781705.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_016781705.1/)
* BioProject: [PRJNA665391](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA665391/)
* BioSample: [SAMN16295052](https://www.ncbi.nlm.nih.gov/biosample/SAMN16295052/)
* soil metagenome, desert biome
* Chile: Atacama Desert
---
## Treeing Rubrobacter
Ours and the 3 above with C-fixation capabilites are noted.
```
GToTree -f input-fastas.txt -H Bacteria -o Rubrobacter-gtotree -j 20
```
***Rubrobacter* tree**
<a href="https://i.imgur.com/r3yH9Rp.png"><img src="https://i.imgur.com/r3yH9Rp.png"></a>
[Interactive tree](https://itol.embl.de/tree/1566850116437001616542516)
## Getting GTDB taxonomy for those labeled Rubrobacter in NCBI
```bash
gtdbtk classify_wf --genome_dir genomes/ -x fa --out_dir gtdbtd-out --cpus 40
```
```
cut -f 1,2 gtdbtd-out/gtdbtk.bac120.summary.tsv | sed 's/-m//' | sort -k 2 | column
user_genome classification
GCA_013695915.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781585.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781605.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781625.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781675.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781705.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781775.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781805.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCF_011492965.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
VAL_D01-MAG-140 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCF_001029505.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__Rubrobacter_A aplysinae
GCF_000014185.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_B;s__Rubrobacter_B xylanophilus
GCF_007164525.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_B;s__Rubrobacter_B xylanophilus_A
GCF_004337705.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_C;s__Rubrobacter_C taiwanensis
GCF_003568865.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter;s__Rubrobacter indicoceani
GCF_000661895.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter;s__Rubrobacter radiotolerans
GCF_900175965.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter;s__Rubrobacter radiotolerans
GCA_013695475.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_013697315.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_013697825.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_013817025.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_014534485.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_016781635.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
VAL_E02-MAG-166 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_013695335.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013696585.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013812245.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013812275.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013813465.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013815485.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013815575.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013820785.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013821095.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_014534105.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_014534615.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_016781545.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_016781725.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_016781855.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_016781885.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCF_011492945.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
YEL_C01-MAG-155 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_016781665.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781735.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781765.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781825.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781845.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781905.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781945.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781965.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
```
## Doing Rubrobacterales order
## Getting Rubrobacterales genomes in NCBI
`esearch` is included in `bit`:
```
esearch -db assembly -query 'Rubrobacterales[Organism]' | esummary | xtract -pattern DocumentSummary -def "NA" -element AssemblyAccession > rubrobacterales_accessions.txt
```
```
cd genomes/
bit-dl-ncbi-assemblies -w ../rubrobacterales_accessions.txt -f fasta -j 5
gunzip *.gz
cd ../
```
Copied some stuff from above to this new location and running C-fixation screening on all:
```
bash scan-for-C-fixation-KOs.sh all-genomes.txt target-C-fixation-KOs.txt
```
Only one more than above of just Rubrobacter:
```
K00134 K00150 K00615 K00855 K00927 K01601 K01602 K01621 K01623 K01624 K01783 K01807 K02446 K03841 K05298 K11532 K11645
GCA_013695915.1 1 0 1 1 2 0 1 0 1 2 1 1 0 0 0 0 0
GCA_013697825.1 1 0 2 1 1 1 1 0 1 0 2 1 0 0 0 0 0
GCA_016781705.1 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 0 0
GCA_902806105.1 1 0 2 0 1 1 0 0 0 1 2 2 0 0 0 0 0
VAL_D01-MAG-140 1 0 2 1 1 1 1 0 0 0 1 1 0 0 0 0 0
VAL_E02-MAG-166 1 0 2 1 1 2 1 0 1 2 2 1 0 0 0 0 0
YEL_C01-MAG-155 1 0 2 1 1 1 1 0 1 0 2 1 0 0 0 0 0
```
Additional one is:
* [GCA_902806105.1](https://www.ncbi.nlm.nih.gov/assembly/GCA_902806105.1/)
* BioProject: [PRJEB36534](https://www.ncbi.nlm.nih.gov/bioproject/PRJEB36534/)
* BioSample: [SAMEA6526607](https://www.ncbi.nlm.nih.gov/biosample/SAMEA6526607/)
* soil biocrust metagenome, subtropical desert biome
* Israel: Negev Desert
---
> In the order Rubrobacterales (according to NCBI), there are 70 genomes, and 4 of them have C-fixation.
**Rubrobacterales tree**
<a href="https://i.imgur.com/sAvopfv.png"><img src="https://i.imgur.com/sAvopfv.png"></a>
[Interactive tree](https://itol.embl.de/tree/1566850110140451617147117)
Order tax based on GTDB:
```bash
cut -f 1,2 gtdbtk-rubrobacterales-out/gtdbtk.bac120.summary.tsv | sed 's/-m//' | sort -k 2 | column
user_genome classification
GCA_013695475.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_013697315.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_013697825.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_013817025.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_014534435.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_014534485.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_016781635.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_902805955.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_902806015.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_902806055.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_902806135.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCA_902806155.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
VAL_E02-MAG-166 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__;s__
GCF_003568865.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter;s__Rubrobacter indicoceani
GCF_000661895.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter;s__Rubrobacter radiotolerans
GCF_900175965.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter;s__Rubrobacter radiotolerans
GCA_013695915.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781585.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781605.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781625.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781675.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781705.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781775.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_016781805.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCA_902806105.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCF_011492965.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
VAL_D01-MAG-140 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__
GCF_001029505.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_A;s__Rubrobacter_A aplysinae
GCF_000014185.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_B;s__Rubrobacter_B xylanophilus
GCF_007164525.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_B;s__Rubrobacter_B xylanophilus_A
GCF_004337705.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__Rubrobacter_C;s__Rubrobacter_C taiwanensis
GCA_013695335.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013696435.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013696585.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013698015.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013698525.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013812245.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013812275.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013813465.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013813545.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013815485.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013815575.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013816455.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013816955.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013817085.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013820785.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_013821095.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_014534105.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_014534615.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_016781545.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_016781725.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_016781855.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_016781885.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_902805975.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_902805985.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_902806025.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_902806035.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_902806045.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_902806065.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_902806075.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_902806085.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCA_902806095.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCF_011492945.1 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
YEL_C01-MAG-155 d__Bacteria;p__Actinobacteriota;c__Rubrobacteria;o__Rubrobacterales;f__Rubrobacteraceae;g__SIRX01;s__
GCF_007970665.1 d__Bacteria;p__Actinobacteriota;c__Thermoleophilia;o__Solirubrobacterales;f__Solirubrobacteraceae;g__Conexibacter_A;s__
GCA_016781665.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781735.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781765.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781825.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781845.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781905.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781945.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
GCA_016781965.1 d__Bacteria;p__Chloroflexota;c__Chloroflexia;o__Thermobaculales;f__Thermobaculaceae;g__;s__
```
# Working with GTDB
Using GToTree v1.6.11, bit v1.8.30, and KOFamScan v1.3.0.
```bash
gtt-get-accessions-from-GTDB -t Rubrobacterales
```
```
Reading in the GTDB info table...
Using GTDB v202: Released April 27, 2021
The rank 'order' has 31 Rubrobacterales entries.
The targeted NCBI accessions were written to:
GTDB-Rubrobacterales-order-accs.txt
A subset GTDB table of these targets was written to:
GTDB-Rubrobacterales-order-metadata.tsv
```
```bash
mkdir genomes
cd genomes/
bit-dl-ncbi-assemblies -w ../GTDB-Rubrobacterales-order-accs.txt -f fasta -j 5
gunzip *.gz
cd ../
bash scan-for-C-fixation-KOs.sh GTDB-Rubrobacterales-order-accs.txt target-C-fixation-KOs.txt
mv Combined-counts.tsv GTDB-combined-counts.tsv
```
**Script used**
```bash
cat scan-for-C-fixation-KOs.sh
```
```bash
set -u
mkdir -p genes results counts
cat <( printf "KO_ID\n" ) $2 > Combined-counts.tsv
for genome in $(cat $1)
do
echo "$genome"
# calling genes
prodigal -q -c -a genes/${genome}-genes.faa -d genes/${genome}-genes.fa -f gff -o genes/${genome}.gff -i genomes/${genome}.fa
# kofamscan
exec_annotation -p target_HMM_profiles/ -k target-ko-list --cpu 5 -f detail-tsv -o ${genome}-targets.tmp genes/${genome}-genes.faa
# filtering results
bit-filter-KOFamScan-results -i ${genome}-targets.tmp -o ${genome}-C-fixation-KO-annots.tmp
grep -v -w "NA" ${genome}-C-fixation-KO-annots.tmp > results/${genome}-C-fixation-KO-annots.tsv
# clearing intermediete files
rm -rf tmp/ ${genome}-targets.tmp ${genome}-C-fixation-KO-annots.tmp
# counting each KO
printf "${genome}\n" > counts/${genome}-KO-counts.txt
for KO in $(cat $2)
do
grep -c -w "${KO}" results/${genome}-C-fixation-KO-annots.tsv >> counts/${genome}-KO-counts.txt
done
paste Combined-counts.tsv counts/${genome}-KO-counts.txt > building.tmp && mv building.tmp Combined-counts.tsv
done
```