Try   HackMD

GO slimming and summarizing example


UPDATE

In getting closer to publication, I split these CoV-IRT microbial subgroup related programs into their own conda package. The custom programs on this page that start with bit- will be replaced by versions that just start with cov- that are included in that conda install. That should be installed with conda as shown on that page, and the install bit instructions below should be ignored.


Installing bit package in a conda environment

conda create -y -n bit -c conda-forge -c bioconda -c defaults -c astrobiomike bit

conda activate bit

Getting example data

curl -L -o SRR10903401_1_fast_functional_results_subset.tsv https://ndownloader.figshare.com/files/22858676

Cutting down to just gene ID and GO annotations (this is the format needed for the slimmer, 2 columns, tab-delimited, the second column can hold multiple GO IDs delimited by semi-colons):

cut -f 1,6 SRR10903401_1_fast_functional_results_subset.tsv > example-GO-annotations.tsv

This looks like this:

Slimming the GO terms down

bit-slim-down-go-terms -a example-GO-annotations.tsv -o example-GO-slimmed-annotations.tsv

Which looks like this:

Summarizing

bit-summarize-go-annotations -i example-GO-slimmed-annotations.tsv -o example-GO-summary-all.tsv

GO has 3 "namespaces", molecular function, biological process, and cellular component. This program now by default outputs just the primary one, and if we also want individual ones we can add the --by-namespace flag. Here's a look at the combined one:

Summarizing the full GO annotation file

We can generate an output table that has all the annotated GO terms and their counts/percentages with the same summary program, but specifying something different than the default as the reference "obo" file to use. Here our example input is the 2-column annotation file that we above gave to the slimming program. Rather than slimming, here we are giving it straight to the summary program and specifying to use the full GO reference obo file with the -g go_basic added as follows:

bit-summarize-go-annotations -g go_basic -i example-GO-annotations.tsv -o example-FULL-GO-summary-all.tsv

This program by default outputs one summary table holding all GO terms detected. It can additionally output individual tables for each GO "namespace" if wanted by adding the --by-namespace flag. And it can return all GO terms (including those not detected) if wanted by adding the --keep-zeroes flag.

bit-summarize-go-annotations --keep-zeroes -g go_basic -i example-GO-annotations.tsv -o example-FULL-GO-summary-with-zeroes.tsv