In getting closer to publication, I split these CoV-IRT microbial subgroup related programs into their own conda package. The custom programs on this page that start with
bit-
will be replaced by versions that just start withcov-
that are included in that conda install. That should be installed with conda as shown on that page, and the installbit
instructions below should be ignored.
Cutting down to just gene ID and GO annotations (this is the format needed for the slimmer, 2 columns, tab-delimited, the second column can hold multiple GO IDs delimited by semi-colons):
This looks like this:
Which looks like this:
GO has 3 "namespaces", molecular function, biological process, and cellular component. This program now by default outputs just the primary one, and if we also want individual ones we can add the --by-namespace
flag. Here's a look at the combined one:
We can generate an output table that has all the annotated GO terms and their counts/percentages with the same summary program, but specifying something different than the default as the reference "obo" file to use. Here our example input is the 2-column annotation file that we above gave to the slimming program. Rather than slimming, here we are giving it straight to the summary program and specifying to use the full GO reference obo file with the -g go_basic
added as follows:
This program by default outputs one summary table holding all GO terms detected. It can additionally output individual tables for each GO "namespace" if wanted by adding the --by-namespace
flag. And it can return all GO terms (including those not detected) if wanted by adding the --keep-zeroes
flag.