--- tags: cov-irt --- # GO slimming and summarizing example --- # UPDATE > **In getting closer to publication, I split these CoV-IRT microbial subgroup related programs into their [own conda package](https://github.com/AstrobioMike/CoV-IRT-Micro). The custom programs on this page that start with `bit-` will be replaced by versions that just start with `cov-` that are included in that conda install. That should be installed with conda as shown on [that page](https://github.com/AstrobioMike/CoV-IRT-Micro), and the install `bit` instructions below should be ignored.** --- [TOC] ## Installing [bit](https://github.com/AstrobioMike/bioinf_tools#bioinformatics-tools-bit) package in a conda environment ```bash conda create -y -n bit -c conda-forge -c bioconda -c defaults -c astrobiomike bit conda activate bit ``` ## Getting example data ```bash curl -L -o SRR10903401_1_fast_functional_results_subset.tsv https://ndownloader.figshare.com/files/22858676 ``` Cutting down to just gene ID and GO annotations (this is the format needed for the slimmer, 2 columns, tab-delimited, the second column can hold multiple GO IDs delimited by semi-colons): ```bash cut -f 1,6 SRR10903401_1_fast_functional_results_subset.tsv > example-GO-annotations.tsv ``` This looks like this: ![](https://i.imgur.com/U7GUchu.png) ## Slimming the GO terms down ```bash bit-slim-down-go-terms -a example-GO-annotations.tsv -o example-GO-slimmed-annotations.tsv ``` Which looks like this: ![](https://i.imgur.com/rp01KwW.png) ## Summarizing ```bash bit-summarize-go-annotations -i example-GO-slimmed-annotations.tsv -o example-GO-summary-all.tsv ``` GO has 3 "[namespaces](http://geneontology.org/docs/ontology-documentation/)", molecular function, biological process, and cellular component. This program now by default outputs just the primary one, and if we also want individual ones we can add the `--by-namespace` flag. Here's a look at the combined one: ![](https://i.imgur.com/FkiTnjv.png) # Summarizing the full GO annotation file We can generate an output table that has all the annotated GO terms and their counts/percentages with the same summary program, but specifying something different than the default as the reference "obo" file to use. Here our example input is the 2-column annotation file that we above gave to the slimming program. Rather than slimming, here we are giving it straight to the summary program and specifying to use the full GO reference obo file with the `-g go_basic` added as follows: ```bash bit-summarize-go-annotations -g go_basic -i example-GO-annotations.tsv -o example-FULL-GO-summary-all.tsv ``` This program by default outputs one summary table holding all GO terms detected. It can additionally output individual tables for each GO "[namespace](http://geneontology.org/docs/ontology-documentation/)" if wanted by adding the `--by-namespace` flag. And it can return all GO terms (including those not detected) if wanted by adding the `--keep-zeroes` flag. ```bash bit-summarize-go-annotations --keep-zeroes -g go_basic -i example-GO-annotations.tsv -o example-FULL-GO-summary-with-zeroes.tsv ```