---
tags: Genomics
title: KOFamScan setup and example
---
# KOFamScan setup and example
---
> **This is a short page demonstrating setting up and running [KOFamScan](https://github.com/takaram/kofam_scan).**
---
[toc]
## Env creation
> Note, `bit` is not required, but it has a filtering function I use after KO annotations to only keep significant hits, and if there is more than one significant KO assigned to a given protein (which is extremely rare, but it happens), it will keep only the most significant one. There are examples of both outputs near the end of the page.
```bash
mamba create -n kofamscan -c conda-forge -c bioconda -c defaults -c astrobiomike kofamscan bit
```
## Downloading required KOFamScan HMM profiles and ref file
This only needs to be done once. Afterwards we need to point to these when running the program.
The profiles directory is 1.3 GB compressed and 6.5 GB uncompressed.
```bash
# this takes seconds to download
curl -L -O ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz
gunzip ko_list.gz
# this usually only takes like 5 minutes
# but sometimes for some reason it can be really slow
# traffic on their end maybe
curl -L -O ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz
tar -xzvf profiles.tar.gz && rm profiles.tar.gz
```
## Example running
Input is an amino acid fasta file, here is downloading one to run the example:
```bash
curl -L https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/018/455/GCF_003018455.1_ASM301845v1/GCF_003018455.1_ASM301845v1_protein.faa.gz | gunzip - > GCF_003018455.1.faa
```
And here is an example running KOFamScan:
```bash
conda activate kofamscan
# takes ~5 minutes with this example
exec_annotation -p profiles/ -k ko_list --cpu 8 -f detail-tsv \
--tmp-dir GCF_003018455.1-ko-tmp \
-o GCF_003018455.1-ko-annotations.tsv \
GCF_003018455.1.faa
```
> **Breakdown**
> - `exec_annotation` - the main command
> - `-p` - this is where we need to point to where the HMM profiles reference directory is that we downloaded and unpacked
> - `-k` - this is where we need to point to the "ko_list" file that we downloaded and gun-zipped
> - `--cpus` - how manyjobs we want to run in parallel
> - `-f` - where we specify the format we want
> - `--tmp-dir` - where we specify the output temp directory where things are written while it's running
> - it's **essential** we set this to something unique for each run if we plan to run more than one at the same time in the same place, because otherwise the temp location keeps overwriting itself with different data from our different runs
> - `-o` - where we specify the output file we want to create
> - the positional argument at the end is the input amino acid fasta file (can't be gzipped)
That output looks like this:
```
# gene name KO thrshld score E-value "KO definition"
# --------- ------ ------- ------ --------- -------------
* WP_000002283.1 K06163 330.10 556.6 8.1e-168 "alpha-D-ribose 1-methylphosphonate 5-phosphate C-P lyase [EC:4.7.1.1]"
WP_000002446.1 K05798 246.67 164.4 1.2e-48 "LysR family transcriptional regulator, transcriptional activator for leuABCD operon"
WP_000002446.1 K18297 276.63 123.8 3.1e-36 "LysR family transcriptional regulator, mexEF-oprN operon transcriptional activator"
WP_000002446.1 K14657 277.50 115.6 8.2e-34 "LysR family transcriptional regulator, nod-box dependent transcriptional activator"
WP_000002446.1 K11921 261.23 100.5 3e-29 "LysR family transcriptional regulator, cyn operon transcriptional activator"
WP_000002446.1 K21900 288.97 97.4 2.9e-28 "LysR family transcriptional regulator, transcriptional activator of the cysJI operon"
WP_000002446.1 K14057 280.90 93.8 2.9e-27 "LysR family transcriptional regulator, regulator of abg operon"
WP_000002446.1 K09681 284.23 82.0 1.3e-23 "LysR family transcriptional regulator, transcription activator of glutamate synthase operon"
```
## Optional filtering with bit
This will take the output from KOFamScan in the `detail-tsv` format as specified in the example command above and: 1) retain only significant hits; and 2) if an input gene has more than one KO significantly assigned to it (which is extremely rare, but it happens), this will keep only the one with the highest significance.
```bash
bit-filter-KOFamScan-results -i GCF_003018455.1-ko-annotations.tsv \
-o GCF_003018455.1-ko-annotations-filtered.tsv
```
The filtered output table from `bit` looks like this:
```
gene_ID KO_ID KO_function
WP_000002283.1 K06163 alpha-D-ribose 1-methylphosphonate 5-phosphate C-P lyase [EC:4.7.1.1]
WP_000002446.1 NA NA
WP_000002542.1 K03100 signal peptidase I [EC:3.4.21.89]
WP_000002907.1 NA NA
WP_000002953.1 K09893 regulator of ribonuclease activity B
WP_000003086.1 K04567 lysyl-tRNA synthetase, class II [EC:6.1.1.6]
WP_000003178.1 NA NA
WP_000003317.1 K06941 23S rRNA (adenine2503-C2)-methyltransferase [EC:2.1.1.192]
WP_000003377.1 K03071 preprotein translocase subunit SecB
```