# DRAM setup (23-Mar-2022)
https://github.com/shafferm/DRAM
[toc]
## Install
### `DRAM-setup.py prepare_databases`
```bash
wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml
conda env create -f environment.yaml -n DRAM
conda activate DRAM
DRAM-setup.py prepare_databases --output_dir DRAM_data --skip_uniref --threads 30 --verbose
```
Finished in 70 minutes, had this warning message, but I think it took care of it (full output is at the bottom of this page):
```
1:12:59.944019: DRAM databases and forms downloaded
1:12:59.959355: Files moved to final destination
/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/database_handler.py:51: UserWarning: Database does not exist at path None
warnings.warn('Database does not exist at path %s' % self.description_loc)
1:13:00.031347: DRAM description database populated
1:13:20.526467: Database preparation completed
```
### `DRAM-setup.py print_config`
And databases seem to be set:
```bash
DRAM-setup.py print_config
```
```
Processed search databases
KEGG db: None
KOfam db: /media/executor/mlee/dram/DRAM_data/kofam_profiles.hmm
KOfam KO list: /media/executor/mlee/dram/DRAM_data/kofam_ko_list.tsv
UniRef db: None
Pfam db: /media/executor/mlee/dram/DRAM_data/pfam.mmspro
dbCAN db: /media/executor/mlee/dram/DRAM_data/dbCAN-HMMdb-V10.txt
RefSeq Viral db: /media/executor/mlee/dram/DRAM_data/refseq_viral.20220323.mmsdb
MEROPS peptidase db: /media/executor/mlee/dram/DRAM_data/peptidases.20220323.mmsdb
VOGDB db: /media/executor/mlee/dram/DRAM_data/vog_latest_hmms.txt
Descriptions of search database entries
Pfam hmm dat: /media/executor/mlee/dram/DRAM_data/Pfam-A.hmm.dat.gz
dbCAN family activities: /media/executor/mlee/dram/DRAM_data/CAZyDB.07292021.fam-activities.txt
VOG annotations: /media/executor/mlee/dram/DRAM_data/vog_annotations_latest.tsv.gz
Description db: /media/executor/mlee/dram/DRAM_data/description_db.sqlite
DRAM distillation sheets
Genome summary form: /media/executor/mlee/dram/DRAM_data/genome_summary_form.20220323.tsv
Module step form: /media/executor/mlee/dram/DRAM_data/module_step_form.20220323.tsv
ETC module database: /media/executor/mlee/dram/DRAM_data/etc_mdoule_database.20220323.tsv
Function heatmap form: /media/executor/mlee/dram/DRAM_data/function_heatmap_form.20220323.tsv
AMG database: /media/executor/mlee/dram/DRAM_data/amg_database.20220323.tsv
```
## Testing
### Getting a test genome
```bash
# getting a test genome
curl -L -o GCF_000005845.2.fasta.gz https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz
gunzip GCF_000005845.2.fasta.gz
```
### `DRAM.py annotate`
```bash
# running annotate
DRAM.py annotate -i GCF_000005845.2.fasta -o test-DRAM-output
```
Failed at ~20 minutes, full output here:
```
1 fastas found
2022-03-23 22:33:00.825213: Annotation started
0:00:00.009530: Retrieved database locations and descriptions
0:00:00.009586: Annotating GCF_000005845.2
0:01:11.657135: Turning genes from prodigal to mmseqs2 db
0:01:14.437307: Getting hits from kofam
0:23:17.054402: Getting forward best hits from peptidase
0:23:30.591247: Getting reverse best hits from peptidase
0:23:31.874700: Getting descriptions of hits from peptidase
/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/database_handler.py:81: UserWarning: No descriptions were found for your id's. Does this MER0295850 look like an id from peptidase_description
warnings.warn("No descriptions were found for your id's. Does this %s look like an id from %s" % (list(ids)[0],
Traceback (most recent call last):
File "/media/executor/mlee/miniconda3/envs/DRAM/bin/DRAM.py", line 189, in <module>
args.func(**args_dict)
File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1039, in annotate_bins_cmd
annotate_bins(list(set(fasta_locs)), output_dir, min_contig_size, prodigal_mode, trans_table, bit_score_threshold,
File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1078, in annotate_bins
all_annotations = annotate_fastas(fasta_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1012, in annotate_fastas
annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table,
File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 920, in annotate_fasta
annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs,
File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 820, in annotate_orfs
annotation_list.append(do_blast_style_search(query_db, db_handler.db_locs['peptidase'], tmp_dir,
File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 683, in do_blast_style_search
hits = formater(hits, header_dict)
File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 187, in get_peptidase_description
header = header_dict[peptidase_hit]
KeyError: 'MER0295850'
```
This issue might be related https://github.com/WrightonLabCSU/DRAM/issues/158
## `DRAM-setup.py prepare_databases` full output
```bash
DRAM-setup.py prepare_databases --output_dir DRAM_data --skip_uniref --threads 30 --verbose
```
```=
2022-03-23 21:13:33.139184: Database preparation started
Downloading dbCAN family activities from : https://bcb.unl.edu/dbCAN2/download/Databases/V10/CAZyDB.07292021.fam-activities.txt
downloading https://bcb.unl.edu/dbCAN2/download/Databases/V10/CAZyDB.07292021.fam-activities.txt
--2022-03-23 21:13:33-- https://bcb.unl.edu/dbCAN2/download/Databases/V10/CAZyDB.07292021.fam-activities.txt
Resolving bcb.unl.edu (bcb.unl.edu)... 129.93.147.58
Connecting to bcb.unl.edu (bcb.unl.edu)|129.93.147.58|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 68035 (66K) [text/plain]
Saving to: ‘DRAM_data/CAZyDB.07292021.fam-activities.txt’
DRAM_data/CAZyDB.07292021.fam-activities.txt 100%[=============================================================================================>] 66.44K --.-KB/s in 0.09s
2022-03-23 21:13:33 (710 KB/s) - ‘DRAM_data/CAZyDB.07292021.fam-activities.txt’ saved [68035/68035]
downloading ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz
--2022-03-23 21:13:33-- ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz
=> ‘DRAM_data/Pfam-A.hmm.dat.gz’
Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.197.74
Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.197.74|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/databases/Pfam/current_release ... done.
==> SIZE Pfam-A.hmm.dat.gz ... 514890
==> PASV ... done. ==> RETR Pfam-A.hmm.dat.gz ... done.
Length: 514890 (503K) (unauthoritative)
Pfam-A.hmm.dat.gz 100%[=============================================================================================>] 502.82K 622KB/s in 0.8s
2022-03-23 21:13:36 (622 KB/s) - ‘DRAM_data/Pfam-A.hmm.dat.gz’ saved [514890]
Downloading dbCAN from: http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt
downloading http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt
--2022-03-23 21:13:36-- http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt
Resolving bcb.unl.edu (bcb.unl.edu)... 129.93.147.58
Connecting to bcb.unl.edu (bcb.unl.edu)|129.93.147.58|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt [following]
--2022-03-23 21:13:36-- https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt
Connecting to bcb.unl.edu (bcb.unl.edu)|129.93.147.58|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 100147232 (96M) [text/plain]
Saving to: ‘./dbCAN-HMMdb-V10.txt’
./dbCAN-HMMdb-V10.txt 100%[=============================================================================================>] 95.51M 48.9MB/s in 2.0s
2022-03-23 21:13:38 (48.9 MB/s) - ‘./dbCAN-HMMdb-V10.txt’ saved [100147232/100147232]
0:00:08.197517: dbCAN database processed
downloading ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz
--2022-03-23 21:13:41-- ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz
=> ‘DRAM_data/database_files/Pfam-A.full.gz’
Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.197.74
Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.197.74|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/databases/Pfam/current_release ... done.
==> SIZE Pfam-A.full.gz ... 15188156081
==> PASV ... done. ==> RETR Pfam-A.full.gz ... done.
Length: 15188156081 (14G) (unauthoritative)
Pfam-A.full.gz 100%[=============================================================================================>] 14.14G 10.5MB/s in 25m 20s
2022-03-23 21:39:03 (9.53 MB/s) - ‘DRAM_data/database_files/Pfam-A.full.gz’ saved [15188156081]
1:02:48.169201: PFAM database processed
downloading ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.protein.faa.gz
--2022-03-23 22:16:21-- ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.protein.faa.gz
=> ‘DRAM_data/database_files/viral.1.protein.faa.gz’
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 130.14.250.11, 2607:f220:41e:250::10, ...
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /refseq/release/viral ... done.
==> SIZE viral.1.protein.faa.gz ... 14008795
==> PASV ... done. ==> RETR viral.1.protein.faa.gz ... done.
Length: 14008795 (13M) (unauthoritative)
viral.1.protein.faa.gz 100%[=============================================================================================>] 13.36M 8.17MB/s in 1.6s
2022-03-23 22:16:24 (8.17 MB/s) - ‘DRAM_data/database_files/viral.1.protein.faa.gz’ saved [14008795]
downloading ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.2.protein.faa.gz
--2022-03-23 22:16:24-- ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.2.protein.faa.gz
=> ‘DRAM_data/database_files/viral.2.protein.faa.gz’
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 130.14.250.7, 2607:f220:41e:250::10, ...
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /refseq/release/viral ... done.
==> SIZE viral.2.protein.faa.gz ... 32804181
==> PASV ... done. ==> RETR viral.2.protein.faa.gz ... done.
Length: 32804181 (31M) (unauthoritative)
viral.2.protein.faa.gz 100%[=============================================================================================>] 31.28M 15.8MB/s in 2.0s
2022-03-23 22:16:27 (15.8 MB/s) - ‘DRAM_data/database_files/viral.2.protein.faa.gz’ saved [32804181]
1:03:02.211988: RefSeq viral database processed
downloading ftp://ftp.ebi.ac.uk/pub/databases/merops/current_release/pepunit.lib
--2022-03-23 22:16:35-- ftp://ftp.ebi.ac.uk/pub/databases/merops/current_release/pepunit.lib
=> ‘DRAM_data/database_files/merops_peptidases_nr.faa’
Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.197.74
Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.197.74|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/databases/merops/current_release ... done.
==> SIZE pepunit.lib ... 448367206
==> PASV ... done. ==> RETR pepunit.lib ... done.
Length: 448367206 (428M) (unauthoritative)
pepunit.lib 100%[=============================================================================================>] 427.60M 22.0MB/s in 20s
2022-03-23 22:16:57 (20.9 MB/s) - ‘DRAM_data/database_files/merops_peptidases_nr.faa’ saved [448367206]
1:03:44.736818: MEROPS database processed
downloading http://fileshare.csb.univie.ac.at/vog/latest/vog.hmm.tar.gz
--2022-03-23 22:17:17-- http://fileshare.csb.univie.ac.at/vog/latest/vog.hmm.tar.gz
Resolving fileshare.csb.univie.ac.at (fileshare.csb.univie.ac.at)... 131.130.65.128
Connecting to fileshare.csb.univie.ac.at (fileshare.csb.univie.ac.at)|131.130.65.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 295237694 (282M) [application/x-gzip]
Saving to: ‘DRAM_data/database_files/vog.hmm.tar.gz’
DRAM_data/database_files/vog.hmm.tar.gz 100%[=============================================================================================>] 281.56M 3.11MB/s in 2m 0s
2022-03-23 22:19:18 (2.35 MB/s) - ‘DRAM_data/database_files/vog.hmm.tar.gz’ saved [295237694/295237694]
1:07:16.261940: VOGdb database processed
downloading ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz
--2022-03-23 22:20:49-- ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz
=> ‘DRAM_data/database_files/kofam_profiles.tar.gz’
Resolving ftp.genome.jp (ftp.genome.jp)... 133.103.200.25
Connecting to ftp.genome.jp (ftp.genome.jp)|133.103.200.25|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/db/kofam ... done.
==> SIZE profiles.tar.gz ... 1347507136
==> PASV ... done. ==> RETR profiles.tar.gz ... done.
Length: 1347507136 (1.3G) (unauthoritative)
profiles.tar.gz 100%[=============================================================================================>] 1.25G 23.5MB/s in 77s
2022-03-23 22:22:08 (16.7 MB/s) - ‘DRAM_data/database_files/kofam_profiles.tar.gz’ saved [1347507136]
1:12:54.163111: KOfam database processed
downloading ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz
--2022-03-23 22:26:27-- ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz
=> ‘DRAM_data/database_files/kofam_ko_list.tsv.gz’
Resolving ftp.genome.jp (ftp.genome.jp)... 133.103.200.25
Connecting to ftp.genome.jp (ftp.genome.jp)|133.103.200.25|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/db/kofam ... done.
==> SIZE ko_list.gz ... 797744
==> PASV ... done. ==> RETR ko_list.gz ... done.
Length: 797744 (779K) (unauthoritative)
ko_list.gz 100%[=============================================================================================>] 779.05K 910KB/s in 0.9s
2022-03-23 22:26:30 (910 KB/s) - ‘DRAM_data/database_files/kofam_ko_list.tsv.gz’ saved [797744]
1:12:57.305246: KOfam ko list processed
1:12:57.305320: PFAM hmm dat processed
1:12:57.305353: dbCAN fam activities processed
downloading http://fileshare.csb.univie.ac.at/vog/latest/vog.annotations.tsv.gz
--2022-03-23 22:26:30-- http://fileshare.csb.univie.ac.at/vog/latest/vog.annotations.tsv.gz
Resolving fileshare.csb.univie.ac.at (fileshare.csb.univie.ac.at)... 131.130.65.128
Connecting to fileshare.csb.univie.ac.at (fileshare.csb.univie.ac.at)|131.130.65.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 172023 (168K) [application/x-gzip]
Saving to: ‘DRAM_data/vog_annotations_latest.tsv.gz’
DRAM_data/vog_annotations_latest.tsv.gz 100%[=============================================================================================>] 167.99K 218KB/s in 0.8s
2022-03-23 22:26:31 (218 KB/s) - ‘DRAM_data/vog_annotations_latest.tsv.gz’ saved [172023/172023]
1:12:58.520111: VOGdb annotations processed
downloading https://raw.githubusercontent.com/shafferm/DRAM/master/data/genome_summary_form.tsv
--2022-03-23 22:26:31-- https://raw.githubusercontent.com/shafferm/DRAM/master/data/genome_summary_form.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 571894 (558K) [text/plain]
Saving to: ‘DRAM_data/database_files/genome_summary_form.20220323.tsv’
DRAM_data/database_files/genome_summary_form 100%[=============================================================================================>] 558.49K --.-KB/s in 0.04s
2022-03-23 22:26:32 (13.8 MB/s) - ‘DRAM_data/database_files/genome_summary_form.20220323.tsv’ saved [571894/571894]
downloading https://raw.githubusercontent.com/shafferm/DRAM/master/data/module_step_form.tsv
--2022-03-23 22:26:32-- https://raw.githubusercontent.com/shafferm/DRAM/master/data/module_step_form.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 579664 (566K) [text/plain]
Saving to: ‘DRAM_data/database_files/module_step_form.20220323.tsv’
DRAM_data/database_files/module_step_form.20 100%[=============================================================================================>] 566.08K --.-KB/s in 0.04s
2022-03-23 22:26:32 (13.5 MB/s) - ‘DRAM_data/database_files/module_step_form.20220323.tsv’ saved [579664/579664]
downloading https://raw.githubusercontent.com/shafferm/DRAM/master/data/etc_module_database.tsv
--2022-03-23 22:26:32-- https://raw.githubusercontent.com/shafferm/DRAM/master/data/etc_module_database.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2378 (2.3K) [text/plain]
Saving to: ‘DRAM_data/database_files/etc_mdoule_database.20220323.tsv’
DRAM_data/database_files/etc_mdoule_database 100%[=============================================================================================>] 2.32K --.-KB/s in 0s
2022-03-23 22:26:32 (20.1 MB/s) - ‘DRAM_data/database_files/etc_mdoule_database.20220323.tsv’ saved [2378/2378]
downloading https://raw.githubusercontent.com/shafferm/DRAM/master/data/function_heatmap_form.tsv
--2022-03-23 22:26:32-- https://raw.githubusercontent.com/shafferm/DRAM/master/data/function_heatmap_form.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10917 (11K) [text/plain]
Saving to: ‘DRAM_data/database_files/function_heatmap_form.20220323.tsv’
DRAM_data/database_files/function_heatmap_fo 100%[=============================================================================================>] 10.66K --.-KB/s in 0s
2022-03-23 22:26:32 (33.8 MB/s) - ‘DRAM_data/database_files/function_heatmap_form.20220323.tsv’ saved [10917/10917]
downloading https://raw.githubusercontent.com/shafferm/DRAM/master/data/amg_database.tsv
--2022-03-23 22:26:32-- https://raw.githubusercontent.com/shafferm/DRAM/master/data/amg_database.tsv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21569 (21K) [text/plain]
Saving to: ‘DRAM_data/database_files/amg_database.20220323.tsv’
DRAM_data/database_files/amg_database.202203 100%[=============================================================================================>] 21.06K --.-KB/s in 0.001s
2022-03-23 22:26:33 (14.3 MB/s) - ‘DRAM_data/database_files/amg_database.20220323.tsv’ saved [21569/21569]
1:12:59.944019: DRAM databases and forms downloaded
1:12:59.959355: Files moved to final destination
/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/database_handler.py:51: UserWarning: Database does not exist at path None
warnings.warn('Database does not exist at path %s' % self.description_loc)
1:13:00.031347: DRAM description database populated
1:13:20.526467: Database preparation completed
```