# DRAM setup (23-Mar-2022) https://github.com/shafferm/DRAM [toc] ## Install ### `DRAM-setup.py prepare_databases` ```bash wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml conda env create -f environment.yaml -n DRAM conda activate DRAM DRAM-setup.py prepare_databases --output_dir DRAM_data --skip_uniref --threads 30 --verbose ``` Finished in 70 minutes, had this warning message, but I think it took care of it (full output is at the bottom of this page): ``` 1:12:59.944019: DRAM databases and forms downloaded 1:12:59.959355: Files moved to final destination /media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/database_handler.py:51: UserWarning: Database does not exist at path None warnings.warn('Database does not exist at path %s' % self.description_loc) 1:13:00.031347: DRAM description database populated 1:13:20.526467: Database preparation completed ``` ### `DRAM-setup.py print_config` And databases seem to be set: ```bash DRAM-setup.py print_config ``` ``` Processed search databases KEGG db: None KOfam db: /media/executor/mlee/dram/DRAM_data/kofam_profiles.hmm KOfam KO list: /media/executor/mlee/dram/DRAM_data/kofam_ko_list.tsv UniRef db: None Pfam db: /media/executor/mlee/dram/DRAM_data/pfam.mmspro dbCAN db: /media/executor/mlee/dram/DRAM_data/dbCAN-HMMdb-V10.txt RefSeq Viral db: /media/executor/mlee/dram/DRAM_data/refseq_viral.20220323.mmsdb MEROPS peptidase db: /media/executor/mlee/dram/DRAM_data/peptidases.20220323.mmsdb VOGDB db: /media/executor/mlee/dram/DRAM_data/vog_latest_hmms.txt Descriptions of search database entries Pfam hmm dat: /media/executor/mlee/dram/DRAM_data/Pfam-A.hmm.dat.gz dbCAN family activities: /media/executor/mlee/dram/DRAM_data/CAZyDB.07292021.fam-activities.txt VOG annotations: /media/executor/mlee/dram/DRAM_data/vog_annotations_latest.tsv.gz Description db: /media/executor/mlee/dram/DRAM_data/description_db.sqlite DRAM distillation sheets Genome summary form: /media/executor/mlee/dram/DRAM_data/genome_summary_form.20220323.tsv Module step form: /media/executor/mlee/dram/DRAM_data/module_step_form.20220323.tsv ETC module database: /media/executor/mlee/dram/DRAM_data/etc_mdoule_database.20220323.tsv Function heatmap form: /media/executor/mlee/dram/DRAM_data/function_heatmap_form.20220323.tsv AMG database: /media/executor/mlee/dram/DRAM_data/amg_database.20220323.tsv ``` ## Testing ### Getting a test genome ```bash # getting a test genome curl -L -o GCF_000005845.2.fasta.gz https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz gunzip GCF_000005845.2.fasta.gz ``` ### `DRAM.py annotate` ```bash # running annotate DRAM.py annotate -i GCF_000005845.2.fasta -o test-DRAM-output ``` Failed at ~20 minutes, full output here: ``` 1 fastas found 2022-03-23 22:33:00.825213: Annotation started 0:00:00.009530: Retrieved database locations and descriptions 0:00:00.009586: Annotating GCF_000005845.2 0:01:11.657135: Turning genes from prodigal to mmseqs2 db 0:01:14.437307: Getting hits from kofam 0:23:17.054402: Getting forward best hits from peptidase 0:23:30.591247: Getting reverse best hits from peptidase 0:23:31.874700: Getting descriptions of hits from peptidase /media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/database_handler.py:81: UserWarning: No descriptions were found for your id's. Does this MER0295850 look like an id from peptidase_description warnings.warn("No descriptions were found for your id's. Does this %s look like an id from %s" % (list(ids)[0], Traceback (most recent call last): File "/media/executor/mlee/miniconda3/envs/DRAM/bin/DRAM.py", line 189, in <module> args.func(**args_dict) File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1039, in annotate_bins_cmd annotate_bins(list(set(fasta_locs)), output_dir, min_contig_size, prodigal_mode, trans_table, bit_score_threshold, File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1078, in annotate_bins all_annotations = annotate_fastas(fasta_locs, output_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 1012, in annotate_fastas annotate_fasta(fasta_loc, fasta_name, fasta_dir, db_handler, min_contig_size, prodigal_mode, trans_table, File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 920, in annotate_fasta annotations = annotate_orfs(gene_faa, db_handler, tmp_dir, start_time, custom_db_locs, custom_hmm_locs, File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 820, in annotate_orfs annotation_list.append(do_blast_style_search(query_db, db_handler.db_locs['peptidase'], tmp_dir, File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 683, in do_blast_style_search hits = formater(hits, header_dict) File "/media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/annotate_bins.py", line 187, in get_peptidase_description header = header_dict[peptidase_hit] KeyError: 'MER0295850' ``` This issue might be related https://github.com/WrightonLabCSU/DRAM/issues/158 ## `DRAM-setup.py prepare_databases` full output ```bash DRAM-setup.py prepare_databases --output_dir DRAM_data --skip_uniref --threads 30 --verbose ``` ```= 2022-03-23 21:13:33.139184: Database preparation started Downloading dbCAN family activities from : https://bcb.unl.edu/dbCAN2/download/Databases/V10/CAZyDB.07292021.fam-activities.txt downloading https://bcb.unl.edu/dbCAN2/download/Databases/V10/CAZyDB.07292021.fam-activities.txt --2022-03-23 21:13:33-- https://bcb.unl.edu/dbCAN2/download/Databases/V10/CAZyDB.07292021.fam-activities.txt Resolving bcb.unl.edu (bcb.unl.edu)... 129.93.147.58 Connecting to bcb.unl.edu (bcb.unl.edu)|129.93.147.58|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 68035 (66K) [text/plain] Saving to: ‘DRAM_data/CAZyDB.07292021.fam-activities.txt’ DRAM_data/CAZyDB.07292021.fam-activities.txt 100%[=============================================================================================>] 66.44K --.-KB/s in 0.09s 2022-03-23 21:13:33 (710 KB/s) - ‘DRAM_data/CAZyDB.07292021.fam-activities.txt’ saved [68035/68035] downloading ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz --2022-03-23 21:13:33-- ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz => ‘DRAM_data/Pfam-A.hmm.dat.gz’ Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.197.74 Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.197.74|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/databases/Pfam/current_release ... done. ==> SIZE Pfam-A.hmm.dat.gz ... 514890 ==> PASV ... done. ==> RETR Pfam-A.hmm.dat.gz ... done. Length: 514890 (503K) (unauthoritative) Pfam-A.hmm.dat.gz 100%[=============================================================================================>] 502.82K 622KB/s in 0.8s 2022-03-23 21:13:36 (622 KB/s) - ‘DRAM_data/Pfam-A.hmm.dat.gz’ saved [514890] Downloading dbCAN from: http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt downloading http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt --2022-03-23 21:13:36-- http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt Resolving bcb.unl.edu (bcb.unl.edu)... 129.93.147.58 Connecting to bcb.unl.edu (bcb.unl.edu)|129.93.147.58|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt [following] --2022-03-23 21:13:36-- https://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V10.txt Connecting to bcb.unl.edu (bcb.unl.edu)|129.93.147.58|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 100147232 (96M) [text/plain] Saving to: ‘./dbCAN-HMMdb-V10.txt’ ./dbCAN-HMMdb-V10.txt 100%[=============================================================================================>] 95.51M 48.9MB/s in 2.0s 2022-03-23 21:13:38 (48.9 MB/s) - ‘./dbCAN-HMMdb-V10.txt’ saved [100147232/100147232] 0:00:08.197517: dbCAN database processed downloading ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz --2022-03-23 21:13:41-- ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz => ‘DRAM_data/database_files/Pfam-A.full.gz’ Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.197.74 Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.197.74|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/databases/Pfam/current_release ... done. ==> SIZE Pfam-A.full.gz ... 15188156081 ==> PASV ... done. ==> RETR Pfam-A.full.gz ... done. Length: 15188156081 (14G) (unauthoritative) Pfam-A.full.gz 100%[=============================================================================================>] 14.14G 10.5MB/s in 25m 20s 2022-03-23 21:39:03 (9.53 MB/s) - ‘DRAM_data/database_files/Pfam-A.full.gz’ saved [15188156081] 1:02:48.169201: PFAM database processed downloading ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.protein.faa.gz --2022-03-23 22:16:21-- ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.protein.faa.gz => ‘DRAM_data/database_files/viral.1.protein.faa.gz’ Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 130.14.250.11, 2607:f220:41e:250::10, ... Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /refseq/release/viral ... done. ==> SIZE viral.1.protein.faa.gz ... 14008795 ==> PASV ... done. ==> RETR viral.1.protein.faa.gz ... done. Length: 14008795 (13M) (unauthoritative) viral.1.protein.faa.gz 100%[=============================================================================================>] 13.36M 8.17MB/s in 1.6s 2022-03-23 22:16:24 (8.17 MB/s) - ‘DRAM_data/database_files/viral.1.protein.faa.gz’ saved [14008795] downloading ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.2.protein.faa.gz --2022-03-23 22:16:24-- ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.2.protein.faa.gz => ‘DRAM_data/database_files/viral.2.protein.faa.gz’ Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.10, 130.14.250.7, 2607:f220:41e:250::10, ... Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.10|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /refseq/release/viral ... done. ==> SIZE viral.2.protein.faa.gz ... 32804181 ==> PASV ... done. ==> RETR viral.2.protein.faa.gz ... done. Length: 32804181 (31M) (unauthoritative) viral.2.protein.faa.gz 100%[=============================================================================================>] 31.28M 15.8MB/s in 2.0s 2022-03-23 22:16:27 (15.8 MB/s) - ‘DRAM_data/database_files/viral.2.protein.faa.gz’ saved [32804181] 1:03:02.211988: RefSeq viral database processed downloading ftp://ftp.ebi.ac.uk/pub/databases/merops/current_release/pepunit.lib --2022-03-23 22:16:35-- ftp://ftp.ebi.ac.uk/pub/databases/merops/current_release/pepunit.lib => ‘DRAM_data/database_files/merops_peptidases_nr.faa’ Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.197.74 Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.197.74|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/databases/merops/current_release ... done. ==> SIZE pepunit.lib ... 448367206 ==> PASV ... done. ==> RETR pepunit.lib ... done. Length: 448367206 (428M) (unauthoritative) pepunit.lib 100%[=============================================================================================>] 427.60M 22.0MB/s in 20s 2022-03-23 22:16:57 (20.9 MB/s) - ‘DRAM_data/database_files/merops_peptidases_nr.faa’ saved [448367206] 1:03:44.736818: MEROPS database processed downloading http://fileshare.csb.univie.ac.at/vog/latest/vog.hmm.tar.gz --2022-03-23 22:17:17-- http://fileshare.csb.univie.ac.at/vog/latest/vog.hmm.tar.gz Resolving fileshare.csb.univie.ac.at (fileshare.csb.univie.ac.at)... 131.130.65.128 Connecting to fileshare.csb.univie.ac.at (fileshare.csb.univie.ac.at)|131.130.65.128|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 295237694 (282M) [application/x-gzip] Saving to: ‘DRAM_data/database_files/vog.hmm.tar.gz’ DRAM_data/database_files/vog.hmm.tar.gz 100%[=============================================================================================>] 281.56M 3.11MB/s in 2m 0s 2022-03-23 22:19:18 (2.35 MB/s) - ‘DRAM_data/database_files/vog.hmm.tar.gz’ saved [295237694/295237694] 1:07:16.261940: VOGdb database processed downloading ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz --2022-03-23 22:20:49-- ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz => ‘DRAM_data/database_files/kofam_profiles.tar.gz’ Resolving ftp.genome.jp (ftp.genome.jp)... 133.103.200.25 Connecting to ftp.genome.jp (ftp.genome.jp)|133.103.200.25|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/db/kofam ... done. ==> SIZE profiles.tar.gz ... 1347507136 ==> PASV ... done. ==> RETR profiles.tar.gz ... done. Length: 1347507136 (1.3G) (unauthoritative) profiles.tar.gz 100%[=============================================================================================>] 1.25G 23.5MB/s in 77s 2022-03-23 22:22:08 (16.7 MB/s) - ‘DRAM_data/database_files/kofam_profiles.tar.gz’ saved [1347507136] 1:12:54.163111: KOfam database processed downloading ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz --2022-03-23 22:26:27-- ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz => ‘DRAM_data/database_files/kofam_ko_list.tsv.gz’ Resolving ftp.genome.jp (ftp.genome.jp)... 133.103.200.25 Connecting to ftp.genome.jp (ftp.genome.jp)|133.103.200.25|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/db/kofam ... done. ==> SIZE ko_list.gz ... 797744 ==> PASV ... done. ==> RETR ko_list.gz ... done. Length: 797744 (779K) (unauthoritative) ko_list.gz 100%[=============================================================================================>] 779.05K 910KB/s in 0.9s 2022-03-23 22:26:30 (910 KB/s) - ‘DRAM_data/database_files/kofam_ko_list.tsv.gz’ saved [797744] 1:12:57.305246: KOfam ko list processed 1:12:57.305320: PFAM hmm dat processed 1:12:57.305353: dbCAN fam activities processed downloading http://fileshare.csb.univie.ac.at/vog/latest/vog.annotations.tsv.gz --2022-03-23 22:26:30-- http://fileshare.csb.univie.ac.at/vog/latest/vog.annotations.tsv.gz Resolving fileshare.csb.univie.ac.at (fileshare.csb.univie.ac.at)... 131.130.65.128 Connecting to fileshare.csb.univie.ac.at (fileshare.csb.univie.ac.at)|131.130.65.128|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 172023 (168K) [application/x-gzip] Saving to: ‘DRAM_data/vog_annotations_latest.tsv.gz’ DRAM_data/vog_annotations_latest.tsv.gz 100%[=============================================================================================>] 167.99K 218KB/s in 0.8s 2022-03-23 22:26:31 (218 KB/s) - ‘DRAM_data/vog_annotations_latest.tsv.gz’ saved [172023/172023] 1:12:58.520111: VOGdb annotations processed downloading https://raw.githubusercontent.com/shafferm/DRAM/master/data/genome_summary_form.tsv --2022-03-23 22:26:31-- https://raw.githubusercontent.com/shafferm/DRAM/master/data/genome_summary_form.tsv Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 571894 (558K) [text/plain] Saving to: ‘DRAM_data/database_files/genome_summary_form.20220323.tsv’ DRAM_data/database_files/genome_summary_form 100%[=============================================================================================>] 558.49K --.-KB/s in 0.04s 2022-03-23 22:26:32 (13.8 MB/s) - ‘DRAM_data/database_files/genome_summary_form.20220323.tsv’ saved [571894/571894] downloading https://raw.githubusercontent.com/shafferm/DRAM/master/data/module_step_form.tsv --2022-03-23 22:26:32-- https://raw.githubusercontent.com/shafferm/DRAM/master/data/module_step_form.tsv Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 579664 (566K) [text/plain] Saving to: ‘DRAM_data/database_files/module_step_form.20220323.tsv’ DRAM_data/database_files/module_step_form.20 100%[=============================================================================================>] 566.08K --.-KB/s in 0.04s 2022-03-23 22:26:32 (13.5 MB/s) - ‘DRAM_data/database_files/module_step_form.20220323.tsv’ saved [579664/579664] downloading https://raw.githubusercontent.com/shafferm/DRAM/master/data/etc_module_database.tsv --2022-03-23 22:26:32-- https://raw.githubusercontent.com/shafferm/DRAM/master/data/etc_module_database.tsv Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 2378 (2.3K) [text/plain] Saving to: ‘DRAM_data/database_files/etc_mdoule_database.20220323.tsv’ DRAM_data/database_files/etc_mdoule_database 100%[=============================================================================================>] 2.32K --.-KB/s in 0s 2022-03-23 22:26:32 (20.1 MB/s) - ‘DRAM_data/database_files/etc_mdoule_database.20220323.tsv’ saved [2378/2378] downloading https://raw.githubusercontent.com/shafferm/DRAM/master/data/function_heatmap_form.tsv --2022-03-23 22:26:32-- https://raw.githubusercontent.com/shafferm/DRAM/master/data/function_heatmap_form.tsv Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 10917 (11K) [text/plain] Saving to: ‘DRAM_data/database_files/function_heatmap_form.20220323.tsv’ DRAM_data/database_files/function_heatmap_fo 100%[=============================================================================================>] 10.66K --.-KB/s in 0s 2022-03-23 22:26:32 (33.8 MB/s) - ‘DRAM_data/database_files/function_heatmap_form.20220323.tsv’ saved [10917/10917] downloading https://raw.githubusercontent.com/shafferm/DRAM/master/data/amg_database.tsv --2022-03-23 22:26:32-- https://raw.githubusercontent.com/shafferm/DRAM/master/data/amg_database.tsv Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.108.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 21569 (21K) [text/plain] Saving to: ‘DRAM_data/database_files/amg_database.20220323.tsv’ DRAM_data/database_files/amg_database.202203 100%[=============================================================================================>] 21.06K --.-KB/s in 0.001s 2022-03-23 22:26:33 (14.3 MB/s) - ‘DRAM_data/database_files/amg_database.20220323.tsv’ saved [21569/21569] 1:12:59.944019: DRAM databases and forms downloaded 1:12:59.959355: Files moved to final destination /media/executor/mlee/miniconda3/envs/DRAM/lib/python3.9/site-packages/mag_annotator/database_handler.py:51: UserWarning: Database does not exist at path None warnings.warn('Database does not exist at path %s' % self.description_loc) 1:13:00.031347: DRAM description database populated 1:13:20.526467: Database preparation completed ```