Trunk River metagenome-assembled genomes (MAGs) workflow

# Trunk River metagenome-assembled genomes (MAGs) workflow ### Step 1: Sequence QC using PRINSEQ ### Step 2: Sequence Assembly using SPAdes ### Step 3: Contigs statistics using MetaQUAST ### Step 4: Binning using Anvi'o & Metabat ### Step 5: Bins statistics using CheckM (completeness and contamination) # PRINSEQ ### Commands to calculate summary statistics > prinseq-lite.pl -verbose -fastq <forward read> -fastq2 <reverse read> -stats_all > <sample_stats>.txt > prinseq-lite.pl -verbose -fastq sample1_R1.fastq -fastq2 sample1_R2.fastq --graph_data sample1.graph.gd -qual_noscale -exact_only -no_qual_header ### Commands to eliminate sequences with less than 25 mean quality scores, replicated reads and sequencing tags > prinseq-lite.pl -fastq *_R1.fastq -fastq2 *_R2.fastq -min_qual_mean 25 -ns_max_n 0 -derep 12345 -lc_method entropy -lc_threshold 50 -trim_qual_left 20 -trim_qual_right 20 -trim_qual_type min -trim_qual_rule lt -trim_qual_window 1 -trim_qual_step 1 -out_format 3 -out_good SK_good -out_bad SK_bad # SPAdes #### Step 1. Concatenate all forward and reverse reads > cat *_good_1.fastq > concatenated1.fastq > cat *_good_2.fastq > concatenated_2.fastq #### Step 2. Run SPAdes > spades.py -1 concat_1.fastq -2 concat_2.fastq -o TrunkMeta_assembly --meta # MetaBat #### Step 1: Build index using bowtie2 > bowtie2-build contigs.fasta spades-contigs-bowtie2-index #### Step 2: Map reads to the contigs to generate the bam files > bowtie2 -x spades-contigs-bowtie2-index -1 7A3_good_1.fastq -2 7A3_good_2.fastq -S sample_1.sam #### Step 3: Convert SAM to BAM file using Samtools (do not clusterize) > samtools view -bS sample_1.sam > sample_1.bam #### Step 4: Sort BAM file > samtools sort sample_1.bam -o sample.sorted_1.bam #### Step 5: Summarize mapping files (Calculate the coverage) > jgi_summarize_bam_contig_depths --outputDepth depth.txt --pairedContigs paired.txt *.bam #### Step 6: Run the binnig step ##### https://bitbucket.org/berkeleylab/metabat > metabat -i contigs.fasta -a depth.txt -m 2000 --saveCls -o output-file > mkdir bin_folder > mv output-file.*.fa bin_folder # CheckM ### https://github.com/Ecogenomics/CheckM/wiki/Genome-Quality-Commands #### Step 1: Place bins in the reference genome > checkm tree <bin folder> <output folder> > checkm tree Bins tree_folder -x fna #### Step 2: Assess phylogenetic markers found in each bin > checkm tree_qa <tree folder> > checkm tree_qa tree_folder -o 1 #### Step 3: Infer lineage-specific marker sets for each bin > checkm lineage_set <tree folder> <marker file> > checkm lineage_set tree_folder marker_file #### Step 4: List available taxonomic-specific marker sets > checkm taxon_list #### Step 5: Generate taxonomic-specific marker set > checkm taxon_set <rank> <taxon> <marker file> > checkm taxon_set domain Bacteria marker_file #### Step 6: Identify marker genes bins > checkm analyze <marker file> <bin folder> <output folder> > checkm analyze marker_file bin_folder -x fa out_folder #### Step 7: Assess bins for contamination and completeness > checkm qa marker_file out_folder -o 1 # MetaQUAST > metaquast.py contigs.fasta -o metaquast_output > metaquast.py contigs.fasta -o metaquast_output --max-ref-number 0 # Anvi'o ### http://merenlab.org/2016/06/22/anvio-tutorial-v2/#creating-an-anvio-contigs-database #### Step 1: Create Anvi'o contigs database > anvi_database anvi-gen-contigs-database -f contigs.fasta -o contigs.db #### Step 2: Run Hidden Markov model (archaeal and bacterial single-copy core genes search) > anvi-run-hmms -c contigs.db #### Step 3: Display contigs stats > anvi-display-contigs-stats contigs.db ##### Open your browser and connect to localhost port 8080: http://localhost:8080 #### Step 4: Annotate genes with functions > anvi-run-ncbi-cogs -c contigs2.db --num-threads 20 #### Step 5: Import taxonomy > anvi-get-sequences-for-gene-calls -c contigs2.db -o gene-calls.fa > centrifuge -f -x /blastdb/centrifuge/p+h+v/p+h+v gene-calls.fa -S centrifuge_hits.tsv > anvi-import-taxonomy-for-genes -c contigs2.db -i centrifuge_report.tsv centrifuge_hits.tsv -p centrifuge #### Step 6: Sort and index bam files > anvi-init-bam sample_1.bam -O SAMPLE-01 > anvi-init-bam sample_2.bam -O SAMPLE-02 > anvi-init-bam sample_3.bam -O SAMPLE-03 > anvi-init-bam sample_4.bam -O SAMPLE-04 #### Step 8: Create Anvi'o profiles ##### --sample-name parameter: This name will appear in the graph. Sample names can't start with digits. > anvi-profile -i SAMPLE-01.bam -c contigs.db --output-dir PROFILES/SAMPLE_01_Profile --sample-name A1 #### Step 7: Merge Anvi'o profiles > anvi-merge SAMPLE-01/PROFILE.db SAMPLE-02/PROFILE.db SAMPLE-03/PROFILE.db -o SAMPLES-MERGED -c contigs.db > anvi-merge */PROFILE.db -o SAMPLES-MERGED -c contigs.db > anvi-merge PROFILES3/SAMPLE_01_Profile/PROFILE.db PROFILES3/SAMPLE_02_Profile/PROFILE.db PROFILES3/SAMPLE_03_Profile/PROFILE.db PROFILES3/SAMPLE_04_Profile/PROFILE.db -o SAMPLES-MERGED -c contigs.db --sample-name samples_merged ##### if you would like to skip default CONCOCT clustering, use --skip-concoct-binning flag > anvi-merge PROFILES3/SAMPLE_01_Profile/PROFILE.db PROFILES3/SAMPLE_02_Profile/PROFILE.db PROFILES3/SAMPLE_03_Profile/PROFILE.db PROFILES3/SAMPLE_04_Profile/PROFILE.db -o SAMPLES-MERGED4 -c contigs2.db --sample-name Trunk_River --skip-concoct-binning #### Step 8: anvi-interactive > anvi-interactive -p SAMPLES-MERGED4/PROFILE.db -c contigs2.db --server-only -P 8080 ##### Open your browser and connect to localhost port 8080: http://localhost:8080