Analysis fungal ITS amplicon from illumina sequencing by qiime2

# Analysis fungal ITS amplicon from illumina sequencing by qiime2 MSc. Kelly J. Hidalgo Martinez Microbiologist Ph.D. student in Genetics and Molecular Biology Division of Microbial Resources Research Center for Agriculture, Biology and Chemical University of Campinas Brazil Phone: +55 19 98172 1510 --- ### Requirements 1. Putty [for windows users](https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html) 2. [ Qiime2 ](https://docs.qiime2.org/2019.4/) installing via conda qiime2-2019.4 3. FASTQC - installing via conda bioinfo environment 4. Trimmomatic - installing via conda bioinfo environment [manual](http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf) 5. Filezilla [windows users](https://filezilla-project.org/download.php?platform=win64) / [linux users](https://filezilla-project.org/download.php) ![](https://i.imgur.com/ROGltvY.jpg) 6. [Download](https://github.com/khidalgo85/qiime2/raw/master/workflow_its_qiime2.pdf) the schematic workflow ## Importants tips before you start 1. When typing a fila name or directory path, you can use **tab completion** 2. Press **up arrow** to get back previous commands you typed 3. Do not stored commands in a word processing program 4. Shell commands are **case-sensitive** 5. Every command has a **help** menu. (qiime --help, qiime dada2 --help, qiime dada2 denoise-paired --help) ### 1. Join the server ```coffeescript= ## Server IP address (IP address for Host box and bioinfor for Username box in filezilla) ssh -x bioinfo@143.106.82.118 ## Password (In filezilla for the password box. Port box 22) bioinfo@15 ## As root su root ## Root password 20141117 ``` ### 2. Working Directory ```coffeescript= ## Change directory cd /data/treinamento/its/ ## Make new directory for the raw data mkdir 00.RawData ## Change directory cd 00.RawData/ ``` Put the samples in the 00.RawData directory ***Download the dataset*** (Note if you have your own samples, it's not necessary) ```coffeescript= ## sample 1 Forward wget https://github.com/USDA-ARS-GBRU/itsxpress-tutorial/raw/master/data/sample1_r1.fq.gz ## sample 1 Reverse wget https://github.com/USDA-ARS-GBRU/itsxpress-tutorial/raw/master/data/sample1_r2.fq.gz ## sample 2 Forward wget https://github.com/USDA-ARS-GBRU/itsxpress-tutorial/raw/master/data/sample2_r1.fq.gz ## sample 2 Reverse wget https://github.com/USDA-ARS-GBRU/itsxpress-tutorial/raw/master/data/sample2_r2.fq.gz ``` --- ## *First option: Quality control with Fastqc and trimming with Trimmomatic (out of Qiime2)* --- ### 3A. Trim primers with cutadapt The cutadapt is installed in qiime2-2019.4 virtual environment. The cutadapt screen out reads that do not begin with primer sequences and remove primer sequence from reads. ```coffeescript= ## o activate the virtual enviroment conda activate qiime2-2019.4 ## Make new directory for the samples without primers mkdir 01.PrimersTrim ``` The below primers correspond to the fungal ITS2 region -g GCATCGATGAAGAACGCAGC \ -G TCCTCCGCTTATTGATATGC \ You have to run the next command for each sample ```coffeescript= parallel --link --jobs 4 \ 'cutadapt \ --pair-filter any \ --no-indels \ --discard-untrimmed \ -g GCATCGATGAAGAACGCAGC \ -G TCCTCCGCTTATTGATATGC \ -o 01.PrimersTrim/sample1_r1.fq.gz \ -p 01.PrimersTrim/sample1_r2.fq.gz \ {1} {2} \ > 01.PrimersTrim/sample1_cutadapt_log.txt' \ ::: 00.RawData/sample1_r1.fq.gz ::: 00.RawData/sample2_r2.fq.gz ``` Download microbiome_helper package ```coffeescript= git clone https://github.com/LangilleLab/microbiome_helper.git ``` Create a log.txt file only with all the sequences that passed after the primer trimming, with a script of microbiome_helper. ```coffeescript= microbiome_helper/parse_cutadapt_logs.py -i 01.PrimersTrim/*log.txt ## See the txt file nano cutadapt_log.txt ## Erase all intermediate arquives rm 01.PrimersTrim/*_log.txt ``` Create a .xlsx (excel) file for control the number the reads in each step | SampleID | Raw_reads | Length | Post_trim_primers | Diference | Post-trim | Diference | itsXpress | Difference | % Lost | Denoised | Difference | Merged | non-chimeric | Difference | Final difference | Final % lost | | -------- | --------- | ------ | ----------------- | --------- | --------- | --------- | --------- | ---------- | ------ | -------- | ---------- | ------ | ------------ | ---------- | ---------------- | ------------ | Like this! (*summary_stats.xlsx* This template is the working material) ![TruncLef](https://i.imgur.com/FswMySN.jpg) ### 4A. Inspect read quality ```coffeescript= ## Make new directory for the quality reports mkdir 02.FastqcReports ## Fastqc is installed in the bioinfo virtual enviroment. Activate! conda activate bioinfo ## Run fastqc fastqc -t 10 01.PrimersTrim/* -o 02.FastqcReports/ ## Make one report with all the samples multiqc 02.FastqcReports/* -o 02.FastqcReports/ ``` Review output from multiqc `multiqc_report.html` in the `02.FastqcReports/` with the quality information. Download from Filezilla. You can view this report in a web-browser. The most important is the per-base quality it must be Q>30. ### 5A. Filter out low-quality reads ```coffeescript= ## Make new directory for the samples after quality control mkdir 03.CleanData ## Make new directory for the unpaired reads (this is trash) mkdir unpaired ``` Run Trimmomatic (Modify the trimming parameters for your samples) #### Parameters *LEADING* Remove low quality bases from the beginning. Specifies the minimum quality required to keep a base. *TRAILING* Remove low quality bases from the end. Specifies the minimum quality required to keep a base. *SLIDINGWINDOW* Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold. WindowSize:requiredQuality. *MAXINFO* Performs an adaptive quality rim, balancing the benefits of retining reads against the cost of retaining bases with errors. targetLength:strictness. *MINLEN* Removes reads that fall below the specigied minimal length. For more details see the [manual](http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf) ```coffeescript= for i in 01.PrimersTrim/*1.fq.gz do BASE=$(basename $i 1.fq.gz) ## The number of threads is depending of your pc trimmomatic PE -threads 12 $i 01.PrimersTrim/${BASE}2.fq.gz 03.CleanData/${BASE}1_paired.fq.gz unpaired/${BASE}1_unpaired.fq.gz 03.CleanData/${BASE}2_paired.fq.gz unpaired/${BASE}2_unpaired.fq.gz LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MAXINFO:80:0.5 MINLEN:200 done ``` Re-check the quality ```coffeescript= fastqc -t 10 03.CleanData/* -o 02.FastqcReports/ multiqc 02.FastqcReports/*paired* -o 02.FastqcReports/ ## Download from Filezilla ``` ## 6A. Importing the FASTQ files as artifact Create the ManifestFile.txt ```coffeescript= nano ManifestFile.txt ## Columns separated by tab sample-id forward-absolute-filepath reverse-absolute-filepath sample1 $PWD/03.CleanData/sample1_r1_paired.fq.gz $PWD/03.CleanData/sample1_r2_paired.fq.gz sample2 $PWD/03.CleanData/sample2_r1_paired.fq.gz $PWD/03.CleanData/sample2_r2_paired.fq.gz ## To exit Ctrl + x ## To Save $ S ## To confirm $ Enter ``` Like this! ![](https://i.imgur.com/KGIktLJ.jpg) ```coffeescript= ## To deactivate the bioinfo environment, stay in qiime2-2019.4 environment conda deactivate ## Make new directory for the artifacts mkdir 04.ImportedReads ## Import as artefact in qiime2 qiime tools import \ --type 'SampleData[PairedEndSequencesWithQuality]' \ --input-path ManifestFile.txt \ --output-path 04.ImportedReads/reads_trimmed.qza \ --input-format PairedEndFastqManifestPhred33V2 ## Visualization qiime demux summarize \ --i-data 04.ImportedReads/reads_trimmed.qza \ --o-visualization 04.ImportedReads/reads_trimmed.qzv ``` Use the qiime2 studio for visualizations [Qiime2 View](https://view.qiime2.org) Generate a table with the number of sequences in each step using a microbiome_helper script ```coffeescript= microbiome_helper/qiime2_fastq_lengths.py 04.ImportedReads/reads_trimmed.qza --proc 4 -o read_counts.tsv ## See the file nano read_counts.tsv ``` ## 7A. OPTIONAL! Extracting fungal ITS with [itsXpress](https://github.com/USDA-ARS-GBRU/q2_itsxpress) (only for fungal analysis) This qiime2 plugin extract ITS1 and ITS2 - as well as full-length ITS sequences from high-throughput sequencing datasets. ```coffeescript= ## Make new directory for ItsX output mkdir 05.ItsXpress ## for help qiime itsxpress trim-pair-output-unmerged --help ## Run (The region depends on your sequenced region) qiime itsxpress trim-pair-output-unmerged \ --i-per-sample-sequences 04.ImportedReads/reads_trimmed.qza \ --p-region ITS2 \ --p-taxa F \ --p-threads 10 \ --o-trimmed 05.ItsXpress/readstrimmed_itsxpress_out.qza ``` Generate a table with the number of sequences in each step using a microbiome_helper script ```coffeescript= microbiome_helper/qiime2_fastq_lengths.py 04.ImportedReads/reads_trimmed.qza 05.ItsXpress/readstrimmed_itsxpress_out.qza --proc 4 -o read_counts.tsv ## See the file nano read_counts.tsv ## Complete your summary_stats.xlsx file!! ``` ## 8A. Denoising, joining reads and chimera removing with DADA2 If you don't make the stage 7A, you have to change the input file for the next command (05.ItsXpress/readstrimmed_itsxpress_out.qza for 04.ImportedReads/reads_trimmed.qza) ```coffeescript= ## Run DADA2 qiime dada2 denoise-paired --i-demultiplexed-seqs 05.ItsXpress/readstrimmed_itsxpress_out.qza \ --p-trunc-len-f 0 \ --p-trunc-len-r 0 \ --output-dir 06.Dada2Output ## Convert the denoising stats from .qza to .tsv qiime tools export --input-path 06.Dada2Output/denoising_stats.qza --output-path 06.Dada2Output ## See the file nano 06.Dada2Output/stats.tsv ## Complete your summary_stats.xlsx file!! ``` ## 9. Download and fit the database For fungal classification the database is UNITE You can fit the database, or download ready to use [here](http://kronos.pharmacology.dal.ca/public_files/taxa_classifiers/qiime2-2019.7_classifiers/classifier_sh_refs_qiime_ver8_99_s_02.02.2019_ITS.qza) ```coffeescript= ## Download the database wget https://files.plutof.ut.ee/public/orig/51/6F/516F387FC543287E1E2B04BA4654443082FE3D7050E92F5D53BA0702E4E77CD4.zip ## Change the name mv 516F387FC543287E1E2B04BA4654443082FE3D7050E92F5D53BA0702E4E77CD4.zip unite_02_02_2019.zip ## Unzip unzip unite_02_02_2019.zip ## Remove all the files out of developver directory rm unite_02_02_2019.zip *.fasta *.txt ## Formating the fasta file awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' developer/sh_refs_qiime_ver8_dynamic_02.02.2019_dev.fasta | sed -e '/^>/!s/$.*$/\U\1/;s/[[:blank:]]*$//' > developer/sh_refs_qiime_ver8_dynamic_02.02.2019_uppercase.fasta ``` ```coffeescript= ## Make new directory for the database mkdir database ## Import the sequences as artifact qiime tools import \ --type FeatureData[Sequence] \ --input-path developer/sh_refs_qiime_ver8_dynamic_02.02.2019_uppercase.fasta \ --output-path database/UNITE.qza ## Import the taxonomy file as artifact qiime tools import \ --type FeatureData[Taxonomy] \ --input-path developer/sh_taxonomy_qiime_ver8_dynamic_02.02.2019_dev.txt \ --output-path database/UNITE_tax.qza \ --input-format HeaderlessTSVTaxonomyFormat ## fit the database qiime feature-classifier fit-classifier-naive-bayes \ --i-reference-reads database/UNITE.qza \ --i-reference-taxonomy database/UNITE_tax.qza \ --o-classifier database/UNITE_classifier.qza ``` If your analysis isn't for fungal sequences, bellow can you download the databases based on the region sequenced: * [16S V4/V5 region](http://kronos.pharmacology.dal.ca/public_files/taxa_classifiers/qiime2-2019.7_classifiers/classifier_silva_132_99_16S_V4.V5_515F_926R.qza) `(classifier_silva_132_99_16S_V4.V5_515F_926R.qza`) * [16S V6/V8 region](http://kronos.pharmacology.dal.ca/public_files/taxa_classifiers/qiime2-2019.7_classifiers/classifier_silva_132_99_16S_V6.V8_B969F_BA1406R.qza) `(classifier_silva_132_99_16S_V6.V8_B969F_BA1406R.qza)` * [16S V6/V8 region targeting archaea](http://kronos.pharmacology.dal.ca/public_files/taxa_classifiers/qiime2-2019.7_classifiers/classifier_silva_132_99_16S_V6.V8_A956F_A1401R.qza)`(classifier_silva_132_99_16S_V6.V8_A956F_A1401R.qza)` * [18S V4 region](http://kronos.pharmacology.dal.ca/public_files/taxa_classifiers/qiime2-2019.7_classifiers/classifier_silva_132_99_18S_V4_E572F_E1009R.qza)`(classifier_silva_132_99_18S_V4_E572F_E1009R.qza)` * [Full ITS - all eukaryotes](http://kronos.pharmacology.dal.ca/public_files/taxa_classifiers/qiime2-2019.7_classifiers/classifier_sh_refs_qiime_ver8_99_s_all_02.02.2019_ITS.qza)`(classifier_sh_refs_qiime_ver8_99_s_all_02.02.2019_ITS.qza)` ## 10. Assign taxonomy to ASVs ```coffeescript= ## Make new directory for the taxonomy classification mkdir 07.TaxonomyClassification ## Run qiime feature-classifier classify-sklearn \ --i-classifier database/UNITE_classifier.qza \ --i-reads 06.Dada2Output/representative_sequences.qza \ --o-classification 07.TaxonomyClassification/taxonomyclassification_dynamic.qza ``` It's recommend to compare the taxonomic assigments with the top BLASTn hits for certain ASVs (~5) ```coffeescript= qiime feature-table tabulate-seqs --i-data 06.Dada2Output/representative_sequences.qza \ --o-visualization 06.Dada2Output/representative_sequences.qzv ``` representative_sequences.qzv ![](https://i.imgur.com/k52uLkB.jpg) ## BONUS (FOR 16S analysis) ### Build tree with FastTree and MAFFT qiime2 plugins ```coffeescript= ## Make new directory for the phylogenetic tree mkdir 12.PhylogeneticTree ## Multiple align with MAFFT qiime alignment mafft --i-sequences 06.DadaOutput/representative_sequences.qza \ --p-n-threads 4 \ --o-alignment 12.PhylogeneticTree/rep_seqs_aligned.qza qiime alignment mask --i-alignment 12.PhylogeneticTree/rep_seqs_aligned.qza \ --o-masked-alignment 12.PhylogeneticTree/rep_seqs_aligned_masked.qza ## FastTree qiime phylogeny fasttree --i-alignment 12.PhylogeneticTree/rep_seqs_aligned_masked.qza \ --p-n-threads 4 \ --o-tree 12.PhylogeneticTree/rep_seqs_aligned_masked_tree.qza # Root the tree qiime phylogeny midpoint-root --i-tree 12.PhylogeneticTree/rep_seqs_aligned_masked_tree.qza \ --o-rooted-tree 12.PhylogeneticTree/rep_seqs_aligned_masked_tree_rooted.qza ``` ## 11. Barplot ```coffeescript= ## Make new directory for the graphs mkdir 08.Graphs ## Create a sample-metadata.txt file. Here you can put all the variables that describes your samples nano sample-metadata.txt sample-id SampleName SampleType Local Date #q2:types categorical categorical categorical categorical sample1 sample1 water SãoPaulo April-18 sample2 sample2 Soil Campinas April-19 ## The second row is for classify the variable according to its nature (p.e categorical ou numerical) ``` Like this! Separate by tab ![](https://i.imgur.com/94lVeFk.jpg) ```coffeescript= ## Make a barplot qiime taxa barplot \ --i-table 06.Dada2Output/table.qza \ --i-taxonomy 07.TaxonomyClassification/taxonomyclassification_dynamic.qza \ --m-metadata-file sample-metadata.txt \ --o-visualization 08.Graphs/taxa-bar-plots-dynamic.qzv ``` Barplot on qiime2 ![](https://i.imgur.com/c7cgLoV.jpg) ## 12. Rarefaction curves See the table from DADA2 output to know the depth ```coffeescript= qiime feature-table summarize \ --i-table 06.Dada2Output/table.qza \ --o-visualization 06.Dada2Output/table_summary.qzv \ --m-sample-metadata-file sample-metadata.txt ``` table_summary.qzv ![](https://i.imgur.com/buO38M6.jpg) ```coffeescript= ## Make new directory for the rarefaction curves mkdir 09.RarefactionCurves ## Run qiime diversity alpha-rarefaction --i-table 06.Dada2Output/table.qza \ --p-max-depth 6782 \ --p-steps 10 \ --p-metrics shannon \ --p-metrics observed_otus \ --p-metrics simpson \ --p-metrics chao1 \ --m-metadata-file sample-metadata.txt \ --o-visualization 09.RarefactionCurves/rarefaction_curves.qzv ``` rarefaction_curves.qzv ![](https://i.imgur.com/CCrAzrF.jpg) ## 13. Alpha and Beta diversity analysis Choose the sampling-depth based on the largest library (see table_summary.qzv). For fungal analysis isn't recommend to use phylogenetic distances. For 16S analysis use the command `qiime diversity core-metrics-phylogenetic` to include the phylogenetic analysis and use `12.PhylogeneticTree/rep_seqs_aligned_masked_tree_rooted.qza` tree ```coffeescript= qiime diversity core-metrics --i-table 06.Dada2Output/table.qza \ --p-sampling-depth 6089 \ --m-metadata-file sample-metadata.txt \ --p-n-jobs 4 \ --output-dir 10.AlphaBetaDiversity qiime diversity alpha --i-table 06.Dada2Output/table.qza \ --p-metric chao1 \ --o-alpha-diversity 10.AlphaBetaDiversity/chao1_vector.qza qiime diversity alpha --i-table 06.Dada2Output/table.qza \ --p-metric simpson \ --o-alpha-diversity 10.AlphaBetaDiversity/simpson_vector.qza # To see all the vectors (alpha diversity indexes and estimators) in a one table qiime metadata tabulate --m-input-file sample-metadata.txt \ --m-input-file 10.AlphaBetaDiversity/shannon_vector.qza \ --m-input-file 10.AlphaBetaDiversity/observed_otus_vector.qza \ --m-input-file 10.AlphaBetaDiversity/simpson_vector.qza \ --m-input-file 10.AlphaBetaDiversity/chao1_vector.qza \ --o-visualization 10.AlphaBetaDiversity/alfadiversidade_all.qzv ``` ## 14. Exporting ```coffeescript= ## Make new directory for the exported files mkdir 11.ExportFiles qiime tools export --input-path 06.Dada2Output/table.qza \ --output-path 11.ExportFiles/ ## .biom to .tsv biom convert -i 11.ExportFiles/feature-table.biom \ -o 11.ExportFiles/Otu_Table.tsv \ --to-tsv \ --table-type "OTU table" ``` Open `Otu_Table.tsv file`, change #OTU ID to OTUID. ```coffeescript= nano 11.ExportFiles/Otu_Table.tsv ``` Export the taxonomy ```coffeescript= qiime tools export --input-path 07.TaxonomyClassification/taxonomyclassification_dynamic.qza \ --output-path 11.ExportFiles/taxonomy ``` Open `taxonomy.tsv`, change Feature ID to OTUID ```coffeescript= nano 11.ExportFiles/taxonomy/taxonomy.tsv ``` # Final `summary_stats.xlsx` | SampleID | Raw_reads | Length | Post_trim_primers | Diference | Post-trim | Diference | itsXpress | Difference | % Lost | Denoised | Difference | Merged | non-chimeric | Difference | Final difference | Final % lost | | -------- | --------- | ------ | ----------------- | --------- | --------- |:---------:| --------- | ---------- | ------ | -------- | ---------- | ------ | ------------ | ---------- | ---------------- | ------------ | |sample1|10000|48-251|10000|0|7755|2245|6415|3585|22,45|6354|3646|6089|3911|6089|3911|39,11| |sample2|10000|48-251|10000|0|7414|2614|7386|2614|25,86|7296|2704|6782|3218|6782|3218|32,18| --- ## *Second option: Quality control and trimming with Qiime2 plugins* --- ## 2B. Importing the FASTQ files as artifact Create the ManifestFile.txt ```coffeescript= nano ManifestFile.txt ## Columns separated by tab sample-id forward-absolute-filepath reverse-absolute-filepath sample1 $PWD/00.RawData/sample1_r1.fq.gz $PWD/00.RawData/sample1_r2.fq.gz sample2 $PWD/00.RawData/sample2_r1.fq.gz $PWD/00.RawData/sample2_r2.fq.gz ## To exit Ctrl + x ## To Save $ S ## To confirm $ Enter ``` Like this! ![](https://i.imgur.com/zeGiK0b.jpg) ```coffeescript= ## Make new directory for the artifacts mkdir 01.ImportedReads ## Import as artefact in qiime2 qiime tools import \ --type 'SampleData[PairedEndSequencesWithQuality]' \ --input-path ManifestFile.txt \ --output-path 01.ImportedReads/reads_raw.qza \ --input-format PairedEndFastqManifestPhred33V2 ## Visualization qiime demux summarize \ --i-data 01.ImportedReads/reads_raw.qza \ --o-visualization 01.ImportedReads/reads_raw.qzv ``` Use the qiime2 studio for visualizations [https://view.qiime2.org/](https://) Download microbiome_helper package ```coffeescript= git clone https://github.com/LangilleLab/microbiome_helper.git ``` Generate a table with the number of sequences in each step using a microbiome_helper script ```coffeescript= microbiome_helper/qiime2_fastq_lengths.py 01.ImportedReads/reads_raw.qza --proc 4 -o read_counts.tsv ## See the file nano read_counts.tsv ``` ## 3B. Trim primers with cutadapt qiime2 plugin The cutadapt is a qiime2 plugin. The cutadapt screen out reads that do not begin with primer sequences and remove primer sequence from reads. ```coffeescript= qiime cutadapt trim-paired \ --i-demultiplexed-sequences 01.ImportedReads/reads_raw.qza \ --p-cores 10 \ --p-front-f GCATCGATGAAGAACGCAGC \ --p-front-r TCCTCCGCTTATTGATATGC \ --p-discard-untrimmed \ --p-no-indels \ --o-trimmed-sequences 01.ImportedReads/reads_primerstrim.qza # Visualization qiime demux summarize \ --i-data 01.ImportedReads/reads_primerstrim.qza \ --o-visualization 01.ImportedReads/reads_primerstrim.qzv # Generate a table with the number of sequences in each step using a microbiome_helper script $ microbiome_helper/qiime2_fastq_lengths.py 01.ImportedReads/reads_raw.qza 01.ImportedReads/reads_primerstrim.qza --proc 4 -o read_counts.tsv ``` ## 4B. Extracting fungal ITS with itsXpress (optional- and only for fungal analysis) This qiime2 plugin extract ITS1 and ITS2 - as well as full-length ITS sequences from high-throughput sequencing datasets. ```coffeescript= ## Installing the itsXpress plugin conda install -c bioconda itsxpress ## Make new directory for ItsX output mkdir 02.ItsXpress ## for help qiime itsxpress trim-pair-output-unmerged --help ## Run (The region depends on your sequenced region) qiime itsxpress trim-pair-output-unmerged \ --i-per-sample-sequences 01.ImportedReads/reads_primerstrim.qza \ --p-region ITS2 \ --p-taxa F \ --p-threads 10 \ --o-trimmed 02.ItsXpress/readstrimmed_itsxpress_out.qza ``` Generate a table with the number of sequences in each step using a microbiome_helper script ```coffeescript= microbiome_helper/qiime2_fastq_lengths.py 01.ImportedReads/reads_raw.qza 01.ImportedReads/reads_primerstrim.qza 02.ItsXpress/readstrimmed_itsxpress_out.qza --proc 4 -o read_counts.tsv ## See the file nano read_counts.tsv ## Complete your summary_stats.xlsx file!! ``` ## 5B. Quality contol, filtering, Denoising, joining reads and chimera removing with DADA2 ```coffeescript= ## See the reads quality qiime demux summarize \ --i-data 02.ItsXpress/readstrimmed_itsxpress_out.qza \ --o-visualization 02.ItsXpress/readstrimmed_itsxpress_out.qzv ``` ![TruncLef](https://i.imgur.com/VDCMwTu.png) ```coffeescript= ## For help qiime dada2 denoise-paired --help ## Run DADA2 . Replace XX with the values for your samples according with observed in readstrimmed_itsxpress_out.qzv qiime dada2 denoise-paired --i-demultiplexed-seqs 02.ItsXpress/readstrimmed_istxpress_out.qza \ --p-trunc-len-f \ --p-trunc-len-r \ --p-trim-len-r \ --p-rim-left-f \ --output-dir 03.Dada2Output ## Convert the denoising stats from .qza to .tsv $ qiime tools export --input-path 03.Dada2Output/denoising_stats.qza --output-path 03.Dada2Output ## See the file nano 03.Dada2Output/stats.tsv ## Complete your summary_stats.xlsx file!! ``` ### Continue to step 9 of the first option. ## ATTENTION!!! If you made the second option, for continue to step 9, you have to pay attention in the names of directories and files, you will have to change them ## IMPORTANT TOOLS FOR UPSTREAM ANALYSIS [RAW GRAPHS:](https://rawgraphs.io/) To construct graph, only you need your formated tables. [Microbiome Analyst:](https://www.microbiomeanalyst.ca/) tool on-line for microbiome analysis. You need all the qiime2 outputs (otu_table, tax_table, sample_metadata, phy_tree-optional). [I want hue:](http://medialab.github.io/iwanthue/) Generate and refine palettes of optimally distinct colors for your graphs in R. ## Some Citations * **Qiime2:** Bolyen, Evan, et al. QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science. No. e27295v1. PeerJ Preprints, 2018. * **FASTQC:** Andrews, Simon. "FastQC: a quality control tool for high throughput sequence data." (2010). * **Trimmomatic:** Bolger, Anthony M., Marc Lohse, and Bjoern Usadel. "Trimmomatic: a flexible trimmer for Illumina sequence data." Bioinformatics 30.15 (2014): 2114-2120. * **Cutadapt:** Martin, Marcel. "Cutadapt removes adapter sequences from high-throughput sequencing reads." EMBnet. journal 17.1 (2011): 10-12. * **ITSxpress:** Rivers AR, Weber KC, Gardner TG et al. ITSxpress: Software to rapidly trim internally transcribed spacer sequences with quality scores for marker gene analysis [version 1; peer review: 2 approved]. F1000Research 2018, 7:1418 (https://doi.org/10.12688/f1000research.15704.1) * **DADA2:** Callahan, Benjamin J., et al. "DADA2: high-resolution sample inference from Illumina amplicon data." Nature methods 13.7 (2016): 581. * **UNITE:** Abarenkov, Kessy, et al. "The UNITE database for molecular identification of fungi–recent updates and future perspectives." New Phytologist 186.2 (2010): 281-285. * **SILVA:** Quast, Christian, et al. "The SILVA ribosomal RNA gene database project: improved data processing and web-based tools." Nucleic acids research 41.D1 (2012): D590-D596.