GEN 725/825 METABARCODING ANALYSIS

# GEN 725/825 METABARCODING ANALYSIS - On a mac, find and open the program 'terminal' ``` ssh jtmiller@ron.sr.unh.edu ``` - I will log you in to my account. Once you are logged in, do not run anything from the command line without my supervision. - Note: To cancel something you started running, use ctrl+c ### Move into our class folder ``` cd GEN725_825 ``` ### See what files/folders are in the class folder ``` ls ``` ### activate the environment for analysis ``` conda activate qiime2R ``` ### Move into your groups folder ### Replace the X with the number that you are ``` cd GroupX ``` ### Make a folder in your groups folder to contain your cleaned up reads ``` mkdir poly-G-trimmed ``` ### Lets make a list of samples to copy into excel so that we can make a metadata file. Copy the output into excel ``` find ../data -type f -name *R1_001.fastq.gz -exec basename {} \; | sed "s/_L00.*//" | sed "s/_S.*//" ``` ## Begin the analysis ### Step 1a-b. Perform quality trimming with the program called 'fastp' ### Step 1a. Test your 'for loop' ``` for f in ../data/*R1_001.fastq.gz do echo $f ## this is for the paired read file r=$(echo $f | sed 's/_R1_/_R2_/g') ## lets print them to screen to make sure it looks ok echo "$f $r" done ``` ### Step 1b. Run your fastp 'for loop' ``` for f in ../data/*R1_001.fastq.gz do r=$(echo $f | sed 's/_R1_/_R2_/g') rout="$(basename $r)" fout="$(basename $f)" fastp \ --in1 $f \ --in2 $r \ --html html/${fout%_L002_R1_001.fastq.gz}.html \ --out1 poly-G-trimmed/$fout \ --out2 poly-G-trimmed/$rout \ -l 150 \ -g \ -Q echo "done cleaning up sample $f" done >> trim.report 2>&1 & ``` ### While the above runs, we will move on to the next step to get ready to run it. Dont run anything, just write it out ## Step 2. Trim off the primer ### activate a different environment ``` conda activate qiime2-2021.11 ``` ### import ``` qiime tools import \ --type "SampleData[PairedEndSequencesWithQuality]" \ --input-format CasavaOneEightSingleLanePerSampleDirFmt \ --input-path poly-G-trimmed \ --output-path MYGROUPNAME_demux ``` ## Trim amplicon primers #### forward primer is 'GTCGGTAAAACTCGTGCCAGC' #### reverse primer is 'CATAGTGGGGTATCTAATCCCAGTTTG' ### We are trimming off the primer sequence (wont need it for taxonomy) ``` qiime cutadapt trim-paired \ --i-demultiplexed-sequences MYGROUPNAME_demux.qza \ --p-cores 16 \ --p-front-f PUT_FORWARD_PRIMER_HERE \ --p-front-r PUT_REVERSE_PRIMER_HERE \ --o-trimmed-sequences MYGROUPNAME_trimmed_reads.qza \ --verbose \ &> primer_trimming.log ``` # This is the important 'de-noising' step. Talk about this in class and include in your report ``` qiime dada2 denoise-paired \ --i-demultiplexed-seqs MYGROUPNAME_trimmed_reads.qza \ --p-trunc-len-f 150 \ --p-trunc-len-r 150 \ --p-n-threads 10 \ --o-denoising-stats dns_stats \ --o-table MYGROUPNAME_table \ --o-representative-sequences MYGROUPNAME_rep-seqs ``` the reference reads are here: ../reference_taxonomy/12S-seqs-derep-uniq.qza the reference taxonomy is here: ../reference_taxonomy/12S-tax-derep-uniq.qza ### This may take awhile. Prepare your sample sheets in excel ``` qiime feature-classifier classify-consensus-vsearch \ --i-query MYGROUPNAME_rep-seqs.qza \ --i-reference-reads REF_READS \ --i-reference-taxonomy REF_TAX \ --p-maxaccepts 10 \ --p-query-cov 0.80 \ --p-perc-identity 0.97 \ --p-weak-id 0.80 \ --p-threads 10 \ --o-classification MYGROUPNAME_vsearch_taxonomy & ``` ### Finally, lets make a barplot to import at https://view.qiime2.org ``` qiime taxa barplot \ --i-table MYGROUPNAME_table.qza \ --i-taxonomy MYGROUPNAME_vsearch_taxonomy.qza \ --o-visualization MYGROUPNAME_taxa-barplot \ ```