# GEN 725/825 METABARCODING ANALYSIS
- On a mac, find and open the program 'terminal'
```
ssh jtmiller@ron.sr.unh.edu
```
- I will log you in to my account. Once you are logged in, do not run anything from the command line without my supervision.
- Note: To cancel something you started running, use ctrl+c
### Move into our class folder
```
cd GEN725_825
```
### See what files/folders are in the class folder
```
ls
```
### activate the environment for analysis
```
conda activate qiime2R
```
### Move into your groups folder
### Replace the X with the number that you are
```
cd GroupX
```
### Make a folder in your groups folder to contain your cleaned up reads
```
mkdir poly-G-trimmed
```
### Lets make a list of samples to copy into excel so that we can make a metadata file. Copy the output into excel
```
find ../data -type f -name *R1_001.fastq.gz -exec basename {} \; | sed "s/_L00.*//" | sed "s/_S.*//"
```
## Begin the analysis
### Step 1a-b. Perform quality trimming with the program called 'fastp'
### Step 1a. Test your 'for loop'
```
for f in ../data/*R1_001.fastq.gz
do
echo $f
## this is for the paired read file
r=$(echo $f | sed 's/_R1_/_R2_/g')
## lets print them to screen to make sure it looks ok
echo "$f $r"
done
```
### Step 1b. Run your fastp 'for loop'
```
for f in ../data/*R1_001.fastq.gz
do
r=$(echo $f | sed 's/_R1_/_R2_/g')
rout="$(basename $r)"
fout="$(basename $f)"
fastp \
--in1 $f \
--in2 $r \
--html html/${fout%_L002_R1_001.fastq.gz}.html \
--out1 poly-G-trimmed/$fout \
--out2 poly-G-trimmed/$rout \
-l 150 \
-g \
-Q
echo "done cleaning up sample $f"
done >> trim.report 2>&1 &
```
### While the above runs, we will move on to the next step to get ready to run it. Dont run anything, just write it out
## Step 2. Trim off the primer
### activate a different environment
```
conda activate qiime2-2021.11
```
### import
```
qiime tools import \
--type "SampleData[PairedEndSequencesWithQuality]" \
--input-format CasavaOneEightSingleLanePerSampleDirFmt \
--input-path poly-G-trimmed \
--output-path MYGROUPNAME_demux
```
## Trim amplicon primers
#### forward primer is 'GTCGGTAAAACTCGTGCCAGC'
#### reverse primer is 'CATAGTGGGGTATCTAATCCCAGTTTG'
### We are trimming off the primer sequence (wont need it for taxonomy)
```
qiime cutadapt trim-paired \
--i-demultiplexed-sequences MYGROUPNAME_demux.qza \
--p-cores 16 \
--p-front-f PUT_FORWARD_PRIMER_HERE \
--p-front-r PUT_REVERSE_PRIMER_HERE \
--o-trimmed-sequences MYGROUPNAME_trimmed_reads.qza \
--verbose \
&> primer_trimming.log
```
# This is the important 'de-noising' step. Talk about this in class and include in your report
```
qiime dada2 denoise-paired \
--i-demultiplexed-seqs MYGROUPNAME_trimmed_reads.qza \
--p-trunc-len-f 150 \
--p-trunc-len-r 150 \
--p-n-threads 10 \
--o-denoising-stats dns_stats \
--o-table MYGROUPNAME_table \
--o-representative-sequences MYGROUPNAME_rep-seqs
```
the reference reads are here: ../reference_taxonomy/12S-seqs-derep-uniq.qza
the reference taxonomy is here: ../reference_taxonomy/12S-tax-derep-uniq.qza
### This may take awhile. Prepare your sample sheets in excel
```
qiime feature-classifier classify-consensus-vsearch \
--i-query MYGROUPNAME_rep-seqs.qza \
--i-reference-reads REF_READS \
--i-reference-taxonomy REF_TAX \
--p-maxaccepts 10 \
--p-query-cov 0.80 \
--p-perc-identity 0.97 \
--p-weak-id 0.80 \
--p-threads 10 \
--o-classification MYGROUPNAME_vsearch_taxonomy &
```
### Finally, lets make a barplot to import at https://view.qiime2.org
```
qiime taxa barplot \
--i-table MYGROUPNAME_table.qza \
--i-taxonomy MYGROUPNAME_vsearch_taxonomy.qza \
--o-visualization MYGROUPNAME_taxa-barplot \
```