---
title: "Cownose Ray Feeding Ecology"
subtitle: "Targeted high-throughput DNA sequencing reveals the microbiome and trophic ecology of the cownose ray *Rhinoptera bonasus*"
author: "Lyndell Bade, MacKenzie Scherr, David R. Angelini,"
affiliation: "Colby College"
date: 2018-04-06
output:
pdf_document:
highlight: tango
toc: yes
toc_depth: 3
---
> [Main page](https://hackmd.io/s/BJ4FiV_W7)
# Cownose Ray Feeding Ecology
Wokring title: **Targeted high-throughput DNA sequencing reveals the microbiome and trophic ecology of the cownose ray** ***Rhinoptera bonasus***
Dave Angelini
April 30, 2018
| |
|:--:|
|  |
| *Rhinoptera bonasus* (Photo by juan.aguere from La Laguna, Tenerife, Spain, [CC BY 2.0](https://commons.wikimedia.org/w/index.php?curid=26294464)) |
# Running Qiime2
[Qiime2](https://qiime2.org/) should be run via a screen on node 26 of **nscc**. An alias calls the qiime docker container with all of its parameters, including the local path.
```bash
alias qiime 'docker run -t -i -v /export/groups/drangeli/Rbon.htseq/16s/:/data qiime2/core:2018.2 qiime'
```
Confirm that it works.
```bash
qiime
```
# Preparing the data
The fastq files must be gzipped and named according to the Casava convention given below as a python regular expression `.+_.+_L[0-9][0-9][0-9]_R[12]_001\\.fastq\\.gz` Be sure there are no other files are in the data folder, or qiime will chock.
```bash
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path raw.reads \
--source-format CasavaOneEightSingleLanePerSampleDirFmt \
--output-path Rbon.16s.demux.qza
qiime demux summarize \
--i-data Rbon.16s.demux.qza \
--o-visualization Rbon.16s.demux.qzv
```
This can be used to estimate an appropriate point for trimming of reads.
```bash
qiime dada2 denoise-paired \
--i-demultiplexed-seqs Rbon.16s.demux.qza \
--p-trunc-len-f 295 \
--p-trunc-len-r 225 \
--p-trim-left-f 34 \
--p-trim-left-r 34 \
--o-representative-sequences Rbon.16s.rep.seqs.qza \
--o-table Rbon.16s.feature.table.qza \
--p-n-threads 30
```
The feature table is worth exporting to a format that can be read by R.
```bash
qiime tools export \
Rbon.16s.feature.table.qza \
--output-dir .
mv feature-table.biom Rbon.16s.feature.table.biom
docker run -t -i -v /export/groups/drangeli/Rbon.htseq/16s/:/data qiime2/core:2018.2 \
biom convert \
-i Rbon.16s.feature.table.biom \
-o Rbon.16s.feature.table.tsv \
--to-tsv
```
The resulting tab-separated values (tsv) file is a table of counts for each feature in each sample. This can be imported into R for custom statistical tests, plots, or other analyses. The features in this table are named by their UUIDs, but their taxonomy assignments can be matched up to the ID using `Rbon.16s.taxonomy.tsv`, which is generated later in this workflow.
```bash
qiime feature-table tabulate-seqs \
--i-data Rbon.16s.rep.seqs.qza \
--o-visualization Rbon.16s.rep.seqs.qzv
```
A list of features with links to BLAST search. This isn't so immediately useful, it can be used as a tool to follow up on the individual identity of particular features/taxa later.
```bash
qiime feature-table summarize \
--i-table Rbon.16s.feature.table.qza \
--o-visualization Rbon.16s.feature.table.qzv \
--m-sample-metadata-file Rbon.sample.metadata.quality.16S.tsv
```
# Diversity analysis
## Determining rarefaction level
```bash
qiime diversity alpha-rarefaction \
--i-table Rbon.16s.feature.table.qza \
--i-phylogeny Rbon.16s.rooted.tree.qza \
--p-max-depth 10000 \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--o-visualization alpha-rarefaction.qzv
```
Use the last two qzv outputs to decide on the level of rarefaction for core metrics of community diversity. **Optimal rarefaction looks to be about 3800.**
## Tree building
Generate a tree for phylogenetic diversity analyses.
```bash
qiime alignment mafft \
--i-sequences Rbon.16s.rep.seqs.qza \
--o-alignment Rbon.16s.aligned.rep.seqs.qza
```
Mask (filter) the alignment to remove positions that are highly variable. These positions add noise to a resulting phylogenetic tree, and removing them speeds up the phylogenetic reconstruction.
```bash
qiime alignment mask \
--i-alignment Rbon.16s.aligned.rep.seqs.qza \
--o-masked-alignment Rbon.16s.masked.aligned.rep.seqs.qza
```
Generate a phylogenetic tree from the masked alignment.
```bash
qiime phylogeny fasttree \
--i-alignment Rbon.16s.masked.aligned.rep.seqs.qza \
--o-tree Rbon.16s.unrooted.tree.qza
```
Mid-point rooting.
```bash
qiime phylogeny midpoint-root \
--i-tree Rbon.16s.unrooted.tree.qza \
--o-rooted-tree Rbon.16s.rooted.tree.qza
```
## Core diversity metrics
Qiime2's core diversity metrics include:
**Alpha diversity**
* Shannon’s diversity index (a quantitative measure of community richness)
* Observed OTUs (a qualitative measure of community richness)
* Faith’s Phylogenetic Diversity (a qualitiative measure of community richness that incorporates phylogenetic relationships between the features)
* Evenness (or Pielou’s Evenness; a measure of community evenness)
**Beta diversity**
* Jaccard distance (a qualitative measure of community dissimilarity)
* Bray-Curtis distance (a quantitative measure of community dissimilarity)
* unweighted UniFrac distance (a qualitative measure of community dissimilarity that incorporates phylogenetic relationships between the features)
* weighted UniFrac distance (a quantitative measure of community dissimilarity that incorporates phylogenetic relationships between the features)
``` bash
qiime diversity core-metrics-phylogenetic \
--i-phylogeny Rbon.16s.rooted.tree.qza \
--i-table Rbon.16s.feature.table.qza \
--p-sampling-depth 3800 \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--output-dir optimal.rarefaction
```
Test for associations between discrete metadata categories and metrics.
```bash
qiime diversity alpha-group-significance \
--i-alpha-diversity optimal.rarefaction/observed_otus_vector.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--o-visualization optimal.rarefaction/observed-otus-group-significance.qzv
qiime diversity alpha-group-significance \
--i-alpha-diversity optimal.rarefaction/faith_pd_vector.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--o-visualization optimal.rarefaction/faith-pd-group-significance.qzv
qiime diversity alpha-group-significance \
--i-alpha-diversity optimal.rarefaction/shannon_vector.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--o-visualization optimal.rarefaction/shannon-group-significance.qzv
```
## Alpha diversity for continuous metadata
```bash
qiime diversity alpha-correlation \
--i-alpha-diversity optimal.rarefaction/shannon_vector.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--o-visualization optimal.rarefaction/shannon-correlation.qzv
qiime diversity alpha-correlation \
--i-alpha-diversity optimal.rarefaction/faith_pd_vector.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--o-visualization optimal.rarefaction/faith-pd-correlation.qzv
```
## Beta diversity
For categorical metadata, this uses PERMANOVA
```bash
qiime diversity beta-group-significance \
--i-distance-matrix optimal.rarefaction/unweighted_unifrac_distance_matrix.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--m-metadata-column organ \
--o-visualization optimal.rarefaction/uu.organ.pAOV.qzv
--p-pairwise
```
## PCA with a covariate
```bash
qiime emperor plot \
--i-pcoa optimal.rarefaction/unweighted_unifrac_pcoa_results.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--p-custom-axes DNAconc \
--o-visualization optimal.rarefaction/uu.DNAconc.emperor.qzv
```
# Classification of OTUs
The qiime2 developers have provided classifiers for the [SILVA release 132](https://www.arb-silva.de/documentation/release-132/) dataset.
## Silva full-length classifier
The Qiime developers have provided a read-to-use classifier for the silva 132 release.
```bash
cd /export/groups/drangeli/marker.ref.DBs/silva132
curl -L -O "https://www.dropbox.com/s/5tckx2vhrmf3flp/silva-132-99-nb-classifier.qza"
```
This classifier can be used directly with our dataset.
```bash
qiime feature-classifier classify-sklearn \
--i-reads Rbon.16s.rep.seqs.qza \
--i-classifier silva-132-99-nb-classifier.qza \
--o-classification Rbon.16s.silva132.nb.taxonomy.calls.qza
qiime metadata tabulate \
--m-input-file Rbon.16s.silva132.nb.taxonomy.calls.qza \
--o-visualization Rbon.16s.silva132.nb.taxonomy.calls.qzv
qiime tools export \
Rbon.16s.silva132.nb.taxonomy.calls.qza \
--output-dir .
mv taxonomy.tsv Rbon.16s.silva132.nb.taxonomy.calls.tsv
awk '{ total += $3 } END { print total/NR }' Rbon.16s.silva132.nb.taxonomy.calls.tsv
```
The mean posterior support for taxonomy assignments is 0.805.
```bash
qiime taxa barplot \
--i-table Rbon.16s.feature.table.qza \
--i-taxonomy Rbon.16s.silva132.nb.taxonomy.calls.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--o-visualization Rbon.16s.silva132.nb.taxa.barplots.qzv
```
The taxa barplot `qzv` file is fun to look at. It shows the proportion of reads from each feature/taxon in each sample, and it can be organized by taxonomic level, and the samples can be sorted by metadata categories.
## Silva 16Sv4 515-806 classifier
```bash
qiime feature-classifier classify-sklearn \
--i-reads Rbon.16s.rep.seqs.qza \
--i-classifier silva-132-99-515-806-nb-classifier.qza \
--o-classification Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.qza
qiime tools export \
Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.qza \
--output-dir .
mv taxonomy.tsv Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.tsv
awk '{ total += $3 } END { print total/NR }' Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.tsv
```
Narrowing the classifier's sequence range did not seem to help. With the full length classifier only 93 of 1581 features were classified only to "Bacteria". With the position 515-806 classifier, this was the case for 1186 features, while the mean posterior support for calls is only 0.677.
## Notes on GreenGenes
The other major database for microbial marker genes is [GreenGenes](http://greengenes.secondgenome.com/). However the last release was in 2013 (release 13_8). While the Qiime developers provide a classifier for this dataset, it has two problems. First it only covers 16S positions 515 - 806. Second, it was made with an older version of SciKit Learn. This causes qiime to throw an error that suggests re-building the classifier. Given these limitations, I think using the current Silva release is by far the easier and better approach.
## Silva custom classifier
Taxonomic identification works best when the classifier is trained on the region of the target gene that was sequenced in the dataset. Instructions to do this are available with the Qiime2 documentation at https://docs.qiime2.org/2018.2/tutorials/feature-classifier/
After a request I made on the [qiime forum](https://forum.qiime2.org/), [Greg Caporaso](https://forum.qiime2.org/u/gregcaporaso) shared the artifact files he used to create the full length Silva 132 classifier, which I used above.
```bash
cd /export/groups/drangeli/marker.ref.DBs/silva132/qiime.artifacts
curl -OL "https://www.dropbox.com/s/5tv5uk95pk3ukwf/7_level_taxonomy.qza?dl=0"
mv 7_level_taxonomy.qza?dl=0 silva132.7level.taxonomy.qza
curl -OL "https://www.dropbox.com/s/ce6cj4m24pi7doy/99-otus.qza?dl=0"
mv 99-otus.qza?dl=0 silva132.99.otus.qza
```
Our data are produced by Andre Comeau at [Dalhousie's IMR](http://cgeb-imr.ca/protocols.html) using primers for 16S v4-v5, which span positions the consensus positions 515 to 926. The product size varies by species, so I would not use `--p-trunc-len` in this application.
* 515F `GTGyCAGCmGCCGCGGTAA`
* 926R `CCGyCAATTymTTTrAGTTT`
Citations for these primers:
> Parada AE, Needham DM, Fuhrman JA. 2015. Every base matters:
assessing small subunit rRNA primers for marine microbiomes with mock
communities, time series and global field samples. *Environ Microbiol*
[doi:10.1111/1462-2920.13023](http://dx.doi.org/10.1111/1462-2920.13023).
> Walters W, Hyde ER, Berg-Lyons D, Ackermann G, Humphrey G, Parada A, Gilbert JA, Jansson JK, Caporaso JG, Fuhrman JA, Apprill A, Knight B. 2015. Improved bacterial 16S rRNA gene (V4 and V4-5) and fungal internal transcribed spacer marker gene primers for microbial community surveys. *mSystems* 1(1):e00009-15. [doi:10.1128/mSystems.00009-15](http://dx.doi.org/10.1128/mSystems.00009-15).
```bash
cd /export/groups/drangeli/marker.ref.DBs/silva132/qiime.artifacts
alias qiime 'docker run -t -i -v /export/groups/drangeli/marker.ref.DBs/silva132/qiime.artifacts/:/data qiime2/core:2018.2 qiime'
qiime feature-classifier extract-reads \
--i-sequences silva132.99.otus.qza \
--p-f-primer GTGYCAGCMGCCGCGGTAA \
--p-r-primer CCGYCAATTYMTTTRAGTTT \
--o-reads silva132.99.16Sv4.515F.926R.qza
cd /export/groups/drangeli/Rbon.htseq/16s
cp /export/groups/drangeli/marker.ref.DBs/silva132/qiime.artifacts/silva132.99.16Sv4.515F.926R.qza .
cp /export/groups/drangeli/marker.ref.DBs/silva132/qiime.artifacts/silva132.7level.taxonomy.qza .
alias qiime 'docker run -t -i -v /export/groups/drangeli/Rbon.htseq/16s/:/data qiime2/core:2018.2 qiime'
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads silva132.99.16Sv4.515F.926R.qza \
--i-reference-taxonomy silva132.7level.taxonomy.qza \
--o-classifier classifier.silva132.99.16Sv4.515F.926R.qza
```
This ran for about 7.5 hours on one thread of **nscc**, which is half the time it took my previous attempts. (I'm not sure if or how this qiime function can be multi-threaded).
Now use the classifier to apply taxonomy to the OTU sequences.
```bash
qiime feature-classifier classify-sklearn \
--i-reads Rbon.16s.rep.seqs.qza \
--i-classifier classifier.silva132.99.16Sv4.515F.926R.qza \
--o-classification Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.qza
```
This run took only about 40 minutes.
```bash
qiime metadata tabulate \
--m-input-file Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.qza \
--o-visualization Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.qzv
qiime tools export \
Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.qza \
--output-dir .
mv taxonomy.tsv Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.tsv
awk '{ total += $3 } END { print total/NR }' Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.tsv
```
**NO GOOD!** Of 1581 features, 1277 are listed as "Unassigned", 302 only as "Bacteria". The supports average only 0.435. Only one OTU had a match that went beyond the domain rank. Strangely this went all the way to the species level. It's the clam *Adrana scaphoides* with 0.82 posterior support. And ironically this species was not identified in the analysis based on the full-length Silva database.
## A complete, full-length Silva classifier
After my efforts to customize a classifier for our 16S region produced poor results, I was fairly frustrated. However, I wanted to know if there was some technical obstacle to building *any* classifier in-house that would work well. So I reasoned that I should eb able to repeat the construction of a classifer similar to the one provided by the Qiime developers.
I started from the `SILVA_132_SSURef_Nr99_tax_dna.fa` fasta file. This appears to contain *more* sequences than the Qiime classifier was made with. Going into this, I wasn't sure if or how this might affect the training or the performance of a classifier.
```bash
cd /export/groups/drangeli/marker.ref.DBs/silva132/
grep '^>' SILVA_132_SSURef_Nr99_tax_dna.fa | sed -e 's/^>//' -e 's/ /\t/' > SILVA_132_SSURef_Nr99_tax.full.ids.txt
alias qiime 'docker run -t -i -v /export/groups/drangeli/marker.ref.DBs/silva132/:/data qiime2/core:2018.2 qiime'
qiime tools import \
--type 'FeatureData[Taxonomy]' \
--source-format HeaderlessTSVTaxonomyFormat \
--input-path SILVA_132_SSURef_Nr99_tax.full.ids.txt \
--output-path SILVA_132_SSURef_Nr99_tax.full.ids.qza
qiime tools import \
--type 'FeatureData[Sequence]' \
--input-path SILVA_132_SSURef_Nr99_tax_dna.fa \
--output-path SILVA_132_SSURef_Nr99_tax_dna.qza
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads SILVA_132_SSURef_Nr99_tax_dna.qza \
--i-reference-taxonomy SILVA_132_SSURef_Nr99_tax.full.ids.qza \
--o-classifier classifier.silva132.in.house.qza
```
This job took 59.5 hours. It appears that classifier training time has a nearly linear relationship to the size of the qza read file.
```bash
cp classifier.silva132.in.house.qza ../../Rbon.htseq/16s/
cd ../../Rbon.htseq/16s/
alias qiime 'docker run -t -i -v /export/groups/drangeli/Rbon.htseq/16s/:/data qiime2/core:2018.2 qiime'
qiime feature-classifier classify-sklearn \
--i-reads Rbon.16s.rep.seqs.qza \
--i-classifier classifier.silva132.in.house.qza \
--o-classification Rbon.16s.silva132.in.house.taxonomy.calls.qza
qiime metadata tabulate \
--m-input-file Rbon.16s.silva132.in.house.taxonomy.calls.qza \
--o-visualization Rbon.16s.silva132.in.house.taxonomy.calls.qzv
qiime tools export \
Rbon.16s.silva132.in.house.taxonomy.calls.qza \
--output-dir .
mv taxonomy.tsv Rbon.16s.silva132.in.house.taxonomy.calls.tsv
awk '{ total += $3 } END { print total/NR }' Rbon.16s.silva132.in.house.taxonomy.calls.tsv
qiime taxa barplot \
--i-table Rbon.16s.feature.table.qza \
--i-taxonomy Rbon.16s.silva132.in.house.taxonomy.calls.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--o-visualization Rbon.16s.silva132.in.house.taxa.barplots.qzv
```
The results look very good. Most features are annotated with meaningful taxonomy, and the mean posterior support is 0.868 -- that's higher than for the classifier provided by the Qiime developers. My only guess for why they may have used fewer sequences is that Silva includes eukaryotes, which they may have chosen to omit.
## Conclusions on classification methods
So among all the classifiers tried, the full-length Silva 132 small subunit classifiers appear to work best. I'm inclined to use our in-house classifier, since it will include all sequences from Silva, including eukaryotes. The one possible limitation is that it uses rank-free taxonomy, and while most entries have a 7-level system, some include intermediary taxon names.
### Some comparisons of classifier assignments
```
grep '26d495a601b91173185fbd11f24e5e08' *taxonomy.calls.tsv
```
* `Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.tsv`: D_0__Bacteria 0.7539273545475154
* `Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.tsv`: Unassigned 0.3691258975401842
* `Rbon.16s.silva132.in.house.taxonomy.calls.tsv`: Bacteria;Acidobacteria;Subgroup 22 0.9832846623962529
* `Rbon.16s.silva132.nb.taxonomy.calls.tsv`: D_0__Bacteria;D_1__Acidobacteria;D_2__Subgroup 22 0.9836703149560827
The both full-length classifiers provide the same assignment.
```
grep 'b5b6bed3f76ee539bcb34b7c2114ef71' *taxonomy.calls.tsv
```
* `Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.tsv`: D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Vibrionales;D_4__Vibrionaceae;D_5__Vibrio;D_6__Vibrionaceae bacterium O6 0.7131749561096695
* `Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.tsv`: Unassigned 0.31234089741539794
* `Rbon.16s.silva132.in.house.taxonomy.calls.tsv`: Bacteria;Proteobacteria;Gammaproteobacteria;Vibrionales;Vibrionaceae;Photobacterium 0.9949968851654118
* `Rbon.16s.silva132.nb.taxonomy.calls.tsv`: D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Vibrionales;D_4__Vibrionaceae;D_5__Photobacterium 0.9487619762137101
The both full-length classifiers provide the same assignment, the in-house built, full, complete classifier does so with a stronger posterior support.
```
grep '9edd3da2230ffbb71467fb945a010dcb' *taxonomy.calls.tsv
```
* `Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.tsv`: D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Betaproteobacteriales 0.9111723821159405
* `Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.tsv`: Unassigned 0.4227193314226422
* `Rbon.16s.silva132.in.house.taxonomy.calls.tsv`: Bacteria;Proteobacteria;Gammaproteobacteria;Betaproteobacteriales;Burkholderiaceae 0.9999969848292178
* `Rbon.16s.silva132.nb.taxonomy.calls.tsv`: D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Betaproteobacteriales;D_4__Burkholderiaceae 0.9999892474465235
Here both the full-length classifiers provide better taxonomic resolution than the narrower one.
```
grep '36d03f255959ac0365d8d0abc73feb9c' *taxonomy.calls.tsv
```
* `Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.tsv`: D_0__Bacteria 0.746289287673006
* `Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.tsv`: Unassigned 0.32227498644551766
* `Rbon.16s.silva132.in.house.taxonomy.calls.tsv`: Bacteria;Cyanobacteria;Oxyphotobacteria;Synechococcales;Cyanobiaceae;Cyanobium PCC-6307;uncultured bacterium 0.7355707797244803
* `Rbon.16s.silva132.nb.taxonomy.calls.tsv`: D_0__Bacteria;D_1__Cyanobacteria;D_2__Oxyphotobacteria;D_3__Synechococcales;D_4__Cyanobiaceae;D_5__Cyanobium PCC-6307;D_6__uncultured bacterium 0.8763101553425945
Here both full-length classifiers provide the same assignment, though the Qiime-provided classifier has higher support for this call.
# Differential abundance analysis
First, we'll need to calculate pseudocounts (counts +1) to avoid log-of-zero errors.
```bash
qiime composition add-pseudocount \
--i-table Rbon.16s.feature.table.qza \
--o-composition-table Rbon.16s.comp.table.qza
```
## ANCOM
[ANCOM](https://www.ncbi.nlm.nih.gov/pubmed/26028277) (ANalysis of COMposition) is a statistical method for the comparisons of feature abundance in sample groups. It should be robust to distributional assumptions and scaling to large sample sizes. ANCOM also assumes that relatively few (< ~ 25%) of the features are changing between groups.
### By organ
```bash
qiime composition ancom \
--i-table Rbon.16s.comp.table.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--m-metadata-column organ \
--o-visualization Rbon.16s.ancom.by.organ.qzv
```
No OTUs show significant association with the stomach vs. intestine.
### By year
```bash
qiime composition ancom \
--i-table Rbon.16s.comp.table.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--m-metadata-column collYear \
--o-visualization Rbon.16s.ancom.by.year.qzv
```
Interestly, two OTUs show significant association by year. Both were over-represented in 2011, compared by 2012. In fact, no reads were found from this *Vibrio* sp. in 2012.
| feature UUID | W | taxonomy call | confidence |
|--|--|--|--|
| 144dba7a99079c9775774544af80ca42 | 1528 | *Vibrio* sp. | 0.9995 |
| 34d9042dcf9c529bf357aac6355abe06 | 1505 | uncultured Chthoniobacteraceae LD29 | 0.9982 |
### By month
```bash
qiime composition ancom \
--i-table Rbon.16s.comp.table.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--m-metadata-column collMonth \
--o-visualization Rbon.16s.ancom.by.month.qzv
```
| feature UUID | W | taxonomy call | confidence |
|---|---|---|---|
| c491836ac2179cae363ba311a9e98ead | 1562 | *Photobacterium damselae* | 0.7819 |
| e74a51e9552452d32ea36b457758e12d | 1579 | *Enterobacter* sp. | 0.9173 |
| 0ffb185d5281f23a82a09b383427053f | 1569 | *Flavobacterium* sp. Aav03 | 0.8691 |
| 6961f6313f71b2f07aaa252f9c58e0f8 | 1560 | *Flavobacterium* sp. Aav03 | 0.8727 |
| a0deaf35796444578dfce55727b7f911 | 1565 | unidentified Bacterium | 0.9875 |
| ae0bdd8654b14be31d5979276f1a4020 | 1448 | Actinobacteria PeM15 | 0.9999 |
<center>

</center>
> **Fig. 1.** Fluctuations in the abdundance of species with significant differences by month.
### By day-of-year
```bash
qiime composition ancom \
--i-table Rbon.16s.comp.table.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--m-metadata-column collDay \
--o-visualization Rbon.16s.ancom.by.day.qzv
```
Twelve OTUs differ significantly. Check annotations against UUIDs of significant features, e.g.
```bash
grep 'c491836ac2179cae363ba311a9e98ead' *.taxonomy.calls.tsv
```
### day-of-year, adults only
```bash
qiime feature-table filter-samples \
--i-table Rbon.16s.feature.table.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--p-where "age='A'" \
--o-filtered-table Rbon.16s.feature.table.adult.qza
qiime composition add-pseudocount \
--i-table Rbon.16s.feature.table.adult.qza \
--o-composition-table Rbon.16s.comp.table.adult.qza
qiime composition ancom \
--i-table Rbon.16s.comp.table.adult.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--m-metadata-column collDay \
--o-visualization Rbon.16s.ancom.adult.by.day.qzv
```
### By sex
```bash
qiime composition ancom \
--i-table Rbon.16s.comp.table.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--m-metadata-column sex \
--o-visualization Rbon.16s.ancom.by.sex.qzv
```
No significant differences.
### By age
```bash
qiime composition ancom \
--i-table Rbon.16s.comp.table.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--m-metadata-column age \
--o-visualization Rbon.16s.ancom.by.age.qzv
```
Four OTUs differed significantly by age and were found only in young-of-year.
| feature UUID | taxonomy call | confidence | W | note |
|---|---|---|---|---|
| ecd9bf4303feb1b02ee495d7e16979b8 | Photobacterium damselae | 0.7383756928 | 1073 | same one that's significantly over-represented on day 246 |
| a0deaf35796444578dfce55727b7f911 | unidentified Bacteria | 0.9875211043 | 1126 | same one that's significantly over-represented on the last collection day |
| 013c17335a9a0d5a40db08476e3fb627 | unidentified Gammaproteobacteria | 0.8109698091 | 1116 | |
| 481ed3c020440361f375cf61654dbc45 | unidentified Bacteria | 0.9851697847 | 1230 | |
### By capture gear
```bash
qiime composition ancom \
--i-table Rbon.16s.comp.table.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--m-metadata-column gearType \
--o-visualization Rbon.16s.ancom.by.gear.qzv
```
Eight OTUs differed significantly by capture method.
### By state, adults only
```bash
qiime composition ancom \
--i-table Rbon.16s.comp.table.adult.qza \
--m-metadata-file Rbon.sample.metadata.quality.16S.tsv \
--m-metadata-column collState \
--o-visualization Rbon.16s.ancom.adults.by.state.qzv
```
----
*Dave Angelini, 2018 --- [bugsinourbackyard.org](http://www.bugsinourbackyard.org/)*