--- title: "Cownose Ray Feeding Ecology" subtitle: "Targeted high-throughput DNA sequencing reveals the microbiome and trophic ecology of the cownose ray *Rhinoptera bonasus*" author: "Lyndell Bade, MacKenzie Scherr, David R. Angelini," affiliation: "Colby College" date: 2018-04-06 output: pdf_document: highlight: tango toc: yes toc_depth: 3 --- > [Main page](https://hackmd.io/s/BJ4FiV_W7) # Cownose Ray Feeding Ecology Wokring title: **Targeted high-throughput DNA sequencing reveals the microbiome and trophic ecology of the cownose ray** ***Rhinoptera bonasus*** Dave Angelini April 30, 2018 | | |:--:| | ![Rhinoptera bonasus](https://upload.wikimedia.org/wikipedia/commons/thumb/3/33/Rhinoptera_bonasus_2.jpg/640px-Rhinoptera_bonasus_2.jpg) | | *Rhinoptera bonasus* (Photo by juan.aguere from La Laguna, Tenerife, Spain, [CC BY 2.0](https://commons.wikimedia.org/w/index.php?curid=26294464)) | # Running Qiime2 [Qiime2](https://qiime2.org/) should be run via a screen on node 26 of **nscc**. An alias calls the qiime docker container with all of its parameters, including the local path. ```bash alias qiime 'docker run -t -i -v /export/groups/drangeli/Rbon.htseq/16s/:/data qiime2/core:2018.2 qiime' ``` Confirm that it works. ```bash qiime ``` # Preparing the data The fastq files must be gzipped and named according to the Casava convention given below as a python regular expression `.+_.+_L[0-9][0-9][0-9]_R[12]_001\\.fastq\\.gz` Be sure there are no other files are in the data folder, or qiime will chock. ```bash qiime tools import \ --type 'SampleData[PairedEndSequencesWithQuality]' \ --input-path raw.reads \ --source-format CasavaOneEightSingleLanePerSampleDirFmt \ --output-path Rbon.16s.demux.qza qiime demux summarize \ --i-data Rbon.16s.demux.qza \ --o-visualization Rbon.16s.demux.qzv ``` This can be used to estimate an appropriate point for trimming of reads. ```bash qiime dada2 denoise-paired \ --i-demultiplexed-seqs Rbon.16s.demux.qza \ --p-trunc-len-f 295 \ --p-trunc-len-r 225 \ --p-trim-left-f 34 \ --p-trim-left-r 34 \ --o-representative-sequences Rbon.16s.rep.seqs.qza \ --o-table Rbon.16s.feature.table.qza \ --p-n-threads 30 ``` The feature table is worth exporting to a format that can be read by R. ```bash qiime tools export \ Rbon.16s.feature.table.qza \ --output-dir . mv feature-table.biom Rbon.16s.feature.table.biom docker run -t -i -v /export/groups/drangeli/Rbon.htseq/16s/:/data qiime2/core:2018.2 \ biom convert \ -i Rbon.16s.feature.table.biom \ -o Rbon.16s.feature.table.tsv \ --to-tsv ``` The resulting tab-separated values (tsv) file is a table of counts for each feature in each sample. This can be imported into R for custom statistical tests, plots, or other analyses. The features in this table are named by their UUIDs, but their taxonomy assignments can be matched up to the ID using `Rbon.16s.taxonomy.tsv`, which is generated later in this workflow. ```bash qiime feature-table tabulate-seqs \ --i-data Rbon.16s.rep.seqs.qza \ --o-visualization Rbon.16s.rep.seqs.qzv ``` A list of features with links to BLAST search. This isn't so immediately useful, it can be used as a tool to follow up on the individual identity of particular features/taxa later. ```bash qiime feature-table summarize \ --i-table Rbon.16s.feature.table.qza \ --o-visualization Rbon.16s.feature.table.qzv \ --m-sample-metadata-file Rbon.sample.metadata.quality.16S.tsv ``` # Diversity analysis ## Determining rarefaction level ```bash qiime diversity alpha-rarefaction \ --i-table Rbon.16s.feature.table.qza \ --i-phylogeny Rbon.16s.rooted.tree.qza \ --p-max-depth 10000 \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --o-visualization alpha-rarefaction.qzv ``` Use the last two qzv outputs to decide on the level of rarefaction for core metrics of community diversity. **Optimal rarefaction looks to be about 3800.** ## Tree building Generate a tree for phylogenetic diversity analyses. ```bash qiime alignment mafft \ --i-sequences Rbon.16s.rep.seqs.qza \ --o-alignment Rbon.16s.aligned.rep.seqs.qza ``` Mask (filter) the alignment to remove positions that are highly variable. These positions add noise to a resulting phylogenetic tree, and removing them speeds up the phylogenetic reconstruction. ```bash qiime alignment mask \ --i-alignment Rbon.16s.aligned.rep.seqs.qza \ --o-masked-alignment Rbon.16s.masked.aligned.rep.seqs.qza ``` Generate a phylogenetic tree from the masked alignment. ```bash qiime phylogeny fasttree \ --i-alignment Rbon.16s.masked.aligned.rep.seqs.qza \ --o-tree Rbon.16s.unrooted.tree.qza ``` Mid-point rooting. ```bash qiime phylogeny midpoint-root \ --i-tree Rbon.16s.unrooted.tree.qza \ --o-rooted-tree Rbon.16s.rooted.tree.qza ``` ## Core diversity metrics Qiime2's core diversity metrics include: **Alpha diversity** * Shannon’s diversity index (a quantitative measure of community richness) * Observed OTUs (a qualitative measure of community richness) * Faith’s Phylogenetic Diversity (a qualitiative measure of community richness that incorporates phylogenetic relationships between the features) * Evenness (or Pielou’s Evenness; a measure of community evenness) **Beta diversity** * Jaccard distance (a qualitative measure of community dissimilarity) * Bray-Curtis distance (a quantitative measure of community dissimilarity) * unweighted UniFrac distance (a qualitative measure of community dissimilarity that incorporates phylogenetic relationships between the features) * weighted UniFrac distance (a quantitative measure of community dissimilarity that incorporates phylogenetic relationships between the features) ``` bash qiime diversity core-metrics-phylogenetic \ --i-phylogeny Rbon.16s.rooted.tree.qza \ --i-table Rbon.16s.feature.table.qza \ --p-sampling-depth 3800 \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --output-dir optimal.rarefaction ``` Test for associations between discrete metadata categories and metrics. ```bash qiime diversity alpha-group-significance \ --i-alpha-diversity optimal.rarefaction/observed_otus_vector.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --o-visualization optimal.rarefaction/observed-otus-group-significance.qzv qiime diversity alpha-group-significance \ --i-alpha-diversity optimal.rarefaction/faith_pd_vector.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --o-visualization optimal.rarefaction/faith-pd-group-significance.qzv qiime diversity alpha-group-significance \ --i-alpha-diversity optimal.rarefaction/shannon_vector.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --o-visualization optimal.rarefaction/shannon-group-significance.qzv ``` ## Alpha diversity for continuous metadata ```bash qiime diversity alpha-correlation \ --i-alpha-diversity optimal.rarefaction/shannon_vector.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --o-visualization optimal.rarefaction/shannon-correlation.qzv qiime diversity alpha-correlation \ --i-alpha-diversity optimal.rarefaction/faith_pd_vector.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --o-visualization optimal.rarefaction/faith-pd-correlation.qzv ``` ## Beta diversity For categorical metadata, this uses PERMANOVA ```bash qiime diversity beta-group-significance \ --i-distance-matrix optimal.rarefaction/unweighted_unifrac_distance_matrix.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --m-metadata-column organ \ --o-visualization optimal.rarefaction/uu.organ.pAOV.qzv --p-pairwise ``` ## PCA with a covariate ```bash qiime emperor plot \ --i-pcoa optimal.rarefaction/unweighted_unifrac_pcoa_results.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --p-custom-axes DNAconc \ --o-visualization optimal.rarefaction/uu.DNAconc.emperor.qzv ``` # Classification of OTUs The qiime2 developers have provided classifiers for the [SILVA release 132](https://www.arb-silva.de/documentation/release-132/) dataset. ## Silva full-length classifier The Qiime developers have provided a read-to-use classifier for the silva 132 release. ```bash cd /export/groups/drangeli/marker.ref.DBs/silva132 curl -L -O "https://www.dropbox.com/s/5tckx2vhrmf3flp/silva-132-99-nb-classifier.qza" ``` This classifier can be used directly with our dataset. ```bash qiime feature-classifier classify-sklearn \ --i-reads Rbon.16s.rep.seqs.qza \ --i-classifier silva-132-99-nb-classifier.qza \ --o-classification Rbon.16s.silva132.nb.taxonomy.calls.qza qiime metadata tabulate \ --m-input-file Rbon.16s.silva132.nb.taxonomy.calls.qza \ --o-visualization Rbon.16s.silva132.nb.taxonomy.calls.qzv qiime tools export \ Rbon.16s.silva132.nb.taxonomy.calls.qza \ --output-dir . mv taxonomy.tsv Rbon.16s.silva132.nb.taxonomy.calls.tsv awk '{ total += $3 } END { print total/NR }' Rbon.16s.silva132.nb.taxonomy.calls.tsv ``` The mean posterior support for taxonomy assignments is 0.805. ```bash qiime taxa barplot \ --i-table Rbon.16s.feature.table.qza \ --i-taxonomy Rbon.16s.silva132.nb.taxonomy.calls.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --o-visualization Rbon.16s.silva132.nb.taxa.barplots.qzv ``` The taxa barplot `qzv` file is fun to look at. It shows the proportion of reads from each feature/taxon in each sample, and it can be organized by taxonomic level, and the samples can be sorted by metadata categories. ## Silva 16Sv4 515-806 classifier ```bash qiime feature-classifier classify-sklearn \ --i-reads Rbon.16s.rep.seqs.qza \ --i-classifier silva-132-99-515-806-nb-classifier.qza \ --o-classification Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.qza qiime tools export \ Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.qza \ --output-dir . mv taxonomy.tsv Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.tsv awk '{ total += $3 } END { print total/NR }' Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.tsv ``` Narrowing the classifier's sequence range did not seem to help. With the full length classifier only 93 of 1581 features were classified only to "Bacteria". With the position 515-806 classifier, this was the case for 1186 features, while the mean posterior support for calls is only 0.677. ## Notes on GreenGenes The other major database for microbial marker genes is [GreenGenes](http://greengenes.secondgenome.com/). However the last release was in 2013 (release 13_8). While the Qiime developers provide a classifier for this dataset, it has two problems. First it only covers 16S positions 515 - 806. Second, it was made with an older version of SciKit Learn. This causes qiime to throw an error that suggests re-building the classifier. Given these limitations, I think using the current Silva release is by far the easier and better approach. ## Silva custom classifier Taxonomic identification works best when the classifier is trained on the region of the target gene that was sequenced in the dataset. Instructions to do this are available with the Qiime2 documentation at https://docs.qiime2.org/2018.2/tutorials/feature-classifier/ After a request I made on the [qiime forum](https://forum.qiime2.org/), [Greg Caporaso](https://forum.qiime2.org/u/gregcaporaso) shared the artifact files he used to create the full length Silva 132 classifier, which I used above. ```bash cd /export/groups/drangeli/marker.ref.DBs/silva132/qiime.artifacts curl -OL "https://www.dropbox.com/s/5tv5uk95pk3ukwf/7_level_taxonomy.qza?dl=0" mv 7_level_taxonomy.qza?dl=0 silva132.7level.taxonomy.qza curl -OL "https://www.dropbox.com/s/ce6cj4m24pi7doy/99-otus.qza?dl=0" mv 99-otus.qza?dl=0 silva132.99.otus.qza ``` Our data are produced by Andre Comeau at [Dalhousie's IMR](http://cgeb-imr.ca/protocols.html) using primers for 16S v4-v5, which span positions the consensus positions 515 to 926. The product size varies by species, so I would not use `--p-trunc-len` in this application. * 515F `GTGyCAGCmGCCGCGGTAA` * 926R `CCGyCAATTymTTTrAGTTT` Citations for these primers: > Parada AE, Needham DM, Fuhrman JA. 2015. Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. *Environ Microbiol* [doi:10.1111/1462-2920.13023](http://dx.doi.org/10.1111/1462-2920.13023). > Walters W, Hyde ER, Berg-Lyons D, Ackermann G, Humphrey G, Parada A, Gilbert JA, Jansson JK, Caporaso JG, Fuhrman JA, Apprill A, Knight B. 2015. Improved bacterial 16S rRNA gene (V4 and V4-5) and fungal internal transcribed spacer marker gene primers for microbial community surveys. *mSystems* 1(1):e00009-15. [doi:10.1128/mSystems.00009-15](http://dx.doi.org/10.1128/mSystems.00009-15). ```bash cd /export/groups/drangeli/marker.ref.DBs/silva132/qiime.artifacts alias qiime 'docker run -t -i -v /export/groups/drangeli/marker.ref.DBs/silva132/qiime.artifacts/:/data qiime2/core:2018.2 qiime' qiime feature-classifier extract-reads \ --i-sequences silva132.99.otus.qza \ --p-f-primer GTGYCAGCMGCCGCGGTAA \ --p-r-primer CCGYCAATTYMTTTRAGTTT \ --o-reads silva132.99.16Sv4.515F.926R.qza cd /export/groups/drangeli/Rbon.htseq/16s cp /export/groups/drangeli/marker.ref.DBs/silva132/qiime.artifacts/silva132.99.16Sv4.515F.926R.qza . cp /export/groups/drangeli/marker.ref.DBs/silva132/qiime.artifacts/silva132.7level.taxonomy.qza . alias qiime 'docker run -t -i -v /export/groups/drangeli/Rbon.htseq/16s/:/data qiime2/core:2018.2 qiime' qiime feature-classifier fit-classifier-naive-bayes \ --i-reference-reads silva132.99.16Sv4.515F.926R.qza \ --i-reference-taxonomy silva132.7level.taxonomy.qza \ --o-classifier classifier.silva132.99.16Sv4.515F.926R.qza ``` This ran for about 7.5 hours on one thread of **nscc**, which is half the time it took my previous attempts. (I'm not sure if or how this qiime function can be multi-threaded). Now use the classifier to apply taxonomy to the OTU sequences. ```bash qiime feature-classifier classify-sklearn \ --i-reads Rbon.16s.rep.seqs.qza \ --i-classifier classifier.silva132.99.16Sv4.515F.926R.qza \ --o-classification Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.qza ``` This run took only about 40 minutes. ```bash qiime metadata tabulate \ --m-input-file Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.qza \ --o-visualization Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.qzv qiime tools export \ Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.qza \ --output-dir . mv taxonomy.tsv Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.tsv awk '{ total += $3 } END { print total/NR }' Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.tsv ``` **NO GOOD!** Of 1581 features, 1277 are listed as "Unassigned", 302 only as "Bacteria". The supports average only 0.435. Only one OTU had a match that went beyond the domain rank. Strangely this went all the way to the species level. It's the clam *Adrana scaphoides* with 0.82 posterior support. And ironically this species was not identified in the analysis based on the full-length Silva database. ## A complete, full-length Silva classifier After my efforts to customize a classifier for our 16S region produced poor results, I was fairly frustrated. However, I wanted to know if there was some technical obstacle to building *any* classifier in-house that would work well. So I reasoned that I should eb able to repeat the construction of a classifer similar to the one provided by the Qiime developers. I started from the `SILVA_132_SSURef_Nr99_tax_dna.fa` fasta file. This appears to contain *more* sequences than the Qiime classifier was made with. Going into this, I wasn't sure if or how this might affect the training or the performance of a classifier. ```bash cd /export/groups/drangeli/marker.ref.DBs/silva132/ grep '^>' SILVA_132_SSURef_Nr99_tax_dna.fa | sed -e 's/^>//' -e 's/ /\t/' > SILVA_132_SSURef_Nr99_tax.full.ids.txt alias qiime 'docker run -t -i -v /export/groups/drangeli/marker.ref.DBs/silva132/:/data qiime2/core:2018.2 qiime' qiime tools import \ --type 'FeatureData[Taxonomy]' \ --source-format HeaderlessTSVTaxonomyFormat \ --input-path SILVA_132_SSURef_Nr99_tax.full.ids.txt \ --output-path SILVA_132_SSURef_Nr99_tax.full.ids.qza qiime tools import \ --type 'FeatureData[Sequence]' \ --input-path SILVA_132_SSURef_Nr99_tax_dna.fa \ --output-path SILVA_132_SSURef_Nr99_tax_dna.qza qiime feature-classifier fit-classifier-naive-bayes \ --i-reference-reads SILVA_132_SSURef_Nr99_tax_dna.qza \ --i-reference-taxonomy SILVA_132_SSURef_Nr99_tax.full.ids.qza \ --o-classifier classifier.silva132.in.house.qza ``` This job took 59.5 hours. It appears that classifier training time has a nearly linear relationship to the size of the qza read file. ```bash cp classifier.silva132.in.house.qza ../../Rbon.htseq/16s/ cd ../../Rbon.htseq/16s/ alias qiime 'docker run -t -i -v /export/groups/drangeli/Rbon.htseq/16s/:/data qiime2/core:2018.2 qiime' qiime feature-classifier classify-sklearn \ --i-reads Rbon.16s.rep.seqs.qza \ --i-classifier classifier.silva132.in.house.qza \ --o-classification Rbon.16s.silva132.in.house.taxonomy.calls.qza qiime metadata tabulate \ --m-input-file Rbon.16s.silva132.in.house.taxonomy.calls.qza \ --o-visualization Rbon.16s.silva132.in.house.taxonomy.calls.qzv qiime tools export \ Rbon.16s.silva132.in.house.taxonomy.calls.qza \ --output-dir . mv taxonomy.tsv Rbon.16s.silva132.in.house.taxonomy.calls.tsv awk '{ total += $3 } END { print total/NR }' Rbon.16s.silva132.in.house.taxonomy.calls.tsv qiime taxa barplot \ --i-table Rbon.16s.feature.table.qza \ --i-taxonomy Rbon.16s.silva132.in.house.taxonomy.calls.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --o-visualization Rbon.16s.silva132.in.house.taxa.barplots.qzv ``` The results look very good. Most features are annotated with meaningful taxonomy, and the mean posterior support is 0.868 -- that's higher than for the classifier provided by the Qiime developers. My only guess for why they may have used fewer sequences is that Silva includes eukaryotes, which they may have chosen to omit. ## Conclusions on classification methods So among all the classifiers tried, the full-length Silva 132 small subunit classifiers appear to work best. I'm inclined to use our in-house classifier, since it will include all sequences from Silva, including eukaryotes. The one possible limitation is that it uses rank-free taxonomy, and while most entries have a 7-level system, some include intermediary taxon names. ### Some comparisons of classifier assignments ``` grep '26d495a601b91173185fbd11f24e5e08' *taxonomy.calls.tsv ``` * `Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.tsv`: D_0__Bacteria 0.7539273545475154 * `Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.tsv`: Unassigned 0.3691258975401842 * `Rbon.16s.silva132.in.house.taxonomy.calls.tsv`: Bacteria;Acidobacteria;Subgroup 22 0.9832846623962529 * `Rbon.16s.silva132.nb.taxonomy.calls.tsv`: D_0__Bacteria;D_1__Acidobacteria;D_2__Subgroup 22 0.9836703149560827 The both full-length classifiers provide the same assignment. ``` grep 'b5b6bed3f76ee539bcb34b7c2114ef71' *taxonomy.calls.tsv ``` * `Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.tsv`: D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Vibrionales;D_4__Vibrionaceae;D_5__Vibrio;D_6__Vibrionaceae bacterium O6 0.7131749561096695 * `Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.tsv`: Unassigned 0.31234089741539794 * `Rbon.16s.silva132.in.house.taxonomy.calls.tsv`: Bacteria;Proteobacteria;Gammaproteobacteria;Vibrionales;Vibrionaceae;Photobacterium 0.9949968851654118 * `Rbon.16s.silva132.nb.taxonomy.calls.tsv`: D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Vibrionales;D_4__Vibrionaceae;D_5__Photobacterium 0.9487619762137101 The both full-length classifiers provide the same assignment, the in-house built, full, complete classifier does so with a stronger posterior support. ``` grep '9edd3da2230ffbb71467fb945a010dcb' *taxonomy.calls.tsv ``` * `Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.tsv`: D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Betaproteobacteriales 0.9111723821159405 * `Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.tsv`: Unassigned 0.4227193314226422 * `Rbon.16s.silva132.in.house.taxonomy.calls.tsv`: Bacteria;Proteobacteria;Gammaproteobacteria;Betaproteobacteriales;Burkholderiaceae 0.9999969848292178 * `Rbon.16s.silva132.nb.taxonomy.calls.tsv`: D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Betaproteobacteriales;D_4__Burkholderiaceae 0.9999892474465235 Here both the full-length classifiers provide better taxonomic resolution than the narrower one. ``` grep '36d03f255959ac0365d8d0abc73feb9c' *taxonomy.calls.tsv ``` * `Rbon.16s.silva132.99.16Sv4.515F.806R.taxonomy.calls.tsv`: D_0__Bacteria 0.746289287673006 * `Rbon.16s.silva132.99.16Sv4.515F.926R.taxonomy.calls.tsv`: Unassigned 0.32227498644551766 * `Rbon.16s.silva132.in.house.taxonomy.calls.tsv`: Bacteria;Cyanobacteria;Oxyphotobacteria;Synechococcales;Cyanobiaceae;Cyanobium PCC-6307;uncultured bacterium 0.7355707797244803 * `Rbon.16s.silva132.nb.taxonomy.calls.tsv`: D_0__Bacteria;D_1__Cyanobacteria;D_2__Oxyphotobacteria;D_3__Synechococcales;D_4__Cyanobiaceae;D_5__Cyanobium PCC-6307;D_6__uncultured bacterium 0.8763101553425945 Here both full-length classifiers provide the same assignment, though the Qiime-provided classifier has higher support for this call. # Differential abundance analysis First, we'll need to calculate pseudocounts (counts +1) to avoid log-of-zero errors. ```bash qiime composition add-pseudocount \ --i-table Rbon.16s.feature.table.qza \ --o-composition-table Rbon.16s.comp.table.qza ``` ## ANCOM [ANCOM](https://www.ncbi.nlm.nih.gov/pubmed/26028277) (ANalysis of COMposition) is a statistical method for the comparisons of feature abundance in sample groups. It should be robust to distributional assumptions and scaling to large sample sizes. ANCOM also assumes that relatively few (< ~ 25%) of the features are changing between groups. ### By organ ```bash qiime composition ancom \ --i-table Rbon.16s.comp.table.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --m-metadata-column organ \ --o-visualization Rbon.16s.ancom.by.organ.qzv ``` No OTUs show significant association with the stomach vs. intestine. ### By year ```bash qiime composition ancom \ --i-table Rbon.16s.comp.table.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --m-metadata-column collYear \ --o-visualization Rbon.16s.ancom.by.year.qzv ``` Interestly, two OTUs show significant association by year. Both were over-represented in 2011, compared by 2012. In fact, no reads were found from this *Vibrio* sp. in 2012. | feature UUID | W | taxonomy call | confidence | |--|--|--|--| | 144dba7a99079c9775774544af80ca42 | 1528 | *Vibrio* sp. | 0.9995 | | 34d9042dcf9c529bf357aac6355abe06 | 1505 | uncultured Chthoniobacteraceae LD29 | 0.9982 | ### By month ```bash qiime composition ancom \ --i-table Rbon.16s.comp.table.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --m-metadata-column collMonth \ --o-visualization Rbon.16s.ancom.by.month.qzv ``` | feature UUID | W | taxonomy call | confidence | |---|---|---|---| | c491836ac2179cae363ba311a9e98ead | 1562 | *Photobacterium damselae* | 0.7819 | | e74a51e9552452d32ea36b457758e12d | 1579 | *Enterobacter* sp. | 0.9173 | | 0ffb185d5281f23a82a09b383427053f | 1569 | *Flavobacterium* sp. Aav03 | 0.8691 | | 6961f6313f71b2f07aaa252f9c58e0f8 | 1560 | *Flavobacterium* sp. Aav03 | 0.8727 | | a0deaf35796444578dfce55727b7f911 | 1565 | unidentified Bacterium | 0.9875 | | ae0bdd8654b14be31d5979276f1a4020 | 1448 | Actinobacteria PeM15 | 0.9999 | <center> ![](16s/significant.seasonal.OTUs.pdf) </center> > **Fig. 1.** Fluctuations in the abdundance of species with significant differences by month. ### By day-of-year ```bash qiime composition ancom \ --i-table Rbon.16s.comp.table.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --m-metadata-column collDay \ --o-visualization Rbon.16s.ancom.by.day.qzv ``` Twelve OTUs differ significantly. Check annotations against UUIDs of significant features, e.g. ```bash grep 'c491836ac2179cae363ba311a9e98ead' *.taxonomy.calls.tsv ``` ### day-of-year, adults only ```bash qiime feature-table filter-samples \ --i-table Rbon.16s.feature.table.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --p-where "age='A'" \ --o-filtered-table Rbon.16s.feature.table.adult.qza qiime composition add-pseudocount \ --i-table Rbon.16s.feature.table.adult.qza \ --o-composition-table Rbon.16s.comp.table.adult.qza qiime composition ancom \ --i-table Rbon.16s.comp.table.adult.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --m-metadata-column collDay \ --o-visualization Rbon.16s.ancom.adult.by.day.qzv ``` ### By sex ```bash qiime composition ancom \ --i-table Rbon.16s.comp.table.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --m-metadata-column sex \ --o-visualization Rbon.16s.ancom.by.sex.qzv ``` No significant differences. ### By age ```bash qiime composition ancom \ --i-table Rbon.16s.comp.table.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --m-metadata-column age \ --o-visualization Rbon.16s.ancom.by.age.qzv ``` Four OTUs differed significantly by age and were found only in young-of-year. | feature UUID | taxonomy call | confidence | W | note | |---|---|---|---|---| | ecd9bf4303feb1b02ee495d7e16979b8 | Photobacterium damselae | 0.7383756928 | 1073 | same one that's significantly over-represented on day 246 | | a0deaf35796444578dfce55727b7f911 | unidentified Bacteria | 0.9875211043 | 1126 | same one that's significantly over-represented on the last collection day | | 013c17335a9a0d5a40db08476e3fb627 | unidentified Gammaproteobacteria | 0.8109698091 | 1116 | | | 481ed3c020440361f375cf61654dbc45 | unidentified Bacteria | 0.9851697847 | 1230 | | ### By capture gear ```bash qiime composition ancom \ --i-table Rbon.16s.comp.table.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --m-metadata-column gearType \ --o-visualization Rbon.16s.ancom.by.gear.qzv ``` Eight OTUs differed significantly by capture method. ### By state, adults only ```bash qiime composition ancom \ --i-table Rbon.16s.comp.table.adult.qza \ --m-metadata-file Rbon.sample.metadata.quality.16S.tsv \ --m-metadata-column collState \ --o-visualization Rbon.16s.ancom.adults.by.state.qzv ``` ---- *Dave Angelini, 2018 --- [bugsinourbackyard.org](http://www.bugsinourbackyard.org/)*