owned this note
owned this note
Published
Linked with GitHub
# Microbial Ecology Tools [Breakout Session]
#### [Mike Lee](https://astrobiomike.github.io/) ([@AstrobioMike](https://twitter.com/AstrobioMike)) and [Sabah Ul-Hasan](https://github.com/sul-hasan) (@s_ulhasan) <br> July 10th, 2018 [~2 hours]
This is a process-oriented, rather than a tool/command line-oriented, discussion. Here we will (1) take a broad-level view of amplicon sequencing and metagenomics as approaches for microbial ecology, (2) discuss the strengths and weaknesses of both, and (3) outline the general workflows for each.
___
## What is microbial ecology?
Microbial ecology adds environmental context to microbiology. Characterizing the microbial ecology of the ocean marine environment, for example, can give us insight into biogeochemical processes such as the carbon cycle.
![](https://i.imgur.com/lu5LAlO.jpg)
[Worden et al. 2015](http://science.sciencemag.org/content/347/6223/1257594.figures-only)
Here we are focusing on two commonly high-throughput sequencing methods (of many -omics types) used for characterizing these microbial communities. **(a)** One approach that arose in the early 2000s is targeted (amplicon) sequencing. **(b)** The other, initially used in the late 2000s and gaining popularity in the late early 2010s, is metagenomics (shotgun) sequencing.
___
## (1) A broad-level view
#### (a) Amplicon sequencing
Amplicon sequencing of marker-genes (e.g. 16S, 18S, ITS) is one of the first tools in the microbial ecologist's toolkit. It is a broad-level survey of community composition and is most often used for hypothesis generation. It's better to think of it as the beginning of a process rather than the end.
![](https://i.imgur.com/45ZAoct.png)
**[Question 1] In what ways have you utilized amplicon sequencing and/or have found it useful?**
#### (b) Metagenomics
Shotgun metagenomic sequencing provides a way to access all the DNA of a mixed community. It uses random primers rather than targeted primers and therefore suffers much less from pcr bias.
**[Question 2] What could still be a bias of metagenomic sequencing?**
![](https://i.imgur.com/tzgqgGx.png)
[Ji and Nielsen 2014](https://www.dovepress.com/new-insight-into-the-gut-microbiome-through-metagenomics-peer-reviewed-fulltext-article-AGG)
___
## (2) Strengths and weaknesses of each approach
What do each of these approaches tell us? What are their limitations?
#### (a) Amplicon sequencing
* **Useful for**
* one *metric* of community composition
* recovered gene copies ≠ counts of organisms
* (gene-copy number; pcr bias)
* you're getting a snapshot of, e.g. "16S gene fragment copy numbers recovered"
* can track changes in community structure in response to a treatment and/or across environmental gradients
* can provide strong support for further investigation, particularly single-nucleotide resolution methods
* e.g. *Trichodesmium–Alteromonas* story ([paper here](https://www.nature.com/articles/ismej201749))
* relatively cheap and less processing power/time required in comparison to metagenomics
* **Not useful for**
* abundance of organisms (or relative abundance)
* recovered gene copies ≠ counts of organisms
* gene-copy number
* pcr bias (small scale) -> under/over representation
* pcr bias (large scale) -> "universal" primers, only looking for what we know
* function
* Even if you can highly resolve the taxonomy of something from an amplicon sequence, it is still only one fragment of one gene, and extrapolating to a genome's functional potential is not sound due to the highly variable nature of most microbes' accessory genomes.
#### (b) Metagenomics
* **Useful for**
* functional potential
* insights into the "unculturables"
* much better for "relative" abundance (not true abundance)
* still some caveats, like cell lysis efficiencies
* **_Not_ useful for:**
* "activity"
* neither is transcriptomics or proteomics for that matter – Life is complicated 🙂
* distinguishing between active, dormant, or dead
**[Question 3] Should we expect relative abundance of amplicon sequencing to match up with metagenomics?**
A recent assessment from [Tessler et al. 2017](https://www.nature.com/articles/s41598-017-06665-3)
![](https://i.imgur.com/gKI5XyD.png)
### Best practice?
Select practices that are the best for addressing your research question(s). Microbial informatics may be sufficient, traditional microbiology and culturing methods may be more suitable, or a combination of both. Resources and time may also narrow choices.
See example [Knight et al. 2018](https://www.nature.com/articles/s41579-018-0029-9)
___
## (3) General workflows
#### (a) Amplicon sequencing
You can download this overview pdf [here](https://ndownloader.figshare.com/files/12367181) if you'd like
**OTUs** vs **ASVs**
> All sequencing technologies make mistakes, and (to a much lesser extent), polymerases make mistakes as well. These mistakes artificially increase the number of unique sequences in a sample, by a lot. Clustering similar sequences together (generating OTUs) emerged is one way to mitigate error and summarize data, though often at the cost of resolution.
* OTUs (operational taxonomic units)
1. cluster sequences into groups based on percent similarity
2. choose a representative sequence for that group
* closed reference
* **\+** can compare across studies
* **\-** reference biased and constrained
* de novo
* **\+** can capture novel diversity
* **\-** not comparable across studies
* **\-** diversity of sample affects what OTUs are generated
* ASVs (amplicon sequence variants)
1. attempt to identify the original biological sequences taking into account error
* **\+** enables single-nucleotide resolution
* **\+** can compare across studies
* **\+** can capture novel diversity
#### (b) Metagenomics
You can download this overview pdf [here](https://ndownloader.figshare.com/files/12367187) if you'd like
**Recovering genomes from metagenomes**
* you can download the [keynote slide here](https://ndownloader.figshare.com/files/12367211) and a [powerpoint version here](https://ndownloader.figshare.com/files/12367226)
**[Question 5] What else might you add to these workflows, and what would you like to focus on in the tutorials later this week?**
___
### Example tutorials for amplicon data
* **dada2** (ASVs)
* [Developer tutorial](https://benjjneb.github.io/dada2/tutorial.html)
* Mike's [dada2 tutorial](https://astrobiomike.github.io/amplicon/dada2_workflow_ex), with a separate section for dealing with 18S mixed in with 16S
* **usearch/vsearch** (ASVs and OTUs)
* [usearch](https://www.drive5.com/usearch/) is not entirely free, but it has some very useful tools and approaches (e.g. good calculation of hybrid quality scores after merging overlapping reads). There is a lightweight free version that can still do many things.
* [vsearch](https://github.com/torognes/vsearch/wiki/VSEARCH-pipeline) is completely free and open, and was made in response to usearch not being completely free. It does not have all of the capabilities of usearch however.
* Mike's [usearch/vsearch tutorial](https://astrobiomike.github.io/amplicon/workflow_ex)
* **mothur** (OTUs only currently)
* [Developer tutorial](https://www.mothur.org/wiki/MiSeq_SOP)
* **qiime2**
* qiime provides an environment that "wraps" (employs) other processing tools (like those above) and also provides convenient visualization capabilities
* they have [extensive documentation](https://docs.qiime2.org/2018.6/) and an [amplicon tutorial here](https://docs.qiime2.org/2018.6/tutorials/moving-pictures/)
### Example tutorials for metagenomic data
As one might guess, this is not as straightforward as the amplicon data tutorials. A nice workflow leading up to and including recovering genomes can be found [here](http://merenlab.org/tutorials/infant-gut/) at the [anvi'o site](http://merenlab.org/software/anvio/) (along with other very informative/helpful tutorials and blogs).