# Some short notes on OTUs vs ASVs > **Sorry this isn't more extensive, as this is a pretty nuanced topic unfortunately, but here are some short notes on this.** I have a small section about this on a [page on my site here](https://astrobiomike.github.io/misc/amplicon_and_metagen#a-note-on-otus-vs-asvs) (screenshot below), with links to some relevant papers. [![](https://i.imgur.com/PymuweN.png)](https://astrobiomike.github.io/misc/amplicon_and_metagen#a-note-on-otus-vs-asvs) <br> Beyond the text above and the practical side of things bulleted there, here are some main points that come to mind at the moment: * With traditional OTU clustering we can get hundreds or even thousands of spurious OTUs from a mock community expected to hold about 20 organisms (e.g. [Edgar 2017](https://peerj.com/articles/3889/), [Prodan et al. 2020](https://doi.org/10.1371/journal.pone.0227434)). * Worth noting, Edgar specifically singles out QIIME a lot in that 2017 paper, to which Caporaso (QIIME developer) responded [here](https://forum.qiime2.org/t/a-response-to-accuracy-of-microbial-community-diversity-by-r-edgar-for-qiime-users/1456). In there Caporaso explicitly states (point 4) they do not recommend OTU clustering anymore – so the disagreement is not there, to be clear. * This is due to errors and chimeras (e.g. [Edgar and Flyvbjerg 2015](https://academic.oup.com/bioinformatics/article/31/21/3476/194979), [Edgar 2017](https://peerj.com/articles/3889/)) * We typically get fewer representative sequences from any single-nucleotide resolution approach (e.g. Minimum entropy decomposition, deblur, DADA2, UNOISE) than we get from a traditional OTU-clustering approach, because all are better dealing with errors that would otherwise lead to spurious OTUs * Singletons (sequences appearing only once across the entire dataset) should be removed no matter what * There is no way to distinguish true singletons from those introduced by errors * The amount of error-caused singletons (with typical, current Illumina sequencing) will always vastly out-number the amount of true, biologically sourced singletons * Even with 99.99% accuracy (and ignoring polymerase error rates), we still get many erroneous singletons from each true sequence. Edgar lays this out pretty nicely [here](https://drive5.com/usearch/manual/tolstoy.html). * Even for the few (if any) singletons that are real, their utility or biological interpretation would be questionable (but again, we can't distinguish them from the much greater amount of noise we know is introduced through errors anyway) * Even with singleton removal, traditional OTU-clustering methods generate many spurious OTUs