This is always a difficult topic for me. How to handle proper statistics in my large data set analysis. Here's a collection of papers and discussion on the best way to approach it. This page is a work in progress of notes copy paste

Links to discussion

Found two stat exchanges discussion on how to handle p value corrections with genomic datasets:

Link

Benjamini-Yekutieli method was designed to handle the situation with correlated test results better. It can provide an FDR that is less conservative than the BH value.

Link

[…] I would suggest relying on FDR-based or maxT tests as implemented in the multtest R package (see mt.maxT), but the definitive guide to resampling strategies for genomic application is Multiple Testing Procedures with Applications to Genomics, from Dudoit & van der Laan (Springer, 2008). See also Andrea Foulkes's book on genetics with R, which is reviewed in the JSS. She has great material on multiple testing procedures.

Links to papers

Microbiome differential abundance methods produce different results across 38 datasets – Link

Interesting paper showing that ANCOM and Aldex2 are the more robust DAA methods

A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions – Link

This one is interesting because it states:

Based on the benchmarking study, we conclude that none of the existing DAA methods evaluated can be applied blindly to any real microbiome dataset. The applicability of an existing DAA method depends on specific settings, which are usually unknown a priori. To circumvent the difficulty of selecting the best DAA tool in practice, we design ZicoSeq, which addresses the major challenges in DAA and remedies the drawbacks of existing DAA methods. ZicoSeq can be applied to microbiome datasets from diverse settings and is a useful DAA tool for robust microbiome biomarker discovery.

Links to discussion

Links to papers

Read more

R - Lines and geom_jitter

Anvio Pangenomic

R kegg scrapper

Metagenomic assembly pipeline