Metagenomes and ARGs

## References: ### [Evaluation of Metagenomic-Enabled Antibiotic Resistance Surveillance at a Conventional Wastewater Treatment Plant](https://www.frontiersin.org/articles/10.3389/fmicb.2021.657954/full) #### Sequence Analysis methods 1. Assemble Reads --> Contigs > Reads were assembled in MetaStorm using the IDBA-UD de novo assembler (Peng et al., 2012) according to default parameters to generate contigs for gene contextualization and clinically relevant pathogen-ARG screening. 2. Filter contigs by length and predict open reading frames (proteins) with Prodigal > Contigs were filtered for sequences ≥1000 bps then protein-coding open reading frames (ORFs) were predicted using Prodigal version 2.6.3 with the “-p meta” option (Hyatt et al., 2010). 3. Annotate predicted ORFs with ARG databases > Predicted ORFs were annotated with CARD and an in-house constructed MGE dataset (Arango-Argoty et al., 2019) using blastp in Diamond version 0.9.24 (Buchfink et al., 2015). Diamond alignments were filtered for stringent ARG and MGE annotation (80% identity, aa length ≥ 100, e-value ≤ 1e-10, bitscore ≥ 50). - **ARG Databases used**: - Comprehensive Antibiotic Resistance Database (CARD) version 2.0.1 (Jia et al., 2017) - but manually curated - Metagenomic Phylogenetic Analysis 2 (MetaPhlAn2) (Truong et al., 2015) 4. Taxonomic annotation of contigs using `Kraken2` > Each contig was assigned taxonomy using Kraken2 version 2.0.7 (Wood and Salzberg, 2014) with the Kraken2 standard database of complete bacterial, archaeal, and viral genomes in RefSeq. 5. Use [ExtraARG](https://github.com/gaarangoa/ExtrARG) to identify discriminatory ARGs. > The core resistome of the influent and secondary effluent was determined as any ARG with a non-zero value relative abundance detected across all sampling events. ExtrARG (Gupta et al., 2019), established based on the extremely randomized tree algorithm, was utilized to identify discriminatory ARGs (i.e., ARGs that collectively distinguish different wastewater samples) taking relative abundance into account. ### [Wastewater treatment plant resistomes are shaped by bacterial composition, genetic exchange, and upregulated expression in the effluent microbiomes](https://www.nature.com/articles/s41396-018-0277-8) #### Sequence Analysis methods >The bioinformatics and statistical analysis of metagenomes, metatranscriptomes, and 16S rRNA gene amplicon data are described in detail in the SI (Fig. S2). Identification of antibiotic, biocide and metal resistance genes was based on similarity search against a concatenated protein database of The NCBI Reference Sequence Database (RefSeq release 78) [30], The Comprehensive Antibiotic Resistance Database (CARD v1.0.1) [29, 31], Stuctured Antibiotic Resistance Genes Database (ARDB v1.1) [32], Antibacterial Biocide and Metal Resistance Genes Database (BacMet v1.1) [24] and functionally validated ARGs [33,34,35], followed by cross validation using hmmscan search against Resfams (v1.2) [36], keyword match, and manual inspection. > ![](https://hackmd.io/_uploads/HyDUj-lo2.png) - 16S workflow > Quiime --> get OTUs + do diversity analyses --> stats Good alternative: DADA2 workflow - Mgx workflow > Assemble contigs (IDBA-UD); predict Open Reading Frames (Prodigal); map reads to contigs (Bowtie2); Annotate contigs with multiple databases; stats ## Potential MGX workflow: - Assemble reads --> contigs (e.g. MegaHit, then try other assemblers if you want) - Map reads to contigs - Bowtie2, Minimap2, etc - Predict open reading frames (proteins) with Prodigal - Annotate contigs - protein annotation --> ARG databases - taxonomic annotation (w/ e.g. GTDB)