---
title: Metagenomics/AMR
description: Information and Notes on my studies and research of deeplearning techniques in the context of antimicrobial resistances in metagenomics.
---
# AMR Studies
## Efforts to reproduce AMR++ v1
- still some differences when running my pipeline steering script.
- nele and judith are keeping intermediate files from some galaxy runs
- need to use that to figure out where the difference comes from.
- annotations are different, but only are missing the "new" column with RequiresSNP Info.
## AMR++ Additional Implementations
- DeepArg
- should take removed host reads as input
- metaxa2 workflow (after kraken run)
- kraken2
- figure out how to make kraken faster:
- maybe rather than:
- `kraken2 --memory-map S1 ; kraken2 --memory-map S2`
- do
- `kraken2 S1 S2`
- such that kraken loads the entire db into memory once and run all samples in one prozess/call.
## Meeting w/ Judith, Nele, Steffen
- ToDo
- [ ] retrain DeepARG with MegaRes v3.0 database
- [ ] AMR++ v1 vs v3 validation
- sequence data located:
- on synology `/data_syn/sequenzdaten/`
- on server `/home/admin/Seqs/`
- start/continue common struture
-
## IOW / Christiane Hassenrück / OTC Genomic
- Christiane erfragt, ob ich potentiell an deren Daten kommen könnte und unser tool daran testen koennte.
- links:
- https://www.oceantechnologycampus.com/
- https://www.io-warnemuende.de/otcg-home-de.html
- https://github.com/tseemann/abricate (Christiane hat bisher nur abricate genutzt)
- kein Fan von analyse mit short reads als einzigen input. lieber nach assembly, i.e. contigs as input.
- kooperiert das FLI mit dem DSMZ in Braunschweig?
- können wir (Fabian ich und das FLI an etablierte Strukturen anknüpfen?)
## Reference Dump
- PhD thesis https://vtechworks.lib.vt.edu/items/4bd67860-7df9-41c5-893f-0b0e284a5eb1
- blasting:
- https://doi.org/10.1186/1471-2105-10-421
- https://doi.org/10.1016/S0022-2836(05)80360-2
- Follow up articles from DeepArg secondary authors:
- DeepMRG: https://doi.org/10.1101/2023.11.14.566903
- ARGEM: https://doi.org/10.3389/fgene.2023.1219297
- HT-ARGfinder: https://doi.org/10.3389/fenvs.2022.901917
- PhD thesis https://vtechworks.lib.vt.edu/handle/10919/106511
- BRENDA enzym db https://www.brenda-enzymes.org/enzyme.php?ecno=3.5.2.6
- article https://towardsdatascience.com/building-machine-learning-models-for-predicting-antibiotic-resistance-7640046a91b6
## Pipelines/Workflows
### funcscan (Pipeline)
- https://github.com/nf-core/funcscan
- includes AMRFinderPlus ABRicate, DeepARG, RGI, fARGene
### ResistoXplorer (Pipeline)
- https://www.resistoxplorer.no/ResistoXplorer/faces/docs/FaqView.xhtml#input2
- uses both AMR++ v2 and DeepARG ?
### AMRFinderPlus (Workflow)
- https://github.com/ncbi/amr
-
### AMR++ (Workflow)
- https://www.meglab.org/amrplusplus/
- version 3 https://github.com/Microbial-Ecology-Group/AMRplusplus/tree/master
- version 1 (credentials user:admin@galaxy.org pw:admin)
- Galaxy shed [link](https://toolshed.g2.bx.psu.edu/repository?repository_id=f249d27395ea9e5b&changeset_revision=c9fbf44a96f7)
- github repo [link](https://github.com/cdeanj/galaxytools/tree/master/workflows/amrplusplus)
<details>
<summary>AMR++ version comparison</summary>
taking test input from v3:
reads: data/raw/*_R{1,2}.fastq.gz
host: data/host/chr21.fasta.gz
host_index: data/host/chr21.fasta.gz
amr_index: data/amr/megares_database_v3.00.fasta
amr: data/amr/megares_database_v3.00.fasta
annotation: data/amr/megares_annotation_v3.00.csv
v3 default:
trimmomatic: v3 (v1)
leading = 3 (3 - not applied)
trailing = 3 (3 - not applied)
slidingwindow = 4:15 (4:20)
minlen = 36 (20 - not applied)
resistome
threshold = 80 (1)
rarefaction
min = 5
max = 100
threshold = 80
skip = 5
samples = 1
- datasets in S3 Galaxy queue
- 1126 -> ch21.fasta
- 1124 -> S3_test_R1.fastq
- 1125 -> S3_test_R2.fastq
- 1127 -> megares_annotation_v3.00.csv
- 1128 -> megares_database_v3.00.fasta
- 1244 -> R1.paired.fastq
- 1245 -> R2.paired.fastq
- step10 Map with BWA
- 1248 -> alignment_sorted.bam
- step 11 Filter by flag
- in v1 `samtools view ... -f 0x0004 -f 0x0008 -f 0x0001 ...`
- in v3 `samtools view ... -f 12 ...` equivalent to `samtools view ... -f 0x4 -f 0x8 ...`
- 1249 -> alignment_sorted_filtered.
- step 12 Sort again
- 1250 -> alignment_sorted_filtered_sorted.bam
- step 13
#### Conclusion
Two major differences between v1 and v3:
1. v1 only uses the sliding window (`4:20`) while v3 uses more options and tighter window (`4:15`).
1. v1 uses different alignment method `bwa aln` instead of `bwa mem`
Recreation of v1 in v3 nextflow framework: [stalbrec/AMRplusplus](https://github.com/Microbial-Ecology-Group/AMRplusplus/compare/master...stalbrec:AMRplusplus:amrpp_v1_recreation)
</details>
### DeepARG
<details>
<summary>Workflow steps</summary>
1. Trimmomatic
- trim & QC on paired end input
- input: forward and reverse read e.g. `F.fq.gz` and `R.fq.gz`
- output:
- `F.fq.gz.paired` surviving pairs
- `R.fq.gz.unpaired` surviving Rs
- `F.fq.gz.unpaired` surviving Fs
- rest discarded
- e.g. 25k read pairs
- 24246 out paired passing in both R and F
- 519 out passing in F only
- 216 out passing in R only
- (29 failing both in R and F)
2. merge `F.paired` and `R.paired`
- vsearch
- input `R.paired` and `F.paired`
- out `F.unmerged`, `R.unmerged` and `merged` $\rightarrow$ summed to `reads.clean`
3. DeepARG Inference
- Diamond $\rightarrow$ database with 12279 features (gene sequences)
- features are aligned to short reads
- outputs `.tsv` file $\rightarrow$ hits with metrics
```
X = [
feature bit-scores -> . . .
reads
|
v
.
.
.
]
```
- DeepARG predict
- score $\geq 0.9 \rightarrow$ single hit
- score $<0.9 \rightarrow$ first two leading hits
- read-id; predicted class; prediction score (probabilty)
- output files
- mapping $\rightarrow$ Diamond best hit info + prediction
:::danger
best hit might be the wrong one, since the sorting is done with bit-score as primary sort key, but in diamond they are sorted in descending alphabetical order - i think! $\rightarrow$ check this!!
:::
4. Quantification of results
- pair best-hit aus alignment das mit predicted class passt
5. Normalization
1. `bowtie2` gg13 alignment to reads
2. Take $N_\mathrm{16s}$ from reads and normalize ARG counts using $N_\mathrm{16s}/L_\mathrm{16s}$, where $L_\mathrm{16s} = 1432$ (**why?**)
3. ARG counts are normalized: $N_\mathrm{sub-type} = (N_\mathrm{sub-type, raw}/L_\mathrm{sub-type~gene})/(N_\mathrm{16s}/L_\mathrm{16s})$
</details>
### hAMRonization
- https://github.com/pha4ge/hAMRonization
- harmonizes output format of various AMR detection tools
- ABRicate https://github.com/tseemann/abricate
- ARIBA https://github.com/sanger-pathogens/ariba
- c-SSTAR https://github.com/chrisgulvik/c-SSTAR
- fargene https://github.com/fannyhb/fargene
- GROOT https://github.com/will-rowe/groot
- kmerresistance https://bitbucket.org/genomicepidemiology/kmerresistance/src/master/
- mykrobe https://github.com/Mykrobe-tools/mykrobe
- PointFinder (now in ResFinder) (https://doi.org/10.1093/jac/dkx217, https://bitbucket.org/genomicepidemiology/pointfinder, https://bitbucket.org/genomicepidemiology/resfinder)
- ResFams https://github.com/dantaslab/resfams
- ResFinder https://bitbucket.org/genomicepidemiology/resfinder http://genepi.food.dtu.dk/resfinder
- The Resistance Gene Identifier https://github.com/arpcard/rgi
- sraX https://github.com/lgpdevtools/sraX
- SRST2 https://github.com/katholt/srst2
- staramr https://github.com/phac-nml/staramr
- TBProfiler https://tbdr.lshtm.ac.uk/ https://github.com/jodyphelan/TBProfiler
### Other Tools/collections
- https://github.com/topics/antimicrobial-resistance-genes
- https://github.com/topics/antimicrobial-resistance
- https://nf-co.re/deepvariant/1.0