[TOC]
# Bit 150 Project 3 - Identifying the source of virulence
## INTRODUCTION
Since the 2011 E. coli pathogenic outbreak, different bioinformatic tools have been used to analyze and study the molecular epidemiology behind these microorganisms. The Comprehensive Antibiotic Resistance Database (CARD) is an active bioinformatic database of resistance genes, their products, and associated phenotypes. The Resistance Gene Identifier (RGI) is a bioinformatic tool offered by CARD that is used to predict resistomes from protein or nucleotide data based on homology and single nucleotide polymorphisms models. CARD/RGI can be used via the command line in a cluster loading it on Conda. The RGI analyzes the nucleotide or protein sequences under three algorithms: Perfect, Strict, and Loose. The Perfect model detects perfectly matches to the reference sequences and mutations in the CARD database. The Strict model detects previously unknown variants of known antimicrobial resistance (AMR) genes using detection models with similar CARD cut-offs to make sure that detected variant are likely a functional AMR gene. The Loose model works outside of the detection model cut-offs to focus on new AMR genes. By using RGI, we can create outputs files like json (JavaScript Object Notation) files that are used as input in web applications. By using the RGI heatmap command we can create images to visualize colored comparisons between the genomes. In this project, the CARD and RGI bioinformatic tools were used to compare the E. coli strains O104:H4 TY-2482 virulent strain with the O104:H4 Ec55989 avirulent reference.
## METHODS
To identify the differences that are unique to the TY-2482 compared with the reference strain
1. Login to the cluster FARM using credentials
```ssh user-id@farm.cse.ucdavis.edu ```
2. Requesting resources for interactive job
``` srun -p bit150h -t 4:00:00 --mem=8000 -N 1 -n 3 --pty bash -l```
3. Creating and changing directory to the project 3 file folder
```mkdir P3```
```cd P3```
4. To set and link the input files the FASTA files Ec_TY2482.contigs.fa and Ec_55989.contigs.fa and to P3 folder on the cluster.
```ln -s /group/bit150/Project_3/Ec_55989.contigs.fa .```
```ln -s /group/bit150/Project_3/Ec_TY2482.contigs.fa .```
5. To load and activate the program CARD
```module load conda3```
```source activate CARD```
6. To create perfect or strict hits without loose hits using RGI tool for the Ec_TY2482 and Ec_55989 strains
```rgi main --input_sequence Ec_TY2482.contigs.fa --output_file Ec_TY2482 --input_type contig --clean```
``` rgi main --input_sequence Ec_55989.contigs.fa --output_file Ec_55989 --input_type contig –clean```
7. To create perfect or strict hits with loose hits using RGI tool for the Ec_TY2482 and Ec_55989 strains
```rgi main --input_sequence Ec_TY2482.contigs.fa --output_file Ec_TY2482_loose --input_type contig --include_loose –clean```
```rgi main --input_sequence Ec_55989.contigs.fa --output_file Ec_55989_loose --input_type contig --include_loose –clean```
8. to check the inputs
```ls```
9. Creating folders to allocate json files, d: default without loose hits and l: with loose hits
```mkdir json_d```
```mkdir json_l```
10. Sorting the created json files to the respective json folders
```mv Ec_TY2482.json ./json_d```
```mv Ec_55989.json ./json_d```
```mv Ec_TY2482_loose.json ./json_l```
```mv Ec_55989_loose.json ./json_l```
10. Creating net visualization using RGI heatmap
```rgi heatmap --input ./json_d --output ./heatmap```
```rgi heatmap --input ./json_l --output ./heatmap_l```
11. Analyzing the json files with RGI in the CARD Web portal
https://card.mcmaster.ca/rgi/results/VrfMogcNZmW9iXWdSz8sdRgRfj9jkexVjoE0js1c
## RESULTS

Figure 1. RGI wheel showing the antimicrobial genes in virulent Ec_TY2482 without loose hits. Green color represents perfect hits, yellow color represents strict hits and red color represents loose hits.

Figure 2. RGI wheel showing the antimicrobial genes in the avirulent Ec_55989 without loose hits. Green color represents perfect hits, yellow color represents strict hits and red color represents loose hits.
Table1. Summary of AMR hits for the RGI without loose analysis
<table><tr><td class="selected" colspan="1" rowspan="1" style="display: table-cell; text-align: left; vertical-align: top;"><div class="wrap"><div class="" contenteditable="false" style="margin: 10px 5px;"><p><span>Strain</span></p></div></div></td><td class="selected" colspan="1" rowspan="1" style="display: table-cell; text-align: left; vertical-align: top;"><div class="wrap"><div class="" contenteditable="false" style="margin: 10px 5px;"><p><span>#Perfect hits</span></p></div></div></td><td class="selected" colspan="1" rowspan="1" style="display: table-cell; text-align: left; vertical-align: top;"><div class="wrap"><div class="" contenteditable="false" style="margin: 10px 5px;"><p><span>#Strict hits</span></p></div></div></td><td class="selected" colspan="1" rowspan="1" style="display: table-cell; text-align: left; vertical-align: top;"><div class="wrap"><div class="" contenteditable="false" style="margin: 10px 5px;"><p><span>#Loose hits</span></p></div></div></td></tr><tr><td class="selected" colspan="1" rowspan="1" style="display: table-cell; text-align: left; vertical-align: top;"><div class="wrap"><div style="margin: 10px 5px;"><p><span>Ec_TY2482</span></p></div></div></td><td class="selected" colspan="1" rowspan="1" style="display: table-cell; text-align: center; vertical-align: top;"><div class="wrap"><div style="margin: 10px 5px;" class="" contenteditable="false"><p><span>15</span></p></div></div></td><td class="selected" colspan="1" rowspan="1" style="display: table-cell; text-align: center; vertical-align: top;"><div class="wrap"><div style="margin: 10px 5px;" class="" contenteditable="false"><p><span>51</span></p></div></div></td><td class="selected" colspan="1" rowspan="1" style="display: table-cell; text-align: center; vertical-align: top;"><div class="wrap"><div style="margin: 10px 5px;" class="" contenteditable="false"><p><span>0</span></p></div></div></td></tr><tr><td class="selected" colspan="1" rowspan="1" style="display: table-cell; text-align: left; vertical-align: top;"><div class="wrap"><div class="" contenteditable="false" style="margin: 10px 5px;"><p><span>Ec_55989</span></p></div></div></td><td class="selected" colspan="1" rowspan="1" style="display: table-cell; text-align: center; vertical-align: top;"><div class="wrap"><div style="margin: 10px 5px;" class="" contenteditable="false"><p><span>13</span></p></div></div></td><td class="selected" colspan="1" rowspan="1" style="display: table-cell; text-align: center; vertical-align: top;"><div class="wrap"><div style="margin: 10px 5px;" class="" contenteditable="false"><p><span>45</span></p></div></div></td><td class="selected" colspan="1" rowspan="1" style="display: table-cell; text-align: center; vertical-align: top;"><div class="wrap"><div style="margin: 10px 5px;" class="" contenteditable="false"><p><span>0</span></p></div></div></td></tr></table>

Figure 3. Heatmap showing genes of Ec_TY2482 and Ec_55989 strains. The figure was created with RGI using json files without loose hits analysis. Yellow color represents perfect hits, green color represents strict hits and purple color represents no hits.

Figure 4. Heatmap showing genes of Ec_TY2482 and Ec_55989 strains. The figure was created with RGI using json files with loose hits analysis. Yellow color represents perfect hits, green color represents strict hits and purple color represents no hits.
## DISCUSSION
According to the results, there are genes in the virulent Ec_TY2482 that are not present in the Ec_55989 avirulent e. coli O104:H4 strain. We can recognize these genes by comparing the colors in the heatmaps. There are more AMR genes with perfect hits for the virulent strain.
There are 51 and 45 AMR genes with strict hits for Ec_TY2482 and Ec_55989 respectively. Interestingly, 7 AMR genes are only for the virulent strain (APH3, APH6, DfrA7, Sul1, Sulf2, TetA, Figure 3). Also, there are 15 genes with perfect hits for the Ec_TY2482 and 13 with perfect hits for Ec_55989 . Interestingly, despite 12 genes had perfect hits for both Ec_TY2482 and Ec_55989, only 3 genes had perfect hits only for the virulent strain: CTXM-15, TEM-1 and qacEdelta1.
Rhode et al, 2011 founded that the outbreak was caused by the acquisition of prophage and a plasmid pESBL TY2482 by the E. coli strains O104:H4. The pESBL TY2482 encodes a CTX-M-15 extended-spectrum beta-lactamase (ESBL) enzyme, as well as a beta-lactamase from the TEM class.. Frank et al.,2011 founded that the outbreak strain produces an ESBL complex (CTX-M15) and a specific beta-lactamase TEM-1, resulting in the resistance for several antibiotics. A blaCTX-M-15 gene was detected in two bla/CTX isolates as well as in the 2011 isolates. . All isolates classified as the outbreak strain are resistant to beta-lactam antibiotics. Resistance to β-lactam can occur by: 1) the production of β-lactamases or 2) the production of an altered penicillin-binding proteins with a lower affinity for most β-lactam antibiotics (Worthington et al., 2013). The qacEdelta1 gene codes for the antiseptic-resistance protein quaternary ammonia compounds resistance, a type of desinfectant, but it is not mentioned in the papers revised and might be interesting to study.
The json files created under the loose hit analysis were not be able to being uploaded to the CARD/RGI web analyzer because their size was more than 20 bp (maximum allowed by the web site). These files might be further analyzed converting the json files into cvs files.
## REFERENCES
Rohde, H., Qin, J., Cui, Y., Li, D., Loman, N. J., Hentschke, M., ... & Xi, F. E. coli O104: H4 Genome Analysis Crowd-Sourcing Consortium. 2011. Open-source genomic analysis of Shiga-toxin-producing E. coli O, 104, 718-724.
Ferdous, M., Zhou, K., de Boer, R. F., Friedrich, A. W., Kooistra-Smid, A., & Rossen, J. W. (2015). Comprehensive characterization of Escherichia coli O104: H4 isolated from patients in the Netherlands. Frontiers in microbiology, 6, 1348.
Frank, C., Werber, D., Cramer, J. P., Askar, M., Faber, M., & Ander Heiden, M. (2011). HUS investigation team: epidemic profile of Shiga-toxin-producing O104: H4 outbreak in Germany. Escherichia coli, 365, 1771-80.
Alcock, Brian P., Amogelang R. Raphenya, Tammy TY Lau, Kara K. Tsang, Mégane Bouchard, Arman Edalatmand, William Huynh et al. (2020) "CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database." Nucleic acids research 48, no. D1 D517-D525. https://github.com/arpcard/rgi
Worthington, R. J., & Melander, C. (2013). Overcoming resistance to β-lactam antibiotics. The Journal of organic chemistry, 78(9), 4207-4213.