# NGS HIVDR Mutation analysis
- Obtain FASTQ files
- Convert to AAVF format using quasitools
- Query Stanford HIVdb for mutation analysis (Sierrapy client)
- Return HIVDR report
- Write pseudo-code for what has been done
### 1. Converting FASTQ to AAVF using hydra from quasitoools**
**Installation of quasitools**
Create conda environment
`conda create -n hiv python='3.7'
Activate the environment
`conda activate hiv`
Install quasitools
`conda install quasitools -y`
To run hydra in the quasitools (Unzip files before running command)
`quasitools hydra -o outdir R1.fastq R2.fastq`
The out folder has
- mutation_report.aavf
- combined_reads.fastq
- coverage_file.csv
- filtered.fastq
- dr_report.csv
- hydra,vcf
- stats.txt
- log.txt
- align.bam, align.bam.bai
`quasitools hydra` options


A portion of the aavf file is pasted below for our reference.
```
##fileformat=AAVFv1.0
##fileDate=20210321
##source=quasitools:hydra
##reference=hxb2_pol.fas
##INFO=<ID=RC,Number=1,Type=String,Description="Reference Codon">
##INFO=<ID=AC,Number=.,Type=String,Description="Alternate Codon">
##INFO=<ID=ACF,Number=.,Type=Float,Description="Alternate Codon Frequency,for each Alternate Codon,in the same order aslisted.">
##INFO=<ID=CAT,Number=.,Type=String,Description="Drug Resistance Category">
##INFO=<ID=SRVL,Number=.,Type=String,Description="Drug Resistance Surveillance">
##FILTER=<ID=af0.01,Description="Set if True; alt_freq<0.01">
#CHROM GENE POS REF ALT FILTER ALT_FREQ COVERAGE INFO
hxb2_pol PR 3 V I PASS 1.0000 2202 RC=gtc;AC=Atc;ACF=1.0000;CAT=.;SRVL=.
hxb2_pol PR 13 I V PASS 0.9973 2618 RC=ata;AC=Gta;ACF=0.9973;CAT=.;SRVL=.
hxb2_pol PR 20 K R PASS 1.0000 3127 RC=aag;AC=aGA,aGg;ACF=0.9994,0.0006;CAT=Other;SRVL=No
hxb2_pol PR 33 L F PASS 0.9993 4245 RC=tta;AC=ttT;ACF=0.9993;CAT=PIMinor;SRVL=No
hxb2_pol PR 35 E D PASS 0.9990 3817 RC=gaa;AC=gaC;ACF=0.9990;CAT=.;SRVL=.
hxb2_pol PR 36 M I PASS 0.9971 3743 RC=atg;AC=atA;ACF=0.9971;CAT=.;SRVL=.
hxb2_pol PR 37 S N PASS 1.0000 3743 RC=agt;AC=aAt;ACF=1.0000;CAT=.;SRVL=.
hxb2_pol PR 39 P A PASS 0.9946 3678 RC=cca;AC=Gca;ACF=0.9946;CAT=.;SRVL=.
hxb2_pol PR 41 R K PASS 0.9967 3594 RC=aga;AC=aAa;ACF=0.9967;CAT=.;SRVL=.
```
### 2. Converting FASTQ to CodFreq file format using *fastq2codfreq*
Useful links:
- https://github.com/hivdb/codfreq
- https://hivdb.stanford.edu/page/codfreq/
The `fastq2codfreq` program is contained in the `codfreq` docker container from Stanford hivdb. Therefore to use it make sure you have docker installed and running on your system.
- Installation
If need to install docker first, Install Docker CE at (https://docs.docker.com/install/)
```
sudo curl -sL https://raw.githubusercontent.com/hivdb/codfreq/master/bin/fastq2codfreq-docker -o /usr/local/bin/fastq2codfreq
sudo chmod +x /usr/local/bin/fastq2codfreq
```
- Running the script.
Simply call the `fastq2codfreq` script and provide the directory with the fastq files. It automatically detects whether the files are paired or single-end.
`fastq2codfreq /path/to/folders/containing/fastq/files`
A portion of the generated codfreq file is pasted below. The columns are;
- gene (PR, RT, or IN)
- position
- total number of reads of this position
- codon nucleotide triplet
- total number of reads of this codon
```
PR 1 2118 CCT 2112
PR 1 2118 CCC 5
PR 1 2118 TCT 1
PR 2 2175 CAA 2174
PR 2 2175 CAG 1
PR 3 2226 ATC 2223
PR 3 2226 ATT 2
PR 3 2226 ACC 1
PR 4 2218 ACT 2215
PR 4 2218 ATT 2
PR 4 2218 GCT 1
PR 5 2339 CTT 2338
PR 5 2339 CCT 1
PR 6 2343 TGG 2343
PR 7 2408 CAA 2405
```
Also note that Codfreq can as well be generated from **sam** files after alignment; the code below shows how to install the `sam2codfreq` which helps in archieving this.
```
curl -sL https://raw.githubusercontent.com/hivdb/codfreq/master/bin/sam2codfreq-docker -o /usr/local/bin/sam2codfreq
sudo chmod +x /usr/local/bin/sam2codfreq
```
### 3. Querrying Stanford HIVDB
**Approach one: Using sierrapy's seqreads**
Provide a codfreq to sierrapy's seqreads, which outputs a json file.
- Installation of sierrapy
```
pip install sierrapy
```
- Running the tool

```
sierrapy seqreads DR-077-20_S19_L001.codfreq -o DR-077-20_S19_L001.json
```
Process JSON file to obtain mutation annotations and predicted scores