NGS HIVDR Mutation analysis

# NGS HIVDR Mutation analysis - Obtain FASTQ files - Convert to AAVF format using quasitools - Query Stanford HIVdb for mutation analysis (Sierrapy client) - Return HIVDR report - Write pseudo-code for what has been done ### 1. Converting FASTQ to AAVF using hydra from quasitoools** **Installation of quasitools** Create conda environment `conda create -n hiv python='3.7' Activate the environment `conda activate hiv` Install quasitools `conda install quasitools -y` To run hydra in the quasitools (Unzip files before running command) `quasitools hydra -o outdir R1.fastq R2.fastq` The out folder has - mutation_report.aavf - combined_reads.fastq - coverage_file.csv - filtered.fastq - dr_report.csv - hydra,vcf - stats.txt - log.txt - align.bam, align.bam.bai `quasitools hydra` options ![](https://i.imgur.com/lC02SNt.png) ![](https://i.imgur.com/HHCMWID.png) A portion of the aavf file is pasted below for our reference. ``` ##fileformat=AAVFv1.0 ##fileDate=20210321 ##source=quasitools:hydra ##reference=hxb2_pol.fas ##INFO=<ID=RC,Number=1,Type=String,Description="Reference Codon"> ##INFO=<ID=AC,Number=.,Type=String,Description="Alternate Codon"> ##INFO=<ID=ACF,Number=.,Type=Float,Description="Alternate Codon Frequency,for each Alternate Codon,in the same order aslisted."> ##INFO=<ID=CAT,Number=.,Type=String,Description="Drug Resistance Category"> ##INFO=<ID=SRVL,Number=.,Type=String,Description="Drug Resistance Surveillance"> ##FILTER=<ID=af0.01,Description="Set if True; alt_freq<0.01"> #CHROM GENE POS REF ALT FILTER ALT_FREQ COVERAGE INFO hxb2_pol PR 3 V I PASS 1.0000 2202 RC=gtc;AC=Atc;ACF=1.0000;CAT=.;SRVL=. hxb2_pol PR 13 I V PASS 0.9973 2618 RC=ata;AC=Gta;ACF=0.9973;CAT=.;SRVL=. hxb2_pol PR 20 K R PASS 1.0000 3127 RC=aag;AC=aGA,aGg;ACF=0.9994,0.0006;CAT=Other;SRVL=No hxb2_pol PR 33 L F PASS 0.9993 4245 RC=tta;AC=ttT;ACF=0.9993;CAT=PIMinor;SRVL=No hxb2_pol PR 35 E D PASS 0.9990 3817 RC=gaa;AC=gaC;ACF=0.9990;CAT=.;SRVL=. hxb2_pol PR 36 M I PASS 0.9971 3743 RC=atg;AC=atA;ACF=0.9971;CAT=.;SRVL=. hxb2_pol PR 37 S N PASS 1.0000 3743 RC=agt;AC=aAt;ACF=1.0000;CAT=.;SRVL=. hxb2_pol PR 39 P A PASS 0.9946 3678 RC=cca;AC=Gca;ACF=0.9946;CAT=.;SRVL=. hxb2_pol PR 41 R K PASS 0.9967 3594 RC=aga;AC=aAa;ACF=0.9967;CAT=.;SRVL=. ``` ### 2. Converting FASTQ to CodFreq file format using *fastq2codfreq* Useful links: - https://github.com/hivdb/codfreq - https://hivdb.stanford.edu/page/codfreq/ The `fastq2codfreq` program is contained in the `codfreq` docker container from Stanford hivdb. Therefore to use it make sure you have docker installed and running on your system. - Installation If need to install docker first, Install Docker CE at (https://docs.docker.com/install/) ``` sudo curl -sL https://raw.githubusercontent.com/hivdb/codfreq/master/bin/fastq2codfreq-docker -o /usr/local/bin/fastq2codfreq sudo chmod +x /usr/local/bin/fastq2codfreq ``` - Running the script. Simply call the `fastq2codfreq` script and provide the directory with the fastq files. It automatically detects whether the files are paired or single-end. `fastq2codfreq /path/to/folders/containing/fastq/files` A portion of the generated codfreq file is pasted below. The columns are; - gene (PR, RT, or IN) - position - total number of reads of this position - codon nucleotide triplet - total number of reads of this codon ``` PR 1 2118 CCT 2112 PR 1 2118 CCC 5 PR 1 2118 TCT 1 PR 2 2175 CAA 2174 PR 2 2175 CAG 1 PR 3 2226 ATC 2223 PR 3 2226 ATT 2 PR 3 2226 ACC 1 PR 4 2218 ACT 2215 PR 4 2218 ATT 2 PR 4 2218 GCT 1 PR 5 2339 CTT 2338 PR 5 2339 CCT 1 PR 6 2343 TGG 2343 PR 7 2408 CAA 2405 ``` Also note that Codfreq can as well be generated from **sam** files after alignment; the code below shows how to install the `sam2codfreq` which helps in archieving this. ``` curl -sL https://raw.githubusercontent.com/hivdb/codfreq/master/bin/sam2codfreq-docker -o /usr/local/bin/sam2codfreq sudo chmod +x /usr/local/bin/sam2codfreq ``` ### 3. Querrying Stanford HIVDB **Approach one: Using sierrapy's seqreads** Provide a codfreq to sierrapy's seqreads, which outputs a json file. - Installation of sierrapy ``` pip install sierrapy ``` - Running the tool ![](https://i.imgur.com/4gnQ1tS.png) ``` sierrapy seqreads DR-077-20_S19_L001.codfreq -o DR-077-20_S19_L001.json ``` Process JSON file to obtain mutation annotations and predicted scores