Work Flow Overview

# Work Flow Overview ###### tags: ## Overview This document will outline all the steps that you will need to analyze genome sequence data. Each of the following are listed in order and will link to more detailed information (including explanations for each step and the script we used). 1. [Download `.fastq` files onto cluster](https://github.com/Noor-WGS-data/Genome_sequence_data/blob/main/How_to_sftp.md) 2. [Run fastQC on your sequence data](https://github.com/Noor-WGS-data/Genome_sequence_data/blob/main/How_to_QC_fastq.md) 3. [Download and setup the reference genome using BWA](https://github.com/Noor-WGS-data/Genome_sequence_data/blob/main/how_to_ref_genome_bwa.md) 4. [Align raw `.fastq` files with the reference genome using BWA MEM](https://github.com/Noor-WGS-data/Genome_sequence_data/blob/main/Aligned_SAM_file_from_raw%20fastq_data.md) 5. Getting Your SAM file ready: a. Add readgroups b. SortSam (by query name) c. Mark Illumina Adapters (needs to be sorted by QUERY NAME) d. MarkDuplicatesSpark (works best on files sorted by query name but outputs coordinate sorted file) e. Convert SAM to BAM 6. Variant Calling and Filtration a. Index the BAM to prep for Haplotype Caller b. ## Our project directories