# Somatic Variant Calling Tutorial ## Context Somatic variant calling is a crucial process in cancer genomics, allowing researchers to identify mutations that arise specifically in tumor cells. These mutations occur due to DNA damage or replication errors and are not inherited. This tutorial is adapted from the Galaxy Training Network and condensed into a 2-hour session, focusing on key steps: quality control, read alignment, somatic variant calling, and visualization. The workflow uses publicly available sequencing data and open-source bioinformatics tools for hands-on learning. Full Galaxy Training material: https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/somatic-variants/tutorial.html ## Introduction Somatic variant calling helps identify genetic differences between normal and tumor tissues, which is essential for understanding cancer progression and treatment. This tutorial will guide users through detecting somatic variants using the Galaxy platform. ## Objectives By the end of this tutorial, participants will be able to: - Perform quality control on sequencing data - Align sequencing reads to a reference genome - Call somatic variants from tumor-normal pairs - Visualize and interpret results ## Step-by-Step Approach ### Step 1: Data Preparation and Quality Control **Tools used:** FastQC, Trimmomatic 1. **Upload Data:** - Download the sequencing data from the following links: - [Normal sample Read 1](https://zenodo.org/record/2582555/files/SLGFSK-N_231335_r1_chr5_12_17.fastq.gz) - [Normal sample Read 2](https://zenodo.org/record/2582555/files/SLGFSK-N_231335_r2_chr5_12_17.fastq.gz) - [Tumor sample Read 1](https://zenodo.org/record/2582555/files/SLGFSK-T_231336_r1_chr5_12_17.fastq.gz) - [Tumor sample Read 2](https://zenodo.org/record/2582555/files/SLGFSK-T_231336_r2_chr5_12_17.fastq.gz) - Upload these FASTQ files to Galaxy using the **Upload Data** tool. - Download the reference genome: - [Reference Genome - hg19](https://zenodo.org/record/2582555/files/hg19.chr5_12_17.fa.gz) 2. **Quality Check with FastQC:** - Run FastQC on the uploaded FASTQ files. - Examine reports for quality issues such as adapter contamination. 3. **Trimming Low-Quality Reads with Trimmomatic:** - Use Trimmomatic to remove adapters and low-quality sequences. - Choose appropriate parameters based on FastQC reports. 4. **Re-run FastQC:** - Check if trimming improved data quality. ### Step 2: Read Alignment **Tools used:** BWA-MEM, Samtools 1. **Align reads to the reference genome using BWA-MEM:** - Use the reference genome **hg19 (Human Genome build 19)** or the latest available human reference genome in Galaxy. - Set tumor and normal sample as inputs. 2. **Convert to BAM format and sort:** - Use Samtools to convert SAM to sorted BAM. 3. **Index the sorted BAM files:** - Use Samtools index for efficient access. ### Step 3: Somatic Variant Calling **Tools used:** Mutect2 1. **Run Mutect2:** - Use the tumor and normal BAM files as inputs. - Set the reference genome (**hg19** or latest available). - Generate a VCF file containing somatic variants. 2. **Assess variant quality:** - Review the VCF file to check for potential issues. ### Step 4: Variant Visualization (Personal Learning and follow the galaxy training material) **Tools used:** Integrative Genomics Viewer (IGV) 1. **Load BAM and VCF files into IGV:** - Ensure the reference genome is correctly set. 2. **Inspect key somatic variants in the tumor sample:** - Look for high-confidence variants in coding regions.