# Variant Calling workflow
Assignees: Allan and Joyce
### Background
Variant calling has become widely accepted in human genetics to identify variants associated with a specific trait, population or hereditary disease. It employs next-generation sequencing data to identify two main types of variants: single nucleotide variants/polymorphisms (SNPs) and INDELs (Small insertion or deletions) within a genome of interest and infer their phenotypic impact on the organism.
Workflow languages such as Nextflow and snakemake and containers such as docker and singularity have become central tools in automating bioinformatics pipelines and thus ensuring reproducibility in research.
### Aim
This mini-project aims to review existing variant calling pipelines, identify great ones and extend the workflows where there are gaps, especially to make them useful in insect and pathogen data. Review the workflows using the following criteria:
1. How easy are they to set up and use? Do they provide accessible documentation and tutorials?
2. Are they fast and easily scalable based on available compute resources?
3. Can they scale to the cloud?
4. Can they be used on a variety of data, including insects, pathogens and microbes (virus and bacteria)
5. Are they implemented in the latest specifications and versions of the tools? For example, Whether the pipeline implements Nextflow DSL2 syntax and docker or singularity containers
6. Are they well and regularly maintained? When were they updated last?
### Some existing pipelines
- https://github.com/mbbu/variant-calling-pipeline/tree/main/GATK
- https://h3abionet.github.io/H3ABionet-SOPs/Variant-Calling-1-0.html
- https://github.com/CRG-CNAG/CalliNGS-NF
- https://github.com/nf-core/sarek
- https://github.com/nf-core/viralrecon
- https://github.com/nf-core/rnavar
- https://github.com/nf-core/vipr
- https://github.com/h3abionet/h3avarcall
### Tasks
1. Create a roadmap for the miniproject.
2. Review existing variant calling pipelines.
3. Work towards addressing the existing gap.
4. Document your work clearly on GitHub using wikis and GitHub pages
5. Document the papers you are reading, a link to the paper, and a sentence or two on why you included them
**N/B:** You have to clearly demonstrate collaborative research skills, informative visualization, and report writing.
### Useful resources and references
1. [Identification of pathogen genomic variants through an integrated pipeline](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-63)
2. [Microbial Variant Calling](https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/microbial-variants/tutorial.html)
3. [Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4544753/)