# Reading: 1000 Genomes project ## What was the primary goal of this project? To understand the impact of the population’s genetic variations in genetic associated diseases. ## What major challenges did they face in completing this dataset? To identify large and complex structural variants and shorter indels in regions of low complexity, high false discovery rate (FDR) of indel call sets and ambiguous and inconsistent results in variation characterization of low-complexity genomic regions. ## Find 1 paper that cites this paper (or dataset) and briefly describe what that paper found. Auton, A., Abecasis, G., Altshuler, D. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). https://doi.org/10.1038/nature15393 This is subsequent research by a portion of the same scientific group but with improved technologies including multi-allelic SNPs, indels, and a diverse set of structural variants (SVs) data collection and analyses. They studied 2,504 genomes from 26 populations by using whole-genome sequencing (low-coverage), deep exome sequencing, and dense microarray genotyping. They characterized more than 88 million variants, included single nucleotide polymorphisms (SNPs), short insertions/deletions, and structural variants. They found out that a typical genome differs from the reference human genome in about 5.0 million sites. But most of the variants consist of SNPs. Also, they detected between 149–182 sites in a genome with protein-truncating variants, multiple sites with peptide-sequence-altering variants, and multiple variant sites overlapping known regulatory regions. They discussed potential causes of genetic variation and their uneven distribution between populations such as geographical characteristics, number of generations, and admixture events.