其他資料庫 === ###### tags: `基因體/三級分析/資料庫` ###### tags: `基因體`, `SNP`, `Variant`, `dbSNP`, <br> [TOC] <br> ## dbsnp - subdatabases > 子資料庫類型 - [[UCSC] Short Genetic Variants from dbSNP release 155](https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=dbSnp155Composite) - **All dbSNP (155)** (==全部 dbSNP==) the entire set (1.02 billion for hg19, 1.06 billion for hg38) - **Common dbSNP (155)** (==全部 dbSNP + 條件:MAF>=1%==) approximately 15 million variants with a minor allele frequency (MAF) of at least 1% (0.01) in the 1000 Genomes Phase 3 dataset. Variants in the Mult. subset (below) are excluded. - **ClinVar dbSNP (155)** (==有被 ClinVar 參考到的 dbSNP==) approximately 820,000 variants mentioned in ClinVar. Note: that includes both benign and pathogenic (as well as uncertain) variants. Variants in the Mult. subset (below) are excluded. - **Mult. dbSNP (155)** (==該序列有被回貼到不同的染色體==) variants that have been mapped to multiple chromosomes <br> <hr> <br> ## Common - [Annotating Variant Calls with Information from Databases](https://docs.nvidia.com/clara/parabricks/3.8.0/How-Tos/WholeGenomeGermlineSmallVariants.html#annotating-variant-calls-with-information-from-databases) ```bash ## Download the 1000Genomes VCF (and .tbi) on hg38 $ wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20181203_biallelic_SNV/ALL.wgs.shapeit2_integrated_v1a.GRCh38.20181129.sites.vcf.gz $ wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20181203_biallelic_SNV/ALL.wgs.shapeit2_integrated_v1a.GRCh38.20181129.sites.vcf.gz.tbi ## Download the gnomad VCF (and .tbi). We'll use v2.1 on GRCh38 $ wget https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/liftover_grch38/vcf/exomes/gnomad.exomes.r2.1.1.sites.liftover_grch38.vcf.bgz $ wget https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/liftover_grch38/vcf/exomes/gnomad.exomes.r2.1.1.sites.liftover_grch38.vcf.bgz.tbi ## Download the ClinVar VCF and index $ wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz $ wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi ## Download the dbSNP VCF and index. Make sure to use the proper version; ## some versions of dbSNP use alternative naming conventions for contig names ## that will prevent proper annotation. $ wget https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/00-All.vcf.gz $ wget https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/00-All.vcf.gz.tbi ``` - **1000Genomes**: A database of variants called in 2,504 individuals from healthy individuals - **ClinVar**: An NCBI database of variants associated with disease - **gnomAD**: The Genome Aggregation Database, containing population variants from 125,748 exomes and 15,708 whole genomes - [[Google][Browser] gnomad.exomes.r2.1.1.sites.liftover_grch38.vcf.bgz](https://console.cloud.google.com/storage/browser/_details/gcp-public-data--gnomad/release/2.1.1/liftover_grch38/vcf/exomes/gnomad.exomes.r2.1.1.sites.liftover_grch38.vcf.bgz) - **COSMIC**: Coding and non-coding variants from the Catalogue of Somatic Mutations in Cancer - **dbSNP**: SNPs, MNPs and indels for both common and clinical mutations <br> <hr> <br> ## Others ### [[UCSC] Short Genetic Variants from dbSNP release 155](https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=dbSnp155Composite) - [1000Genomes](https://www.internationalgenome.org/): The 1000 Genomes dataset contains data for 2,504 individuals from 26 populations. - [dbGaP_PopFreq](https://www.ncbi.nlm.nih.gov/snp/docs/gsr/alfa/): The new source of dbGaP aggregated frequency data (&gt;1 Million Subjects) provided by dbSNP. - [TOPMED](https://www.nhlbiwgs.org/): The TOPMED dataset contains freeze 8 panel that includes about 158,000 individuals. The approximate ethnic breakdown is European(41%), African (31%), Hispanic or Latino (15%), East Asian (9%), and unknown (4%) ancestry. - [KOREAN](https://academic.oup.com/database/article/doi/10.1093/database/baz146/5775747): The Korean Reference Genome Database contains data for 1,465 Korean individuals. - [SGDP_PRJ](https://www.simonsfoundation.org/simons-genome-diversity-project/): The Simons Genome Diversity Project dataset contains 263 C-panel fully public samples and 16 B-panel fully public samples for a total of 279 samples. - [Qatari](https://geneticmedicine.weill.cornell.edu/research/population-genetics): The dataset contains initial mappings of the genomes of more than 1,000 Qatari nationals. - [NorthernSweden](https://swefreq.nbis.se/dataset/SweGen): The dataset contains 300 whole-genome sequenced human samples from the county of Vasterbotten in northern Sweden. - [Siberian](https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA267856): The dataset contains paired-end whole-genome sequencing data of 28 modern-day humans from Siberia and Western Russia. - [TWINSUK](https://twinsuk.ac.uk/): The UK10K - TwinsUK project contains 1854 samples from the Department of Twin Research and Genetic Epidemiology (DTR). The dataset contains data obtained from the 11,000 identical and non-identical twins between the ages of 16 and 85 years old. - [TOMMO](https://jmorp.megabank.tohoku.ac.jp/201905/downloads/): The Tohoku Medical Megabank Project contains an allele frequency panel of 3552 Japanese individuals, including the X chromosome. - [ALSPAC](https://www.bristol.ac.uk/alspac/): The UK10K - Avon Longitudinal Study of Parents and Children project contains 1927 sample including individuals obtained from the ALSPAC population. This population contains more than 14,000 mothers enrolled during pregnancy in 1991 and 1992. - [GENOME_DK](https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB19794): The dataset contains the sequencing of Danish parent-offspring trios to determine genomic variation within the Danish population. - [GnomAD](https://gnomad.broadinstitute.org/): The gnomAD genome dataset includes a catalog containing 602M SNVs and 105M indels based on the whole-genome sequencing of 71,702 samples mapped to the GRCh38 build of the human reference genome. - [GoNL](https://www.rug.nl/research/genetics/databases/genomeofthenetherlands/): The Genome of the Netherlands (GoNL) Project characterizes DNA sequence variation, common and rare, for SNVs and short insertions and deletions (indels) and large deletions in 769 individuals of Dutch ancestry selected from five biobanks under the auspices of the Dutch hub of the Biobanking and Biomolecular Research Infrastructure (BBMRI-NL). - [Estonian](https://www.geenivaramu.ee/en): The dataset contains genetic variation in the Estonian population: pharmacogenomics study of adverse drug effects using electronic health records. - [Vietnamese](http://genomes.vn/): The Kinh Vietnamese database contains 24.81 million variants (22.47 million single nucleotide polymorphisms (SNPs) and 2.34 million indels), of which 0.71 million variants are novel. - [Korea1K](https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA609628): The dataset contains 1,094 Korean personal genomes with clinical information. - [HapMap](https://hapmap.ncbi.nlm.nih.gov/): (HapMap is being retired.) The International HapMap Project contains samples from African, Asian, or European populations. - [PRJEB36033](https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB36033): The dataset contains ancient Sardinia genome-wide 1240k capture data from 70 ancient Sardinians. - [HGDP_Stanford](https://www.hagsc.org/hgdp/): The Stanford HGDP SNP genotyping data consists of ~660,918 tag SNPs in autosomes, chromosome X and Y, the pseudoautosomal region, and mitochondrial DNA, typed across 1043 individuals from all panel populations. - [Daghestan](https://www.ncbi.nlm.nih.gov/bioproject/576826): The dataset contains genotypes of &gt;550 000 autosomal single-nucleotide polymorphisms (SNPs) in a set of 14 population isolates speaking Nakh-Daghestanian (ND) languages. - [PAGE_STUDY](https://www.pagestudy.org/): The PAGE Study: How Genetic Diversity Improves Our Understanding of the Architecture of Complex Traits. - [Chileans](https://www.ncbi.nlm.nih.gov/bioproject/577585): The dataset consists of genetic variation on the Chileans using genotype data on ~685,944 SNPs from 313 individuals across the whole-continental country. - [MGP](https://www.clinbioinfosspa.es/content/medical-genome-project): MGP contains aggregated information on 267 healthy individuals, representative of the Spanish population that were used as controls in the MGP (Medical Genome Project). - [PRJEB37584](https://www.ncbi.nlm.nih.gov/bioproject/PRJEB37584): The dataset contains genome-wide genotype analysis that identified copy number variations in cranial meningiomas in Chinese patients, and demonstrated diverse CNV burdens among individuals with diverse clinical features. - [GoESP (Go-ESP)](https://esp.gs.washington.edu/): The NHLBI Grand Opportunity Exome Sequencing Project (GO-ESP) dataset contains 6503 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data. - [ExAC](https://exac.broadinstitute.org): The Exome Aggregation Consortium (ExAC) dataset contains 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies. Individuals affected by severe pediatric disease have been removed. - [GnomAD_exomes](https://gnomad.broadinstitute.org/): The gnomAD v2.1 exome dataset comprises a total of 16 million SNVs and 1.2 million indels from 125,748 exomes in 14 populations. - [FINRISK](https://thl.fi/en/web/thlfi-en/research-and-development/research-and-projects/the-national-finrisk-study): The FINRISK cohorts comprise the respondents of representative, cross-sectional population surveys that are carried out every 5 years since 1972, to assess the risk factors of chronic diseases (e.g. CVD, diabetes, obesity, cancer) and health behavior in the working age population. - [PharmGKB](https://www.pharmgkb.org): The dataset contains aggregated frequency data for all PharmGKB submissions. - [PRJEB37766](https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB37766): The Mexican Genomic Database for Addiction Research. <br> ### Misc > 其他地方提到的資料庫 - ### [The Gene Partnership (TGP)](https://ega-archive.org/studies/phs000495) > 提到來源:1000genomes [Google翻譯] The Gene Partnership (TGP) 是波士頓兒童醫院 (BCH) 的一項前瞻性縱向登記,旨在研究遺傳和環境對兒童健康和疾病的影響,收集大量已進行表型分析的兒童的遺傳信息,並實施知情隊列和知情隊列監督委員會 (ICOB)。術語“基因夥伴關係“反映了研究人員和參與者之間的合作關係。在生物安全信息交換所看到的兒童以及他們的父母和兄弟姐妹都可以註冊。所有註冊者的 DNA 都被收集。生物安全信息交換所擁有全面的電子病歷系統,幾乎所有住院和門診病人的數據都是以電子方式獲取的。臨床BCH EMR 中的數據加載到可供調查人員使用的 i2b2 數據倉庫中。使用 i2b2 數據庫確定病例、表型和協變量。TGP 中 BCH 的參與者已同意接收任何研究結果和/或出現的偶然發現來自使用經知情隊列監督委員會 (ICOB) 批准並符合 ... - ### [Database of Pathogenic Variants](https://dpv.cmg.med.keio.ac.jp/dpv-pub/top) 致病資料庫 - [9856](https://dpv.cmg.med.keio.ac.jp/dpv-pub/variants/9856) - ### [[台基盟生技] Congenica所使用的參考資料庫, p4](https://cghdpt.cgmh.org.tw/files/news/40d24ba6-ee49-4109-8aa0-c3c8d105d7ca.pdf) • 1000 Genomes 1000人基因體計畫 • ACMG/AMP 美國醫學遺傳學暨基因體學學會/美國分子病理學學會 • ClinVar 疾病與基因檢測資料庫 (20210201 version) • dbSNP 單核苷酸多態性資料 • GRCh38 人類染色體第38版本 • gnomAD_genome EAS 全基因體數據的突變位點頻率數據庫東亞族群 • PolyPhen2 蛋白質序列變異與功能相關性預測軟體(VEP90 (GRCh38)) • SIFT 非同義基因變異預測軟體(VEP90 (GRCh38)) • HPO 人類表型功能分類資料庫(2020-12-07) • VEP 基因變異註釋軟體(v.81) Ensembl 基因體註釋資料庫(v.81) • DecipherCNV 分子細胞學資料庫(SNV, 2020-12) • Decipher Decipher SNV 2021-01 • Mastermind CVR-2 綜合型參考文獻資料庫(2021-01) • Exomiser 基因變異評分排序系統(12.1.0) <br> ### [NCBI's interactions with Locus-Specific Data Bases (LSDB)](https://www.ncbi.nlm.nih.gov/refseq/rsg/lsdb/) - dbGaP , database of Genotypes and Phenotypes. (基因型和表現型資料庫) - dbSNP , database of short genetic variation. (短遺傳變異資料庫) - dbVar , database of genomic structural variation. (基因體結構變異資料庫) - ClinVar , database of medically-related variation. (醫學相關變異資料庫) - OMIM , Online Mendelian Inheritance in Man. (線上人類孟德爾遺傳學) - RefSeqGene , this site. <br> ### ESP6500 - [遗传资源数据库专题-ESP6500](https://posts.careerengine.us/p/607fd7f58230c505cfdf15ac)