1000genome === ###### tags: `基因體/三級分析/資料庫` ###### tags: `基因體`, `SNP`, `dbSNP`, `Variant` <br> [TOC] <br> ## diff(no dbsnp) > 透過 pbrun vcfanno 標注 > from /workspace/datasets/variants/1000genomes/ALL.wgs.shapeit2_integrated_v1a.GRCh38.20181129.sites.vcf.gz ### screenshot > 順序有調過,以便於 diff > [![](https://i.imgur.com/ROsaUB6.png)](https://i.imgur.com/ROsaUB6.png) <br> ### 原始 HEADER 資訊 - ### FILTER 欄位 | ID | Description | |----|-------------| | LowQual | Low quality | - ### FORMAT 欄位 | ID | Number | Type | Description | |----|--------|------|-------------| | AC | A | Integer | Allele count in genotypes, for each ALT allele, in the same order as listed | | AF | A | Float | Allele Frequency, for each ALT allele, in the same order as listed | | AN | 1 | Integer | Total number of alleles in called genotypes | | BaseQRankSum | 1 | Float | Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities | | DP | 1 | Integer | Approximate read depth; some reads may have been filtered | | ExcessHet | 1 | Float | Phred-scaled p-value for exact test of excess heterozygosity | | FS | 1 | Float | Phred-scaled p-value using Fisher's exact test to detect strand bias | | InbreedingCoeff | 1 | Float | Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation | | MLEAC | A | Integer | Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed | | MLEAF | A | Float | Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed | | MQ | 1 | Float | RMS Mapping Quality | | MQRankSum | 1 | Float | Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities | | QD | 1 | Float | Variant Confidence/Quality by Depth | | ReadPosRankSum | 1 | Float | Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias | | SOR | 1 | Float | Symmetric Odds Ratio of 2x2 contingency table to detect strand bias | - ### INFO 欄位 | ID | Number | Type | Description | |----|--------|------|-------------| | AD | R | Integer | Allelic depths for the ref and alt alleles in the order listed | | DP | 1 | Integer | Approximate read depth (reads with MQ=255 or with bad mates are filtered) | | GQ | 1 | Integer | Genotype Quality | | GT | 1 | String | Genotype | | PL | G | Integer | Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification | <br> ## 註釋欄位 > - 透過 pbrun vcfanno 標注 > - 檔案來源 > /workspace/datasets/variants/1000genomes/ALL.wgs.shapeit2_integrated_v1a.GRCh38.20181129.sites.vcf.gz > - 版本 > ``` > ##fileformat=VCFv4.3 > ##FILTER=<ID=PASS,Description="All filters passed"> > ##fileDate=05122018_15h52m43s > ##source=IGSRpipeline > ##reference=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa > ``` - ### 新增 FILTER 欄位 | ID | Description | |----|-------------| | PASS | All filters passed | - ### 新增 contig 欄位 | ID | length | |----|--------| | (略) | (略) | - 範例:`##contig=<ID=chr1,length=248956422>` - ### 新增 INFO 欄位 (`1000g_` 為自定義 prefix) | ID | Number / Type<br>Description | |----|------------------------------| | 1000g_AC | A / Integer<br>Total number of alternate alleles in called genotypes | | 1000g_AF | A / Float<br>Estimated allele frequency in the range (0,1) | | 1000g_AFR_AF | A / Float<br>Allele frequency in the AFR populations calculated from AC and AN, in the range (0,1) | | 1000g_AMR_AF | A / Float<br>Allele frequency in the AMR populations calculated from AC and AN, in the range (0,1) | | 1000g_AN | 1 / Integer<br>Total number of alleles in called genotypes | | 1000g_DP | 1 / Integer<br>Approximate read depth; some reads may have been filtered | | <span style="color: red">1000g_EAS_AF</span> | A / Float<br>Allele frequency in the EAS populations calculated from AC and AN, in the range (0,1)<br>EAS, East Asian population, 東亞族群 | | 1000g_EUR_AF | A / Float<br>Allele frequency in the EUR populations calculated from AC and AN, in the range (0,1)<br>EUR, European population, 歐洲族群 | | 1000g_EX_TARGET | 0 / Flag<br>indicates whether a variant is within the exon pull down target boundaries<br>指出變異是否在外顯子下拉目標邊界內 | | 1000g_NS | 1 / Integer<br>Number of samples with data | | 1000g_SAS_AF | A / Float<br>Allele frequency in the SAS populations calculated from AC and AN, in the range (0,1) | | 1000g_VT | . / String<br>indicates what type of variant the line represents | - ### 新增其他資訊 - `##bcftools_normVersion=1.7+htslib-1.9` - `##bcftools_normCommand=norm -m- -o 0f6003ef32484436467db03a203b30af74d60c4a.norm.vcf /workspace/datasets/germline/HG002/wes-output/output.vcf; Date=Thu Dec 29 13:50:51 2022` <br> ## 標注範例 > chr1 942451 . T C 472.1 . AC=2;AF=1;AN=2;DP=16;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=29.5;SOR=1.609;1000g_AF=1;1000g_AC=5096;1000g_NS=2548;1000g_AN=5096;1000g_EAS_AF=1;1000g_EUR_AF=1;1000g_AFR_AF=1;1000g_AMR_AF=1;1000g_SAS_AF=1;1000g_VT=SNP;1000g_EX_TARGET;1000g_DP=15856 GT:AD:DP:GQ:PL 1/1:0,16:16:48:486,48,0 > `1000g_` 為自定義 prefix,例如 AC vs 1000g_AC > - AC: 原始 VCF 資訊 > - 1000g_AC: 來自 1000genomes 資訊 | Column | value | |--------|-------| | #CHROM | chr1 | | POS | 942451 | | ID | . | | REF | T | | ALT | C | | QUAL | 472.1 | | FILTER | . | | INFO | (見下表展開) | | FORMAT | GT:AD:DP:GQ:PL | | sample | 1/1:0,16:16:48:486,48,0 | | INFO | 值 | 意義 | |------|-------|---------| | AC | 2 | Allele count in genotypes, for each ALT allele, in the same order as listed | | AF | 1 | Allele Frequency, for each ALT allele, in the same order as listed | | AN | 2 | Total number of alleles in called genotypes | | DP | 16 | Approximate read depth; some reads may have been filtered | | ExcessHet | 3.0103 | Phred-scaled p-value for exact test of excess heterozygosity | | FS | 0 | Phred-scaled p-value using Fisher’s exact test to detect strand bias | | MLEAC | 2 | Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed | | MLEAF | 1 | Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed | | MQ | 60 | RMS Mapping Quality | | QD | 29.5 | Variant Confidence/Quality by Depth | | SOR | 1.609 | Symmetric Odds Ratio of 2x2 contingency table to detect strand bias | | 1000g_AF | 1 | Estimated allele frequency in the range (0,1) | | 1000g_AC | 5096 | Total number of alternate alleles in called genotypes | | 1000g_NS | 2548 | Number of samples with data | | 1000g_AN | 5096 | Total number of alleles in called genotypes | | 1000g_EAS_AF | 1 | Allele frequency in the EAS populations calculated from AC and AN, in the range (0,1) | | 1000g_EUR_AF | 1 | Allele frequency in the EUR populations calculated from AC and AN, in the range (0,1) | | 1000g_AFR_AF | 1 | Allele frequency in the AFR populations calculated from AC and AN, in the range (0,1) | | 1000g_AMR_AF | 1 | Allele frequency in the AMR populations calculated from AC and AN, in the range (0,1) | | 1000g_SAS_AF | 1 | Allele frequency in the SAS populations calculated from AC and AN, in the range (0,1) | | 1000g_VT | SNP | indicates what type of variant the line represents | | 1000g_EX_TARGET | | indicates whether a variant is within the exon pull down target boundaries | | 1000g_DP | 15856 | Approximate read depth; some reads may have been filtered | <br> ## 欄位詳細解釋 | INFO | 範例值<br>(型別) | 說明 | |------|----------------|-----| | 1000g_EX_TARGET | 無值 (flag) | indicates whether a variant is within the exon pull down target boundaries<br>指出變異是否在外顯子下拉目標邊界內 |