1000genome
===
###### tags: `基因體/三級分析/資料庫`
###### tags: `基因體`, `SNP`, `dbSNP`, `Variant`
<br>
[TOC]
<br>
## diff(no dbsnp)
> 透過 pbrun vcfanno 標注
> from /workspace/datasets/variants/1000genomes/ALL.wgs.shapeit2_integrated_v1a.GRCh38.20181129.sites.vcf.gz
### screenshot
> 順序有調過,以便於 diff
>
[](https://i.imgur.com/ROsaUB6.png)
<br>
### 原始 HEADER 資訊
- ### FILTER 欄位
| ID | Description |
|----|-------------|
| LowQual | Low quality |
- ### FORMAT 欄位
| ID | Number | Type | Description |
|----|--------|------|-------------|
| AC | A | Integer | Allele count in genotypes, for each ALT allele, in the same order as listed |
| AF | A | Float | Allele Frequency, for each ALT allele, in the same order as listed |
| AN | 1 | Integer | Total number of alleles in called genotypes |
| BaseQRankSum | 1 | Float | Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities |
| DP | 1 | Integer | Approximate read depth; some reads may have been filtered |
| ExcessHet | 1 | Float | Phred-scaled p-value for exact test of excess heterozygosity |
| FS | 1 | Float | Phred-scaled p-value using Fisher's exact test to detect strand bias |
| InbreedingCoeff | 1 | Float | Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation |
| MLEAC | A | Integer | Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed |
| MLEAF | A | Float | Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed |
| MQ | 1 | Float | RMS Mapping Quality |
| MQRankSum | 1 | Float | Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities |
| QD | 1 | Float | Variant Confidence/Quality by Depth |
| ReadPosRankSum | 1 | Float | Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias |
| SOR | 1 | Float | Symmetric Odds Ratio of 2x2 contingency table to detect strand bias |
- ### INFO 欄位
| ID | Number | Type | Description |
|----|--------|------|-------------|
| AD | R | Integer | Allelic depths for the ref and alt alleles in the order listed |
| DP | 1 | Integer | Approximate read depth (reads with MQ=255 or with bad mates are filtered) |
| GQ | 1 | Integer | Genotype Quality |
| GT | 1 | String | Genotype |
| PL | G | Integer | Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification |
<br>
## 註釋欄位
> - 透過 pbrun vcfanno 標注
> - 檔案來源
> /workspace/datasets/variants/1000genomes/ALL.wgs.shapeit2_integrated_v1a.GRCh38.20181129.sites.vcf.gz
> - 版本
> ```
> ##fileformat=VCFv4.3
> ##FILTER=<ID=PASS,Description="All filters passed">
> ##fileDate=05122018_15h52m43s
> ##source=IGSRpipeline
> ##reference=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
> ```
- ### 新增 FILTER 欄位
| ID | Description |
|----|-------------|
| PASS | All filters passed |
- ### 新增 contig 欄位
| ID | length |
|----|--------|
| (略) | (略) |
- 範例:`##contig=<ID=chr1,length=248956422>`
- ### 新增 INFO 欄位 (`1000g_` 為自定義 prefix)
| ID | Number / Type<br>Description |
|----|------------------------------|
| 1000g_AC | A / Integer<br>Total number of alternate alleles in called genotypes |
| 1000g_AF | A / Float<br>Estimated allele frequency in the range (0,1) |
| 1000g_AFR_AF | A / Float<br>Allele frequency in the AFR populations calculated from AC and AN, in the range (0,1) |
| 1000g_AMR_AF | A / Float<br>Allele frequency in the AMR populations calculated from AC and AN, in the range (0,1) |
| 1000g_AN | 1 / Integer<br>Total number of alleles in called genotypes |
| 1000g_DP | 1 / Integer<br>Approximate read depth; some reads may have been filtered |
| <span style="color: red">1000g_EAS_AF</span> | A / Float<br>Allele frequency in the EAS populations calculated from AC and AN, in the range (0,1)<br>EAS, East Asian population, 東亞族群 |
| 1000g_EUR_AF | A / Float<br>Allele frequency in the EUR populations calculated from AC and AN, in the range (0,1)<br>EUR, European population, 歐洲族群 |
| 1000g_EX_TARGET | 0 / Flag<br>indicates whether a variant is within the exon pull down target boundaries<br>指出變異是否在外顯子下拉目標邊界內 |
| 1000g_NS | 1 / Integer<br>Number of samples with data |
| 1000g_SAS_AF | A / Float<br>Allele frequency in the SAS populations calculated from AC and AN, in the range (0,1) |
| 1000g_VT | . / String<br>indicates what type of variant the line represents |
- ### 新增其他資訊
- `##bcftools_normVersion=1.7+htslib-1.9`
- `##bcftools_normCommand=norm -m- -o 0f6003ef32484436467db03a203b30af74d60c4a.norm.vcf /workspace/datasets/germline/HG002/wes-output/output.vcf; Date=Thu Dec 29 13:50:51 2022`
<br>
## 標注範例
> chr1 942451 . T C 472.1 . AC=2;AF=1;AN=2;DP=16;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=29.5;SOR=1.609;1000g_AF=1;1000g_AC=5096;1000g_NS=2548;1000g_AN=5096;1000g_EAS_AF=1;1000g_EUR_AF=1;1000g_AFR_AF=1;1000g_AMR_AF=1;1000g_SAS_AF=1;1000g_VT=SNP;1000g_EX_TARGET;1000g_DP=15856 GT:AD:DP:GQ:PL 1/1:0,16:16:48:486,48,0
> `1000g_` 為自定義 prefix,例如 AC vs 1000g_AC
> - AC: 原始 VCF 資訊
> - 1000g_AC: 來自 1000genomes 資訊
| Column | value |
|--------|-------|
| #CHROM | chr1 |
| POS | 942451 |
| ID | . |
| REF | T |
| ALT | C |
| QUAL | 472.1 |
| FILTER | . |
| INFO | (見下表展開) |
| FORMAT | GT:AD:DP:GQ:PL |
| sample | 1/1:0,16:16:48:486,48,0 |
| INFO | 值 | 意義 |
|------|-------|---------|
| AC | 2 | Allele count in genotypes, for each ALT allele, in the same order as listed |
| AF | 1 | Allele Frequency, for each ALT allele, in the same order as listed |
| AN | 2 | Total number of alleles in called genotypes |
| DP | 16 | Approximate read depth; some reads may have been filtered |
| ExcessHet | 3.0103 | Phred-scaled p-value for exact test of excess heterozygosity |
| FS | 0 | Phred-scaled p-value using Fisher’s exact test to detect strand bias |
| MLEAC | 2 | Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed |
| MLEAF | 1 | Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed |
| MQ | 60 | RMS Mapping Quality |
| QD | 29.5 | Variant Confidence/Quality by Depth |
| SOR | 1.609 | Symmetric Odds Ratio of 2x2 contingency table to detect strand bias |
| 1000g_AF | 1 | Estimated allele frequency in the range (0,1) |
| 1000g_AC | 5096 | Total number of alternate alleles in called genotypes |
| 1000g_NS | 2548 | Number of samples with data |
| 1000g_AN | 5096 | Total number of alleles in called genotypes |
| 1000g_EAS_AF | 1 | Allele frequency in the EAS populations calculated from AC and AN, in the range (0,1) |
| 1000g_EUR_AF | 1 | Allele frequency in the EUR populations calculated from AC and AN, in the range (0,1) |
| 1000g_AFR_AF | 1 | Allele frequency in the AFR populations calculated from AC and AN, in the range (0,1) |
| 1000g_AMR_AF | 1 | Allele frequency in the AMR populations calculated from AC and AN, in the range (0,1) |
| 1000g_SAS_AF | 1 | Allele frequency in the SAS populations calculated from AC and AN, in the range (0,1) |
| 1000g_VT | SNP | indicates what type of variant the line represents |
| 1000g_EX_TARGET | | indicates whether a variant is within the exon pull down target boundaries |
| 1000g_DP | 15856 | Approximate read depth; some reads may have been filtered |
<br>
## 欄位詳細解釋
| INFO | 範例值<br>(型別) | 說明 |
|------|----------------|-----|
| 1000g_EX_TARGET | 無值 (flag) | indicates whether a variant is within the exon pull down target boundaries<br>指出變異是否在外顯子下拉目標邊界內 |