三級分析 / vcfanno
===
###### tags: `基因體/三級分析`
###### tags: `生物資訊`, `基因體`, `三級分析`, `vcfanno`
<br>
[TOC]
<br>
## 簡介
| Q | A |
|---|---|
| 用途 | 給定變異資料庫,對 VCF 檔進行註釋 |
| 版權 | MIT |
| github | https://github.com/brentp/vcfanno |
| doc | http://brentp.github.io/vcfanno/ |
| 論文 | https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5 |
<br>
## 運作方式
- ### 註釋概念
[](https://i.imgur.com/1ftgxxu.png)
- ### vcfanno 運作
[](https://i.imgur.com/3sljNXk.png)
- B 部份
- 設定註釋的來源
- 對每個來源,挑選想要的欄位
- 對每個欄位,選擇聚合方式
| op | 意義 |
|----|-----|
| self | 直接填入原值 |
| mean | 計算這個欄位的平均 |
| max | 統計這個欄位的最大值 |
| min | 統計這個欄位的最小值 |
| sum | 計算這個欄位的加總 |
| ... | ... |
| lua:$lua | 自定義函數 |
<br>
## 安裝
- 直接下載 binary
- https://github.com/brentp/vcfanno/releases/
- [vcfanno_linux64](https://github.com/brentp/vcfanno/releases/download/v0.3.5/vcfanno_linux64) (8.93 MB)
- 透過 conda
```bash
#$ conda install vcfanno
$ conda install -c bioconda vcfanno -y
```
<br>
## 執行方式
### 準備 `conf.toml`
```toml
[[annotation]]
file="/workspace/datasets/variants/ncbi/clinvar/clinvar.vcf.gz"
# ID and FILTER are special fields that pull the ID and FILTER columns from the VCF
fields=["CLNHGVS", "GENEINFO", "MC"]
ops=["self", "self", "self"]
names=["clinvar_CLNHGVS", "clinvar_GENEINFO", "clinvar_MC"]
[[annotation]]
file="/workspace/datasets/variants/1000genomes/ALL.wgs.shapeit2_integrated_v1a.GRCh38.20181129.sites.vcf.gz"
# ID and FILTER are special fields that pull the ID and FILTER columns from the VCF
fields=["EAS_AF"]
ops=["self"]
```
### 進行註釋
```
$ vcfanno conf.toml output.vcf > annotated.vcf
```
<br>
## 註釋前後對照
- ### 原始 VCF:
<span style="font-family: courier new">chr1 942451 . T C 472.06 . AC=2;AF=1.00;AN=2;DP=16;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.50;SOR=1.609 GT:AD:DP:GQ:PL 1/1:0,16:16:48:486,48,0
</span>
- ### 透過 vcfanno 註釋:
<span style="font-family: courier new">chr1 942451 . T C 472.1 . AC=2;AF=1.00;AN=2;DP=16;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.50;SOR=1.609;<span style="background: rgb(255 240 128);">clinvar_CLNHGVS=NC_000001.11:g.942451T>C;clinvar_GENEINFO=SAMD11:148398;clinvar_MC=SO:0001583|missense_variant;</span><span style="background: rgb(200 255 240);">1000g_EAS_AF=1</span> GT:AD:DP:GQ:PL 1/1:0,16:16:48:486,48,0</span>
- ### 透過 pbrun vcfanno 註釋:
<span style="font-family: courier new">chr1 942451 . T C 472.1 . AC=2;AF=1;AN=2;DP=16;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=29.5;SOR=1.609;==clinvar_ALLELEID=1153723;clinvar_CLNDN=not_provided;clinvar_CLNDISDB=MedGen:CN517202;clinvar_CLNHGVS=NC_000001.11:g.942451T>C;clinvar_CLNREVSTAT=criteria_provided,_single_submitter;clinvar_CLNSIG=Benign;clinvar_CLNVC=single_nucleotide_variant;clinvar_CLNVCSO=SO:0001483;clinvar_GENEINFO=SAMD11:148398;clinvar_MC=SO:0001583|missense_variant;clinvar_ORIGIN=1== GT:AD:DP:GQ:PL 1/1:0,16:16:48:486,48,0</span>
<br>
## 進階參數
- [Performance Tips](http://brentp.github.io/vcfanno/performance_tips/)
```
GOGC=2000 vcfanno -p 12 a.conf a.vcf
```
<br>
## 參考序列的限制
### vcfanno 只支援染色體名稱為 `chr1` or `1`
- [inconsistent chromosome labeling (“chr1” versus “1”)](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5)

- 使用轉換方式
- [ChromosomeMappings](https://github.com/dpryan79/ChromosomeMappings)
- [GRCh37_NCBI2UCSC.txt](https://github.com/dpryan79/ChromosomeMappings/blob/master/GRCh37_NCBI2UCSC.txt)
- [VCF: Replacing RefSeq ID to chr in #CHROM](https://www.biostars.org/p/410789/)
<br>
## 測試集
### HG002
- HG002.novaseq.wes_agilent.100x.R1.fastq.gz
- 35111 - 22(in head)
- HG002.hiseqx.pcr-free.30x.R1.fastq.gz
- 40598 - 22(in head)