三級分析 / vcfanno === ###### tags: `基因體/三級分析` ###### tags: `生物資訊`, `基因體`, `三級分析`, `vcfanno` <br> [TOC] <br> ## 簡介 | Q | A | |---|---| | 用途 | 給定變異資料庫,對 VCF 檔進行註釋 | | 版權 | MIT | | github | https://github.com/brentp/vcfanno | | doc | http://brentp.github.io/vcfanno/ | | 論文 | https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5 | <br> ## 運作方式 - ### 註釋概念 [![](https://i.imgur.com/1ftgxxu.png)](https://i.imgur.com/1ftgxxu.png) - ### vcfanno 運作 [![](https://i.imgur.com/3sljNXk.png)](https://i.imgur.com/3sljNXk.png) - B 部份 - 設定註釋的來源 - 對每個來源,挑選想要的欄位 - 對每個欄位,選擇聚合方式 | op | 意義 | |----|-----| | self | 直接填入原值 | | mean | 計算這個欄位的平均 | | max | 統計這個欄位的最大值 | | min | 統計這個欄位的最小值 | | sum | 計算這個欄位的加總 | | ... | ... | | lua:$lua | 自定義函數 | <br> ## 安裝 - 直接下載 binary - https://github.com/brentp/vcfanno/releases/ - [vcfanno_linux64](https://github.com/brentp/vcfanno/releases/download/v0.3.5/vcfanno_linux64) (8.93 MB) - 透過 conda ```bash #$ conda install vcfanno $ conda install -c bioconda vcfanno -y ``` <br> ## 執行方式 ### 準備 `conf.toml` ```toml [[annotation]] file="/workspace/datasets/variants/ncbi/clinvar/clinvar.vcf.gz" # ID and FILTER are special fields that pull the ID and FILTER columns from the VCF fields=["CLNHGVS", "GENEINFO", "MC"] ops=["self", "self", "self"] names=["clinvar_CLNHGVS", "clinvar_GENEINFO", "clinvar_MC"] [[annotation]] file="/workspace/datasets/variants/1000genomes/ALL.wgs.shapeit2_integrated_v1a.GRCh38.20181129.sites.vcf.gz" # ID and FILTER are special fields that pull the ID and FILTER columns from the VCF fields=["EAS_AF"] ops=["self"] ``` ### 進行註釋 ``` $ vcfanno conf.toml output.vcf > annotated.vcf ``` <br> ## 註釋前後對照 - ### 原始 VCF: <span style="font-family: courier new">chr1 942451 . T C 472.06 . AC=2;AF=1.00;AN=2;DP=16;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.50;SOR=1.609 GT:AD:DP:GQ:PL 1/1:0,16:16:48:486,48,0 </span> - ### 透過 vcfanno 註釋: <span style="font-family: courier new">chr1 942451 . T C 472.1 . AC=2;AF=1.00;AN=2;DP=16;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.50;SOR=1.609;<span style="background: rgb(255 240 128);">clinvar_CLNHGVS=NC_000001.11:g.942451T>C;clinvar_GENEINFO=SAMD11:148398;clinvar_MC=SO:0001583|missense_variant;</span><span style="background: rgb(200 255 240);">1000g_EAS_AF=1</span> GT:AD:DP:GQ:PL 1/1:0,16:16:48:486,48,0</span> - ### 透過 pbrun vcfanno 註釋: <span style="font-family: courier new">chr1 942451 . T C 472.1 . AC=2;AF=1;AN=2;DP=16;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=29.5;SOR=1.609;==clinvar_ALLELEID=1153723;clinvar_CLNDN=not_provided;clinvar_CLNDISDB=MedGen:CN517202;clinvar_CLNHGVS=NC_000001.11:g.942451T>C;clinvar_CLNREVSTAT=criteria_provided,_single_submitter;clinvar_CLNSIG=Benign;clinvar_CLNVC=single_nucleotide_variant;clinvar_CLNVCSO=SO:0001483;clinvar_GENEINFO=SAMD11:148398;clinvar_MC=SO:0001583|missense_variant;clinvar_ORIGIN=1== GT:AD:DP:GQ:PL 1/1:0,16:16:48:486,48,0</span> <br> ## 進階參數 - [Performance Tips](http://brentp.github.io/vcfanno/performance_tips/) ``` GOGC=2000 vcfanno -p 12 a.conf a.vcf ``` <br> ## 參考序列的限制 ### vcfanno 只支援染色體名稱為 `chr1` or `1` - [inconsistent chromosome labeling (“chr1” versus “1”)](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5) ![](https://i.imgur.com/xtEm6Wh.png) - 使用轉換方式 - [ChromosomeMappings](https://github.com/dpryan79/ChromosomeMappings) - [GRCh37_NCBI2UCSC.txt](https://github.com/dpryan79/ChromosomeMappings/blob/master/GRCh37_NCBI2UCSC.txt) - [VCF: Replacing RefSeq ID to chr in #CHROM](https://www.biostars.org/p/410789/) <br> ## 測試集 ### HG002 - HG002.novaseq.wes_agilent.100x.R1.fastq.gz - 35111 - 22(in head) - HG002.hiseqx.pcr-free.30x.R1.fastq.gz - 40598 - 22(in head)