## 變異命名學 (Sequence Variant Nomenclature):
即為將變異 (variant) 命名之學問,人類基因組變異協會(HGVS:Human Genome Variation Society)規則是目前學術界所公認的命名規則[[1]](https://kknews.cc/news/53oeqvk.html)。
---
## Level
根據分子生物學的[中心法則](https://zh.wikipedia.org/zh-tw/%E4%B8%AD%E5%BF%83%E6%B3%95%E5%89%87),即<font color="red">「DNA → RNA →蛋白質」</font>的過程,從不同的維度(level)出發,相同的基因突變可以有多種不同的表現形式,例如,參考序列的不同、表現層次的不同(DNA、RNA或者蛋白質)都會導致突變的表現方式產生差異。
目前,通用的參考序列主要包括:==基因組參考序列(以前綴「g.」表示)==、==cDNA參考序列(以前綴「c.」表示)==、非編碼DNA參考序列 (以前綴「n.」表示)、RNA參考序列(以前綴「r.」表示)、==蛋白質參考序列(以前綴「p.」表示)==。
日常應用中又以==g.==、==c==. 及==p.== 這三個前綴最為常見。
### genomic reference (g.)
即將DNA上每個鹼基依照各個chromosome的p-arm(短臂)至q-arm(長臂)線性編號,可能因為對照的reference不同而改變,例如同一個變異在human genome hg19 和 hg38的座標可能會不一樣。
### coding sequence (c.)
依據各個基因的coding sequence編號,轉錄起點(start codon)與(stop codon)終點即為編號起點與終點。一個基因可能具有多個coding transcript template,因此即便是同一個變異,coding sequence編號也有可能不同。
### protein sequence (p.)
每個基因的coding sequence皆可以依據[遺傳編碼](https://zh.wikipedia.org/zh-hant/%E9%81%97%E4%BC%A0%E5%AF%86%E7%A0%81)做出相對的蛋白質序列,
因此若一個基因有多個coding transcript template,就可能做出不同的蛋白質,protein sequence的編號也會因此改變。
---
## Numbering
下圖為HGVS針對genomic reference(g.)、coding sequence(c.)及protein sequence(p.)三個維度,以及不同區域(exon、intron、UTR)的編號規則所做的簡單示意[[2]](https://varnomen.hgvs.org/bg-material/numbering/)。

### Exonic
撇除其他區域,以exon為例,上圖紅框標示之變異依據不同的維度命名分別為:
:::info
* 基因組參考序列之變異命名為 **g.306C>A**。
* cDNA參考序列之變異命名為 **c.6C>A**。
* 蛋白質參考序列之變異命名為 **p.D2E**。
:::
實際的變異命名除了遵循上述規則外,還附加了諸如染色體編號、cDNA參考序列版本及蛋白質序列版本等資訊。
以癌症常見的**BRAF V600E**為例:
:::info
* 基因組參考序列之變異命名: **NC_000007.13:g.140453136A>T**
> 第七號染色體位置140453136的核甘酸A被置換成T。(GRCh37)
* cDNA參考序列之變異命名: **NM_004333.6(BRAF):c.1799T>A**
> BRAF的coding transcript *NM004333.6* coding sequence位置1799的核甘酸T被置換成A。
* 蛋白質參考序列之變異命名: **NP_004324.2:p.V600E**
> BRAF的protein template *NP004324.2* 在protien sequence位置600的胺基酸V (Valine,纈氨酸) 被置換成E (Glutamate,穀氨酸)。
:::
### Introic
>* nucleotides at the 5’ end of an intron are numbered relative to the last nucleotide of the directly upstream exon, followed by a “+” (plus) and their position in to the intron, like c.87+1, c.87+2, c.87+3, …
>* nucleotides at the 3’ end of an intron are numbered relative to the first nucleotide of the directly downstream exon, followed by a “-” (minus) and their position out of the intron, like …, c.88-3, c.88-2, c.88-1.
>
若變異發生在intron時,則依據其與附近的exon之相對位置命名,靠近5'者以"+"表示;靠近3'者以"-"表示。
以上圖綠框及藍框所標示之變異為例,
* 綠框之變異依據不同的維度命名分別為:
>g.315G>C,c.12+3G>C,p.?
* 藍框之變異依據不同的維度命名分別為:
>g.411A>G,c.13-2A>G,p.?
若位置非常靠近exon時(常見的定義為±3),有可能會影響RNA splincing,通常會將這些區域另外稱作splice site。
### UTR (untranlated region)
> * nucleotides upstream (5’) of the ATG-translation initiation codon (start) are marked with a “-” (minus) and numbered c.-1, c.-2, c.-3, etc. (i.e. going further upstream)
>* nucleotides downstream (3’) of the translation termination codon (stop) are marked with a “*” (asterisk) and numbered c.*1, c.*2, c.*3, etc. (i.e. going further downstream)
>* there is no nucleotide c.0.
若變異發生在UTR時,則依據其與轉錄起點及轉錄終點之相對位置命名,位於5'UTR者以"-"表示;位於3'UTR者以"*"表示。
以上圖橘框及紫框所標示之變異為例,
* 橘框之變異依據不同的維度命名分別為:
> g.299C>G,c.-2C>G,p.?
* 紫框之變異依據不同的維度命名分別為
> g.1633A>T,c.*3A>T,p.?
位於intron及UTR的變異對於蛋白質序列的影響無法透過參考序列的得知,通常蛋白質參考序列之變異命名會以 "*?*" 來表示。
---
## Variant type (DNA level)
依據不同的變異類型,如Substitution、Deletion、Insertion、Duplication、Deletion-insertion等,有不同的命名規則。如下圖所示:

### Substitution
> * Definition: a sequence change where, compared to a reference sequence, ==one nucleotide is replaced by one other nucleotide==.
> * Format: "prefix" "position_substituted" "reference_nucleotide" ">" "new_nucleotide”, e.g. g.123A>G, <font color="red">c.6T>C</font>
### Deletion
> * Definition: a sequence change where, compared to a reference sequence, ==one or more nucleotides are not present (deleted)==.
> * Format: “prefix” “position(s)_deleted” “del”, e.g. g.123_127del, <font color="red">c.4del</font>
### Insertion
> * Definition: a sequence change where, compared to the reference sequence, ==one or more nucleotides are inserted and where the insertion is not a copy of a sequence== immediately 5
> * Format: “prefix”“positions_flanking”“ins”“inserted_sequence”, e.g. g.123_124insAGC, <font color="red">c.3_4insC</font>
### Duplication
> * Definition: a sequence change where, compared to a reference sequence, ==a copy of one or more nucleotides are inserted directly 3' of the original copy of that sequence==.
> * Format: “prefix”“position(s)_duplicated”“dup”, e.g. g.123_345dup, <font color="red">c.6dup</font>
### Deletion-insertion
> * Definition: a sequence change where, compared to a reference sequence, ==one or more nucleotides are replaced by one or more other nucleotides== and which is not a substitution, inversion or conversion.
> * Format: “prefix”“position(s)_deleted”“delins”“inserted_sequence”, e.g. g.123_127delinsAG, <font color="red">c.4delinsAC</font>
## Variant type (Protein level)
依據不同的變異類型,如Substitution、Deletion、Insertion、Duplication、Frameshift等,有不同的命名規則。如下圖所示:

### Substitution
> * Definition: a sequence change where, compared to a reference sequence, ==one amino acid is replaced by one other amino acid==.
> * Format: “prefix”“amino_acid”“position”“new_amino_acid”, e.g. p.Arg54Ser
>### Missense
>> p.D3E / p.Asp3Glu
>### Nonsense
>> p.Q4X / p.Q4* / p.Gln4Ter
>### Synonymous
>> p.D3= / p.Asp3=
### Non-frameshit deletion/duplication/insertion/deletion-insertion
> * Definition: a sequence change between the translation initiation (start) and termination (stop) codon where, compared to a reference sequence, one or more amino acids are deleted/duplicated/inserted.
> * Format: “prefix”“amino_acid(s)+position(s)_deleted”“del”, e.g. p.Cys76_Glu79del
> Format: “prefix”“amino_acid(s)+position(s)_duplicated”“dup”, e.g. p.Cys76_Glu79dup
> * Format: “prefix”“amino_acids+positions_flanking”“ins”“inserted_sequence”, e.g. p.Lys23_Leu24insArgSerGln
> * Format: “prefix”“amino_acid(s)+position(s)_deleted”“delins”“inserted_sequence”, e.g. p.Arg123_Lys127delinsSerAsp
> * AKA ==in-frame== deletion/duplication/insertion/deletion-insertion, which doesn't lead to the shifting of reading frame.
### Frameshift insertion/deletion/duplication
> * Definition: a sequence change between the translation initiation (start) and termination (stop) codon where, compared to a reference sequence, translation shifts to another reading frame.
> * Format: “prefix”“amino_acid”position”new_amino_acid”“fs”“Ter”“position_termination_site”, e.g. p.(Arg123LysfsTer34)
> * The shortest frame shift variant possible contains “fsTer2”
> * Example: p.Arg97ProfsTer23 (short p.Arg97fs) / p.Arg97Profs*23:
a variant with Arg97 as the first amino acid changed, shifting the reading frame, replacing it for a Pro and terminating at position Ter23.
### Alleles
以中括號表示in-cis或in-trans
> NM_004006.2:c.[2376G>C;3103del] => in-cis
> NP_003997.1:p.[Ser68Arg;Asn594del] => in-cis
> NM_004006.2:c.[2376G>C];[3103del] => in-trans
> NP_003997.1:p.[Ser68Arg];[Asn594del] => in-trans
以==小括號==表示==未經實驗證實==的protein變化
> Predicted consequences, i.e. without experimental evidence (no RNA or protein sequence analysed), should be given in parentheses inside the square brackets.
> NP_003997.1:p.[(Ser68Arg;Asn594del)] => in-cis, predicted consequence
> NP_003997.1:p.[(Ser68Arg)];[(Asn594del)] => in-trans, predicted consequence
### 3 prime rule

https://www.sophiagenetics.com/science-hub/hgvs-nomenclature/
### Reference:
https://kknews.cc/news/53oeqvk.html
https://varnomen.hgvs.org/
https://onlinelibrary.wiley.com/doi/epdf/10.1002/%28SICI%291098-1004%28200001%2915%3A1%3C7%3A%3AAID-HUMU4%3E3.0.CO%3B2-N
## supplement
1. 標準序列(reference)的編碼方向(左到右,短到長,p到q)為5'到3'
2. 標準序列為正股(plus strand)
3. 若轉錄序列(coding sequence)與標準序列相同,模板股為反股(minus strand)
4. 若轉錄序列與標準序列互補,模板股為正股
5. 轉錄方向為合成股(i.e. 轉錄序列, coding sequence)的5'到3'
6. 轉錄序列的5'為上游,3'為下游
###### tags: `genomics`