## 變異命名學 (Sequence Variant Nomenclature): 即為將變異 (variant) 命名之學問,人類基因組變異協會(HGVS:Human Genome Variation Society)規則是目前學術界所公認的命名規則[[1]](https://kknews.cc/news/53oeqvk.html)。 --- ## Level 根據分子生物學的[中心法則](https://zh.wikipedia.org/zh-tw/%E4%B8%AD%E5%BF%83%E6%B3%95%E5%89%87),即<font color="red">「DNA → RNA →蛋白質」</font>的過程,從不同的維度(level)出發,相同的基因突變可以有多種不同的表現形式,例如,參考序列的不同、表現層次的不同(DNA、RNA或者蛋白質)都會導致突變的表現方式產生差異。 目前,通用的參考序列主要包括:==基因組參考序列(以前綴「g.」表示)==、==cDNA參考序列(以前綴「c.」表示)==、非編碼DNA參考序列 (以前綴「n.」表示)、RNA參考序列(以前綴「r.」表示)、==蛋白質參考序列(以前綴「p.」表示)==。 日常應用中又以==g.==、==c==. 及==p.== 這三個前綴最為常見。 ### genomic reference (g.) 即將DNA上每個鹼基依照各個chromosome的p-arm(短臂)至q-arm(長臂)線性編號,可能因為對照的reference不同而改變,例如同一個變異在human genome hg19 和 hg38的座標可能會不一樣。 ### coding sequence (c.) 依據各個基因的coding sequence編號,轉錄起點(start codon)與(stop codon)終點即為編號起點與終點。一個基因可能具有多個coding transcript template,因此即便是同一個變異,coding sequence編號也有可能不同。 ### protein sequence (p.) 每個基因的coding sequence皆可以依據[遺傳編碼](https://zh.wikipedia.org/zh-hant/%E9%81%97%E4%BC%A0%E5%AF%86%E7%A0%81)做出相對的蛋白質序列, 因此若一個基因有多個coding transcript template,就可能做出不同的蛋白質,protein sequence的編號也會因此改變。 --- ## Numbering 下圖為HGVS針對genomic reference(g.)、coding sequence(c.)及protein sequence(p.)三個維度,以及不同區域(exon、intron、UTR)的編號規則所做的簡單示意[[2]](https://varnomen.hgvs.org/bg-material/numbering/)。 ![](https://i.imgur.com/qBEKm8q.png =150%x) ### Exonic 撇除其他區域,以exon為例,上圖紅框標示之變異依據不同的維度命名分別為: :::info * 基因組參考序列之變異命名為 **g.306C>A**。 * cDNA參考序列之變異命名為 **c.6C>A**。 * 蛋白質參考序列之變異命名為 **p.D2E**。 ::: 實際的變異命名除了遵循上述規則外,還附加了諸如染色體編號、cDNA參考序列版本及蛋白質序列版本等資訊。 以癌症常見的**BRAF V600E**為例: :::info * 基因組參考序列之變異命名: **NC_000007.13:g.140453136A>T** > 第七號染色體位置140453136的核甘酸A被置換成T。(GRCh37) * cDNA參考序列之變異命名: **NM_004333.6(BRAF):c.1799T>A** > BRAF的coding transcript *NM004333.6* coding sequence位置1799的核甘酸T被置換成A。 * 蛋白質參考序列之變異命名: **NP_004324.2:p.V600E** > BRAF的protein template *NP004324.2* 在protien sequence位置600的胺基酸V (Valine,纈氨酸) 被置換成E (Glutamate,穀氨酸)。 ::: ### Introic >* nucleotides at the 5’ end of an intron are numbered relative to the last nucleotide of the directly upstream exon, followed by a “+” (plus) and their position in to the intron, like c.87+1, c.87+2, c.87+3, … >* nucleotides at the 3’ end of an intron are numbered relative to the first nucleotide of the directly downstream exon, followed by a “-” (minus) and their position out of the intron, like …, c.88-3, c.88-2, c.88-1. > 若變異發生在intron時,則依據其與附近的exon之相對位置命名,靠近5'者以"+"表示;靠近3'者以"-"表示。 以上圖綠框及藍框所標示之變異為例, * 綠框之變異依據不同的維度命名分別為: >g.315G>C,c.12+3G>C,p.? * 藍框之變異依據不同的維度命名分別為: >g.411A>G,c.13-2A>G,p.? 若位置非常靠近exon時(常見的定義為±3),有可能會影響RNA splincing,通常會將這些區域另外稱作splice site。 ### UTR (untranlated region) > * nucleotides upstream (5’) of the ATG-translation initiation codon (start) are marked with a “-” (minus) and numbered c.-1, c.-2, c.-3, etc. (i.e. going further upstream) >* nucleotides downstream (3’) of the translation termination codon (stop) are marked with a “*” (asterisk) and numbered c.*1, c.*2, c.*3, etc. (i.e. going further downstream) >* there is no nucleotide c.0. 若變異發生在UTR時,則依據其與轉錄起點及轉錄終點之相對位置命名,位於5'UTR者以"-"表示;位於3'UTR者以"*"表示。 以上圖橘框及紫框所標示之變異為例, * 橘框之變異依據不同的維度命名分別為: > g.299C>G,c.-2C>G,p.? * 紫框之變異依據不同的維度命名分別為 > g.1633A>T,c.*3A>T,p.? 位於intron及UTR的變異對於蛋白質序列的影響無法透過參考序列的得知,通常蛋白質參考序列之變異命名會以 "*?*" 來表示。 --- ## Variant type (DNA level) 依據不同的變異類型,如Substitution、Deletion、Insertion、Duplication、Deletion-insertion等,有不同的命名規則。如下圖所示: ![](https://i.imgur.com/7KHAMjB.png =50%x) ### Substitution > * Definition: a sequence change where, compared to a reference sequence, ==one nucleotide is replaced by one other nucleotide==. > * Format: "prefix" "position_substituted" "reference_nucleotide" ">" "new_nucleotide”, e.g. g.123A>G, <font color="red">c.6T>C</font> ### Deletion > * Definition: a sequence change where, compared to a reference sequence, ==one or more nucleotides are not present (deleted)==. > * Format: “prefix” “position(s)_deleted” “del”, e.g. g.123_127del, <font color="red">c.4del</font> ### Insertion > * Definition: a sequence change where, compared to the reference sequence, ==one or more nucleotides are inserted and where the insertion is not a copy of a sequence== immediately 5 > * Format: “prefix”“positions_flanking”“ins”“inserted_sequence”, e.g. g.123_124insAGC, <font color="red">c.3_4insC</font> ### Duplication > * Definition: a sequence change where, compared to a reference sequence, ==a copy of one or more nucleotides are inserted directly 3' of the original copy of that sequence==. > * Format: “prefix”“position(s)_duplicated”“dup”, e.g. g.123_345dup, <font color="red">c.6dup</font> ### Deletion-insertion > * Definition: a sequence change where, compared to a reference sequence, ==one or more nucleotides are replaced by one or more other nucleotides== and which is not a substitution, inversion or conversion. > * <font color="red">Variants separated by one or more nucleotides should be described individually, not as a "delins," with the exception that two variants separated by a single nucleotide, and together affecting one amino acid, should be described as a "delins.".</font> > * Format: “prefix”“position(s)_deleted”“delins”“inserted_sequence”, e.g. g.123_127delinsAG, <font color="red">c.4delinsAC</font> > * Examples: > (x) c.[399T>C;400G>A] (O) c.399_400delinsCA (O) c.235_237delinsTAT,p.(Lys79Tyr) (O) c.[235A>T;237G>T],p.[(Lys79*;Lys79Asn)] ## Variant type (Protein level) 依據不同的變異類型,如Substitution、Deletion、Insertion、Duplication、Frameshift等,有不同的命名規則。如下圖所示: ![](https://i.imgur.com/lB0wKZW.png) ### Substitution > * Definition: a sequence change where, compared to a reference sequence, ==one amino acid is replaced by one other amino acid==. > * Format: “prefix”“amino_acid”“position”“new_amino_acid”, e.g. p.Arg54Ser >### Missense >> p.D3E / p.Asp3Glu >### Nonsense >> p.Q4X / p.Q4* / p.Gln4Ter >### Synonymous >> p.D3= / p.Asp3= ### Non-frameshit deletion/duplication/insertion/deletion-insertion > * Definition: a sequence change between the translation initiation (start) and termination (stop) codon where, compared to a reference sequence, one or more amino acids are deleted/duplicated/inserted. > * Format: “prefix”“amino_acid(s)+position(s)_deleted”“del”, e.g. p.Cys76_Glu79del > Format: “prefix”“amino_acid(s)+position(s)_duplicated”“dup”, e.g. p.Cys76_Glu79dup > * Format: “prefix”“amino_acids+positions_flanking”“ins”“inserted_sequence”, e.g. p.Lys23_Leu24insArgSerGln > * Changes involving two or more consecutive amino acids are described as a deletion/insertion variant (delins). > * Format: “prefix”“amino_acid(s)+position(s)_deleted”“delins”“inserted_sequence”, e.g. p.Arg123_Lys127delinsSerAsp <font color="red">NOTE: this does not mean that on the DNA or RNA level the variant is described as a "delins" variant as well; on DNA level, other rules may apply.</font> p.Arg76_Cys77delinsSerTrp (O) p.[Arg76Ser;Cys77Trp] (X) > * AKA ==in-frame== deletion/duplication/insertion/deletion-insertion, which doesn't lead to the shifting of reading frame. ### Frameshift insertion/deletion/duplication > * Definition: a sequence change between the translation initiation (start) and termination (stop) codon where, compared to a reference sequence, translation shifts to another reading frame. > * Format: “prefix”“amino_acid”position”new_amino_acid”“fs”“Ter”“position_termination_site”, e.g. p.(Arg123LysfsTer34) > * The shortest frame shift variant possible contains “fsTer2” > * Example: p.Arg97ProfsTer23 (short p.Arg97fs) / p.Arg97Profs*23: a variant with Arg97 as the first amino acid changed, shifting the reading frame, replacing it for a Pro and terminating at position Ter23. ### Alleles 以中括號表示in-cis或in-trans > NM_004006.2:c.[2376G>C;3103del] => in-cis > NP_003997.1:p.[Ser68Arg;Asn594del] => in-cis > NM_004006.2:c.[2376G>C];[3103del] => in-trans > NP_003997.1:p.[Ser68Arg];[Asn594del] => in-trans 以小括號表示未經實驗證實的protein變化 > Predicted consequences, i.e. without experimental evidence (no RNA or protein sequence analysed), should be given in parentheses inside the square brackets. > NP_003997.1:p.[(Ser68Arg;Asn594del)] => in-cis, predicted consequence > NP_003997.1:p.[(Ser68Arg)];[(Asn594del)] => in-trans, predicted consequence ### 3 prime rule ![](https://hackmd.io/_uploads/ryn0z-Hth.png) https://www.sophiagenetics.com/science-hub/hgvs-nomenclature/ ## Brain storming (2022 CAP NGSB dry challenge) ### 1. A patient has the high-quality sequencing result as shown in the online image ![image](https://hackmd.io/_uploads/SyLXj6Uhyl.png) Review of ClinVar reveals that the c.399T>C variant has been reported by multiple laboratories as benign and the c.400G>A variant has been reported by one laboratory as a variant of uncertain significance (VUS). How should the laboratory report the findings? A) [c.399T>C;c.400G>A] B) c.400G>A C) c.399_400delinsCA D) c.396_397dup E) c.399_400delTGinsCA :::danger C ::: --- ### 2. A patient has the high-quality sequencing result as shown in the online image. ![image](https://hackmd.io/_uploads/H1vObn02kl.png) A 2-month-old female is seen in a genetics clinic for a suspicion of nemaline myopathy. A trio exome is performed and multiple reportable variants in the KLHL41 gene are detected. One of the alleles is shown in the IGV image. How would you likely report this? Use NM_006063.3 as the reference transcript. A) Two DNA changes: c.1309G>T, c.1313A>T; two protein changes: p.Val437Phe, p.Tyr438Phe B) Two DNA changes: c.[1309G>T;c.1313A>T]; two protein changes: p.[Val437Phe;p.Tyr438Phe] C) Two DNA changes: c.[1309G>T;c.1313A>T]; one protein change: p.Val437_Tyr438delinsPhePhe D) One DNA change: c.1309_1313delinsTTCTT; two protein changes: [p.Val437Phe;p.Tyr438Phe] E) One DNA change: c.1309_1313delinsTTCTT; one protein change: p.Val437_Tyr438delinsPhePhe :::danger C ::: ### Reference: https://kknews.cc/news/53oeqvk.html https://varnomen.hgvs.org/ https://onlinelibrary.wiley.com/doi/epdf/10.1002/%28SICI%291098-1004%28200001%2915%3A1%3C7%3A%3AAID-HUMU4%3E3.0.CO%3B2-N ## For CNV (unfinished): E.g. A deletion c.(4005+1_4006-1)\_(*1_?) is detected in gene CRB1. - **`(4005+1_4006-1)`** : indicate an ==uncertain start point== located between 4005+1 and 4006-1 (i.e. intron 11) - **`(*1_?)`** : indicate an ==uncertain end point== located between the first base after the stop codon(轉錄終止密碼子後的第一個核苷酸) and unknown backward end(不確定的終點). ###### tags: `genomics`