## What is Pseudogenes?
> A pseudogene is a genomic region that ==has high sequence similarity(homology) to a known gene== but is ==nonfunctional== (ie, does not produce a functional final protein product). Usually, the DNA sequences of a pseudogene and of its functional parent gene are about 65% to 100% identical.
>
> Pseudogenes tend to accumulate more variants than their parent genes as they are not often under selective pressure.
> For example, STRC and STRCP1.
## What is Paralogue?
>Paralogs are genes that have evolved from a common ancestral gene through duplication within the same organism. They often ==retain some functional capacity== and may diverge in their roles over time.
>For example, SCN1A, SCN9A, SCN1A, and so on. [Ref](https://www.cardiodb.org/paralogue_annotation/gene.php?name=SCN5A).
[perplexity](https://www.perplexity.ai/search/what-does-paralogue-mean-in-th-rE_0r_hJQp6pT_2plCSdCA)
## Be aware of pseudogenes when ordering genetic testing
* Segmental duplications can be ==indistinguishable== from their parent region if a laboratory is using short-read NGS methods
* High levels of sequence similarity complicate accurate read alignment (mapping) as shown in the figure below. ==Sequence reads that map to several genomic positions are discarded (aka low mapping quality) in the analysis==, which causes gaps in the sequence coverage.
* If sequence reads containing a pseudogene-derived variant are mis-mapped to the parent gene, it may result in a ==false positive variant call==.
* If sequence reads containing a parent gene-derived variant are mis-mapped to the pseudogene, it may result in a ==false negative result==.
* Due to the high degree of sequence similarity, ==it can be difficult to design parent gene-specific Sanger sequencing primers==.
* We manually design all Sanger primers when confirming variants in regions with high homology and develop custom confirmation methods utilizing long-range PCR when necessary.

==Confidence in read alignment decreases when sequence homology between the regions increases.== Sequence reads are discarded when they align equally well to several genomic positions. The use of longer read length and paired-end sequencing improves read mapping.
[reference](https://blueprintgenetics.com/pseudogene/)
## Real case *STRC* vs *STRCP1*
This is a case with congenital hearing loss. A homozygoys G>A variant on `STRC` is detected by GATK Haplotype caller.
When browsing this variant in IGV without setting the threshold for mapping quality, reads with low mapping quality lead to the heterozygous-like distribution(left figure). After setting the mapping quality cutoff of 1, only reads with alternative allele are remained. And it looks exactly homozygous(right figure).
While Sangering this case, the signal of reference allele also present. It probably results from the pseudogene.
Exon 19-29 of *STRC* (NM_153700.2) have high sequence similarity (>98%) with *STRCP1*, the detected variant is loacted at exon 28, so the accurate genotype should be tested by another methodology.

## *PMS2* vs *PMS2CL*
Exon 11-14 of *PMS2*(NM_000535.6) have high sequence similarity (>98%) with *PMS2CL*.
https://onlinelibrary.wiley.com/doi/full/10.1002/gcc.23193
## How to overcome the problem resulted from pseudogene when sequencing PMS2 using NGS?
To effectively overcome the challenges posed by pseudogenes when sequencing the PMS2 gene using next-generation sequencing (NGS), several strategies can be employed. The presence of the pseudogene PMS2CL, which shares significant homology with PMS2, complicates variant calling due to potential misalignment and ambiguous mapping of sequencing reads. Here are key approaches to address these issues:
### **1. Target Enrichment Strategies**
- **PCR-based Enrichment**: This method allows for selective amplification of the target gene (PMS2) while minimizing pseudogene interference. However, it may introduce biases and is less effective in highly homologous regions.
- **Capture-based Enrichment**: Utilizing hybridization capture techniques can improve specificity by selectively enriching for the target sequences. This approach can be more effective in distinguishing between homologous regions compared to PCR methods[1][3].
### **2. Advanced Mapping Algorithms**
- **MRJD Algorithm**: A novel computational strategy called MRJD has been developed to enhance small variant detection in regions of high homology. It improves sensitivity and specificity in variant calling by addressing misalignment issues associated with PMS2 and its pseudogene[2][4].
- **DRAGEN Algorithm**: The DRAGEN 4.3 algorithm refines variant detection capabilities specifically for PMS2, enabling accurate identification of pathogenic variants in clinical samples[4].
### **3. Long-range PCR Techniques**
- Implementing long-range PCR can help amplify larger segments of the PMS2 gene, which may include exons where pseudogene interference is problematic. This technique allows for better resolution and confirmation of variants by producing longer amplicons that are less likely to misalign with the pseudogene[6][8].
### **4. Bioinformatics Enhancements**
- Employing customized bioinformatics pipelines that account for high homology regions can significantly improve read mapping accuracy. This includes setting stringent mapping quality thresholds and utilizing paired-end sequencing to enhance alignment confidence[5][6].
- It is also crucial to maintain awareness of problematic genomic regions during the design phase of NGS assays, allowing for adjustments in analysis protocols as needed[6].
### **5. Manual Confirmation Methods**
- When dealing with variants detected in high-homology regions, manual confirmation using Sanger sequencing or additional targeted approaches may be necessary to validate findings and mitigate false positives or negatives resulting from pseudogene interference[5][6].
By integrating these strategies, researchers and clinicians can enhance the reliability of NGS for PMS2 sequencing, ultimately improving diagnostic accuracy for conditions such as Lynch syndrome.
Citations:
[1] https://pubmed.ncbi.nlm.nih.gov/34165726/
[2] https://www.tempus.com/publications/overcoming-the-challenges-of-variant-calling-in-pms2-high-homology-regions-for-improved-lynch-syndrome-diagnosis-using-whole-genome-sequencing/
[3] https://pubmed.ncbi.nlm.nih.gov/24823787/
[4] https://www.illumina.com/science/genomics-research/articles/PMS2-small-variant-detection.html
[5] https://blueprintgenetics.com/pseudogene/
[6] https://www.nature.com/articles/gim201658
[7] https://www.nature.com/articles/s41467-022-28115-z
[8] https://www.jmdjournal.org/article/S1525-1578(15)00113-0/pdf
result from [perplexity](https://www.perplexity.ai/search/how-to-overcome-the-problem-re-D3CoI.GJQn2Qe5Bv_J8GiA)