# Comparison of Sequence-based and Array-based genotyping methods for QTL Analysis
## Introduction
Quantitative trait locus (QTL) mapping is a crucial technique in genetics for understanding the relationship between genetic variants and phenotypic traits. This study compares two approaches: sequence-based genotyping and traditional array-based genotyping analyzed using QTLreaper, which implements Haley-Knott (H-K) regression. While both methods aim to identify genetic variants associated with traits of interest, sequence-based genotyping offers higher marker density and better coverage of genetic variation. This increased resolution is particularly important for identifying cis-expression QTLs (cis-eQTLs), where genetic variants near a gene affect its expression. With the advent of whole-genome sequencing technologies, we expect to detect more cis-eQTLs compared to array-based methods due to better capture of local genetic variation and improved ability to detect variants in traditionally poorly covered regions.
## Methods
#### QTLreaper and H-K Regression
QTLreaper is an implementation of the Haley-Knott regression method specifically designed for efficient QTL analysis. It uses a regression-based approach to test for associations between genetic markers and phenotypic traits. The H-K method estimates genotype probabilities at regular intervals between markers, making it computationally efficient while still providing good statistical power for QTL detection. In our analysis, QTLreaper was used to process the array-based genotype data.
#### Data Collection and Processing
Two distinct datasets were analyzed:
- Spleen dataset (UTHSC_SPL_RMAEx_1210)
- Retina dataset (Illum_Retina_BXD_RankInv0410)
For each dataset, genotyping was performed using:
- Sequence-based genotyping
- Array-based genotyping
Distance calculations were performed using the difference between the marker position (Mb) and peak position (Peak Mb) for each identified QTL.
## Results
The comparison between sequence and array genotypes revealed fundamental differences in QTL detection capability, with sequence-based genotyping demonstrating clear advantages for genetic mapping.

- Higher Resolution and Sensitivity
Sequence genotypes exhibited higher marker density, particularly at distances less than 1 Mb from QTL peaks, as shown by the dense cloud of points in the plots. This provides better coverage of genetic variants near genes of interest, eables more precise localization of QTLs, especially cis-acting elements and increases the likelihood of detecting causal variants.
- Improved Detection of Local Effects
Higher LOD scores (>10) consistently correspond to smaller distances (<1 Mb), this pattern suggests better detection of local regulatory elements and cis-effects.
#### Limitations of Array Genotyping
In contrast, array genotypes showed:
- A sparse, linear distribution of markers
- Large gaps in coverage, particularly evident in the scattered distribution pattern
- Less resolution for detecting local effects, as shown by the more uniform distribution across distances
The visualization clearly demonstrates that sequence-based genotyping provides superior resolution and detection capability compared to array-based approaches, particularly for identifying and characterizing local genetic effects that may influence gene expression.
For Retina-sequence genotype:
```
QTL Statistics:
Mean LOD for significant QTLs: 8.94
Median LOD for significant QTLs: 4.61
Distance Statistics by LOD threshold:
LOD > 3.5:
Median distance: 0.91 Mb
Number of QTLs: 8616
LOD > 4:
Median distance: 0.83 Mb
Number of QTLs: 5706
LOD > 5:
Median distance: 0.74 Mb
Number of QTLs: 3841
```
For retina-array sequencing:
```
QTL Statistics:
Mean LOD for significant QTLs: 8.39
Median LOD for significant QTLs: 4.60
Distance Statistics by LOD threshold:
LOD > 3.5:
Median distance: 24.92 Mb
Number of QTLs: 8499
LOD > 4:
Median distance: 24.36 Mb
Number of QTLs: 5653
LOD > 5:
Median distance: 23.88 Mb
Number of QTLs: 3752
```
For Spleen-sequence genotype:
QTL Statistics:
Mean LOD for significant QTLs: 6.19
Median LOD for significant QTLs: 4.15
Distance Statistics by LOD threshold:
```
LOD > 3.5:
Median distance: 1.13 Mb
Number of QTLs: 35840
LOD > 4:
Median distance: 0.96 Mb
Number of QTLs: 20506
LOD > 5:
Median distance: 0.83 Mb
Number of QTLs: 10963
```
For Spleen ARRAY-SEQUENCING:
```
QTL Statistics:
Mean LOD for significant QTLs: 6.65
Median LOD for significant QTLs: 4.27
Distance Statistics by LOD threshold:
LOD > 3.5:
Median distance: 24.78 Mb
Number of QTLs: 28593
LOD > 4:
Median distance: 24.28 Mb
Number of QTLs: 17316
LOD > 5:
Median distance: 23.87 Mb
Number of QTLs: 10196
```
####################################################
#### Search for specific probeIDs manually
###### Example 1
I focused on chromosome 5, with particular attention to ProbeID 10529896, corresponding to the Qdpr gene (quinoid dihydropteridine reductase).
- In the spleen dataset search for ProbeID: 10529896

- WGS-sequence data:

JSON file in which I was working on:
```
jq '.[] | select(.Name == "10529896")' UTHSC_SPL_RMAEx_1210_seqgeno.json
{
"Additive": -0.243564827127659,
"Aliases": "Qdpr; 2610008L04Rik; D5Ertd371e; Dhpr; PKU2",
"Chr": "5",
"Description": "quinoid dihydropteridine reductase",
"Id": 8439598,
"LRS": 50.3400630206968,
"Locus": "rsm10000002332",
"Mb": 45.434293,
"Mean": 9.894440335964937,
"Name": "10529896",
"Peak Chr": "5",
"Peak Mb": 44.121135,
"Symbol": "Qdpr"
}
```
The position are the same from the file and GN, but the LOD is slightly different.
- Classic genotype data:

```
{
"Additive": -0.2561574893368342,
"Name": "10529896",
"Mb": 45.434293,
"Peak Chr": "5",
"Alias": "Qdpr; 2610008L04Rik; D5Ertd371e; Dhpr; PKU2",
"Chr": "5",
"Peak Mb": 31.889,
"Description": "quinoid dihydropteridine reductase",
"Symbol": "Qdpr",
"Locus": "rs13478217",
"LRS": 53.81956481067991,
"Id": 8439598
}
```
The position are the same from the file and GN, but the LOD is slightly different.
###### Example 2
-WGS genotype

```
{
"Additive": -0.0765618279569901,
"Aliases": "Homer2; 9330120H11Rik; AW539445; CPD; Vesl-2",
"Chr": "7",
"Description": "homer homolog 2 (Drosophila)",
"Id": 8470703,
"LRS": 10.7922176884059,
"Locus": "rsm10000004986",
"Mb": 81.600969,
"Mean": 5.714091751553597,
"Name": "10565153",
"Peak Chr": "7",
"Peak Mb": 81.519056,
"Symbol": "Homer2"
}
```
- Classic genotype

```
{
"Additive": 0.0771401808785529,
"Name": "10565153",
"Mb": 81.600969,
"Peak Chr": "15",
"Alias": "Homer2; 9330120H11Rik; AW539445; CPD; Vesl-2",
"Chr": "7",
"Peak Mb": 75.107,
"Description": "homer homolog 2 (Drosophila)",
"Symbol": "Homer2",
"Locus": "rs13482731",
"LRS": 11.443635237320898,
"Id": 8470703
}
```
Interesting is that in any case, in the classic genotype file the marker and the peak are on chromosomes different.
#### Notes LOD
https://gist.github.com/sens/32429e9f8cd876e4347e21175bd97cd4
One can see that logp and LOD score only agree when the df=2. When df=1 (as is the case with many GWAS), logp is greater than LOD.