# MR - mendelian randomization
## Assumptions

Assumptions of MR analysis
* **IV1 Relevance**: The variant is associated with the risk factor.

* **IV2 Exchangeability**: The variant is independent of the confounder U.

population stratification:

* **IV3 Exclusion-restriction**: The variant is independent of the outcome Y conditional on the risk factor X and confounders U. There is no other arrow from G to Y than via X.
Horizontal pleiotropy is not okay for MR but vertical pleiotropy is okay for MR

## One sample MR
### Using single IV - the ratio estimates

For strong instruments the first and second order standard error are relatively similar. The weaker the instrument, the higher the standard error when using the second order standard error

### Using multivariate IVs - two stage least squares


This is automatically done when using dedicated 2-stage least squares functions in R:
`ivreg() `in AER package
`summary.ivreg(Y~X | G, diagnostics=TRUE)`
`tsls()` in sem package
`tsls(Y~X, ~G)`
`ivreg()` in ivpack package
`ivreg(Y~X, ~G)`

### Using multivariate IVs - allele scores
Allele scores

weighted allele scores

''
### Do bi-directional MR using one sample MR
To explore either X-->Y or Y-->X
Does more education make students short-sighted?
Do short-sighted students stay longer in education?
Instrument strength:
Allele score for education explains of 0.7% of years in education (F = 464).
Allele score for myopia explains of 4.3% of the refractive error

## two sample MR


Summary-level data: ratio estimate

Summary-level data: inverse variance weighting

Summary-level data: likelihood-based method
* The IVW estimate θivw is a weighted average of slope estimates θ1, ..., θn
* Equivalent to fitting a weighted regression model
* The regression coefficient θreg is an estimate of the causal effect θ:
* Note that there is no intercept which is a consequence of the IV assumptions (e.g. no pleiotropy)
## robust MR methods with summary level data
### problems
Bias due to pleiotropy

### MR-Egger regression


### median based estimates
* Suppose that the majority of variants, i.e. >50%, are valid IVs
* In a large sample size, the variant-specific ratio estimates based on the valid instruments will all estimate the true causal effect
* So the median of the ratio estimates can be used as an estimate of θ
* The median estimate will be less influenced by outlying variants than the IVW estimate (which is a weighted mean of the variant-specific ratio estimates)
* No assumption necessary for invalid variants

### mode based estimates
* Zero Modal Pleiotropy Assumption (i.e. most common horizontal pleiotropy value across IVs is zero)
* In other words, the most common (i.e. mode) variant-specific ratio estimate is a consistent estimate of the true causal effect, even if the majority of IVs are invalid
* So the mode of the ratio estimates can be used asan estimate of θ
* The mode estimate will be less influenced byoutlying variants than the IVW estimate
* No assumption necessary for invalid variants
## Violation of assumptions
**Weak instrument bias**: The genetic variant is not a strong predictor of the exposure



**Linkage disequilibrium**: the observed effect is due to a close by genetic variant, unrelated to the exposure (when using cis-pQTL) >> pleiotropy


**Pleiotropy**: genetic instruments working through different pathways



**Population stratification**: if the population consists of subpopulations, then associations may arise that do not have a biological interpretation
**Canalization**: genetic profile (e.g. a disruptive mutation) has activated a compensatory mechanism which is opposite or unrelated to exposure
**Winner’s curse**
It is shown that the effect estimate is likely to be overestimated in discovery panels (winner’s curse)
This is due to variation in the effect size from population to population
Winner’s curse might cause instruments to be weaker than expected in practice
If you have the chance, use effect estimates from replication studies
**Additional assumptions for causal estimation**
* What we calculate in MR, is the average causal effect
* Homogeneity: the effect of exposure on outcome is the same in all individuals
* Linearity: the effect of exposure on outcome is linear
* No effect modification: the effect of exposure on outcome does not vary across other variables (e.g. sex, age, …)
* Local average causal effect
* Monotonicity: the effect of genetic instruments on X is homogenous across the population
## outlier detection


**Global test:**
Concerned with the overall features of the data:
* Is there in general more heterogeneity than expected by chance in all data (considering all instruments together)?
**Local test:**
Focusing on individual observations (genetic variants used as IVs) and detecting specific variants as potential outliers:
* Are there specific instrumental variables that are more heterogeneous than expected by chance?



A bigger Q means higher heterogeneity than expected by chance. this indicates potential pleiotropyof the genetic
Influential points
* In addition to outliers, we could also look at influential points
* When using multiple instruments, a MR study is reliable if the causal effect is supported by the majority of the genetic variants.
* Still in some applications there may be genetic variants that have a great impact on the analysis and can greatly influence an analysis.
Mendelian randomization pleiotropy residual sum and outlier (MR-PRESSO)
MR-PRESSO has three components:
(1) Detection of horizontal pleiotropy (the MR-PRESSO global test)
(2) Correction for horizontal pleiotropy via outlier removal (the MR-PRESSO local test for outliers)
(3) Testing of significant differences in the causal estimates before and after correction for outliers (the MR-PRESSO distortion test)


## multivariate MR
Multivariable and multiresponse Mendelian Randomisation -When there is pleitrophy in our model
### sensitivity analysis

### Multivariable by design

Assumptions

direct and total effect

1. individual level data : 2stage least squares

3. Summary-level data: inverse variance weighting


4. Summary-level data: MR-Egger

### Multiple response MR (MR2)

Seeminly unrelated regression framework


## cis MR and colocalization
cis MR


colocalization

How to measure colocalization


**cis-MR**: assessing the genetic associations with the outcome for independent variants that strongly associate with the exposure within the cis-region
**Colocalization**: comparing the genetic associations of exposure biomarker and the outcome for all available variants within the cis-region. For sensitivitly analysis. Specifically useful in protecting against confounding by LD.


## minor allele frequency(MAF)
MAF=[sum(g==1)+2*sum(g==2)]/2*length(g)
smaller value means week variant
To prune or clump the SNPs. Pruning removes SNPs which are correlated (measured by the squared correlation r2). From a group of correlated SNPs it retains the one with the lowest p-value for the exposure.