# Wald, Score, Likelihood Ratio test, odds ratio
###### tags: `Categorical data analysis`
## Hypothesis test
To test the hypothesis $H_0:\pi = \pi_0$ for a binomial distribution, three test statistics can be used
1. Wald test

where $Z \sim N(0,1)$
Then the $(1-\alpha)%$ confidence level can be define as

2. Score test

\
where $Z \sim N(0,1)$
We can sense out the difference that Wald test takes the estimating value for the varience. However, score test takes the value of $H_0$ into calculation. Basically, score test is more preferable but Wald test is more practical
3. Likelihood ratio test
$L_0$ : The maximum value in likelihood function under $H_0$ is true
$L_1$ : The maximum value in likelihood function for the space $H_0 \cup H_{a}$
Then the ratio is defined as $\Lambda = \frac{L_0}{L_1}$
**Since $L_1 \geq L_0$, $\Lambda \leq 1$**
Illustration for one-tailed test

And the test statistics $-2log\Lambda$ has a limiting chi-square distribution
\
Illustration of bernoulli trial

**Three methods will yield different results**
**But when $\pi \approx 0$, we might find Wald test invalid**
## Comparisions between Pearson $\chi^2$ and likelihood ratio test
1. 大樣本時,$\chi^2$和$G^2$會接近
2. 再給定類別數量$c$下,$\chi^2$會更快的收斂至卡方
3. 當 $\frac{n}{c} < 5$ 時,$\chi^2$的效果很差
4. 兩者都treat $X, Y$ as nomianl
5. 兩者在row or columns隨意變換順序時都不會受影響
## Contingency Table
| X Y | Affected(1) | No Affected(2) | |
| ---------------- |:-----------:|:--------------:| ------------------------ |
| With Medigen(1) | $\pi_{11}$ | $\pi_{12}$ | $\pi_{1+}$ |
| No Medigen(2) | $\pi_{21}$ | $\pi_{22}$ | $\pi_{2+}$ |
| | $\pi_{+1}$ | $\pi_{+2}$ | $\sum_{ij} \pi_{ij} = 1$ |
**$\pi_{i+}$ and $\pi_{+j}$ represented marginal distribution**
**Sensitivity:$P(Y=1 | X = 1)$**
**Specificity:$P(Y=2|X=2)$**
For the two group contingency table above, given that $X = i$, the conditional probability of $Y$ becomes
$\pi_{2|i} = 1 - \pi_{1|i}$
So for simplicity, here we rewrite as
$\pi_{1|i} = \pi_{i}$
### Type of sampling
For this kind of experiment, where we want to justify whether Medigen is efficacious, there are three ways sampling.
1. Randomly sample the 200 population who have taken the vaccination
------>Multinomial distribution
2. Sample 100 people who were affected and 100 people who weren't affected
------>Two binomial distribution
Note: This is called the **case-control study**
Note: In the case-control study, we will have difficulties defining the $\pi_1$ and $\pi_2$. Since we have fixed the probability between affected patients and unaffected patients. Thus, we can actually control $P[X=1,Y=1]$
3. Set up the time interval, record all the people that passed through
------> Poisson distribution
### Type of studies
1. **Retrospective study**:
The data is gained by the past events
2. **Prospective study**:
The data is gained by observing the future events
### Comparisons between $\pi_1$ and $\pi_2$
1. Risk Difference: $\pi_2 - \pi_1$
Properties: [-1, 1]
Inference: If $\pi_2 = \pi_1$, meaning that $X$ is independent
Drawback: When $\pi_{i}$ is close to 1 or 0, we might falsely interpret the significance of $\pi_2 - \pi_1$
2. Relative risk(RR):
Def: $RR = \frac{\pi_1}{\pi_2}$
Properties: $RR > 0$
Inference: if rows are independent, then $RR = 1$
3. Odds ratio(OR):
Def: $\Omega = \frac{\pi}{1-\pi}$, where $\pi$ is the probability of success. Then $OR = \theta = \frac{\Omega_1}{\Omega_2} = \frac{\pi_1/(1-\pi_1)}{\pi_2/(1-\pi_2)}$
In the 2X2 contingency table, $OR = \frac{\pi_{11}/\pi_{12}}{\pi_{21}/\pi_{22}} = \frac{\pi_{11}\pi_{22}}{\pi_{21}\pi_{12}}$
Note: Since the calculation here is using the probability in every cell, it can be used on the case-control study where $\pi_1$ and $\pi_2$ is problematic. Furthermore, the value of $OR$ will not change if we swap the rows with columns, that is

### Relation between RR and OR

So we can find out that as $\pi_{i}$ close to zero, OR and RR will getting closer
### Asymptotic distribution for RR
Delta method:
For an estimator $\hat{X}$, where $\hat{X} \xrightarrow{d} N(\mu, \sigma^2)$
The transformation $f$ on $\hat{X}$ will be asymptotic normally distributed with
$f(\hat{X}) \xrightarrow{d} N(f(\mu), [f^{'}(\mu))]^2\sigma^2)$

### SAS code
```sas=
proc data = d1;
weight count;
table ismedigen * isaffected / riskdiff relrisk;
run;
```
### SAS implementation



### Inference of odds ratio
對於勝算比,我們需要先定義"勝利"才能做推論。例如上方例子$X = 1$為注射高端,$X = 2$則否。$Y = 1$為被感染,$Y = 2$則否。\
首先如果定義"被感染",也就是$Y = 1$為勝利,那麼
$\Omega_1 = \frac{0.0171}{0.9829} = 0.0174$ \
$\Omega_2 = \frac{0.00094}{0.09906} = 0.0949$
此時
$\theta = \frac{0.0174}{0.0949} =
1.8335 = \frac{189*10933}{104*10845}$
根據這個結果的推論則為,有注射高端的被感染的勝算為沒注射高端的1.83倍。
\
如果反過來定義"未感染",也就是$Y = 2$為勝利,那麼
$\Omega_1 = \frac{0.9829}{0.0171} = 57.48$ \
$\Omega_2 = \frac{0.9906}{0.0094} = 105.4$ \
此時
$\theta = \frac{57.48}{105.4} =
0.545 = \frac{1}{1.83} = \frac{104*10845}{189*10933}$
根據這個結果的推論為,有注射高端的勝率沒被感染的勝算為沒注射高端的0.545倍。