# 三維table, linked function
###### tags: `Categorical data analysis`
## Three-way table
### Definitions:
對於一個X x Y x Z的table:
<b>Partial table</b>: represent X, Y with different level of Z
<b>Marginal table</b>: represent X, Y ignoring effect of Z
<b>Marginal association</b>: describe association with marginal table
<b>Conditional association</b>: describe association with partial table
---
### Conditional odds ratio:
:::success
<font size = 5.5>
$\theta_{XY(z)} = \frac{\mu_{11(z)}\mu_{22(z)}}{\mu_{12(z)}\mu_{22(z)}}$
</font>
:::
### Marginal odds ratio:
:::success
<font size = 5.5>
$\theta_{XY} = \frac{\mu_{11+}\mu_{22+}}{\mu_{12+}\mu_{22+}}$
</font>
:::
### Conditional independence:
:::success
<font size = 5.5>
$\theta_{XY(1)} = \theta_{XY(2)} = ... = \theta_{XY(z)} = 1$
</font>
:::
### Marginal independence:
:::success
<font size = 5.5>
$\theta_{XY} = 1$
</font>
:::
Conditional independence跟marginal independence之間不能互推
---
### Test
首先需要先檢驗Z對於是否會影響X跟Y之間的關係,即檢定同質性(homogeneous),檢定的目標為odds ratio,因此
:::success
<font size = 5.5>
$H_0 : \theta_{XY(1)} = \theta_{XY(2)} = .... = \theta_{XY(z)}$
</font>
:::
The test is called <b>Breslow-Day</b> test with test statistic
:::success
<font size = 5.5>
$T = \sum_{ijk} \frac{(\mu_{ijk} - \hat{\mu_{ijk}})^2}{\hat{\mu_{ijk}}} \rightarrow \chi_{z - 1}^2$
</font>
:::
之後可以檢定conditional independence(for 2 x 2 x z)
<font size = 5.5>
$H_0: \theta_{XY(1)} = \theta_{XY(2)} = .... = \theta_{XY(z)} = 1$
</font>
The test is called <b>Cochran-Mantel-Haenszel(CMH)</b> test, where test statistic
:::success
<font size = 5.5>
$CMH = \frac{[\sum_{k}(n_{11k} - \mu_{11k})]^2}{\sum_k Var(n_{11k})} \rightarrow \chi^2_{1}$
where
$\mu_{11k} = \frac{n_{1+k}n_{+1k}}{n_{++k}}$
$Var(n_{11k}) = \frac{n_{1+k} n_{2+k} n_{+1k} n_{+2k}}{n_{++k}^2(n_{++k} - 1)}$
</font>
:::
Note: CMH必須在$(n_{11k} - \mu_{11k})$同正或同負的情況下才能使用
Note: It can also be extended to general i x j x k contengency table
---
### SAS code
```sas=
proc freq data = <data>;
weight <var>;
table <explanetory1> * <explanetory2> * <response> / cmh;
run;
```
### SAS implementation
假設以下數據

White = 0, Black = 1
Death penalty no = 0, yes = 1
首先先看同質性檢定

$H_0:$ Z不會影響X跟Y之間的關係
$H_a:$ Z會影響X跟Y之間的關係
$p-value > 0.05$ 因此fail to reject同質性,可進一步檢定conditional independence
<br/>

其中第一個值是假設x,y均為ordinal data
第二個值假設x為nominal, y為ordianl
第三個值假設x, y均為nomial
這邊要看的應該是第三個值
$p-value < 0.05$,拒絕conditional independence
然而前面由於fail to reject common odds ratio,我們便可以對Mantel-Haenszel odds ratio作出解釋。在這邊的解釋為:
在給定victim race的情況下,白人被判死刑的勝算是黑人的0.4119倍。而下面的對數優劣比則是以防細格有0的情況出現,所以做了+0.5的amend,這邊有cell是0,所以應該是看這個比較準確。

## Generalized linear model
### General linear model
:::success
<font size = 5.5>
Let $X$ be a n x p matrix, representing a n x p contingency table <br/>
Let $\beta$ be a p x 1 matrix, representing the parameter of linear model, then <br/>
$Y_{n,1} = X\beta + \varepsilon \;\;\; \varepsilon \sim N(0, \sigma^2)$
</font>
:::
is called the general linear model
### Generalized linear model
如果對於一個分配的pdf, $f(x|\theta)$, 可被表達成
:::success
<font size = 5.5>
$f(x|\theta) = \alpha(\theta)\beta(x)e^{xT(\theta)}$
</font>
:::
則稱為nature exponential family的一族,其中$T(\theta)$稱作natural parameter
例如bernoulli distribution
:::success
<font size = 5.5>
$f(x|\theta) = \theta^{x}(1 - \theta)^{1-x} = (1-\theta)e^{x ln\frac{\theta}{1 - \theta}}$
</font>
:::
poisson distribution
::: success
<font size = 5.5>
$f(x|\theta) = \frac{\theta^xe^{-\theta}}{x!} = e^{-\theta}\frac{1}{x!}e^{xln\theta}$
</font>
:::
<br/>
<br/>
有了這個natural exponential family 的形式,便可以導出link function,denoted by $g(x)$,這個link function的作用主要是把迴歸跟分配的參數兩者的range連結起來,使其在數學上是可以做的。
例如bernoulli distribution的parameter範圍是, $0 \leq p \leq 1, \;\;\mu = p$
而任意一條回歸式$f(X) = x_1\beta_1 + x_2\beta_2 + ...x_n\beta_n + \varepsilon$的範圍會是$(-\infty, \infty)$
經由上面natural parameter的轉換就可以得到
:::success
<font size = 5.5>
$g(\mu) = ln\frac{\mu}{1-\mu} =x_1\beta_1 + x_2\beta_2 +...x_n\beta_n$
</font>
:::
如此就順利的把parameter跟回歸式連起來了
然而在這條式子裡看不到random error term,因為$p$本身就是一個隨機項,所以random的概念被移到$p$了,這樣便可以把random error term拿掉
<br/>
因此,當parameter space不同時,就有不同的link function來轉換range
1. logit link: $g(\theta) = ln \frac{\theta}{1-\theta}$
from $[0, 1]$ to $(-\infty, \infty)$
Ex. Bernoulli
2. logit link: $g(\theta) = ln \frac{\theta}{n-\theta}$
from ${0, 1, ...n}$ to $(-\infty, \infty)$
Ex. Binomial
3. identity link: $g(\theta) = \theta$
from $(-\infty, \infty)$ to $(-\infty, \infty)$
Ex. Normal
4. log link: $g(\theta) = ln\theta$
from $[0, \infty)$ to $(-\infty, \infty)$
Ex. Poisson
如果link function即是exponential form形式中的natural parameter的話,那這個link就稱為<b>canonical link</b>, which is the most commonly used link