# 三維table, linked function ###### tags: `Categorical data analysis` ## Three-way table ### Definitions: 對於一個X x Y x Z的table: <b>Partial table</b>: represent X, Y with different level of Z <b>Marginal table</b>: represent X, Y ignoring effect of Z <b>Marginal association</b>: describe association with marginal table <b>Conditional association</b>: describe association with partial table --- ### Conditional odds ratio: :::success <font size = 5.5> $\theta_{XY(z)} = \frac{\mu_{11(z)}\mu_{22(z)}}{\mu_{12(z)}\mu_{22(z)}}$ </font> ::: ### Marginal odds ratio: :::success <font size = 5.5> $\theta_{XY} = \frac{\mu_{11+}\mu_{22+}}{\mu_{12+}\mu_{22+}}$ </font> ::: ### Conditional independence: :::success <font size = 5.5> $\theta_{XY(1)} = \theta_{XY(2)} = ... = \theta_{XY(z)} = 1$ </font> ::: ### Marginal independence: :::success <font size = 5.5> $\theta_{XY} = 1$ </font> ::: Conditional independence跟marginal independence之間不能互推 --- ### Test 首先需要先檢驗Z對於是否會影響X跟Y之間的關係,即檢定同質性(homogeneous),檢定的目標為odds ratio,因此 :::success <font size = 5.5> $H_0 : \theta_{XY(1)} = \theta_{XY(2)} = .... = \theta_{XY(z)}$ </font> ::: The test is called <b>Breslow-Day</b> test with test statistic :::success <font size = 5.5> $T = \sum_{ijk} \frac{(\mu_{ijk} - \hat{\mu_{ijk}})^2}{\hat{\mu_{ijk}}} \rightarrow \chi_{z - 1}^2$ </font> ::: 之後可以檢定conditional independence(for 2 x 2 x z) <font size = 5.5> $H_0: \theta_{XY(1)} = \theta_{XY(2)} = .... = \theta_{XY(z)} = 1$ </font> The test is called <b>Cochran-Mantel-Haenszel(CMH)</b> test, where test statistic :::success <font size = 5.5> $CMH = \frac{[\sum_{k}(n_{11k} - \mu_{11k})]^2}{\sum_k Var(n_{11k})} \rightarrow \chi^2_{1}$ where $\mu_{11k} = \frac{n_{1+k}n_{+1k}}{n_{++k}}$ $Var(n_{11k}) = \frac{n_{1+k} n_{2+k} n_{+1k} n_{+2k}}{n_{++k}^2(n_{++k} - 1)}$ </font> ::: Note: CMH必須在$(n_{11k} - \mu_{11k})$同正或同負的情況下才能使用 Note: It can also be extended to general i x j x k contengency table --- ### SAS code ```sas= proc freq data = <data>; weight <var>; table <explanetory1> * <explanetory2> * <response> / cmh; run; ``` ### SAS implementation 假設以下數據 ![](https://i.imgur.com/j71kPyP.jpg) White = 0, Black = 1 Death penalty no = 0, yes = 1 首先先看同質性檢定 ![](https://i.imgur.com/J4JzFoZ.jpg) $H_0:$ Z不會影響X跟Y之間的關係 $H_a:$ Z會影響X跟Y之間的關係 $p-value > 0.05$ 因此fail to reject同質性,可進一步檢定conditional independence <br/> ![](https://i.imgur.com/obTdoXa.jpg) 其中第一個值是假設x,y均為ordinal data 第二個值假設x為nominal, y為ordianl 第三個值假設x, y均為nomial 這邊要看的應該是第三個值 $p-value < 0.05$,拒絕conditional independence 然而前面由於fail to reject common odds ratio,我們便可以對Mantel-Haenszel odds ratio作出解釋。在這邊的解釋為: 在給定victim race的情況下,白人被判死刑的勝算是黑人的0.4119倍。而下面的對數優劣比則是以防細格有0的情況出現,所以做了+0.5的amend,這邊有cell是0,所以應該是看這個比較準確。 ![](https://i.imgur.com/nry2DEh.jpg =60%x) ## Generalized linear model ### General linear model :::success <font size = 5.5> Let $X$ be a n x p matrix, representing a n x p contingency table <br/> Let $\beta$ be a p x 1 matrix, representing the parameter of linear model, then <br/> $Y_{n,1} = X\beta + \varepsilon \;\;\; \varepsilon \sim N(0, \sigma^2)$ </font> ::: is called the general linear model ### Generalized linear model 如果對於一個分配的pdf, $f(x|\theta)$, 可被表達成 :::success <font size = 5.5> $f(x|\theta) = \alpha(\theta)\beta(x)e^{xT(\theta)}$ </font> ::: 則稱為nature exponential family的一族,其中$T(\theta)$稱作natural parameter 例如bernoulli distribution :::success <font size = 5.5> $f(x|\theta) = \theta^{x}(1 - \theta)^{1-x} = (1-\theta)e^{x ln\frac{\theta}{1 - \theta}}$ </font> ::: poisson distribution ::: success <font size = 5.5> $f(x|\theta) = \frac{\theta^xe^{-\theta}}{x!} = e^{-\theta}\frac{1}{x!}e^{xln\theta}$ </font> ::: <br/> <br/> 有了這個natural exponential family 的形式,便可以導出link function,denoted by $g(x)$,這個link function的作用主要是把迴歸跟分配的參數兩者的range連結起來,使其在數學上是可以做的。 例如bernoulli distribution的parameter範圍是, $0 \leq p \leq 1, \;\;\mu = p$ 而任意一條回歸式$f(X) = x_1\beta_1 + x_2\beta_2 + ...x_n\beta_n + \varepsilon$的範圍會是$(-\infty, \infty)$ 經由上面natural parameter的轉換就可以得到 :::success <font size = 5.5> $g(\mu) = ln\frac{\mu}{1-\mu} =x_1\beta_1 + x_2\beta_2 +...x_n\beta_n$ </font> ::: 如此就順利的把parameter跟回歸式連起來了 然而在這條式子裡看不到random error term,因為$p$本身就是一個隨機項,所以random的概念被移到$p$了,這樣便可以把random error term拿掉 <br/> 因此,當parameter space不同時,就有不同的link function來轉換range 1. logit link: $g(\theta) = ln \frac{\theta}{1-\theta}$ from $[0, 1]$ to $(-\infty, \infty)$ Ex. Bernoulli 2. logit link: $g(\theta) = ln \frac{\theta}{n-\theta}$ from ${0, 1, ...n}$ to $(-\infty, \infty)$ Ex. Binomial 3. identity link: $g(\theta) = \theta$ from $(-\infty, \infty)$ to $(-\infty, \infty)$ Ex. Normal 4. log link: $g(\theta) = ln\theta$ from $[0, \infty)$ to $(-\infty, \infty)$ Ex. Poisson 如果link function即是exponential form形式中的natural parameter的話,那這個link就稱為<b>canonical link</b>, which is the most commonly used link