三維table, linked function

# 三維table, linked function ###### tags: `Categorical data analysis` ## Three-way table ### Definitions: 對於一個X x Y x Z的table: Partial table: represent X, Y with different level of Z Marginal table: represent X, Y ignoring effect of Z Marginal association: describe association with marginal table Conditional association: describe association with partial table --- ### Conditional odds ratio: :::success $\theta_{XY(z)} = \frac{\mu_{11(z)}\mu_{22(z)}}{\mu_{12(z)}\mu_{22(z)}}$ ::: ### Marginal odds ratio: :::success $\theta_{XY} = \frac{\mu_{11+}\mu_{22+}}{\mu_{12+}\mu_{22+}}$ ::: ### Conditional independence: :::success $\theta_{XY(1)} = \theta_{XY(2)} = ... = \theta_{XY(z)} = 1$ ::: ### Marginal independence: :::success $\theta_{XY} = 1$ ::: Conditional independence跟marginal independence之間不能互推 --- ### Test 首先需要先檢驗Z對於是否會影響X跟Y之間的關係，即檢定同質性(homogeneous)，檢定的目標為odds ratio，因此 :::success $H_0 : \theta_{XY(1)} = \theta_{XY(2)} = .... = \theta_{XY(z)}$ ::: The test is called Breslow-Day test with test statistic :::success $T = \sum_{ijk} \frac{(\mu_{ijk} - \hat{\mu_{ijk}})^2}{\hat{\mu_{ijk}}} \rightarrow \chi_{z - 1}^2$ ::: 之後可以檢定conditional independence(for 2 x 2 x z) $H_0: \theta_{XY(1)} = \theta_{XY(2)} = .... = \theta_{XY(z)} = 1$ The test is called Cochran-Mantel-Haenszel(CMH) test, where test statistic :::success $CMH = \frac{[\sum_{k}(n_{11k} - \mu_{11k})]^2}{\sum_k Var(n_{11k})} \rightarrow \chi^2_{1}$ where $\mu_{11k} = \frac{n_{1+k}n_{+1k}}{n_{++k}}$ $Var(n_{11k}) = \frac{n_{1+k} n_{2+k} n_{+1k} n_{+2k}}{n_{++k}^2(n_{++k} - 1)}$ ::: Note: CMH必須在$(n_{11k} - \mu_{11k})$同正或同負的情況下才能使用 Note: It can also be extended to general i x j x k contengency table --- ### SAS code ```sas= proc freq data = <data>; weight <var>; table <explanetory1> * <explanetory2> * <response> / cmh; run; ``` ### SAS implementation 假設以下數據 ![](https://i.imgur.com/j71kPyP.jpg) White = 0, Black = 1 Death penalty no = 0, yes = 1 首先先看同質性檢定 ![](https://i.imgur.com/J4JzFoZ.jpg) $H_0:$ Z不會影響X跟Y之間的關係 $H_a:$ Z會影響X跟Y之間的關係 $p-value > 0.05$ 因此fail to reject同質性，可進一步檢定conditional independence ![](https://i.imgur.com/obTdoXa.jpg) 其中第一個值是假設x,y均為ordinal data 第二個值假設x為nominal, y為ordianl 第三個值假設x, y均為nomial 這邊要看的應該是第三個值 $p-value < 0.05$，拒絕conditional independence 然而前面由於fail to reject common odds ratio，我們便可以對Mantel-Haenszel odds ratio作出解釋。在這邊的解釋為: 在給定victim race的情況下，白人被判死刑的勝算是黑人的0.4119倍。而下面的對數優劣比則是以防細格有0的情況出現，所以做了+0.5的amend，這邊有cell是0，所以應該是看這個比較準確。 ![](https://i.imgur.com/nry2DEh.jpg =60%x) ## Generalized linear model ### General linear model :::success Let $X$ be a n x p matrix, representing a n x p contingency table Let $\beta$ be a p x 1 matrix, representing the parameter of linear model, then $Y_{n,1} = X\beta + \varepsilon \;\;\; \varepsilon \sim N(0, \sigma^2)$ ::: is called the general linear model ### Generalized linear model 如果對於一個分配的pdf, $f(x|\theta)$, 可被表達成 :::success $f(x|\theta) = \alpha(\theta)\beta(x)e^{xT(\theta)}$ ::: 則稱為nature exponential family的一族，其中$T(\theta)$稱作natural parameter 例如bernoulli distribution :::success $f(x|\theta) = \theta^{x}(1 - \theta)^{1-x} = (1-\theta)e^{x ln\frac{\theta}{1 - \theta}}$ ::: poisson distribution ::: success $f(x|\theta) = \frac{\theta^xe^{-\theta}}{x!} = e^{-\theta}\frac{1}{x!}e^{xln\theta}$ ::: 有了這個natural exponential family 的形式，便可以導出link function，denoted by $g(x)$，這個link function的作用主要是把迴歸跟分配的參數兩者的range連結起來，使其在數學上是可以做的。例如bernoulli distribution的parameter範圍是, $0 \leq p \leq 1, \;\;\mu = p$ 而任意一條回歸式$f(X) = x_1\beta_1 + x_2\beta_2 + ...x_n\beta_n + \varepsilon$的範圍會是$(-\infty, \infty)$ 經由上面natural parameter的轉換就可以得到 :::success $g(\mu) = ln\frac{\mu}{1-\mu} =x_1\beta_1 + x_2\beta_2 +...x_n\beta_n$ ::: 如此就順利的把parameter跟回歸式連起來了然而在這條式子裡看不到random error term，因為$p$本身就是一個隨機項，所以random的概念被移到$p$了，這樣便可以把random error term拿掉 因此，當parameter space不同時，就有不同的link function來轉換range 1. logit link: $g(\theta) = ln \frac{\theta}{1-\theta}$ from $[0, 1]$ to $(-\infty, \infty)$ Ex. Bernoulli 2. logit link: $g(\theta) = ln \frac{\theta}{n-\theta}$ from ${0, 1, ...n}$ to $(-\infty, \infty)$ Ex. Binomial 3. identity link: $g(\theta) = \theta$ from $(-\infty, \infty)$ to $(-\infty, \infty)$ Ex. Normal 4. log link: $g(\theta) = ln\theta$ from $[0, \infty)$ to $(-\infty, \infty)$ Ex. Poisson 如果link function即是exponential form形式中的natural parameter的話，那這個link就稱為canonical link, which is the most commonly used link