{%hackmd 8nGPjMOiTy-0DWU2xy0q0Q %} # Categorical Predictors Resucre: [Stat 501](https://online.stat.psu.edu/stat501/lesson/8) [Home Page](/_9v1g3C3TXmUfBbxkjnn0A) [toc] ## Preparation Category variables can’t directly use, need to be transformed. ### Binary predictor Example: gender (male, female) Use an indicator $$ X_{1i} = \begin{cases} 1 & \text{ if preson } i \text{ is male} \\ 0 & \text{ if preson } i \text{ is female} \end{cases} $$ | Gender | | Gender | | ------ |:--- | ------ | | M | | 1 | | F | | 0 | | F | | 0 | | ... | | ... | | M | | 1 | ### Multinomial predictor Example: educate level (primary school, junior high school, university) Use indicators, or called *one-hot encoding* $$ \begin{align*} X_{1i} & = \begin{cases} 1 & \text{ if preson } i \text{ graduating from primary school} \\ 0 & \text{ otherwise} \end{cases} \\ X_{2i} & = \begin{cases} 1 & \text{ if preson } i \text{ graduating from junior high school} \\ 0 & \text{ otherwise} \end{cases} \\ X_{3i} & = \begin{cases} 1 & \text{ if preson } i \text{ graduating from university} \\ 0 & \text{ otherwise} \end{cases} \end{align*} $$ | Educate level | | Primary school | Junior high school | University | | ------------------ | --- | -------------- | ------------------ |:---------- | | Primary school | | 1 | 0 | 0 | | Junior high school | | 0 | 1 | 0 | | University | | 0 | 0 | 1 | | ... | | ... | ... | ... | | Junior high school | | 0 | 1 | 0 | ### Why One-hot Encoding Solve 3 problem 1. Not binary: educate level (Primary, Junior, University) 2. Not ratio: educate level (University - Junior $\ne$ Junior - Primary) 3. Not ordinal: color (red $\not >$ green) Create 1 problem 1. Sparse ## **Additive Effects** ### Binary Example Does education level affect salary? - Salary $(y)$: dollar (numerical) - Seniority $(x_1)$: years (numerical) - University graduate $(x_2)$: true or false (binary) ![000064](https://hackmd.io/_uploads/ry1Dzj-Dp.png) Consider model: $$ y_i = \beta_0 + \beta_1 x_{i1} + \beta_{2} x_{i2} + \varepsilon_i $$ ![000078](https://hackmd.io/_uploads/HJkdGsbPp.png) Actually, $\beta_2$ just an intercept $$ \begin{align*} y_i = \begin{cases} \beta_1 x_{i1} + \beta_0 & \text{ if } x_{i2} = 0 \\ \beta_1 x_{i1} + \beta_0 + \beta_2 & \text{ if } x_{i2} = 1 \\ \end{cases} \end{align*} $$ ![000079](https://hackmd.io/_uploads/SJAtfi-D6.png) ### Multinomial Example Does education level affect salary? - Salary $(y)$: dollar (numerical) - Seniority $(x_1)$: years (numerical) - Educate level $(x_2)$: university, junior or primary (multinomial) ![000012](https://hackmd.io/_uploads/SJbD7iZPT.png) Make one-hot encoding $$ \begin{align*} X_{3i} & = \begin{cases} 1 & \text{ if preson } i \text{ graduating from primary school} \\ 0 & \text{ otherwise} \end{cases} \\ X_{4i} & = \begin{cases} 1 & \text{ if preson } i \text{ graduating from junior high school} \\ 0 & \text{ otherwise} \end{cases} \\ X_{5i} & = \begin{cases} 1 & \text{ if preson } i \text{ graduating from university} \\ 0 & \text{ otherwise} \end{cases} \end{align*} $$ Note that, $$ X_{3i} + X_{4i} + X_{5i} = 1 $$ i.e. $X_5$ is a linear combination of $X_3$ and $X_4$ $$ X_{5i} = 1 - X_{3i} - X_{4i} $$ Consider model: $$ \begin{align*} y_i & = \beta_0 + \beta_1 x_{i1} + \beta_{3} x_{i3} + \beta_{4} x_{i4} + \varepsilon_i \\ & = \begin{cases} \beta_1 x_{i1} + \beta_0 & \text{ if } x_{2i} = \text{"primary"} \\ \beta_1 x_{i1} + \beta_0 + \beta_3 & \text{ if } x_{2i} = \text{"junior"} \\ \beta_1 x_{i1} + \beta_0 + \beta_4 & \text{ if } x_{2i} = \text{"university"} \\ \end{cases} \end{align*} $$ ![0000a4](https://hackmd.io/_uploads/Hyc3zobPp.png) ## Interaction Does education level affect salary? - Salary $(y)$: dollar (numerical) - Seniority $(x_1)$: years (numerical) - University graduate $(x_2)$: true or false (binary) ![000010](https://hackmd.io/_uploads/rkZTfsZwT.png) Consider $$ \begin{align*} y_i & = \beta_0 + \beta_1 x_{i1} + \beta_{2} x_{i2} + \varepsilon_i \\ & = \begin{cases} \beta_1 x_{i1} + \beta_0 & \text{ if } x_{i2} = 0 \\ \beta_1 x_{i1} + \beta_0 + \beta_2 & \text{ if } x_{i2} = 1 \\ \end{cases} \end{align*} $$ ![000036](https://hackmd.io/_uploads/HysaGobvT.png) A slope parameter can no longer be interpreted as the change in the mean response. $$ \begin{align*} y_i & = \beta_0 + \beta_1 x_{i1} + \beta_{2} x_{i2} + \beta_{12} x_{i1} x_{i2} + \varepsilon_i \\ & = \begin{cases} \beta_0 + \beta_1 x_{i1} & \text{ if } x_{i2} = 0 \\ (\beta_0 + \beta_2) + (\beta_1 + \beta_{12}) x_{i1} & \text{ if } x_{i2} = 1 \\ \end{cases} \end{align*} $$ ![00004e](https://hackmd.io/_uploads/H1XRMiZw6.png) ### Interaction Interpret Never interpret the main effect in the presence of an interaction. ![00005e](https://hackmd.io/_uploads/HkGyQsbPa.png) ## Hypothesis Test ### Limit and Benefit Check normal assumption ![Untitled](https://hackmd.io/_uploads/rksJXoZva.png) ![Untitled 1](https://hackmd.io/_uploads/H1017jZDp.png) Violate: constant variance ![Untitled 2](https://hackmd.io/_uploads/HkWWms-vT.png) ![Untitled 3](https://hackmd.io/_uploads/B1--7o-wT.png) Benefit: sample size increase, easier to pass hypothesis test, and smaller confidence interval. ![圖片](https://hackmd.io/_uploads/HydSl2vDT.png) ### One Parameter Test In model $$ y_i = \beta_0 + \beta_1 x_{i1} + \beta_{2} x_{i2} + \varepsilon_i $$ ![000078](https://hackmd.io/_uploads/SJdbmj-va.png) To test: $$ H : \beta_2 = \beta_{2,0} \quad \text{against} \quad A : \beta_2 \ne \beta_{2,0} $$ By $$ t^* = \frac{\beta_2 - \beta_{2,0}}{s \{ \beta_2 \}} \sim t_{n - p} $$ In significance level $\alpha \in (0, 1)$ - If $t^* \leq t_{n - p} (1 - \alpha)$ , conclude $H$ - If $t^* > t_{n - p} (1 - \alpha)$ , conclude $A$ ### Multi Parameter Test Seem $X_1 X_2 = X_3$ as a new predictor $$ \begin{align*} y_i & = \beta_0 + \beta_1 x_{i1} + \beta_{2} x_{i2} + \beta_{12} x_{i1} x_{i2} + \varepsilon_i \\ & = \beta_0 + \beta_1 x_{i1} + \beta_{2} x_{i2} + \beta_{12} x_{i3} + \varepsilon_i \end{align*} $$ ![00004e](https://hackmd.io/_uploads/HJkGQjWP6.png) To test: $$ \begin{align*} & H : \beta_2 = \beta_{12} = 0 \quad \text{(Redce model)} \\ & A : \text{ not all } \beta_i = 0 \quad \text{(Full model)} \end{align*} $$ By general linear approach $$ F^* = \frac{SSE_R - SSE_F}{df_R - df_F} / \frac{SSE_F}{df_F} \sim F_{df_R - df_F, df_F} $$ In significance level $\alpha \in (0, 1)$ - If $F^* \leq F_{df_R - df_F, df_F} (1 - \alpha)$ , conclude $H$ - If $F^* > F_{df_R - df_F, df_F} (1 - \alpha)$ , conclude $A$ ## Piece Wise Linear Regression Models In classification and regression trees (CART) sense, need to find a good separation. ### Intercept Piece Wise Need to determined $c$ $$ x_{i2} = I (x_{i1} > c) $$ Only consider intercept term $$ \begin{align*} y_i & = \beta_0 + \beta_2 x_{i2} \\ & = \begin{cases} \beta_0 & \text{ if } x_{i1} \leq c \\ (\beta_0 + \beta_2) & \text{ if } x_{i1} > c \\ \end{cases} \end{align*} $$ ![cut](https://hackmd.io/_uploads/HyDzQs-va.gif) ### Linear Piece Wise Data is piecewise linear. Need determined $c$ that separate into two regression line $$ x_{i2} = I (x_{i1} > c) $$ Fitted by $$ \begin{align*} y_i & = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \beta_{12} x_{i1} x_{i2} \\ & = \begin{cases} \beta_0 + \beta_1 x_{i1} & \text{ if } x_{i1} \leq c \\ (\beta_0 + \beta_2) + (\beta_1 + \beta_{12}) x_{i1} & \text{ if } x_{i1} > c \\ \end{cases} \end{align*} $$ ![Regression_Tree](https://hackmd.io/_uploads/HkTMXiWwp.gif) [Home Page](/_9v1g3C3TXmUfBbxkjnn0A) [toc]