# Regression modeling Regression is an activity that leads to mathematical description of a process in terms of a set of associated. $Y$ response variable, $X$ one or more explanatory variable. ``` mermaid graph LR X --> Y ``` a mathematical description for the relationship, model, or law. In statistic we refer to unknow constants as **parameters**. The value of the parameters are usually determined by collecting data, and using data to estimate the parameters. # Regression Assumptions - Function Form $\begin{equation} \begin{split} E(Y) &= \beta_0 +\beta_1x\\ &= E(\beta_0 +\beta_1 x_i + \varepsilon_i)= E(\beta_0 +\beta_1 x_i) + E(\varepsilon_i)\\ &= E(\beta_0) +\beta_1E(x_i) = \beta_0 +\beta_1x_i \end{split} \end{equation}$ - Homogeneity of variance $Var(\varepsilon_i) = \sigma^2$ - Independency $\varepsilon(y_i)$ is independence - Normality $\varepsilon(y_i) \sim N(\mu,\sigma^2)$ ## Function Form If the assumptions are met, the residuals will be randomly scattered around the center line of zero, with no obvious pattern. The residuals will look like an unstructured cloud of points, centered at zero. ![RFunction Form](https://hackmd.io/_uploads/BJwlhEGe0.png) ### Test of Function Form - Ramsey RESET test [linked](https://www.aptech.com/resources/tutorials/econometrics/ols-diagnostics-model-specification/) $H_0:$ model is adequate - Rainbow TEST[linked](http://math.furman.edu/~dcs/courses/math47/R/library/lmtest/html/raintest.html) $H_0:$ model is adequate ## Homogeneity of variance If the residuals fan out as the predicted values increase, then we have what is known as heteroscedasticity. This means that the variability in the response is changing as the predicted value increases. This is a problem, in part, because the observations with larger errors will have more pull or influence on the fitted model. ![Homogeneity](https://hackmd.io/_uploads/r1gH_HGx0.png) ### Test of Homogeneity - Breusch-Pagan test [linked](https://www.statology.org/breusch-pagan-test/) $H_0$:Homoscedasticity is present - Brown–Forsythe test [linked](https://www.graphpad.com/guides/prism/latest/statistics/interpreting_welch_browne-forsythe_tests.htm) One-way ANOVA compares three or more unmatched groups, based on the assumption that the populations are Gaussian. The Welch and Brown-Forsythe versions of one-way ANOVA do not assume that all the groups were sampled from populations with equal variances. ## Independency ![Independency](https://hackmd.io/_uploads/SkyEgIfgA.png) Independency if only if ACF plot shows lag1 is 1 others should be under 0.2