Multiple Linear Regression

# Multiple Linear Regression ###### tags: multiple linear regression, linear regression, statistics ## Summary - $\hat{B}=(X^TX)^{-1}X^TY$ is unbiased - Variance of $\hat{B} = \Sigma_{\hat{B}\hat{B}} = \sigma^2(X^TX)^{-1}$. - Distribution of $\hat{B}$ ### WARM UP! Important Theorems #### :rocket: Theorem 1: Expectation of Vector Form If $Y = c + AX$ where X is a random vector, A is a fixed matrix and c is a fixed vector, then $$ E[Y] = c + AE[X] $$ Proof: $$ \begin{bmatrix} y_1\\ y_2 \\ \cdots \\ y_n\\ \end{bmatrix} = \begin{bmatrix} c_1\\ c_2 \\ \cdots \\ c_n\\ \end{bmatrix} + \begin{bmatrix} a_{11} & \cdots & a_{1n}\\ a_{21} & \cdots & a_{2n}\\ \cdots & \cdots &\cdots\\ a_{n1}& \cdots & a_{nn}\\ \end{bmatrix} \times \begin{bmatrix} x_1\\ x_2 \\ \cdots \\ x_n\\ \end{bmatrix} $$ ith component of y: $y_i = c_i + \sum_{i=1}^{n}a_{ij}x_j$. $E[Y_i] = c_i + \sum E[a_{ijX_j}] = c_i + \sum_{i=1}^{n} a_{ij}E[X_j]$ since $a_{ij}$ is a value from a fixed vector and is not a random variable. #### :rocket: Covariance Matrix If X is a $p \times 1$ random vector, then the covariance matrix $\sum_{xx}$ represents the covariance between ith row and jth column. In other words, (i,j) element of $\sum_{xx}$ is the cov($x_i,x_j$) = $\sigma_{ij}$. Since X is a random vector, we need to measure the variance between the components of X. Personally, I overlooked the fact that X is a random vector, and that is why I was confused why we even need the covariance matrix in the first place. Another important fact is that covariance matrix $\sigma_{xx}$ is **symmetric**, and the diagonal elements represents the **variance** of $x_i$. #### :rocket: Covariance Matrix of Y If $Y = c + AX$ where X is a random vector, A is a fixed matrix, and c is a fixed vector, and $\sigma_{xx}$ is a covariance matrix, then the covariance matrix of Y is $$ \sigma_{YY} = A\sigma_{XX}A^T $$ Why is this theorem even important? Y is sort of like a linear transformation of X. If we know the covariance matrix of X, then we can find the covariance matrix of Y. Proof: $cov(y_i,y_j) = cov(c_i+\sum_{k=1}^{n}a_{ik}x_k, c_j+\sum_{l=1}^{n}a_{jl}x_k)$ Since constant does not affect the covariance, $=cov(\sum_{k=1}^{n}a_{ik}x_k, \sum_{l=1}^{n}a_{jl}x_k)$ applying this $cov(A+B, C+D) = cov(A,C+D) + cov(B, C+D) = cov(A,C) + cov(A,D) + cov(B,C) + cov(B,D)$ $=\sum_{k=1}^{n}\sum_{l=1}^{n}cov(a_{ik}x_k,a_{jl}x_k)$ Since matrix A is a fixed matrix, $=\sum_{k=1}^{n}\sum_{l=1}^{n}a_{ik}a_{jl}cov(x_k,x_k)$ $=\sum_{k=1}^{n}a_{ik}\sum_{l=1}^{n}a_{jl} \sigma_{jl}$ $=A\sum_{XX}A^T$ #### :rocket: Covariance Matrix of Y The **trace** of a square matrix nxn is the sum of the diagonal entries of the matrix. Theorem: If X is a random n x 1 vector with mean $\mu$ and covaraince $\sigma$, and A is a fixed matrix, then $$ E[X^TAX] = trace(A\Sigma) + \mu^T A\mu$$ Proof: $Cov(x_i,x_j) = E[x_ix_j] - E[x_i]E[x_j]$ -> $E[x_ix_j] = \sigma_{ij} + \mu_i\mu_j$ $X^TAX = \sum_{i=1}^{n} \sum_{j=1}^{n} a_{ij}x_j$ $E[X^TAX] = E[\sum_{i=1}^{n} \sum_{j=1}^{n} a_{ij}x_j] = \sum_{i=1}^{n} \sum_{j=1}^{n}E[x_j]a_{ij} = \sum \sum(\sigma_{ij} + \mu_i \mu_j)a_{ij}$ $=\sum_{i=1}^{n} \sum_{j=1}^{n} \sigma_{ij} a_{ij} + \sum_{j=1}^{n} \mu_i \mu_j a_{ij}$ Since (ith diagonal of $A\Sigma$) : $(A\Sigma)_{ij} = \sum_{j=1}^{n}a_{ij}\sigma_{ji} = \sum_{j=1}^{n} a_{ij}\sigma_{ij}$ =$\sum_{i=1}^{n}$ (ith diagonal of $A\Sigma$) $+ \mu^TA\mu$ where $\mu^TA\mu$ is a quadratic form. *Quadratic form:* If x is a p x 1 column vector and A is a p x p symmetric matrix, then the quadratic form is: $x^TAx = \sum_{i=1}^{p} \sum_{j=1} ^{p} a_{ij}x_ix_j$ ### Variance of $\hat{B}$ **$$Y = XB + e$$** B is a vector! Theorem: If the errors have mean zero, and uncorrelated and homoskedastic with Var($e_i$) = $\sigma^2$, then $$ \sigma_{\hat{B}\hat{B}} = \sigma^2(X^TX)^{-1} $$ Proof: 1. Covariance matrix of $e_{nxn}$ vector: $\Sigma_{ee} = \sigma^2 I$ 2. If Y = C + AX, then $\Sigma_{yy} = A\Sigma_{xx}A^T$ 3. Y = XB + e -> Since $XB$ is a constant, $\Sigma_{yy} = \Sigma_{ee}$ $\hat{B} = (X^TX)^{-1}X^TY = (X^TX)^{-1}X^T \Sigma_{YY} ((X^TX)^{-1}X^T)^T$ = $(X^TX)^{-1}X^T \Sigma_{YY} X((X^TX)^{-1})^T$ = $(X^TX)^{-1}X^T \Sigma_{YY} X(X^TX)^{-1}$ $\Sigma\Sigma_{\hat{B}\hat{B}} = (X^TX)^{-1}X^T \Sigma_{ee} X(X^TX)^{-1} = (X^TX)^{-1}X^T\sigma^2I(x)(X^X)^{-1}$ =$\sigma^2 (X^TX)^{-1}X^TX(X^TX)^{-1}$ =$\sigma^2(X^TX)^{-1}$