# Multiple Linear Regression
###### tags: multiple linear regression, linear regression, statistics
## Summary
- $\hat{B}=(X^TX)^{-1}X^TY$ is unbiased
- Variance of $\hat{B} = \Sigma_{\hat{B}\hat{B}} = \sigma^2(X^TX)^{-1}$.
- Distribution of $\hat{B}$
### WARM UP! Important Theorems
#### :rocket: Theorem 1: Expectation of Vector Form
If $Y = c + AX$ where X is a random vector, A is a fixed matrix and c is a fixed vector, then
$$
E[Y] = c + AE[X]
$$
<span style="color:grey">
Proof:
$$
\begin{bmatrix}
y_1\\
y_2 \\
\cdots \\
y_n\\
\end{bmatrix} =
\begin{bmatrix}
c_1\\
c_2 \\
\cdots \\
c_n\\
\end{bmatrix} +
\begin{bmatrix}
a_{11} & \cdots & a_{1n}\\
a_{21} & \cdots & a_{2n}\\
\cdots & \cdots &\cdots\\
a_{n1}& \cdots & a_{nn}\\
\end{bmatrix}
\times
\begin{bmatrix}
x_1\\
x_2 \\
\cdots \\
x_n\\
\end{bmatrix}
$$
ith component of y: $y_i = c_i + \sum_{i=1}^{n}a_{ij}x_j$.
$E[Y_i] = c_i + \sum E[a_{ijX_j}] = c_i + \sum_{i=1}^{n} a_{ij}E[X_j]$ since $a_{ij}$ is a value from a fixed vector and is not a random variable.
</span>
#### :rocket: Covariance Matrix
If X is a $p \times 1$ random vector, then the covariance matrix $\sum_{xx}$ represents the covariance between ith row and jth column. In other words, (i,j) element of $\sum_{xx}$ is the cov($x_i,x_j$) = $\sigma_{ij}$.
<span style="color:grey">
Since X is a random vector, we need to measure the variance between the components of X. Personally, I overlooked the fact that X is a random vector, and that is why I was confused why we even need the covariance matrix in the first place.
</span>
Another important fact is that covariance matrix $\sigma_{xx}$ is **symmetric**, and the diagonal elements represents the **variance** of $x_i$.
#### :rocket: Covariance Matrix of Y
If $Y = c + AX$ where X is a random vector, A is a fixed matrix, and c is a fixed vector, and $\sigma_{xx}$ is a covariance matrix, then the covariance matrix of Y is
$$
\sigma_{YY} = A\sigma_{XX}A^T
$$
<span style="color:grey">
Why is this theorem even important? Y is sort of like a linear transformation of X. If we know the covariance matrix of X, then we can find the covariance matrix of Y.
</span>
Proof:
$cov(y_i,y_j) = cov(c_i+\sum_{k=1}^{n}a_{ik}x_k, c_j+\sum_{l=1}^{n}a_{jl}x_k)$
Since constant does not affect the covariance,
$=cov(\sum_{k=1}^{n}a_{ik}x_k, \sum_{l=1}^{n}a_{jl}x_k)$
applying this $cov(A+B, C+D) = cov(A,C+D) + cov(B, C+D) = cov(A,C) + cov(A,D) + cov(B,C) + cov(B,D)$
$=\sum_{k=1}^{n}\sum_{l=1}^{n}cov(a_{ik}x_k,a_{jl}x_k)$
Since matrix A is a fixed matrix,
$=\sum_{k=1}^{n}\sum_{l=1}^{n}a_{ik}a_{jl}cov(x_k,x_k)$
$=\sum_{k=1}^{n}a_{ik}\sum_{l=1}^{n}a_{jl} \sigma_{jl}$
$=A\sum_{XX}A^T$
#### :rocket: Covariance Matrix of Y
The **trace** of a square matrix nxn is the sum of the diagonal entries of the matrix.
Theorem: If X is a random n x 1 vector with mean $\mu$ and covaraince $\sigma$, and A is a fixed matrix, then
$$
E[X^TAX] = trace(A\Sigma) + \mu^T A\mu$$
Proof:
$Cov(x_i,x_j) = E[x_ix_j] - E[x_i]E[x_j]$ -> $E[x_ix_j] = \sigma_{ij} + \mu_i\mu_j$
$X^TAX = \sum_{i=1}^{n} \sum_{j=1}^{n} a_{ij}x_j$
$E[X^TAX] = E[\sum_{i=1}^{n} \sum_{j=1}^{n} a_{ij}x_j] = \sum_{i=1}^{n} \sum_{j=1}^{n}E[x_j]a_{ij} = \sum \sum(\sigma_{ij} + \mu_i \mu_j)a_{ij}$
$=\sum_{i=1}^{n} \sum_{j=1}^{n} \sigma_{ij} a_{ij} + \sum_{j=1}^{n} \mu_i \mu_j a_{ij}$
Since (ith diagonal of $A\Sigma$) : $(A\Sigma)_{ij} = \sum_{j=1}^{n}a_{ij}\sigma_{ji} = \sum_{j=1}^{n} a_{ij}\sigma_{ij}$
=$\sum_{i=1}^{n}$ (ith diagonal of $A\Sigma$) $+ \mu^TA\mu$ where $\mu^TA\mu$ is a quadratic form.
*Quadratic form:*
If x is a p x 1 column vector and A is a p x p symmetric matrix, then the quadratic form is: $x^TAx = \sum_{i=1}^{p} \sum_{j=1} ^{p} a_{ij}x_ix_j$
### Variance of $\hat{B}$
**$$Y = XB + e$$**
B is a vector!
Theorem: If the errors have mean zero, and uncorrelated and homoskedastic with Var($e_i$) = $\sigma^2$, then
$$
\sigma_{\hat{B}\hat{B}} = \sigma^2(X^TX)^{-1}
$$
Proof:
1. Covariance matrix of $e_{nxn}$ vector: $\Sigma_{ee} = \sigma^2 I$
2. If Y = C + AX, then $\Sigma_{yy} = A\Sigma_{xx}A^T$
3. Y = XB + e -> Since $XB$ is a constant, $\Sigma_{yy} = \Sigma_{ee}$
$\hat{B} = (X^TX)^{-1}X^TY = (X^TX)^{-1}X^T \Sigma_{YY} ((X^TX)^{-1}X^T)^T$
= $(X^TX)^{-1}X^T \Sigma_{YY} X((X^TX)^{-1})^T$
= $(X^TX)^{-1}X^T \Sigma_{YY} X(X^TX)^{-1}$
$\Sigma\Sigma_{\hat{B}\hat{B}} = (X^TX)^{-1}X^T \Sigma_{ee} X(X^TX)^{-1} = (X^TX)^{-1}X^T\sigma^2I(x)(X^X)^{-1}$
=$\sigma^2 (X^TX)^{-1}X^TX(X^TX)^{-1}$
=$\sigma^2(X^TX)^{-1}$