# Linear regression
<!-- Put the link to this slide here so people can follow -->
slide: https://hackmd.io/@ccornwell/Linear-regression
---
<h3>Idea of Linear regression</h3>

- <font size=+3>Consider underlying relation between $x$ and $y$ as: $y=wx+b$ for some slope & intercept...</font>
- <font size=+3>the $y$ values have some *noise* influencing them.</font>
---
<h3>Idea of Linear regression</h3>

- <font size=+3>Consider underlying relation between $x$ and $y$ as: $y=wx+b$ for some slope & intercept...</font>
- <font size=+3>the $y$ values have some *noise* influencing them.</font>
---
<h3>Line of "Best Fit"</h3>
- <font size=+3>Write $x_1,x_2,\ldots,x_m$ for $x$-coords of points. (and similarly for $y$-coords)</font>
- <font size=+3>Would like to have $w, b$ so that $y_i=wx_i+b$, for $i=1,\ldots,m$. But points not on a line: impossible.</font>
- <font size=+2>Let $A = \begin{bmatrix}1&x_1\\ \vdots\\ 1&x_m\end{bmatrix}$. Then, saying no solution to $$A\begin{bmatrix}b\\ w\end{bmatrix} = \begin{bmatrix}y_1\\ \vdots\\ y_m\end{bmatrix}$$</font>
---
<h3>Line of "Best Fit"</h3>
- <font size=+2>Let $A = \begin{bmatrix}1&x_1\\ \vdots\\ 1&x_m\end{bmatrix}$. Then, saying no solution to $$A\begin{bmatrix}b\\ m\end{bmatrix} = \begin{bmatrix}y_1\\ \vdots\\ y_m\end{bmatrix} = {\bf y}$$</font>
- <font size=+3>Means: ${\bf y}$ vector not in column space of $A$.</font>
- <font size=+3>Linear regression: replace ${\bf y}$ with its orthogonal projection to column space of $A$, call it $\hat{\bf y}$; then solve the $$A\begin{bmatrix}b\\ w\end{bmatrix}=\hat{\bf y}.$$ </font>
----
<h3>Line of "Best Fit"</h3>
- <font size=+3>How do you find $\hat{\bf y}$?</font>
- <font size=+3>Do Gram-Schmidt on the 2 columns of $A$: get orthonormal basis of col. space: ${\bf u_1}, {\bf u_2}$; find $$\alpha_i = {\bf u_i}\cdot{\bf y};$$ Then $\hat{\bf y} = \alpha_1{\bf u_1} + \alpha_2{\bf u_2}$.</font>
- <font sizse=+3>Then solve the equation. Solution: $(\hat b, \hat w)$.
---
<h3>Discussion</h3>
<br />
<br />
<br />
<br />
<br />
<br />
---
<h3>
Solving the normal equation
</h3>
- <font size=+3>Just heard about using the *normal equation* (1) to find the slope and intercept of the least squares regression line.
$(A^\top A)\begin{bmatrix}b\\ w\end{bmatrix} = A^\top{\bf y} \qquad\qquad(1)$
</font>
- <font size=+3>Why does this give the same line as the orthogonal projection approach?</font>
---
<h3>
Relating the two approaches
</h3>
> <font size=+2>**Fact 1:** Null space of $A^\top =\{$vectors orthog. to col. space of $A\}$.</font>
<!-- <font sizse=+2>**Fact 2:** $\dim$ of col. space of $A$ is number of pivots (the *rank* of $A$); $\dim$ of null space of $A^\top$ is
$m-($number of pivots$)$.</font>-->
<br />
<br />
<br />
<br />
<br />
----
<h3>
Relating the two approaches
</h3>
> <font size=+2>**Fact 1:** Null space of $A^\top =\{$vectors orthog. to col. space of $A\}$.</font>
> <font size=+2>**Fact 2:** $\dim$ of col. space of $A$ is number of pivots (the *rank* of $A$); $\dim$ of null space of $A^\top$ is
$m-($number of pivots$)$.</font>
----
<h3>
Relating the two approaches
</h3>
- <font size=+2>**Consequence:** Can make basis of $\mathbb R^m$ out of basis of each: {col basis} + {ortho null basis};
- every ${\bf y} = \hat{\bf y} + {\bf q}$, where ${\bf q}$ in null space of $A^\top$.</font>
<span style="color:#181818;">
Now: solution to normal eq'n is unique$^*$;
but if $\hat{\bf x}$ makes $A\hat{\bf x}=\hat{\bf y}$, then
$$(A^\top A)\hat{\bf x} = A^\top\hat{\bf y} = A^\top{\bf y}.$$
</span>
----
<h3>
Relating the two approaches
</h3>
- <font size=+2>**Consequence:** Can make basis of $\mathbb R^m$ out of basis of each: {col basis} + {ortho null basis};
- every ${\bf y} = \hat{\bf y} + {\bf q}$, where ${\bf q}$ in null space of $A^\top$.</font>
- <font size=+2>Now: solution to normal eq'n is unique$^*$;
but if $\hat{\bf x}$ makes $A\hat{\bf x}=\hat{\bf y}$, then
$$(A^\top A)\hat{\bf x} = A^\top\hat{\bf y} = A^\top{\bf y}.$$</font>
---
<h3>
The Mean Squared Error
</h3>
- <font size=+2>Given *fixed* data ($m$ points); take any slope,intercept $w, b$.</font>
<font size=+2>Define $\operatorname{MSE}(w,b) = \frac{1}{m}\sum_{i=1}^m(h(x_i)-y_i)^2$, where $h(x_i)=wx_i + b$.</font>
<span style="color:#181818;">
- <font size=+2>For a linear model $h$ with parameters $w,b$, this $\operatorname{MSE}(w,b)$ measures how well $h$ fits the data.</font>
- <font size=+2>Say don't know the l.sq. regression line. What to do in order to minimize $\operatorname{MSE}(w,b)$?</font>
</span>
----
<h3>
The Mean Squared Error
</h3>
- <font size=+2>Given *fixed* data ($m$ points); take any slope,intercept $w, b$.</font>
<font size=+2>Define $\operatorname{MSE}(w,b) = \frac{1}{m}\sum_{i=1}^m(h(x_i)-y_i)^2$, where $h(x_i)=wx_i + b$.</font>
- <font size=+2>For a linear model $h$ with parameters $w,b$, this $\operatorname{MSE}(w,b)$ measures how well $h$ fits the data.</font>
- <font size=+2>Say don't know the l.sq. regression line. What to do in order to minimize $\operatorname{MSE}(w,b)$?</font>
---
<h3>Discussion</h3>
<br />
<br />
<br />
<br />
<br />
<br />
{"metaMigratedAt":"2023-06-15T19:53:05.291Z","metaMigratedFrom":"YAML","title":"Linear regression","breaks":true,"description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"da8891d8-b47c-4b6d-adeb-858379287e60\",\"add\":8191,\"del\":2825}]"}