Linear regression

# Linear regression  slide: https://hackmd.io/@ccornwell/Linear-regression --- <h3>Idea of Linear regression</h3> ![](https://i.imgur.com/fgtBcI9.png =400x) - <font size=+3>Consider underlying relation between $x$ and $y$ as: $y=wx+b$ for some slope & intercept...</font> - <font size=+3>the $y$ values have some *noise* influencing them.</font> --- <h3>Idea of Linear regression</h3> ![](https://i.imgur.com/oTjOpp9.png =400x) - <font size=+3>Consider underlying relation between $x$ and $y$ as: $y=wx+b$ for some slope & intercept...</font> - <font size=+3>the $y$ values have some *noise* influencing them.</font> --- <h3>Line of "Best Fit"</h3> - <font size=+3>Write $x_1,x_2,\ldots,x_m$ for $x$-coords of points. (and similarly for $y$-coords)</font> - <font size=+3>Would like to have $w, b$ so that $y_i=wx_i+b$, for $i=1,\ldots,m$. But points not on a line: impossible.</font> - <font size=+2>Let $A = \begin{bmatrix}1&x_1\\ \vdots\\ 1&x_m\end{bmatrix}$. Then, saying no solution to $$A\begin{bmatrix}b\\ w\end{bmatrix} = \begin{bmatrix}y_1\\ \vdots\\ y_m\end{bmatrix}$$</font> --- <h3>Line of "Best Fit"</h3> - <font size=+2>Let $A = \begin{bmatrix}1&x_1\\ \vdots\\ 1&x_m\end{bmatrix}$. Then, saying no solution to $$A\begin{bmatrix}b\\ m\end{bmatrix} = \begin{bmatrix}y_1\\ \vdots\\ y_m\end{bmatrix} = {\bf y}$$</font> - <font size=+3>Means: ${\bf y}$ vector not in column space of $A$.</font> - <font size=+3>Linear regression: replace ${\bf y}$ with its orthogonal projection to column space of $A$, call it $\hat{\bf y}$; then solve the $$A\begin{bmatrix}b\\ w\end{bmatrix}=\hat{\bf y}.$$ </font> ---- <h3>Line of "Best Fit"</h3> - <font size=+3>How do you find $\hat{\bf y}$?</font> - <font size=+3>Do Gram-Schmidt on the 2 columns of $A$: get orthonormal basis of col. space: ${\bf u_1}, {\bf u_2}$; find $$\alpha_i = {\bf u_i}\cdot{\bf y};$$ Then $\hat{\bf y} = \alpha_1{\bf u_1} + \alpha_2{\bf u_2}$.</font> - <font sizse=+3>Then solve the equation. Solution: $(\hat b, \hat w)$. --- <h3>Discussion</h3> <br /> <br /> <br /> <br /> <br /> <br /> --- <h3> Solving the normal equation </h3> - <font size=+3>Just heard about using the *normal equation* (1) to find the slope and intercept of the least squares regression line. $(A^\top A)\begin{bmatrix}b\\ w\end{bmatrix} = A^\top{\bf y} \qquad\qquad(1)$ </font> - <font size=+3>Why does this give the same line as the orthogonal projection approach?</font> --- <h3> Relating the two approaches </h3> > <font size=+2>**Fact 1:** Null space of $A^\top =\{$vectors orthog. to col. space of $A\}$.</font>  <br /> <br /> <br /> <br /> <br /> ---- <h3> Relating the two approaches </h3> > <font size=+2>**Fact 1:** Null space of $A^\top =\{$vectors orthog. to col. space of $A\}$.</font> > <font size=+2>**Fact 2:** $\dim$ of col. space of $A$ is number of pivots (the *rank* of $A$); $\dim$ of null space of $A^\top$ is $m-($number of pivots$)$.</font> ---- <h3> Relating the two approaches </h3> - <font size=+2>**Consequence:** Can make basis of $\mathbb R^m$ out of basis of each: {col basis} + {ortho null basis}; - every ${\bf y} = \hat{\bf y} + {\bf q}$, where ${\bf q}$ in null space of $A^\top$.</font> <span style="color:#181818;"> Now: solution to normal eq'n is unique$^*$; but if $\hat{\bf x}$ makes $A\hat{\bf x}=\hat{\bf y}$, then $$(A^\top A)\hat{\bf x} = A^\top\hat{\bf y} = A^\top{\bf y}.$$ </span> ---- <h3> Relating the two approaches </h3> - <font size=+2>**Consequence:** Can make basis of $\mathbb R^m$ out of basis of each: {col basis} + {ortho null basis}; - every ${\bf y} = \hat{\bf y} + {\bf q}$, where ${\bf q}$ in null space of $A^\top$.</font> - <font size=+2>Now: solution to normal eq'n is unique$^*$; but if $\hat{\bf x}$ makes $A\hat{\bf x}=\hat{\bf y}$, then $$(A^\top A)\hat{\bf x} = A^\top\hat{\bf y} = A^\top{\bf y}.$$</font> --- <h3> The Mean Squared Error </h3> - <font size=+2>Given *fixed* data ($m$ points); take any slope,intercept $w, b$.</font> <font size=+2>Define $\operatorname{MSE}(w,b) = \frac{1}{m}\sum_{i=1}^m(h(x_i)-y_i)^2$, where $h(x_i)=wx_i + b$.</font> <span style="color:#181818;"> - <font size=+2>For a linear model $h$ with parameters $w,b$, this $\operatorname{MSE}(w,b)$ measures how well $h$ fits the data.</font> - <font size=+2>Say don't know the l.sq. regression line. What to do in order to minimize $\operatorname{MSE}(w,b)$?</font> </span> ---- <h3> The Mean Squared Error </h3> - <font size=+2>Given *fixed* data ($m$ points); take any slope,intercept $w, b$.</font> <font size=+2>Define $\operatorname{MSE}(w,b) = \frac{1}{m}\sum_{i=1}^m(h(x_i)-y_i)^2$, where $h(x_i)=wx_i + b$.</font> - <font size=+2>For a linear model $h$ with parameters $w,b$, this $\operatorname{MSE}(w,b)$ measures how well $h$ fits the data.</font> - <font size=+2>Say don't know the l.sq. regression line. What to do in order to minimize $\operatorname{MSE}(w,b)$?</font> --- <h3>Discussion</h3> <br /> <br /> <br /> <br /> <br /> <br />