Gradient descent & Logistic regression

# Gradient descent for Logistic regression  slide: https://hackmd.io/@ccornwell/grad-desc-logistic-regression --- <h3>The (negative) log-loss function</h3> - <font size=+2>In blog post, logistic regression model $h({\bf x}) = \sigma_k({\bf w}\cdot{\bf x}+b)$ has parameter $k$. Just use $k=1$ (for simplicity): $h({\bf x}) = 1/(1+e^{-{\bf w}\cdot{\bf x_i}+b})$.</font> --- <h3>The (negative) log-loss function</h3> - <font size=+2>Given *instance* data $S=[{\bf x_1},\ldots,{\bf x_m}]^\top$, corresponding labels $y_1,\ldots,y_m$ (in $\{1,-1\}$), and given parameters ${\bf w}$ and $b$, want loss function for model $h$ determined by ${\bf w}, b$.</font> <span style="color:#181818;"> - <font size=+2>Loss function should: (i) be affected by each ${\bf x_i}$; (ii) have lower value when $h$ is more often correct; (iii) have lower value if $h$ is more "certain" about points labeled correctly.</font> </span> ---- <h3>The (negative) log-loss function</h3> - <font size=+2>Given *instance* data $S=[{\bf x_1},\ldots,{\bf x_m}]^\top$, corresponding labels $y_1,\ldots,y_m$ (in $\{1,-1\}$), and given parameters ${\bf w}$ and $b$, want loss function for model $h$ determined by ${\bf w}, b$.</font> - <font size=+2>Loss function should: (i) be affected by each ${\bf x_i}$; (ii) have lower value when $h$ is more often correct; (iii) have lower value if $h$ is more "certain" about points labeled correctly.</font> --- <h3>The (negative) log-loss function</h3> - <font size=+2>A loss function that achieves (i), (ii), and (iii):</font> > $\frac{1}{m}\sum_{i=1}^m\log\left(1 + e^{-y_i({\bf w}\cdot{\bf x_i}+b)}\right).$ - <font size=+2>On each instance, $\log\left(1 + e^{-y_i({\bf w}\cdot{\bf x_i}+b)}\right) = \ldots$</font> <span style="color:#181818;"> - <font size=+2>When $y_i=1$: get $-\log(h({\bf x_i}))$</font> - <font size=+2>When $y_i=-1$: get $-\log(1 - h({\bf x_i}))$</font> </span> <br /> <br /> <br /> <br /> ---- <h3>The (negative) log-loss function</h3> - <font size=+2>A loss function that achieves (i), (ii), and (iii):</font> > $\frac{1}{m}\sum_{i=1}^m\log\left(1 + e^{-y_i({\bf w}\cdot{\bf x_i}+b)}\right).$ - <font size=+2>On each instance, $\log\left(1 + e^{-y_i({\bf w}\cdot{\bf x_i}+b)}\right) = \ldots$</font> - <font size=+2>When $y_i=1$, get: $-\log(h({\bf x_i}))$</font> - <font size=+2>When $y_i=-1$, get: $-\log(1 - h({\bf x_i}))$</font> <br /> <br /> <br /> <br /> --- <h3>The (negative) log-loss function</h3> - <font size=+2>Since $-\lim_{t\to 1^-}\log(t) = 0$ and $-\lim_{t\to 0^+}\log(t) = \infty$,</font> - <font size=+2>if model $h$ is very certain, and correct (so, $h({\bf x_i})\approx 1$ when $\overset{\circ}{y_i}=1$; or $1-h({\bf x_i})\approx 1$ when $\overset{\circ}{y_i}=0$), little contribution to loss;</font> <span style="color:#181818;"> - <font size=+2>if model $h$ is very certain, but wrong ($h({\bf x_i})\approx 1$ when $\overset{\circ}{y_i}=0$, for example), big contribution to loss: $-\log(1 - h({\bf x_i}))$</font> </span> ---- <h3>The (negative) log-loss function</h3> - <font size=+2>Since $-\lim_{t\to 1^-}\log(t) = 0$ and $-\lim_{t\to 0^+}\log(t) = \infty$,</font> - <font size=+2>if model $h$ is very certain, and correct (so, $h({\bf x_i})\approx 1$ when $\overset{\circ}{y_i}=1$; or $1-h({\bf x_i})\approx 1$ when $\overset{\circ}{y_i}=0$), little contribution to loss;</font> - <font size=+2>if model $h$ is very certain, but wrong ($h({\bf x_i})\approx 1$ when $\overset{\circ}{y_i}=0$, for example), big contribution to loss: $-\log(1 - h({\bf x_i}))$</font> --- <h3>Partial derivatives of log-loss</h3> - <font size=+2>Need to find $\partial/\partial w_j$ of log-loss, $j=1,\ldots,n$, and partial w.r.t. $b$.</font> - <font size=+2>First, what is</font> $\frac{\partial}{\partial w_j}({\bf w}\cdot{\bf x_i}+b)$? - <font size=+2>And, what about</font> $\frac{\partial}{\partial b}({\bf w}\cdot{\bf x_i}+b)$? --- <h3>Partial derivatives of log-loss</h3> - <font size=+2>So, write $z = {\bf w}\cdot{\bf x_i}+b$. We have the partials of $z$.</font> - <font size=+2>Now, to find</font> $\frac{\partial}{\partial z}\log(1 + e^{-z})$? <br /> <br /> <br /> <br /> <br /> <br /> --- <h3>Partial derivatives of log-loss</h3> - <font size=+2>Bring it all together: say $z = {\bf w}\cdot{\bf x_i}+b$, and ${y_i}=1$.</font> <span style="color:#181818;"> $\frac{\partial}{\partial w_j}\log(1 + e^{-z}) = \frac{\partial}{\partial z}\log(1 + e^{-z})\frac{\partial}{\partial w_j}(z)$ $=\frac{-e^{-z}}{1+e^{-z}}(x_{i,j})$ $= x_{i,j}(h({\bf x_i}) - 1) = x_{i,j}(h({\bf x_i}) - \overset{\circ}{y_i})$. - <font size=+2>What happens when ${y_i}=-1$ (that is, $\overset{\circ}{y_i}=0$)?</font> Hint: encountering $\frac{e^z}{1+e^z}$... multiply by $\frac{e^{-z}}{e^{-z}}$. </span> ---- <h3>Partial derivatives of log-loss</h3> - <font size=+2>Bring it all together: say $z = {\bf w}\cdot{\bf x_i}+b$, and ${y_i}=1$.</font> $\frac{\partial}{\partial w_j}\log(1 + e^{-z}) = \frac{\partial}{\partial z}\log(1 + e^{-z})\frac{\partial}{\partial w_j}(z)$ <span style="color:#181818;"> $=\frac{-e^{-z}}{1+e^{-z}}(x_{i,j})$ $= x_{i,j}(h({\bf x_i}) - 1) = x_{i,j}(h({\bf x_i}) - \overset{\circ}{y_i})$. - <font size=+2>What happens when ${y_i}=-1$ (that is, $\overset{\circ}{y_i}=0$)?</font> Hint: encountering $\frac{e^z}{1+e^z}$... multiply by $\frac{e^{-z}}{e^{-z}}$. </span> ---- <h3>Partial derivatives of log-loss</h3> - <font size=+2>Bring it all together: say $z = {\bf w}\cdot{\bf x_i}+b$, and ${y_i}=1$.</font> $\frac{\partial}{\partial w_j}\log(1 + e^{-z}) = \frac{\partial}{\partial z}\log(1 + e^{-z})\frac{\partial}{\partial w_j}(z)$ $=\frac{-e^{-z}}{1+e^{-z}}(x_{i,j})$ $= x_{i,j}(h({\bf x_i}) - 1) = x_{i,j}(h({\bf x_i}) - \overset{\circ}{y_i})$. <span style="color:#181818;"> - <font size=+2>What happens when ${y_i}=-1$ (that is, $\overset{\circ}{y_i}=0$)?</font> Hint: encountering $\frac{e^z}{1+e^z}$... multiply by $\frac{e^{-z}}{e^{-z}}$. </span> ---- <h3>Partial derivatives of log-loss</h3> - <font size=+2>Bring it all together: say $z = {\bf w}\cdot{\bf x_i}+b$, and ${y_i}=1$.</font> $\frac{\partial}{\partial w_j}\log(1 + e^{-z}) = \frac{\partial}{\partial z}\log(1 + e^{-z})\frac{\partial}{\partial w_j}(z)$ $=\frac{-e^{-z}}{1+e^{-z}}(x_{i,j})$ $= x_{i,j}(h({\bf x_i}) - 1) = x_{i,j}(h({\bf x_i}) - \overset{\circ}{y_i})$. - <font size=+2>What happens when ${y_i}=-1$ (that is, $\overset{\circ}{y_i}=0$)?</font> Hint: encountering $\frac{e^z}{1+e^z}$... multiply by $\frac{e^{-z}}{e^{-z}}$. --- <h3> Discussion </h3> <br /> <br /> <br /> <br /> <br /> <br /> <br />