# Gradient descent for Logistic regression
<!-- Put the link to this slide here so people can follow -->
slide: https://hackmd.io/@ccornwell/grad-desc-logistic-regression
---
<h3>The (negative) log-loss function</h3>
- <font size=+2>In blog post, logistic regression model $h({\bf x}) = \sigma_k({\bf w}\cdot{\bf x}+b)$ has parameter $k$. Just use $k=1$ (for simplicity): $h({\bf x}) = 1/(1+e^{-{\bf w}\cdot{\bf x_i}+b})$.</font>
---
<h3>The (negative) log-loss function</h3>
- <font size=+2>Given *instance* data $S=[{\bf x_1},\ldots,{\bf x_m}]^\top$, corresponding labels $y_1,\ldots,y_m$ (in $\{1,-1\}$), and given parameters ${\bf w}$ and $b$, want loss function for model $h$ determined by ${\bf w}, b$.</font>
<span style="color:#181818;">
- <font size=+2>Loss function should: (i) be affected by each ${\bf x_i}$; (ii) have lower value when $h$ is more often correct; (iii) have lower value if $h$ is more "certain" about points labeled correctly.</font>
</span>
----
<h3>The (negative) log-loss function</h3>
- <font size=+2>Given *instance* data $S=[{\bf x_1},\ldots,{\bf x_m}]^\top$, corresponding labels $y_1,\ldots,y_m$ (in $\{1,-1\}$), and given parameters ${\bf w}$ and $b$, want loss function for model $h$ determined by ${\bf w}, b$.</font>
- <font size=+2>Loss function should: (i) be affected by each ${\bf x_i}$; (ii) have lower value when $h$ is more often correct; (iii) have lower value if $h$ is more "certain" about points labeled correctly.</font>
---
<h3>The (negative) log-loss function</h3>
- <font size=+2>A loss function that achieves (i), (ii), and (iii):</font>
> $\frac{1}{m}\sum_{i=1}^m\log\left(1 + e^{-y_i({\bf w}\cdot{\bf x_i}+b)}\right).$
- <font size=+2>On each instance, $\log\left(1 + e^{-y_i({\bf w}\cdot{\bf x_i}+b)}\right) = \ldots$</font>
<span style="color:#181818;">
- <font size=+2>When $y_i=1$: get $-\log(h({\bf x_i}))$</font>
- <font size=+2>When $y_i=-1$: get $-\log(1 - h({\bf x_i}))$</font>
</span>
<br />
<br />
<br />
<br />
----
<h3>The (negative) log-loss function</h3>
- <font size=+2>A loss function that achieves (i), (ii), and (iii):</font>
> $\frac{1}{m}\sum_{i=1}^m\log\left(1 + e^{-y_i({\bf w}\cdot{\bf x_i}+b)}\right).$
- <font size=+2>On each instance, $\log\left(1 + e^{-y_i({\bf w}\cdot{\bf x_i}+b)}\right) = \ldots$</font>
- <font size=+2>When $y_i=1$, get: $-\log(h({\bf x_i}))$</font>
- <font size=+2>When $y_i=-1$, get: $-\log(1 - h({\bf x_i}))$</font>
<br />
<br />
<br />
<br />
---
<h3>The (negative) log-loss function</h3>
- <font size=+2>Since $-\lim_{t\to 1^-}\log(t) = 0$ and $-\lim_{t\to 0^+}\log(t) = \infty$,</font>
- <font size=+2>if model $h$ is very certain, and correct (so, $h({\bf x_i})\approx 1$ when $\overset{\circ}{y_i}=1$; or $1-h({\bf x_i})\approx 1$ when $\overset{\circ}{y_i}=0$), little contribution to loss;</font>
<span style="color:#181818;">
- <font size=+2>if model $h$ is very certain, but wrong ($h({\bf x_i})\approx 1$ when $\overset{\circ}{y_i}=0$, for example), big contribution to loss: $-\log(1 - h({\bf x_i}))$</font>
</span>
----
<h3>The (negative) log-loss function</h3>
- <font size=+2>Since $-\lim_{t\to 1^-}\log(t) = 0$ and $-\lim_{t\to 0^+}\log(t) = \infty$,</font>
- <font size=+2>if model $h$ is very certain, and correct (so, $h({\bf x_i})\approx 1$ when $\overset{\circ}{y_i}=1$; or $1-h({\bf x_i})\approx 1$ when $\overset{\circ}{y_i}=0$), little contribution to loss;</font>
- <font size=+2>if model $h$ is very certain, but wrong ($h({\bf x_i})\approx 1$ when $\overset{\circ}{y_i}=0$, for example), big contribution to loss: $-\log(1 - h({\bf x_i}))$</font>
---
<h3>Partial derivatives of log-loss</h3>
- <font size=+2>Need to find $\partial/\partial w_j$ of log-loss, $j=1,\ldots,n$, and partial w.r.t. $b$.</font>
- <font size=+2>First, what is</font>
$\frac{\partial}{\partial w_j}({\bf w}\cdot{\bf x_i}+b)$?
- <font size=+2>And, what about</font>
$\frac{\partial}{\partial b}({\bf w}\cdot{\bf x_i}+b)$?
---
<h3>Partial derivatives of log-loss</h3>
- <font size=+2>So, write $z = {\bf w}\cdot{\bf x_i}+b$. We have the partials of $z$.</font>
- <font size=+2>Now, to find</font>
$\frac{\partial}{\partial z}\log(1 + e^{-z})$?
<br />
<br />
<br />
<br />
<br />
<br />
---
<h3>Partial derivatives of log-loss</h3>
- <font size=+2>Bring it all together: say $z = {\bf w}\cdot{\bf x_i}+b$, and ${y_i}=1$.</font>
<span style="color:#181818;">
$\frac{\partial}{\partial w_j}\log(1 + e^{-z}) = \frac{\partial}{\partial z}\log(1 + e^{-z})\frac{\partial}{\partial w_j}(z)$
$=\frac{-e^{-z}}{1+e^{-z}}(x_{i,j})$
$= x_{i,j}(h({\bf x_i}) - 1) = x_{i,j}(h({\bf x_i}) - \overset{\circ}{y_i})$.
- <font size=+2>What happens when ${y_i}=-1$ (that is, $\overset{\circ}{y_i}=0$)?</font>
Hint: encountering $\frac{e^z}{1+e^z}$... multiply by $\frac{e^{-z}}{e^{-z}}$.
</span>
----
<h3>Partial derivatives of log-loss</h3>
- <font size=+2>Bring it all together: say $z = {\bf w}\cdot{\bf x_i}+b$, and ${y_i}=1$.</font>
$\frac{\partial}{\partial w_j}\log(1 + e^{-z}) = \frac{\partial}{\partial z}\log(1 + e^{-z})\frac{\partial}{\partial w_j}(z)$
<span style="color:#181818;">
$=\frac{-e^{-z}}{1+e^{-z}}(x_{i,j})$
$= x_{i,j}(h({\bf x_i}) - 1) = x_{i,j}(h({\bf x_i}) - \overset{\circ}{y_i})$.
- <font size=+2>What happens when ${y_i}=-1$ (that is, $\overset{\circ}{y_i}=0$)?</font>
Hint: encountering $\frac{e^z}{1+e^z}$... multiply by $\frac{e^{-z}}{e^{-z}}$.
</span>
----
<h3>Partial derivatives of log-loss</h3>
- <font size=+2>Bring it all together: say $z = {\bf w}\cdot{\bf x_i}+b$, and ${y_i}=1$.</font>
$\frac{\partial}{\partial w_j}\log(1 + e^{-z}) = \frac{\partial}{\partial z}\log(1 + e^{-z})\frac{\partial}{\partial w_j}(z)$
$=\frac{-e^{-z}}{1+e^{-z}}(x_{i,j})$
$= x_{i,j}(h({\bf x_i}) - 1) = x_{i,j}(h({\bf x_i}) - \overset{\circ}{y_i})$.
<span style="color:#181818;">
- <font size=+2>What happens when ${y_i}=-1$ (that is, $\overset{\circ}{y_i}=0$)?</font>
Hint: encountering $\frac{e^z}{1+e^z}$... multiply by $\frac{e^{-z}}{e^{-z}}$.
</span>
----
<h3>Partial derivatives of log-loss</h3>
- <font size=+2>Bring it all together: say $z = {\bf w}\cdot{\bf x_i}+b$, and ${y_i}=1$.</font>
$\frac{\partial}{\partial w_j}\log(1 + e^{-z}) = \frac{\partial}{\partial z}\log(1 + e^{-z})\frac{\partial}{\partial w_j}(z)$
$=\frac{-e^{-z}}{1+e^{-z}}(x_{i,j})$
$= x_{i,j}(h({\bf x_i}) - 1) = x_{i,j}(h({\bf x_i}) - \overset{\circ}{y_i})$.
- <font size=+2>What happens when ${y_i}=-1$ (that is, $\overset{\circ}{y_i}=0$)?</font>
Hint: encountering $\frac{e^z}{1+e^z}$... multiply by $\frac{e^{-z}}{e^{-z}}$.
---
<h3>
Discussion
</h3>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
{"metaMigratedAt":"2023-06-15T20:25:35.822Z","metaMigratedFrom":"YAML","title":"Gradient descent & Logistic regression","breaks":true,"description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"da8891d8-b47c-4b6d-adeb-858379287e60\",\"add\":11676,\"del\":4887}]"}