# Finding a hyperplane (when linearly separable)
<!-- Put the link to this slide here so people can follow -->
slide: https://hackmd.io/@ccornwell/Perceptron
---
**The Perceptron Algorithm**
- Recall: ${\bf X_i}$ is the vector in $\mathbb R^{n+1}$ whose first $n$ components are those of ${\bf x_i}$ (the data) and last component is $1$.
- ${\bf W}$ is the $(n+1)$-vector with ${\bf w}$ as first $n$ components and $b$ as last component.
:bulb: The sign of ${\bf W}\cdot{\bf X_i}$ is same as that of $y_i$ if and only if $y_i{\bf W}\cdot{\bf X_i} > 0$.
----
**The Perceptron Algorithm: the algorithm**
- Initialize ${\bf W}$ however you want.
- Go through the vectors ${\bf X_i}$; if you find "error" $y_i{\bf X_i}\cdot{\bf W} \le 0$ for some $i$, update ${\bf W}$ by adding $y_i{\bf X_i}$ to it.
${\bf W^{(t+1)}} = {\bf W^{(t)}} + y_i{\bf X_i}$
- At some point, there will be no errors (assuming the data is linearly separable).
---
**Visualization**
A "toy" dataset in $\mathbb R^2$:
```python=
data = np.array([(-1,3),(-1,-0.3),(3,-1), (0.7,0.3), (0.5,0.9),(0,0.6), (0,1.7)])
y = [-1,-1,1,1,1,1,-1]
```
This data looks like :arrow_down_small:

----
**A "toy" dataset in $\mathbb R^2$:**
```python=
data = np.array([(-1,3),(-1,-0.3),(3,-1), (0.7,0.3), (0.5,0.9),(0,0.6), (0,1.7)])
y = [-1,-1,1,1,1,1,-1]
```
- Initialize ${\bf W^{(0)}} = (0,0.1,0)$. This ${\bf W^{(0)}}$ (in red) and its orthogonal line look like

----
**A "toy" dataset in $\mathbb R^2$:**
```python=
data = np.array([(-1,3),(-1,-0.3),(3,-1), (0.7,0.3), (0.5,0.9),(0,0.6), (0,1.7)])
y = [-1,-1,1,1,1,1,-1]
```
- Now: $y_0{\bf X_0}\cdot{\bf W^{(0)}} = -0.3$. So ${\bf W^{(1)}} = {\bf W^{(0)}} + y_0{\bf X_0} = (1,-2.9,-1)$. The new ${\bf W^{(1)}}$ and orthogonal line look like

----
**A "toy" dataset in $\mathbb R^2$:**
```python=
data = np.array([(-1,3),(-1,-0.3),(3,-1), (0.7,0.3), (0.5,0.9),(0,0.6), (0,1.7)])
y = [-1,-1,1,1,1,1,-1]
```
- From here check $y_i{\bf X_i}\cdot{\bf W^{(1)}}$ with $i=1,2$; both are positive. But, $y_3{\bf X_3}\cdot{\bf W^{(1)}} = -1.17$. Updating: add $y_3{\bf X_3}$ to get ${\bf W^{(2)}} = (1.7,-2.6,0)$. Now things look like:

----
**A "toy" dataset in $\mathbb R^2$:**
```python=
data = np.array([(-1,3),(-1,-0.3),(3,-1), (0.7,0.3), (0.5,0.9),(0,0.6), (0,1.7)])
y = [-1,-1,1,1,1,1,-1]
```
- Proceed...end up with 4 more updates (when you get to the end of list, have to restart, until you check all points once through with no errors). The rest plays out as:
```python
np.dot(X[4], W)*y[4] # outputs -1.49; Update, get new W
np.dot(X[5], W)*y[5] # outputs -0.02; Update, get new W
np.dot(X[6], W)*y[6] # outputs -0.13; Update, get new W
# works for X[0], X[1], X[2], X[3]
np.dot(X[4], W)*y[4] # outputs -0.42; Update, get new W
# At this point no errors: can check X[0],...,X[6], all
# have positive dot product with current W
```
----
**A "toy" dataset in $\mathbb R^2$:**
The final ${\bf W} = (2.7, -1.9,2)$. Plot looks like:

---
Here is a guarantee that you get to having no errors at some point.

- Sketch of proof is given in blog post.
- Roughly: length of ${\bf W^{(t)}}$ increases as updates occur (in a way that depends on $B$). However, if there is a need for an update, you can check that $|{\bf W^{(t+1)}}| \le \sqrt{t}R$. You use this information and the Cauchy-Schwarz inequality to get $t \le (BR)^2$.
---
**Discussion**
<br />
<br />
<br />
<br />
<br />
<br />
<br />
{"metaMigratedAt":"2023-06-15T19:46:38.960Z","metaMigratedFrom":"YAML","title":"Perceptron algorithm and Half-space model","breaks":true,"description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"da8891d8-b47c-4b6d-adeb-858379287e60\",\"add\":8589,\"del\":4788}]"}