Perceptron algorithm and Half-space model

# Finding a hyperplane (when linearly separable)  slide: https://hackmd.io/@ccornwell/Perceptron --- **The Perceptron Algorithm** - Recall: ${\bf X_i}$ is the vector in $\mathbb R^{n+1}$ whose first $n$ components are those of ${\bf x_i}$ (the data) and last component is $1$. - ${\bf W}$ is the $(n+1)$-vector with ${\bf w}$ as first $n$ components and $b$ as last component. :bulb: The sign of ${\bf W}\cdot{\bf X_i}$ is same as that of $y_i$ if and only if $y_i{\bf W}\cdot{\bf X_i} > 0$. ---- **The Perceptron Algorithm: the algorithm** - Initialize ${\bf W}$ however you want. - Go through the vectors ${\bf X_i}$; if you find "error" $y_i{\bf X_i}\cdot{\bf W} \le 0$ for some $i$, update ${\bf W}$ by adding $y_i{\bf X_i}$ to it. ${\bf W^{(t+1)}} = {\bf W^{(t)}} + y_i{\bf X_i}$ - At some point, there will be no errors (assuming the data is linearly separable). --- **Visualization** A "toy" dataset in $\mathbb R^2$: ```python= data = np.array([(-1,3),(-1,-0.3),(3,-1), (0.7,0.3), (0.5,0.9),(0,0.6), (0,1.7)]) y = [-1,-1,1,1,1,1,-1] ``` This data looks like :arrow_down_small: ![](https://i.imgur.com/Yszor78.png) ---- **A "toy" dataset in $\mathbb R^2$:** ```python= data = np.array([(-1,3),(-1,-0.3),(3,-1), (0.7,0.3), (0.5,0.9),(0,0.6), (0,1.7)]) y = [-1,-1,1,1,1,1,-1] ``` - Initialize ${\bf W^{(0)}} = (0,0.1,0)$. This ${\bf W^{(0)}}$ (in red) and its orthogonal line look like ![](https://i.imgur.com/tMsAXuG.png) ---- **A "toy" dataset in $\mathbb R^2$:** ```python= data = np.array([(-1,3),(-1,-0.3),(3,-1), (0.7,0.3), (0.5,0.9),(0,0.6), (0,1.7)]) y = [-1,-1,1,1,1,1,-1] ``` - Now: $y_0{\bf X_0}\cdot{\bf W^{(0)}} = -0.3$. So ${\bf W^{(1)}} = {\bf W^{(0)}} + y_0{\bf X_0} = (1,-2.9,-1)$. The new ${\bf W^{(1)}}$ and orthogonal line look like ![](https://i.imgur.com/QGtDKBg.png) ---- **A "toy" dataset in $\mathbb R^2$:** ```python= data = np.array([(-1,3),(-1,-0.3),(3,-1), (0.7,0.3), (0.5,0.9),(0,0.6), (0,1.7)]) y = [-1,-1,1,1,1,1,-1] ``` - From here check $y_i{\bf X_i}\cdot{\bf W^{(1)}}$ with $i=1,2$; both are positive. But, $y_3{\bf X_3}\cdot{\bf W^{(1)}} = -1.17$. Updating: add $y_3{\bf X_3}$ to get ${\bf W^{(2)}} = (1.7,-2.6,0)$. Now things look like: ![](https://i.imgur.com/cPiGfSr.png) ---- **A "toy" dataset in $\mathbb R^2$:** ```python= data = np.array([(-1,3),(-1,-0.3),(3,-1), (0.7,0.3), (0.5,0.9),(0,0.6), (0,1.7)]) y = [-1,-1,1,1,1,1,-1] ``` - Proceed...end up with 4 more updates (when you get to the end of list, have to restart, until you check all points once through with no errors). The rest plays out as: ```python np.dot(X[4], W)*y[4] # outputs -1.49; Update, get new W np.dot(X[5], W)*y[5] # outputs -0.02; Update, get new W np.dot(X[6], W)*y[6] # outputs -0.13; Update, get new W # works for X[0], X[1], X[2], X[3] np.dot(X[4], W)*y[4] # outputs -0.42; Update, get new W # At this point no errors: can check X[0],...,X[6], all # have positive dot product with current W ``` ---- **A "toy" dataset in $\mathbb R^2$:** The final ${\bf W} = (2.7, -1.9,2)$. Plot looks like: ![](https://i.imgur.com/yGcLPYt.png) --- Here is a guarantee that you get to having no errors at some point. ![](https://i.imgur.com/Hd2QIAD.png) - Sketch of proof is given in blog post. - Roughly: length of ${\bf W^{(t)}}$ increases as updates occur (in a way that depends on $B$). However, if there is a need for an update, you can check that $|{\bf W^{(t+1)}}| \le \sqrt{t}R$. You use this information and the Cauchy-Schwarz inequality to get $t \le (BR)^2$. --- **Discussion**