# Creating Decision Trees
<!-- Put the link to this slide here so people can follow -->
slide: https://hackmd.io/@ccornwell/decision-trees
---
<h3>Decision stumps</h3>
- <font size=+2>Decision stump, a particular half-space model, with ${\bf w} = \pm{\bf e_j}$, some $j$, and usual $b$ (for half-space models).</font>
- <font size=+2 style="color:#181818;">Say $\upsilon=\pm1$ and ${\bf w} = \upsilon{\bf e_j}$. The decision stump, given ${\bf x_i}$, has output $$\operatorname{sign}(\upsilon x_{i,j} + b).$$</font>
- <font size=+2 style="color:#181818;">Outputs $1$ if $\upsilon x_{i,j} \ge -b$; outputs $-1$ if $\upsilon x_{i,j} < -b$ (deciding that $\operatorname{sign}(0)=1$ ). 3 parameters determine decision stump: $b, \upsilon, j$.</font>
- <font size=+2 style="color:#181818;">On their own, stumps are not good classifiers. They are basic unit in a decision tree, as single neurons were in a neural network.</font>
----
<h3>Decision stumps</h3>
- <font size=+2>Decision stump, a particular half-space model, with ${\bf w} = \pm{\bf e_j}$, some $j$, and usual $b$ (for half-space models).</font>
- <font size=+2>Say $\upsilon=\pm1$ and ${\bf w} = \upsilon{\bf e_j}$. The decision stump, given ${\bf x_i}$, has output $$\operatorname{sign}(\upsilon x_{i,j} + b).$$</font>
- <font size=+2>Outputs $1$ if $\upsilon x_{i,j} \ge -b$; outputs $-1$ if $\upsilon x_{i,j} < -b$ (deciding that $\operatorname{sign}(0)=1$ ). 3 parameters determine decision stump: $b, \upsilon, j$.</font>
- <font size=+2 style="color:#181818;">On their own, stumps are not good classifiers. They are basic unit in a decision tree, as single neurons were in a neural network.</font>
----
<h3>Decision stumps</h3>
- <font size=+2>Decision stump, a particular half-space model, with ${\bf w} = \pm{\bf e_j}$, some $j$, and usual $b$ (for half-space models).</font>
- <font size=+2>Say $\upsilon=\pm1$ and ${\bf w} = \upsilon{\bf e_j}$. The decision stump, given ${\bf x_i}$, has output $$\operatorname{sign}(\upsilon x_{i,j} + b).$$</font>
- <font size=+2>Outputs $1$ if $\upsilon x_{i,j} \ge -b$; outputs $-1$ if $\upsilon x_{i,j} < -b$ (deciding that $\operatorname{sign}(0)=1$ ). 3 parameters determine decision stump: $b, \upsilon, j$.</font>
- <font size=+2>On their own, stumps are not good classifiers. They are basic unit in a decision tree, as single neurons were in a neural network.</font>
---
<h3>Finding best "split"</h3>
<h4>(Given training data)</h4>
- <font size=+2>One approach ("empirical risk minimization"): find the $b, \upsilon, j$ that minimizes number of errors.</font>

----
<h3>Finding best "split"</h3>
<h4>(Given training data)</h4>

- <font size=+2>Fix $\upsilon$ and $j$. Each hyperplane: $j^{th}$ coordinate equal to $-b$.</font>
- <font size=+2>Decrease $b$ to cross a data point, then number of errors changes by 1.</font>
- <font size=+2>Order points by their $j^{th}$ coord. Quickly compute number of errors for each stump with $-b=$ (avg. of two consecutive $x_{i,j}$.)</font>
---
<h3>Decision trees</h3>
- <font size=+2>To build a decision tree from training data: first find best split.</font>
- <font size=+2>Restrict to *only the points on one side*; find the best split for those. Do the same on other side.</font>

- <font size=+2 style="color:#181818;">Continue this way, stop when the nested splits place points into subsets with only one label.</font>
----
<h3>Decision trees</h3>
- <font size=+2>To build a decision tree from training data: first find best split.</font>
- <font size=+2>Restrict to *only the points on one side*; find the best split for those. Do the same on other side.</font>

- <font size=+2>Continue this way, stop when the nested splits place points into subsets with only one label.</font>
---
<h3>Finding best "split" (alternate method)</h3>
- <font size=+2>Method for determining split before, can produce poor performance on unseen data when significant imbalance between the two classes.</font>
- <font size=+2>Another method: Maximize the *Information Gain* ($IG$), a number computed as follows.</font>
- <font size=+2 style="color:#181818;">"entropy": $e(r) = -r\log(r)-(1-r)\log(1-r)$, $0<r<1$.</font>
- <font size=+2 style="color:#181818;">Define: $r$ as proportion of points with $+1$ labels (total); $r_+$ as proportion of points on positive side of split with $+1$ labels; $r_-$ in similar way to $r_+$.</font>
- <font size=+2 style="color:#181818;">If $p_+$ is proportion of all points on positive side, similar for $p_-$, then $IG = e(r) - (p_+e(r_+)+p_-e(r_-))$.
----
<h3>Finding best "split" (alternate method)</h3>
- <font size=+2>Method for determining split before, can produce poor performance on unseen data when significant imbalance between the two classes.</font>
- <font size=+2>Another method: Maximize the *Information Gain* ($IG$), a number computed as follows.</font>
- <font size=+2>"entropy": $e(r) = -r\log(r)-(1-r)\log(1-r)$, $0<r<1$.</font>
- <font size=+2 style="color:#181818;">Define: $r$ as proportion of points with $+1$ labels (total); $r_+$ as proportion of points on positive side of split with $+1$ labels; $r_-$ in similar way to $r_+$.</font>
- <font size=+2 style="color:#181818;">If $p_+$ is proportion of all points on positive side, similar for $p_-$, then $IG = e(r) - (p_+e(r_+)+p_-e(r_-))$.
----
<h3>Finding best "split" (alternate method)</h3>
- <font size=+2>Method for determining split before, can produce poor performance on unseen data when significant imbalance between the two classes.</font>
- <font size=+2>Another method: Maximize the *Information Gain* ($IG$), a number computed as follows.</font>
- <font size=+2>"entropy": $e(r) = -r\log(r)-(1-r)\log(1-r)$, $0<r<1$.</font>
- <font size=+2>Define: $r$ as proportion of points with $+1$ labels (total); $r_+$ as proportion of points on positive side of split with $+1$ labels; $r_-$ in similar way to $r_+$.</font>
- <font size=+2>If $p_+$ is proportion of all points on positive side, similar for $p_-$, then $IG = e(r) - (p_+e(r_+)+p_-e(r_-))$.
{"metaMigratedAt":"2023-06-15T22:26:44.937Z","metaMigratedFrom":"YAML","title":"Decision trees","breaks":true,"description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"da8891d8-b47c-4b6d-adeb-858379287e60\",\"add\":6972,\"del\":643}]"}