Kernels and Decision Trees

###### tags: `one-offs` `kernels` `trees` `circuits` # Kernels and Decision Trees **Overview**: In this note, I discuss some explorations on connections between kernel methods and decision trees. ## Intersections and Nestings of Concepts In the past, as an exercise, I devoted a bit of time to thinking about { how, whether } one could nest kernel methods inside decision trees, and vice versa. The outcome of this exercise was to conclude that there are some relatively simple ways of doing both, which I detail roughly here. For embedding kernels within decision trees, a useful starting point is to note that the usual $\left\{ x:x_{i}<a_{i}\right\}$ -type splits which are used in decision trees can readily be expression as hyperplane cuts, i.e. as the solutions to $\left\langle x-a\cdot\mathrm{e}_{i},\mathrm{e}_{i}\right\rangle <0$. Now, it is well-understood that once you express a classical technique in terms of inner products only, then you are half way to deriving the corresponding kernel method. I have not retained any of the details for how the rest of this story unfolds, but it should be a relatively simple exercise. For the other direction, i.e. embedding decision trees within kernels, it seemed less obvious to me what one can or should do. One natural concept was to use a tree structure to induce a feature map, i.e. once you have a decision tree topology constructed from nested hyperplane cuts, you can use it to build a kernel. More precisely, for $z$ in your space, define \begin{align} \mathrm{path}\left(z\right):=\left\{ \text{nodes on the path from }\mathrm{Root}\leadsto\mathrm{Leaf}\left(z\right)\right\} . \end{align} Defining then \begin{align} K\left(x,y\right):=\mathrm{card}\left(\mathrm{path}\left(x\right)\cap\mathrm{path}\left(y\right)\right), \end{align} one obtains that $K$ is a kernel. To me, this was compelling because the resulting functions would be piecewise constant in the same way as the original tree, which felt intuitively nice. However, it is also not entirely satisfying because you have to supply the tree structure a priori. In some sense, this might be an unavoidable restriction. Anyways, probably there are some other interesting solutions. This is all a "what if you had to" exercise, rather than "we probably should", and so I am not under the illusion that these ideas are practical. Still, this was nice to work through at the time.