KD-Tree Discovery

# KD-Tree Discovery ## Introduction This document mainly for solving the problem on Algo2's class on Feb 14th, solving the following question. *Question: What will happen to the KD tree building and query when all the points are having the same $x$-coordinate? Under the condition that we don't know those points are not having the same coordinate in advance.* This problem is not hard actually. I will explain it by the theory from our classes first. Then, let's see what the industry doing on such problems, to make my result more convincing. The industy has already used KD tree techique for over 10 years, and one of the most famous usage is the Neariest Neighbor Searching. ## Theoretical Analysis ### A Notice In the lecture note, we can see there is a point "in" each rectangle. But that point is actually on the cutting line, so it is actually on the edge of each rectangle. And this point represents the cutting line. The following example: ![](https://hackmd.io/_uploads/rJPWt2Kl5.jpg) And we are searching on the tree shown on the right not the left one. By thinking in this way, we can explian the "all points have the same $x$-coordinate" problem. ### Solve Duplicate Problem in BST Since the KD tree and BST are both "trees", we can first try to find some intuition from the duplicate problem in the BST. Two classical ways to solve the duplicated problem in BST: - Idea from the "CLRS" (who cares the duplicates...): let $k_2$ be the key, and $k_1$ is the value to insert. If $k_1 < k_2$, put $k_1$ on the left child. Else if $k_1 \ge k_2$, put $k_1$ on the right child. But this kind of stuffs will introduce an unbalanced problem in the following figure: ![](https://hackmd.io/_uploads/BkhDP0Fec.jpg =x200) - Idea from the "Goodrich and Tamassia" (similar to AVL): adding another weighted factor to those values. This tree will still be balanced: ![](https://hackmd.io/_uploads/HJ65_CYgq.jpg =x200) ### Transfer those ideas from BST to KD Tree #### The "CLRS" Idea (Not recommended) Since all the points have the same $x$-coordinate, for each node from $x$ layer, it only has one child. And I think this tree will be unbalanced. ![](https://hackmd.io/_uploads/BJGiRAFg9.jpg) #### The "Goodrich and Tamassia" Idea（Recommended） If the points' $x$ coordinates are duplicated, we distinguish those points by the $y$ coordinates. Well, if the $y$ coordinates are the same, they are the same points. Thus, we have the new "order" of those points for $x$ coordinates: $$ A < B < C < D < E < F < G < H < I. $$ ![](https://hackmd.io/_uploads/HkKII1qx9.jpg) This tree is still balanced and it is no different from the $1$ dimensional KD tree.  ## Some Views from the Industry KD tree has a widely usage in Neariest Neighbor Searching (including KD tree insert, delete, and query). And millions people are using this algorithms on the earth :) Here, I referred from two mostly using Python packages in Data Science: - Sklearn. https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html#sklearn.neighbors.KDTree. - Scipy. https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html. Paper References: Sklearn - https://dl.acm.org/doi/10.1145/361002.361007 Scipy - https://arxiv.org/abs/cs/9901013 The paper referred by Sklearn proved the fact that random KD tree performanced similar as the random BST. And their solution of the $x$ coordinate is the same problem will be more close to the "CLRS". And in their packages, they don't have the balanced tree condition. The paper referred by Scipy is based on a balanced KD tree. The KD tree they are using is more close to the version in your lecture notes. And their KD tree is not exactly the same one as the lecture notes, instead of the median, they are using something called the "sliding-median" to improve the performance.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.