# HL Topic 5 Binary trees
:::info
IB asks the students in the exam to traverse in different ways data trees and be able to discuss its propierties. The students don't have to implement binary trees in pseudocode.
:::
https://turnoff.us/geek/binary-tree/

## Trees as data structure (in general)
We have already covered linked list. A couple of concepts that are important to the concept of linked list is that we don't have just the naked data anymore, but we have *nodes*, elements that have data (the payload) and one (or more) links.
In a single linked list would look like this:

_credit Y.C._
The concept of tree (or data tree) comes when we ask ourselves the question, can we make this nodes connect in a different way? Can we connect one node to 2, 3, 5 different nodes? The answer was **yes** and the results are data trees.
Here we have a data tree:

_(credit Y.C.)_
Usually data trees, unlike linked list are represented with the entry point (_the root node_) on top instead that on the left. In _some_ places the also put the entry point (_the root node_) on the bottom.

_[data tree graph in Grasshopper](https://bimcorner.com/a-beginners-guide-to-data-trees-in-grasshopper/)_
## Data tree lexicon

_[source](https://www.geeksforgeeks.org/introduction-to-tree-data-structure-and-algorithm-tutorials/)_
* Root node is the entry node of a tree (or a subtree)
* The parent of a node B is the node that links to it (A in the image). All the nodes except the root have a parent. In trees a node cannot have more than one parent.
* A child node of a node B is a node that the node B links to. A node can have any number of childs, but up to 2 in binary trees.
* A leaf node is a node that doesn't link to any other node of the tree.
* Height of the tree. The number of links from the root note to the farther leaf.
* Levels of the tree are the distance (in links) from the root.
* Subtree. Any node with their descendants.
* Visit (a node). Arrive to the node for doing some operation with their data (that includes just read it). When passing through a node to get to other note is not considered "visiting"
* Traversing. Visit all the nodes
## Types of trees
The basics are kind of specifics versions of each other
* A **data tree** is when we have nodes with any number of children.
* **Binary tree** is the subset of data trees where we specify that any node can have only up to 2 childs.(we call them left and right)
* **Binary search tree** is the subset of binary data trees with a condition that any left child has a value that is less than its parent and the right child is more than the parent.
:::info
Since binary search trees have the assumption of nodes having to be bigger or smaller, they cannot hold the same values. _(data trees or binary trees don't have this restriction)_
:::
Here you have a Venn diagram to clarify

And a UML diagram using inheritance for people doing OOPç
## Traversing trees:
When we have a binary tree (such as this binary search tree)

If we want to tell all the elements that we have in the tree we can have different orders. For example (not meaningful orders) we can go
34, 32, 39, 20, 33, 35 , 41, 18, 22, 7 ,19, 21, 24
Or
7, 19, 21, 24, 35, 41, 18, 22, 39, 20, 33, 32, 39, 34
To make a simplification we have a subtree of this tree.

We need to understand
* preorder traversal
* inorder traversal
* postorder traversal
### Preorder traversal
We visit the root node
Then we visit the left node (or left subtree)
Finally we visit the right node (or right subtree)
So the subtree inorder would be
39, 35, 41
The whole tree would be:
34, 32, 20, 18, 7, 19, 22, 21, 24, 33, 39, 35, 41
One way to do it is creating this _cloud_ around the binary tree

This shape should be as fit as you can. And from the start on the left.

### Inorder traversal
Then we visit the left node (or left subtree)
We visit the root node
Finally we visit the right node (or right subtree)
### Postorder traversal
Then we visit the left node (or left subtree)
Finally we visit the right node (or right subtree)
We visit the root node


### Exercise of traversing
Traverse this tree in preorder, inorder, postorder

### Exercise of traversing (2)
Traverse this tree in preorder, inorder, postorder

_credit Y.C._
Solution in the spoiler
:::spoiler
Preorder: Potato, Lettuce, Banana, Cucumber, Carrot, Tomato, Lemon, Apple, Orange, Onion
Inorder: Banana, Lettuce, Cucumber, Potato, Lemon, Tomato, Carrot, Orange, Apple, Onion.
Postorder: Banana, Cucumber, Lettuce, Lemon, Tomato, Orange, Onion, Apple, Carrot, Potato
:::
### Memonic rule for traversing trees.
Look at the three ways of ordering.
Preorder: **Root**, left, right
Inorder: Left, **root**, right
Postorder: Left, right, **root**
* Since this is a western centered notation, left goes before right always.
* If you see closely the root is prior to left and right in pre-order, is in the middle in inorder and is after left and right in post order.
This can be done in simple draft in the exam to remember (it's not in the pseudocode approbed notation)
## Uses of Binary trees and binary search trees
Reference:
https://stackoverflow.com/questions/2130416/what-are-the-applications-of-binary-trees
* Binary Search Tree - Used in many search applications where data is constantly entering/leaving, such as the map and set objects in many languages' libraries.
* Binary Space Partition - Used in almost every 3D video game to determine what objects need to be rendered.
* Binary Tries - Used in almost every high-bandwidth router for storing router-tables.
* Hash Trees - Used in torrents and specialized image-signatures in which a hash needs to be verified, but the whole file is not available. Also used in blockchains for eg. Bitcoin.
* Heaps - Used in implementing efficient priority-queues, which in turn are used for scheduling processes in many operating systems, Quality-of-Service in routers, and A* (path-finding algorithm used in AI applications, including robotics and video games). Also used in heap-sort.
* Huffman Coding Tree (Chip Uni) - Used in compression algorithms, such as those used by the .jpeg and .mp3 file-formats.
* GGM Trees - Used in cryptographic applications to generate a tree of pseudo-random numbers.
* Syntax Tree - Constructed by compilers and (implicitly) calculators to parse expressions.
* Treap - Randomized data structure used in wireless networking and memory allocation.
* T-tree - Though most databases use some form of B-tree to store data on the drive, databases which keep all (most) their data in memory often use T-trees to do so.
The reason that binary trees are used more often than n-ary trees for searching is that n-ary trees are more complex, but usually provide no real speed advantage.
In a (balanced) binary tree with m nodes, moving from one level to the next requires one comparison, and there are $log_2(m)$ levels, for a total of $log_2(m)$comparisons.
In contrast, an n-ary tree will require $log_2(n)$ comparisons (using a binary search) to move to the next level. Since there are $log_n(m)$ total levels, the search will require $log_2(n)*log_n(m) = log_2(m)$ comparisons total. So, though n-ary trees are more complex, they provide no advantage in terms of total comparisons necessary.
(However, n-ary trees are still useful in niche-situations. The examples that come immediately to mind are quad-trees and other space-partitioning trees, where divisioning space using only two nodes per level would make the logic unnecessarily complex; and B-trees used in many databases, where the limiting factor is not how many comparisons are done at each level but how many nodes can be loaded from the hard-drive at once)
Other reference (not quoted here)
https://www.studysmarter.co.uk/explanations/computer-science/data-structures/binary-tree/
### Recommended Summary video
https://www.youtube.com/watch?v=oSWTXtMglKE
{%youtube oSWTXtMglKE%}
### Extra: B-trees
https://www.youtube.com/watch?v=K1a2Bk8NrYQ&ab_channel=SpanningTree
{%youtube K1a2Bk8NrYQ%}
### Extra: heaps
https://www.youtube.com/watch?v=t0Cq6tVNRBA
{%youtube t0Cq6tVNRBA%}
In this video deeps dive on the topic and goes with also high priority queues at the end
https://www.youtube.com/watch?v=HqPJF2L5h9U
{%youtube HqPJF2L5h9U%}
I like this example of why using heaps from Stack overflow
https://stackoverflow.com/questions/749199/when-would-i-want-to-use-a-heap
```!
Heaps are structures meant to allow quick access to the min or the max.
But why would you want that? You could just check every entry on add to see if it's the smallest or the biggest. This way you always have the smallest or the biggest in constant time O(1).
The answer is because heaps allow you to pull the smallest or the biggest and quickly know the NEXT smallest or biggest. That's why it's called a Priority Queue.
Real world example (not very fair world, though):
Suppose you have a hospital in which patients are attended based on their ages. The oldest are always attended first, no matter when he/she got in the queue.
You can't just keep track of the oldest one because if you pull he/she out, you don't know the next oldest one. In order to solve this hospital problem, you implement a max heap. This heap is, by definition, partially ordered. This means you cannot sort the patients by their age, but you know that the oldest ones are always in the top, so you can pull a patient out in constant time O(1) and re-balance the heap in log time O(log N).
More sophisticated example:
Suppose you have a sequence of integers and you want to keep track of the median. The median is the number that is in the middle of an ordered array.
Example:
[1, 2, 5, 7, 23, 27, 31]
In the above case, 7 is the median because the array containing the smaller numbers [1, 2, 5] is of the same size of the one containing the bigger numbers [23, 27, 31]. Normally, if the array has an even number of elements, the median is the arithmetic average of the 2 elements in the middle, e.g (5 + 7)/2.
Now, how do you keep track of the median? By having 2 heaps, one min heap containing the numbers smaller than the current median and a max heap containing the numbers bigger than the current median. Now, if these heaps are always balanced, the 2 heaps will contain the same number of elements or one will have 1 element more than the other, the most.
When you add a new element to the sequence, if the number is smaller than the current median, you add it to the min heap, otherwise, you add it to the max heap. Now, if the heaps are unbalanced (one heap has more than 1 element more than the other), you pull an element from the biggest heap and add to the smallest. Now they're balanced.
```
### Revers polish notation
Reverse Polish notation is something that happened in some (actually, one (1) exam) exams and you can see in the book. It's kind of abstract and when I talk to the students about it I see some very different faces trying to undertand it's use. Here you have some reference:
https://mathworld.wolfram.com/ReversePolishNotation.html
### Surnames and Abstract Data Structures (ADT)
Something I was thinking it's interesting is thinking about names and surnames as payloads and pointers of ABT.
Let's see different ways of having surnames in different parts of the world
:::info
:warning:
This part is a stub that I need to consult again with people of the different backgrounds of the different surnames to double check it
:::
#### Spanish case
I'm going to start with the one that I'm familiar with. The Spanish (also seen in Latin America) one. If your name is "Laura Fernández González" you can consider that your surnames are _pointers_ to other nodes.
Let's draw this as a node:

Fernández refers to the surname of Laura's father and González to the surname of Laura's mother (this can be changed and there are some nuances but this is the default)
So let's represent this:
:::success
I'm going to start all these diagrams as being the last (the youngest) person on the family the root and from there I'm going to go up as in regular family trees. Remember that the notation in CS of DT is with the root on top
:::

This can go on

Also it's interesting because, in Spain even if the registered surnames are only two, you can tell the rest of the surnames following the procedure of writing the next surnames of your fathers. Following the example of Laura these would be their surnames:

(Of course, for this purpose, I'm going to omit contractions like Garcia-Quintero or the Catalan version Puig i Vidal)
I like to think of these as pointers to the grandparents.

You can go on and write all your eight (if you know your grandparents 2 surnames) or as much as you can find.
This is why there is a comedy called "ocho apellidos vascos" (8 basque surnames) and a sequel called "ocho apellidos catalanes" (8 catalan surnames)

If we think of this as traversing a data tree we can consider that this system kind of omits the middle nodes and focus on the leaves (the farther parent you know) and it's going to omit the middle parts.
#### Anglo saxon style
In the case of the anglo-saxon system (where ladies lose their maiden surname) the same Surname refers to both parents. The mother may (still) have a maiden name that it's her link to their parents.

This for the purposes of Abstract Data Structures is a mess since we have a pointer pointing to more than one thing place to the same place! (we can consither that it's pointing to the family)
If we continue doing the family tree we would get something like this:

#### Arab system
//to-do
#### Russian system
//to-do