Try   HackMD

HL Topic 5 Binary trees

IB asks the students in the exam to traverse in different ways data trees and be able to discuss its propierties. The students don't have to implement binary trees in pseudocode.

https://turnoff.us/geek/binary-tree/

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Trees as data structure (in general)

We have already covered linked list. A couple of concepts that are important to the concept of linked list is that we don't have just the naked data anymore, but we have nodes, elements that have data (the payload) and one (or more) links.

In a single linked list would look like this:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

credit Y.C.

The concept of tree (or data tree) comes when we ask ourselves the question, can we make this nodes connect in a different way? Can we connect one node to 2, 3, 5 different nodes? The answer was yes and the results are data trees.

Here we have a data tree:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

(credit Y.C.)

Usually data trees, unlike linked list are represented with the entry point (the root node) on top instead that on the left. In some places the also put the entry point (the root node) on the bottom.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

data tree graph in Grasshopper

Data tree lexicon

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

source

  • Root node is the entry node of a tree (or a subtree)
  • The parent of a node B is the node that links to it (A in the image). All the nodes except the root have a parent. In trees a node cannot have more than one parent.
  • A child node of a node B is a node that the node B links to. A node can have any number of childs, but up to 2 in binary trees.
  • A leaf node is a node that doesn't link to any other node of the tree.
  • Height of the tree. The number of links from the root note to the farther leaf.
  • Levels of the tree are the distance (in links) from the root.
  • Subtree. Any node with their descendants.
  • Visit (a node). Arrive to the node for doing some operation with their data (that includes just read it). When passing through a node to get to other note is not considered "visiting"
  • Traversing. Visit all the nodes

Types of trees

The basics are kind of specifics versions of each other

  • A data tree is when we have nodes with any number of children.

  • Binary tree is the subset of data trees where we specify that any node can have only up to 2 childs.(we call them left and right)

  • Binary search tree is the subset of binary data trees with a condition that any left child has a value that is less than its parent and the right child is more than the parent.

Since binary search trees have the assumption of nodes having to be bigger or smaller, they cannot hold the same values. (data trees or binary trees don't have this restriction)

Here you have a Venn diagram to clarify

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

And a UML diagram using inheritance for people doing OOPç

Traversing trees:

When we have a binary tree (such as this binary search tree)

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

If we want to tell all the elements that we have in the tree we can have different orders. For example (not meaningful orders) we can go

34, 32, 39, 20, 33, 35 , 41, 18, 22, 7 ,19, 21, 24

Or

7, 19, 21, 24, 35, 41, 18, 22, 39, 20, 33, 32, 39, 34

To make a simplification we have a subtree of this tree.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

We need to understand

  • preorder traversal
  • inorder traversal
  • postorder traversal

Preorder traversal

We visit the root node
Then we visit the left node (or left subtree)
Finally we visit the right node (or right subtree)

So the subtree inorder would be
39, 35, 41

The whole tree would be:
34, 32, 20, 18, 7, 19, 22, 21, 24, 33, 39, 35, 41

One way to do it is creating this cloud around the binary tree

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

This shape should be as fit as you can. And from the start on the left.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Inorder traversal

Then we visit the left node (or left subtree)
We visit the root node
Finally we visit the right node (or right subtree)

Postorder traversal

Then we visit the left node (or left subtree)
Finally we visit the right node (or right subtree)
We visit the root node

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Exercise of traversing

Traverse this tree in preorder, inorder, postorder

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Exercise of traversing (2)

Traverse this tree in preorder, inorder, postorder

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

credit Y.C.

Solution in the spoiler

Preorder: Potato, Lettuce, Banana, Cucumber, Carrot, Tomato, Lemon, Apple, Orange, Onion
Inorder: Banana, Lettuce, Cucumber, Potato, Lemon, Tomato, Carrot, Orange, Apple, Onion.
Postorder: Banana, Cucumber, Lettuce, Lemon, Tomato, Orange, Onion, Apple, Carrot, Potato

Memonic rule for traversing trees.

Look at the three ways of ordering.

Preorder: Root, left, right
Inorder: Left, root, right
Postorder: Left, right, root

  • Since this is a western centered notation, left goes before right always.
  • If you see closely the root is prior to left and right in pre-order, is in the middle in inorder and is after left and right in post order.

This can be done in simple draft in the exam to remember (it's not in the pseudocode approbed notation)

Uses of Binary trees and binary search trees

Reference:

https://stackoverflow.com/questions/2130416/what-are-the-applications-of-binary-trees

  • Binary Search Tree - Used in many search applications where data is constantly entering/leaving, such as the map and set objects in many languages' libraries.
  • Binary Space Partition - Used in almost every 3D video game to determine what objects need to be rendered.
  • Binary Tries - Used in almost every high-bandwidth router for storing router-tables.
  • Hash Trees - Used in torrents and specialized image-signatures in which a hash needs to be verified, but the whole file is not available. Also used in blockchains for eg. Bitcoin.
  • Heaps - Used in implementing efficient priority-queues, which in turn are used for scheduling processes in many operating systems, Quality-of-Service in routers, and A* (path-finding algorithm used in AI applications, including robotics and video games). Also used in heap-sort.
  • Huffman Coding Tree (Chip Uni) - Used in compression algorithms, such as those used by the .jpeg and .mp3 file-formats.
  • GGM Trees - Used in cryptographic applications to generate a tree of pseudo-random numbers.
  • Syntax Tree - Constructed by compilers and (implicitly) calculators to parse expressions.
  • Treap - Randomized data structure used in wireless networking and memory allocation.
  • T-tree - Though most databases use some form of B-tree to store data on the drive, databases which keep all (most) their data in memory often use T-trees to do so.

The reason that binary trees are used more often than n-ary trees for searching is that n-ary trees are more complex, but usually provide no real speed advantage.

In a (balanced) binary tree with m nodes, moving from one level to the next requires one comparison, and there are

log2(m) levels, for a total of
log2(m)
comparisons.

In contrast, an n-ary tree will require

log2(n) comparisons (using a binary search) to move to the next level. Since there are
logn(m)
total levels, the search will require
log2(n)logn(m)=log2(m)
comparisons total. So, though n-ary trees are more complex, they provide no advantage in terms of total comparisons necessary.

(However, n-ary trees are still useful in niche-situations. The examples that come immediately to mind are quad-trees and other space-partitioning trees, where divisioning space using only two nodes per level would make the logic unnecessarily complex; and B-trees used in many databases, where the limiting factor is not how many comparisons are done at each level but how many nodes can be loaded from the hard-drive at once)

Other reference (not quoted here)

https://www.studysmarter.co.uk/explanations/computer-science/data-structures/binary-tree/

https://www.youtube.com/watch?v=oSWTXtMglKE

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Extra: B-trees

https://www.youtube.com/watch?v=K1a2Bk8NrYQ&ab_channel=SpanningTree

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Extra: heaps

https://www.youtube.com/watch?v=t0Cq6tVNRBA

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

In this video deeps dive on the topic and goes with also high priority queues at the end

https://www.youtube.com/watch?v=HqPJF2L5h9U

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

I like this example of why using heaps from Stack overflow

https://stackoverflow.com/questions/749199/when-would-i-want-to-use-a-heap


Heaps are structures meant to allow quick access to the min or the max.

But why would you want that? You could just check every entry on add to see if it's the smallest or the biggest. This way you always have the smallest or the biggest in constant time O(1).

The answer is because heaps allow you to pull the smallest or the biggest and quickly know the NEXT smallest or biggest. That's why it's called a Priority Queue.
Real world example (not very fair world, though):

Suppose you have a hospital in which patients are attended based on their ages. The oldest are always attended first, no matter when he/she got in the queue.

You can't just keep track of the oldest one because if you pull he/she out, you don't know the next oldest one. In order to solve this hospital problem, you implement a max heap. This heap is, by definition, partially ordered. This means you cannot sort the patients by their age, but you know that the oldest ones are always in the top, so you can pull a patient out in constant time O(1) and re-balance the heap in log time O(log N).
More sophisticated example:

Suppose you have a sequence of integers and you want to keep track of the median. The median is the number that is in the middle of an ordered array.

Example:

[1, 2, 5, 7, 23, 27, 31]

In the above case, 7 is the median because the array containing the smaller numbers [1, 2, 5] is of the same size of the one containing the bigger numbers [23, 27, 31]. Normally, if the array has an even number of elements, the median is the arithmetic average of the 2 elements in the middle, e.g (5 + 7)/2.

Now, how do you keep track of the median? By having 2 heaps, one min heap containing the numbers smaller than the current median and a max heap containing the numbers bigger than the current median. Now, if these heaps are always balanced, the 2 heaps will contain the same number of elements or one will have 1 element more than the other, the most.

When you add a new element to the sequence, if the number is smaller than the current median, you add it to the min heap, otherwise, you add it to the max heap. Now, if the heaps are unbalanced (one heap has more than 1 element more than the other), you pull an element from the biggest heap and add to the smallest. Now they're balanced.

Revers polish notation

Reverse Polish notation is something that happened in some (actually, one (1) exam) exams and you can see in the book. It's kind of abstract and when I talk to the students about it I see some very different faces trying to undertand it's use. Here you have some reference:

https://mathworld.wolfram.com/ReversePolishNotation.html

Surnames and Abstract Data Structures (ADT)

Something I was thinking it's interesting is thinking about names and surnames as payloads and pointers of ABT.

Let's see different ways of having surnames in different parts of the world

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

This part is a stub that I need to consult again with people of the different backgrounds of the different surnames to double check it

Spanish case

I'm going to start with the one that I'm familiar with. The Spanish (also seen in Latin America) one. If your name is "Laura Fernández González" you can consider that your surnames are pointers to other nodes.

Let's draw this as a node:

imagen

Fernández refers to the surname of Laura's father and González to the surname of Laura's mother (this can be changed and there are some nuances but this is the default)

So let's represent this:

I'm going to start all these diagrams as being the last (the youngest) person on the family the root and from there I'm going to go up as in regular family trees. Remember that the notation in CS of DT is with the root on top

imagen

This can go on

imagen

Also it's interesting because, in Spain even if the registered surnames are only two, you can tell the rest of the surnames following the procedure of writing the next surnames of your fathers. Following the example of Laura these would be their surnames:

imagen

(Of course, for this purpose, I'm going to omit contractions like Garcia-Quintero or the Catalan version Puig i Vidal)

I like to think of these as pointers to the grandparents.

imagen

You can go on and write all your eight (if you know your grandparents 2 surnames) or as much as you can find.

This is why there is a comedy called "ocho apellidos vascos" (8 basque surnames) and a sequel called "ocho apellidos catalanes" (8 catalan surnames)

imagen

If we think of this as traversing a data tree we can consider that this system kind of omits the middle nodes and focus on the leaves (the farther parent you know) and it's going to omit the middle parts.

Anglo saxon style

In the case of the anglo-saxon system (where ladies lose their maiden surname) the same Surname refers to both parents. The mother may (still) have a maiden name that it's her link to their parents.

imagen

This for the purposes of Abstract Data Structures is a mess since we have a pointer pointing to more than one thing place to the same place! (we can consither that it's pointing to the family)

If we continue doing the family tree we would get something like this:

imagen

Arab system

//to-do

Russian system

//to-do