--- tags: fall 2018 cs61b --- # Heap Notes [TOC] ## Properties Heaps are just binary trees, but they have additional properties that make them pretty nice: 1) They are always as **complete** as possible. What this means is, the only 'empty' spaces in the tree should be at the very bottom with the leaf nodes. The tree is also 'filled in' from left to right - so when you add in nodes, the left most empty spot is filled in first. 2) They have a 'heap invariant' either for min or max heap known as **max-heap property** or **min-heap property**. With a min heap, this means every parent node is smaller than its children. With a max heap, this means every parent node is larger than its children. ![](https://i.imgur.com/eOJoPWi.png) ## Underlying Structure This idea of completeness allows us to actually represent a heap with an array because it guarantees that the tree fills up from the left! If we want to represent a heap with n nodes, we can create an array of size n + 1. We skip the 0th element in the array, and put the root value at index 1. Then, we fill traverse the heap by level order (from left to right) and fill in the empty spots of the array. **Max-Heap** [X 8 7 4 3 1 2 0], **Min-Heap** [X 0 1 2 3 7 4 8] If a node was represented by index **x**: * Parent of the node is **x/2** * Left child of the node is **x*2** * Right child of the node is **x*2+1** Technically, we can also start storing elements at the 0th index, but it makes everything a tad bit harder to keep track of. **Max-Heap** [8 7 4 3 1 2 0], **Min-Heap** [0 1 2 3 7 4 8] * Left child of the node is **x*2+1** * Right child of the node is **x*2+2** * Parent of the node (if left child) is **x/2** * Parent of the node (if right child) is **x/2 - 1** ## Heapification We can build a heap from an arbitrary, unsorted array, in a process called **heapification**. There are 4 ways to heapify, but only 2 ways actually work: * **Bottom-Up Heapification (Reverse Level-Order Bubbling Down)** * **Top-Down Heapification (Level-Order Bubbling Up)** The broad intuition for why this is the case is because only these two methods preserve the heap invariant (i.e. that every node is larger/smaller than all of its children). ## Bottom-Up Heapification Let’s specifically look at the example of heapifying a **min heap**. Bottom-Up Heapification means we swap each node downwards with its smaller child if the childern nodes are smaller than its parent *(bubbling down)*. Another way to refer to the process of bubbling down is **sinking**. Notice that because we start at the bottom, and bubble downwards from each level onward, at each successive level we have to re-evaluate all the children below each node. ![](https://i.imgur.com/57Ritta.png) **Why is this a viable method?** If the true minimum is at the bottom-most level of the tree, as it gets swapped upwards towards the top, we know for sure that each parent it swaps positions with is larger than it, which means as the node moves up we preserve the invariant that each children nodes is smaller than its children. **Runtime Explanation** Recall that sinking takes **O(log N)** in worst case because the root node needs to sink **log N** (the height of the tree) times. However, not all nodes do that. In fact, approximately half of the nodes don't even need to sink - the most bottom nodes. They have no children, which means they are a valid heap structure by themselves. *(Steps 1-5 demonstrate in the diagram above demonstrate this!)* Now, the parents of these bottom nodes do need to sink, but once max. The grandparents, twice max. etc. And finally the root, **log N** max (from the height of the tree). For the analysis now on, the most bottom level level will be 0, the level above level 1, etc. There are at most $\frac{N+1}{2^{h+1}}$ nodes at level $h$. The amount of sinking required for each level is $\frac{h(N+1)}{2^{h+1}}$. The total is $\sum^{logN}_{h=0}\frac{h(N+1)}{2^{h+1}}$. Now, let's manipulate the expression: $$ \sum^{log N}_{h=0}\frac{h(N+1)}{2^{h+1}} = \frac{N+1}{2}\sum^{log N}_{h=0}\frac{h}{2^h} < \frac{N+1}{2}\sum^{\infty}_{h=0}\frac{h}{2^h} = \frac{N+1}{2}(2) = N $$ Although the calculation is not shown, the step from the summation to 2 relies on an infinite sum formula. Finally, we can conclude that bottom-up heapification has a runtime of **O(N)**. ## Top-Down Heapification Top-Down Heapification means we start at the top of our tree, and swap each node upwards with its parent if it is smaller than its parent *(bubbling upwards)*. Bubbling nodes upward can also be referred to as **swimming**. Notice that because we start at the top, and bubble upwards from each level, at each successive level we have to re-evaluate all the parents above each node. ![](https://i.imgur.com/98QxeaQ.png) **Why is this a viable method?** The reason is the same as it is for bottom-up heapfication. If the true minimum is at the bottom-most level of the tree, as we swap it upwards towards the top, we know for sure that each parent it swaps positions with is larger than it, which means as the node moves up we preserve the invariant that each children nodes is smaller than its children. **Runtime Explanation** For our implementation of PriorityQueue in Monday's lab, we had to insert new nodes at the bottom and bubble them up to the correct spot. This is the same thing that top-down heapification is doing! Therefore, we can think of top-down heapification of a heap with $N$ nodes as making a new Priority Queue and inserting $N$ nodes. Each insertion will take **O(log N)** and there will be $N$ insertions. Therefore, the runtime should be **O(N log N)**. ## Why the Other Methods Don't Work The other methods are **level order bubbling down** and **reverse level order bubbling up**. Now let's contrast **level order bubbling down** in particular. The reason it doesn’t work/doesn’t preserve the heap invariant is because we start at the top and bubble 'downwards', so nodes that we've already seen/swapped don't get re-evaluated. Try to run through the same thought process with **reverse level order bubbling up** (why it doesn’t work). I would also practice the mechanics of heapifying a tree into both a min heap/max heap! ## High Level Overview of Heap Operations ### Insert a node Insert node with new value at the leftmost open space in the lowest level. Bubble up the node to maintain heap property. ### Remove Max/Min node in a Max/Min Heap Swap the value of max/min node (the root) with the value last node in the heap. Remove the last node. Then, bubble down the max/min node to maintain heap property, if needed. ## Resources/Credits [Heap Visualizer](https://visualgo.net/en/heap?slide=1): visualizes heap operations Diagrams credited to Matthew Soh