# Topic 6 Exercises ## Exercise 1 Use a Hashmap that stores key-postion values (postion of the intial value of the key-value pair in Min-heap) and a Min-Heap that is sorted based on value of Key-value pair. Each node contains the Value, the Counter value and 2 Children nodes (initally null) * create(map of key-value pairs): * First Create the Hashmap and the Heap initialized with Null values for each * Begin to iterate through map * For each key value pair first we insert the Value into the Heap. * If the root is null then insert the value into the root and set the counter value to 0 and set postion equal to 0 * else we insert into the heap * if the new value is less than the root, create a new root and update the hashmap of key-postion to reflect new postions * else just insert the value into a new node and keep track of its postion in the heap * Now we insert the key and postion into the hashmap * set(key, value): * get postion of value from hashmap then find it in the tree * We will then need to set the new value of that node and potentially alter the heap so that it remains a min-heap * Which in turn means we may need to change the postions in the hashmap to reflect the new postions * get(key): * Get postion of value from hashmap then find it in the tree * Return that value * delete(key): * Get postion of value in heap from hashmap then delete the key-postion value from the hashmap * Find the value in the tree and set it to null, leaving all others in postion * peek(): Return the value of the root node of the Heap * extract(): Return and remove the value of the root node of the Heap, compare the root's children and select the lowest to be our new root Analysis of Runtimes: * Create: $O(n)$ because we will have iteater through the map. The insertions into the map will be in constant time and the insertion into the heap will be in $O(log(n))$. * Set: $O(log(n))$ since the access to the hashmap is in constant time the and the access to the heap is $O(log(n))$ time. * Get: $O(log(n))$ since the access to the hashmap is in constant time the and the access to the heap is $O(log(n))$ time. * delete: $O(log(n))$ * peek: $O(1)$ since the lowest value is stored in the root * extract: $O(1)$ since we will only have to do one comparison and then alter children of the root node slightly ## Exercise 2 (a) if the input file has character frequencies such that after each step of the algorithm, the newly updated node will again get selected as a minimum element, then the tree will grow to a maximum height. One such input would be frequencies matching the Fibonacci Sequence (ie. 1,2,3,5,8,13,...). Starting with the first two items, the algorithm would grab 1 and 2 as the two minimum elements, create a new tree with a root value of 3, and reinsert it in the list. On the next pass, the newly inserted node with value 3 and the original input frequency 3 will be chosen as the two minimum elements, get branched together, with a root node of value 5. This process goes on similarly for all n items in the alphabet and only grows the huffman tree through one subtree, making it the maximum height of n. (b) Suppose the input file's length is bounded by some polynomial $O(n^d)$ for some constant $d$ and that the smallest frequency of any of the $n$ keys is f. Then at the maximum depth of the tree, there are $b$ nodes (normally $b=2$, but allowing arbitrary $b$ to be our base) with frequency $\geq f$ that share a parent, and therefore that parent node hold frequency $bf$. Because $f$ is the smallest frequency, there must be $b$ nodes at the level of this parent that holds frequency $\geq bf$ where the parent of these nodes holds frequency $b*(bf) = b^2f$. Every level we go up the tree will multiply in a factor of $b$ to the frequency held by the parent until we reach the root. If the height of the tree is bounded by $O(log n)$, the root will then hold a frequency of $b^{log f}*f = O(f^2)$ which is polynomial. If the height of the tree is worse than logarithmic, say linear $O(n)$, then we get the frequency held at the root to be $b^n*f = O(b^n)$ which is clearly exponential, and therefore the length of the input is not bounded by a polynomial. This is a contradiction. Therefore, the height of the huffman tree must be $O(log n)$. ## Exercise 3 (a) To effeciently implement this algorithm, use a ordered heap of key-value pairs, where the key is the space left on the disk it refers to and the value is a reference to the disk. In this way, finding the disk with the most space will take $O(1)$ time as it will always be at the top of the heap/root of the tree. The algorithm will then send the file to the selected disk. If this disk does not have enough space, a new node is added at the end of the heap referencing a new disk and with a key of the amount of space left after the current file was added. Then, this node will then swim up the tree until the heap order is restored which will take $O(log(n))$ time. If the top disk did have enough space, its key will now change to the new value of how much space is now left, and will sink down the tree until the heap order is restored. This will also have a worst case running time of $O(log(n)$. Hence, this implementation will overall have a $O(nlog(n))$ runtime to load all $n$ files. b) For this implementation, use a self balancing binary search tree instead of a heap but still with the key value pairs described above--where the available space is the key of the node and the disk it refers to is the value. To find the smallest disk that still has enough space for the file, starting at the root iteratively check node keys against the file size. If the file is smaller than the key, go to the left child. If the file size is larger than the key go to the right child. If the left child is smaller than the input file or the right child is larger than the input fild, the parent node the alrgotithm is currently at is the disk to save the file to. Since, at the worst case, the number of nodes being examined will be $log(n)$, this will have $log(n)$ runtime. If the input file is larger than the largest leaf a new node is created with a key of how much soace is left after the file is inserted. If a new node is created or once the file is put into an existing node, the key will change and must be re-inserted into the tree. This can be done by starting at the root node and recursively searching subtrees to find where the new key will fit now. A queue can be used when reordering the subtree which is disrupted by the addition of the node. This is a classic BST insertion and will have $O(log(n))$ runtime as well so for one item it will take the algorithm will take $O(log(n))$ runtime and overall for all $n$ item $O(nlog(n))$ runtime. c) For the algorithm in part (a), consider an input stream of $3n$ files of $0.3 TB$ followed by $3n$ files of of $0.6TB$. Ideally, each $O.3TB$ file would be paired up with a $0.6TB$ file for a total of $3n$ disks. However, the algorithm will fill up buckets with all of the $0.3TB$ files for a total of $n$ disks all $0.9TB$ full. Then each $0.6TB$ filwill use its own disk for a total of $3n$ disks and thus $4n$ disks will be used in all, $1.33$ times the optimal number of disks. For the algorithm in part (b), consider a stream of $6n$ files of $0.15TB$ followed $6n$ files of $0.3TB$ followed by $6n$ files of $0.55TB$. Ideally, the files would be put on disks with one from each size for a total of $6n$ files. However, the algorithm instead will group all of the $0.15TB$ files into $n$ disks, all of the $0.3TB$ files into $2n$ disks and all of the $0.55TB$ files will get their own disk for a total of $9n$ disks, $1.5$ times the optimal number of disks.