# CS61B Final Study Guide Last updated 5/15/19, created by Matthew Tang and Christianna Xu ## Topological Sort * Visit all nodes, but can only visit a node `n` once all neighboring nodes directed to `n` * Perform a DFS preorder from every vertex with indegree 0 and then reverse the list * Needs to be a DAG (Directed Acylic Graph) * Need to restart DFS from every indegree 0 vertex * Runtime: $O(V+E)$, Space: $\Theta(V)$ ## Shortest Paths with Negative Edges * DAG SPT algorithm: Visit vertices in topological order instead of a fringe * Runtime: $O(V+E)$, Space: $\Theta(V)$ * DAG longest paths: run DAG shortest paths with all edges flipped in sign, then flip all distTo values * Reduction: Since "DAG-SPT" can be used to solve "DAG-LPT", "DAG-LPT" reduces to "DAG-SPT" ## Sorting * Comparison sorts use a `compareTo` or `compare` method for ordering relations * Inversion: Pairs of elements that are out of order with respect to `<` * Sorting: reduces the # of inversions to 0 * Radix Sorts are better for a very large N limit | Algorithm | Best | Average | Worst | Memory | | -------- | -------- | -------- | ------- | ------- | | MergeSort | $\Theta (N \text{ log }N)$ | $\Theta (N \text{ log }N)$ | $\Theta (N \text{ log }N)$ | $\Theta(N)$ | HeapSort | $\Theta (N)$ | $\Theta (N \text{ log }N)$ | $\Theta (N \text{ log }N)$ | $\Theta(1)$| | Selection Sort | $\Theta(N^2)$ | $\Theta(N^2)$ | $\Theta(N^2)$ | $\Theta(1)$ | Insertion Sort | $\Theta(N)$ | $\Theta(N^2)$ | $\Theta(N^2)$ | $\Theta(1)$ | | QuickSort | $\Theta (N \text{ log }N)$ | $\Theta (N \text{ log }N)$ | $\Theta(N^2)$ | $\Theta(\log N)$ | | LSD Sort | $\Theta(WN+WR)$ | $\Theta(WN+WR)$ | $\Theta(WN+WR)$ | $\Theta(N+R)$ | MSD Sort | $\Theta(N+R)$ | | $\Theta(WN+WR)$ | $\Theta(N+WR)$| Source: MT2 Cheatsheet. Amend as necessary. | Algorithm | Memory | Runtime | Notes | Stable? | --------- | ------ | ------- | ----- | ------ | | HeapSort | $\Theta(1)$ | $\Theta(N \text{ log }N)^*$ | Bad caching (61C) | No | | Insertion | $\Theta(1)$ | $\Theta(N^2)^*$ | Small N, almost sorted | Yes | | Mergesort | $\Theta(N)$ | $\Theta(N \text{ log }N)^*$ | Fastest stable | Yes | | Random Quicksort | $\Theta(\text{log }N)$ | $\Theta(N \text{ log }N)^*$ expected | Fastest compare sort | No | | Counting Sort | $\Theta$(N+R) | $\Theta$(N+R) | Alphabet keys only | Yes | | LSD Sort | $\Theta$(N+R) | $\Theta$(WN+WR) | Strings of alphabetical keys only | Yes | | MSD Sort | $\Theta$(N+WR) | $\Theta$(N+R) (Best Case) $\Theta$(WN+WR) (Worst Case) | Bad Caching (61C) | Yes | N: Number of keys. R: Size of alphabet. W: Width of longest key. *: Assumes constant `compareTo` time. ### Selection Sort * Find smallest item in the unsorted portion of the array * Move item to the end of the sorted portion of array * Repeat for unfixed items until all items are fixed * Runtime: $\Theta(N^2)$ ### Heapsort (naive) * Insert all items into a max heap * Create output array * Delete largest item and place it at end of unused part of output array (do this N times) * Runtime: $\Theta(N \text{ log }N)$, Space: $\Theta(N)$ ### Heapsort (In-place) * Bottom-up heapify input array * Sink nodes in reverse level order to guarantee a tree rooted at that node is a heap * When this is finished, the input array is sorted. * Run heapsort: * Sorted items go at the end of the input array (the finished part) * Runtime: $\Theta(N \text{ log }N)$, Space: $\Theta(1)$ (only operated on input array) * Bottom-up heapification is $\Theta(N)$ ### Merge Sort (Top-Down) * Split items into 2 roughly even pieces * Recursively mergesort each half and merge 2 sorted halves to create final result * Create sep. array and compare the 2 halves * Runtime: $\Theta(N \text{ log }N)$, Space: $\Theta(N)$ ### Insertion Sort (In-place) * Naive: * Start with an empty output array * Add each item from input, inserting in the output at the right spot * In-place: * Swap traveller backwards until it is in correctly sorted place among previously sorted items * Runtime: $\Omega(N), O(N^2)$ * Very good for almost sorted arrays * Num of inversions = # of swaps ## Quicksort * Partioning * x moves to position j (can be same as i) * All entries left of x are <=x, all entries right of x are >= x * Once a pivot has moved, it is in its place * Algorithm: * Partition leftmost item * Qsort left half * Qsort right half * On average, runtime: $\Theta( N \text{ log }N$), space: $\Theta(\text{log }N)$ * Depends on selection of pivot * How you partition * Almost sorted array or with all of same item will be bad! * Avoid worst case behavior: * Use median as pivot * Randomly swap 2 indices (or shuffle entirely) * Qsort is the same as BST Sort (put all pivots in BST, the exact same comparisons are made) ### Hoare Partitioning Create L and G pointers at left and right ends. * L pointer is a friend to small items, and hates large or equal items. * G pointer is a friend to large items, and hates small or equal items. * Walk pointers towards each other, stopping on a hated item. * When both pointers have stopped, swap and move pointers by one * When pointers cross, you are done walking. * Swap pivot with G and repeat with the lower and upper half of the array. ## Sorting Theory To sort N objects, there are N! permutations. Need a binary tree of at least $\lceil \log_2(N!) \rceil$ $\log(N!) \in \Theta(N \log N)$ ## Small-Alphabet Sorts ### Counting Sort * Count number of occurrences of each item. * Iterate through list, using count array to decide where to put everything. * If N (Number of keys) is ≥ R (Size of alphabet), then we expect reasonable performance. ## Radix Sorts * Sorts objects using a fixed alphabet * When sorting integers, use bases ### LSD Radix Sort * Uses counting sort to sort each digit independently from rightmost digit towards left. * With keys of unequal length, considers empty spaces as less than all other characters ### MSD Radix Sort * Just like LSD, but sort from leftmost digit towards the right. * Make sure to sort each subproblem separately otherwise won't sort properly. ### Just-In-Time Compiler * Secretly optimizes code that executes * As your code runs, the “interpreter” is watching everything that happens. * If some segment of code is called many times, the interpreter actually studies and re-implements your code based on what it learned by watching WHILE ITS RUNNING * With JIT, MSD sort slower than MergeSort * Without JIT, MSD sort way faster than MergeSort ## Compression * Make common bitstreams more compact. * General idea: Exploit redundancy and existing order inside the sequence. * Sequences with no existing redundancy or order may actually get enlarged. * Shannon Fano Codes (Extra) ### Prefix-Free Codes * Encoding (bitstream to compressed bitstream) data structures/approaches: * Array of BitSequence[], to retrieve, can use character as index. * HashMap<Character, BitSequence> * Arrays are faster than HashMaps (just get the item from the array), but use more memory if some characters in the alphabet are unused. * Decoding (compressed bitstream back to bitstream): * Tries * Use `longestPrefixOf()` ### Huffman Compression * Calculate relative frequencies. * Assign each symbol to a node with weight = relative frequency. * Take the two smallest nodes and merge them into a super node with weight equal to sum of weights. * Repeat until everything is part of a tree. * Two possible philosophies for using Huffman Compression 1. For each input type (English text, Chinese text, images, Java source code, etc.), assemble huge numbers of sample inputs for that category. Use each corpus to create a standard code for English, Chinese, etc. 2. For every possible input file, create a unique code just for that file. Send the code along with the compressed file. * Pros/Cons: * Approach 1 will result in suboptimal encoding. * Approach 2 requires you to use extra space for the codeword table in the compressed bitstream. * For very large inputs, the cost of including the codeword table will become insignificant so Approach 2 is used in the real world. Huffman Compression 1. Count frequencies. 2. Build encoding array and decoding trie. 3. Write decoding trie to output file. 4. Write codeword for each symbol to output file. Huffman Decompression 1. Read in decoding trie. 2. Use codeword bits to walk down the trie, outputting symbols every time you reach a leaf. ## Misc. Info Queue Documentation ```java= Queue<Node> fringe = new Queue<>(); fringe.enqueue(Node); fringe.dequeue(Node); fringe.peek(); // Returns the item least recently added to this queue. ```