# CS61B MT2 Cheat Sheet Last updated 5/7/23 by Emily Hsi, created by Matthew Tang and Christianna Xu ## Asymptotics Runtime * $\Theta$: there exists positive constants k1 and k2 such that $k_1f_1(N)\leq R(N)\leq k_2f_2(N)$ * Important sums: * Sum of first $n$ natural numbers = $\frac{n(n+1)}{2} \implies \Theta(n^2)$ * Sum of first $n$ powers of 2 = $2Q -1 \implies \Theta(n)$ * $\Sigma_{i=0}^{N-1} 2^i \implies \Theta(2^n)$ * $1 + ... (\frac{n}{2})^2 + n^2 \implies \Theta(n^2)$ * $1 + 2 + 4 + 8 + ... N^2 \implies \Theta(n^2)$ * Recursion: $\text{branches}^{height}$ * Ex. $2$ branches, $\text{log}_2 N$ height $\implies \Theta(n)$ * General formula: $\boxed{\Sigma_{layers} \frac{\text{nodes}}{\text{layer}} \frac{\text{work}}{\text{node}} = \Sigma_{layers} \frac{\text{work}}{\text{layer}}}$ * Any polynomial > power of log ($log^k(N)$) * Amortized = average runtime (ex. when resizing arrays) ## Sorting and Search: Binary search and Mergesort * Binary Search * Repeatedly divide the search interval in half until the value is found or the interval is empty ```java= Let m = middle element Let v = value (searching) if m == v: return true; elif m > v: search lower half elif m < v: search upper half ``` * Mergesort * Break down array into separate arrays with one element each then merge them back together to form a sorted array | Algorithm | Best | Average | Worst | -------- | -------- | -------- | ------- | | Binary Search | $\Theta(1)$ | $O (\text{log }N)$ | $O (\text{log }N)$ | | QuickSort | $O (N \text{ log }N)$ | $O (N \text{ log }N)$ | $O(N^2)$ | | MergeSort | $O (N \text{ log }N)$ | $O (N \text{ log }N)$ | $O (N \text{ log }N)$ | | HeapSort | $O (N \text{ log }N)$ | $O (N \text{ log }N)$ | $O (N \text{ log }N)$ | | Selection Sort | $O(N^2)$ | $O(N^2)$ | $O(N^2)$ | ## Disjoint Sets * Operations * `connect(int p, int q)` also called `union()` * `isConnected(int p, int q)` also called `connected()` * `find(int p)` finds the root of the set p belongs to * Goal: * Support large number of elements $N$ and method calls $M$ * Structure: * Store an array of integers. Indicies represent ID, values represent the parent. * Quick find (obsolete): Indicies represent ID, values represent set number * Quick union`connect(p, q)`: Set the root of p to be q. * Parent = $-1$ if the element has no parent * Weighted quick union `connect(p, q)`: Connect the smaller branch to the larger branch. * Parent = $-k$ where k is the number of nodes (including itself) * When calling `connect(p, q)`, modify the parent node sizes appropriately * To break ties, connect p to q * Path compression: * Point each node along the way of `find()` to the root | Implementation | Constructor | `connect` | `isConnected` | -------- | -------- | -------- | ------- | | QuickFindDS | $\Theta(N)$ | $\Theta(N)$ | $\Theta(1)$ | | QuickUnionDS | $\Theta(N)$ | $O(N)$ | $O(N)$ | | Weighted QU | $\Theta(N)$ | $O (\text{log }N)$ | $O (\text{log }N)$ | Path compression: $O( N + M \text{ log*}N)$ <!-- ## Stack * Operations: * `push(int x)`: Puts x on top of the stack. * `int pop()`: Removes and returns the top item from the stack --> ## Types of collections * List - Linked List, ArrayList (ordered list) * Set - HashSet, TreeSet * Map - HashMap, TreeMap * Abstract - Weighted QU, Priority Queue, Trie, Kd-tree ## Runtimes (overall) | Data Structure | `add` | `remove` | `contains` | `getSmallest` | `removeSmallest` | | | | | -------------- | ----------- | ----------- |:-----------------:|:-------------:|:----------------:| --- | --- | --- | | Ordered Array | $O(N)$ | $\Theta(N)$ | $O(\text{log }N)$ | $\Theta(1)$ | $\Theta(N)$ | | | | | BST | $O(N)$ | $O(N)$ | $O(N)$ | $O(N)$ | $O(N)$ | | | | | B-Tree/LLRB | $O(logN)$ | $O(logN)$ | $O(logN)$ | $O(logN)$ | $O(logN)$ | | | | | Hash Table | $\Theta(1)$ | $\Theta(1)$ | $\Theta(1)$ | $O(N)$ | $O(N)$ | | | | | Heap | $O(logN)$ | N/A | $O(N)$ | $\Theta(1)$ | $O(logN)$ | | | | | Trie | $\Theta(1)$ | $\Theta(1)$ | $\Theta(N)$ | N/A | N/A | | * Average BSTs are $\text{log N}$ but worst case are $N$. * We assume hash tables have evenly spread objects * Heap is $\Theta(\text{log }N)$ time **AMORTIZED** (not including resizing) * Resize = $\Theta(N)$ <!-- | | Ordered Array | Bushy BST | Hash Table | Heap | -------- | -------- | -------- | ------- | ------- | | `add` | $\Theta(N)$ | $\Theta(\text{log }N)$ | $\Theta(1)$ | $\Theta(\text{log }N)$ | | `getSmallest` | $\Theta(1)$ | $\Theta(\text{log }N)$ | $\Theta(N)$ | $\Theta(1)$ | | `removeSmallest` | $\Theta(N)$ | $\Theta(\text{log }N)$ | $\Theta(N)$ | $\Theta(\text{log }N)$ | Height of BST: $\Omega(\text{log N}), O(N)$ Height of B-Tree: $O(\text{log }N)$ --> ## Trees ### BST * Every key in the left subtree is less than X's key, right is greater. * Worst case height/contains/add is $\Theta(N)$, best case $\Theta (\text{log }N)$ * Operations * `insert()` go left if less than X's key, right if greater * `delete()` promote the rightmost element of the left branch or the leftmost element of the right branch to root. ### B-Tree / 2-3 Tree / 2-3-4 Tree * Add new nodes horizontally in a leaf * Once a node exceeds an arbitrary limit (2 or 3), send a middle element up to parent node * Height only increases once root node splits * Invariants: * All leaves must be the same distance from the source. * A non-leaf node with k items must have exactly k+1 children. ### Red-Black Tree (LLRB) * Represent a 2-3 Tree as a BST using "red links" as glue * Balanced by construction * Invariants: * LLRBs can only have red links on left side (cannot have 2 red links) * All leaves have same number of black links to root * Operations are $O (\text{log }N)$ * Operations: * `insert()`: insert as usual in BST, then rotate accordingly * Always use red link when inserting * If there is a red link on right, `rotateLeft` the parent * If there are 2 consecutive red left links, `rotateRight` the parent of the top red link * If a node has 2 red children, `colorFlip` all edges (to emulate splitting) ![](https://i.imgur.com/bcqUR02.png) ## Hashing * Integers overflow at $2,147,483,647$ which can cause hash collisions (inevitable) * Store items in `buckets` (an `ArrayList` or `LinkedList`) * If bucket `h` is empty, create a new list with element `x` * Otherwise add it to the list at bucket `h` ### Hashtable * Use modulo operator `%` to reduce bucket count to create a `HashTable` * Use `Math.floorMod()` to avoid negative indices * Or `(key.hashCode() & 0x7FFFFFFF) % buckets;` * Increase the number of buckets `M`, `N` items, such that $\frac{N}{M} > 1.5$, then double M * $\frac{N}{M}$ is the load factor * Worst case for all operations is $\Theta (\frac{N}{M}) = \Theta (1)$ on average, assuming evenly spread items * Resizing takes $\Theta (N)$ time for some `add` operations * All objects implement a `.hashCode()` method * Warnings * Never store keys that change or their hashCode will change and get lost * Never override `.equals()` without overriding `.hashCode()` ## Heaps and PQs ### Priority Queues: * Allow tracking and removal of the smallest item ```java= public interface MinPQ<Item> { public void add(Item x); Adds the item to the priority queue. public Item getSmallest(); Returns smallest item in the priority queue public Item removeSmallest(); Removes smallest item from the priority queue public int size(); Returns the size of the priority queue. } ``` ### Heaps: * Represented by BSTs * Keys are stored in an array with zeroth element blank, first being the root of the tree with the second and third element being its children and so on * Binary tree must be **complete** and obey the **min-heap property** * Min-heap: Every node is less than or equal to both of its children * Complete: Missing items only at the bottom level (if any), all nodes are as far left as possible. * `leftChildIndex = 2k` * `rightchildIndex = 2k+1` * Operations: * `add(x)`: Add to end of heap temporarily then swim up hierarchy to rightful place * `removeSmallest()`: Remove the root, put the last item in the heap into the root, then sink down the hierarchy to its rightful place * Heap Implementation of Priority Queue ## Tries * Basic idea: Store each letter of the string as a node in a tree. * Mark the end of each key/string * Trie Implementations: * DataIndexedCharMap: Very fast, but memory hungry. * Search runtime: $\Theta(1)$ * Hash Table: Almost as fast, uses less memory. * Search runtime: $O(R)$ where R is size of alphabet * Balanced BST: A little slower than Hash Table and uses a similar amount of memory as Hash Table * Search runtime: $O (\text{log }R)$ where R is size of alphabet * Don't have to worry very much about runtime because alphabet size is fixed: can think of all three as $\Theta(1)$ * Trie Operations: * `collect()`: Collects all keys in a trie * `keysWithPrefix(str)`: Collects all keys starting with prefix `str` * `longestPrefixOf(key)`: Returns the longest prefix of a key in the Trie ## K-d Trees * Root nodes partition by left-right (by x) * Alternates axis partitioning at every depth level * Explore the better side first * If the worse side has a possible point closer than current best, explore that * Compare the appropriate axis value(x if partition by x) to see if that distance is < best * Tie break: equal in 1 dimension put as right child ## Graphs Graph API: ```java= public Graph(int V): Create empty graph with v vertices public void addEdge(int v, int w): add an edge v-w Iterable<Integer> adj(int v): vertices adjacent to v int V(): number of vertices int E(): number of edges ``` ### Traversals * Preorder/DFS: Root, left, right * Inorder: Left, root, right * Postorder/DFS: Left, right, root * Level order/BFS: Top to bottom, left to right ### DFS * Find a path from s to every reachable vertex. * Recursively search children until you reach the end and return T/F * Constructor: $O(V +E)$ time, $\Theta (V)$ space ```java= private boolean[] marked; private int[] edgeTo; private void dfs(Graph G, int v) { marked[v] = true; for (int w : G.adj(v)) { if (!marked[w]) { edgeTo[w] = v; dfs(G, w); }}} ``` ### BFS * Find a shortest path from s to every reachable vertex. * Initialize a queue(fringe) with a starting vertex `s` and mark that vertex. * Use an adjacency list (array of lists, index = vertex number) * Distance to all items on queue is always $k$ or $k + 1$ for some $k$ * Runtime * Constructor: $O(V +E)$ time, $\Theta (V)$ space * Printing all connected vertices: $\Theta (V + E)$ * Space used: $\Theta (V + E)$ ```java= private boolean[] marked; //marked[v] is true iff v connected to s private int[] edgeTo; //edgeTo[v] is previous vertex on path from s to v private void bfs(Graph G, int s) { Queue<Integer> fringe = new Queue<Integer>(); fringe.enqueue(s); marked[s] = true; while (!fringe.isEmpty()) { int v = fringe.dequeue(); for (int w : G.adj(v)) { if (!marked[w]) { fringe.enqueue(w); marked[w] = true; edgeTo[w] = v; distTo[w] = distTo[v] + 1; //if keeping track of distance }}}} ``` ## Shortest Paths * BFS vs DFS: Both accurate, same time efficiency, BFS guarantees shortest paths * DFS is worse for spindly graphs, BFS worse for bushy graphs * BFS does not use edge weights * Terminology * All shortest paths from a source vertex `s` will be a tree * Edge relaxation = use the edge only if it yields a better distance, add that to the SPT ### Dijkstra's Algorithm * Visit vertices in order of best-known distance from source and relax every edge of visited vertex. * Relaxing is $log$ runtime b/c PriorityQueue) * Guaranteed to give best result assuming non-negative edges * Use to find shortest path of 1 vertex to all others ```java= //Dijkstra's Algorithm Pseudocode PQ.add(source, 0); for v in vertices: PQ.add(v, infinity) while PQ is not empty: p = PQ.removeSmallest() relax all outgoing edges from p //Relaxing an edge p → q with weight w: if distTo[p] + w < distTo[q]: distTo[q] = distTo[p] + w edgeTo[q] = p PQ.changePriority(q, distTo[q]) ``` * Invariants and properties * `edgeTo[v]` is the best known predecessor of `v`. * `distTo[v]` is the best known total distance from source to `v`. * PQ contains all unvisited vertices in order of `distTo`. * Always visits vertices in order of total distance from source. * Relaxation always fails on edges to visited (white) vertices. * Runtime * PQ `add`: $O( V \text{ log } V)$ * PQ `removeSmallest`: $O( V \text{ log } V)$ * PQ `changePriority`: $O( E \text{ log } V)$ * Overall: $O( V \text{ log } V) + O( V \text{ log } V) + O( E \text{ log } V)$ ### A* Algorithm * Use a heuristic to guide algorithm in the right direction * Visit vertices in order of `d(source, v) + h(v, goal)` * `h` is a heuristic (does not change while algorithm runs) * Constant heuristics means it just does Dijkstra's * Want `h` to be admissible(cannot overestimate the actual shortest path) and consistent: $h[v] \leq \text{ weight}(v,w) + h[w]$ for each neighbor of w * Not every vertex is visited * Result is not a shortest paths tree, only shortest path to goal ## MST * A MST T is a subgraph of G. * Connected, acyclic, includes all vertices * Spanning tree whose edge weights have the minimum sum * Similar to shortest path trees (SPT) but not always the same * SPT depends on source vertex while MST does not * Edge weights not unique $\implies$ multiple different MSTs * Cut property: * Divide vertices into 2 sets arbitrarily. * The minimum crossing edge for ANY cut is in the MST * Prim's Algorithm * Choose an arbitrary source node * Repeatedly add the shortest edge to connect a new vertex * Very similar to Dijkstra's except consider distance from entire tree (any vertex in tree) instead of from source * Runtime: $O(E\text{ log }V)$ (Assuming $E > V$) * Krusgal's Algorithm * Add the lowest weight edge to the tree as long as it doesn't create a cycle * Runtime: $O(E\text{ log }E)$, if pre-sorted list of edges: $O(E\text{ log* }V)$ (using a WQUPC)