# CS61B MT2 Cheat Sheet
Last updated 5/7/23 by Emily Hsi, created by Matthew Tang and Christianna Xu
## Asymptotics Runtime
* $\Theta$: there exists positive constants k1 and k2 such that $k_1f_1(N)\leq R(N)\leq k_2f_2(N)$
* Important sums:
* Sum of first $n$ natural numbers = $\frac{n(n+1)}{2} \implies \Theta(n^2)$
* Sum of first $n$ powers of 2 = $2Q -1 \implies \Theta(n)$
* $\Sigma_{i=0}^{N-1} 2^i \implies \Theta(2^n)$
* $1 + ... (\frac{n}{2})^2 + n^2 \implies \Theta(n^2)$
* $1 + 2 + 4 + 8 + ... N^2 \implies \Theta(n^2)$
* Recursion: $\text{branches}^{height}$
* Ex. $2$ branches, $\text{log}_2 N$ height $\implies \Theta(n)$
* General formula: $\boxed{\Sigma_{layers} \frac{\text{nodes}}{\text{layer}} \frac{\text{work}}{\text{node}} = \Sigma_{layers} \frac{\text{work}}{\text{layer}}}$
* Any polynomial > power of log ($log^k(N)$)
* Amortized = average runtime (ex. when resizing arrays)
## Sorting and Search: Binary search and Mergesort
* Binary Search
* Repeatedly divide the search interval in half until the value is found or the interval is empty
```java=
Let m = middle element
Let v = value (searching)
if m == v:
return true;
elif m > v:
search lower half
elif m < v:
search upper half
```
* Mergesort
* Break down array into separate arrays with one element each then merge them back together to form a sorted array
| Algorithm | Best | Average | Worst
| -------- | -------- | -------- | ------- |
| Binary Search | $\Theta(1)$ | $O (\text{log }N)$ | $O (\text{log }N)$ |
| QuickSort | $O (N \text{ log }N)$ | $O (N \text{ log }N)$ | $O(N^2)$ |
| MergeSort | $O (N \text{ log }N)$ | $O (N \text{ log }N)$ | $O (N \text{ log }N)$ |
| HeapSort | $O (N \text{ log }N)$ | $O (N \text{ log }N)$ | $O (N \text{ log }N)$ |
| Selection Sort | $O(N^2)$ | $O(N^2)$ | $O(N^2)$ |
## Disjoint Sets
* Operations
* `connect(int p, int q)` also called `union()`
* `isConnected(int p, int q)` also called `connected()`
* `find(int p)` finds the root of the set p belongs to
* Goal:
* Support large number of elements $N$ and method calls $M$
* Structure:
* Store an array of integers. Indicies represent ID, values represent the parent.
* Quick find (obsolete): Indicies represent ID, values represent set number
* Quick union`connect(p, q)`: Set the root of p to be q.
* Parent = $-1$ if the element has no parent
* Weighted quick union `connect(p, q)`: Connect the smaller branch to the larger branch.
* Parent = $-k$ where k is the number of nodes (including itself)
* When calling `connect(p, q)`, modify the parent node sizes appropriately
* To break ties, connect p to q
* Path compression:
* Point each node along the way of `find()` to the root
| Implementation | Constructor | `connect` | `isConnected`
| -------- | -------- | -------- | ------- |
| QuickFindDS | $\Theta(N)$ | $\Theta(N)$ | $\Theta(1)$ |
| QuickUnionDS | $\Theta(N)$ | $O(N)$ | $O(N)$ |
| Weighted QU | $\Theta(N)$ | $O (\text{log }N)$ | $O (\text{log }N)$ |
Path compression: $O( N + M \text{ log*}N)$
<!--
## Stack
* Operations:
* `push(int x)`: Puts x on top of the stack.
* `int pop()`: Removes and returns the top item from the stack
-->
## Types of collections
* List - Linked List, ArrayList (ordered list)
* Set - HashSet, TreeSet
* Map - HashMap, TreeMap
* Abstract - Weighted QU, Priority Queue, Trie, Kd-tree
## Runtimes (overall)
| Data Structure | `add` | `remove` | `contains` | `getSmallest` | `removeSmallest` | | | |
| -------------- | ----------- | ----------- |:-----------------:|:-------------:|:----------------:| --- | --- | --- |
| Ordered Array | $O(N)$ | $\Theta(N)$ | $O(\text{log }N)$ | $\Theta(1)$ | $\Theta(N)$ | | | |
| BST | $O(N)$ | $O(N)$ | $O(N)$ | $O(N)$ | $O(N)$ | | | |
| B-Tree/LLRB | $O(logN)$ | $O(logN)$ | $O(logN)$ | $O(logN)$ | $O(logN)$ | | | |
| Hash Table | $\Theta(1)$ | $\Theta(1)$ | $\Theta(1)$ | $O(N)$ | $O(N)$ | | | |
| Heap | $O(logN)$ | N/A | $O(N)$ | $\Theta(1)$ | $O(logN)$ | | | |
| Trie | $\Theta(1)$ | $\Theta(1)$ | $\Theta(N)$ | N/A | N/A | |
* Average BSTs are $\text{log N}$ but worst case are $N$.
* We assume hash tables have evenly spread objects
* Heap is $\Theta(\text{log }N)$ time **AMORTIZED** (not including resizing)
* Resize = $\Theta(N)$
<!--
| | Ordered Array | Bushy BST | Hash Table | Heap
| -------- | -------- | -------- | ------- | ------- |
| `add` | $\Theta(N)$ | $\Theta(\text{log }N)$ | $\Theta(1)$ | $\Theta(\text{log }N)$ |
| `getSmallest` | $\Theta(1)$ | $\Theta(\text{log }N)$ | $\Theta(N)$ | $\Theta(1)$ |
| `removeSmallest` | $\Theta(N)$ | $\Theta(\text{log }N)$ | $\Theta(N)$ | $\Theta(\text{log }N)$ |
Height of BST: $\Omega(\text{log N}), O(N)$
Height of B-Tree: $O(\text{log }N)$
-->
## Trees
### BST
* Every key in the left subtree is less than X's key, right is greater.
* Worst case height/contains/add is $\Theta(N)$, best case $\Theta (\text{log }N)$
* Operations
* `insert()` go left if less than X's key, right if greater
* `delete()` promote the rightmost element of the left branch or the leftmost element of the right branch to root.
### B-Tree / 2-3 Tree / 2-3-4 Tree
* Add new nodes horizontally in a leaf
* Once a node exceeds an arbitrary limit (2 or 3), send a middle element up to parent node
* Height only increases once root node splits
* Invariants:
* All leaves must be the same distance from the source.
* A non-leaf node with k items must have exactly k+1 children.
### Red-Black Tree (LLRB)
* Represent a 2-3 Tree as a BST using "red links" as glue
* Balanced by construction
* Invariants:
* LLRBs can only have red links on left side (cannot have 2 red links)
* All leaves have same number of black links to root
* Operations are $O (\text{log }N)$
* Operations:
* `insert()`: insert as usual in BST, then rotate accordingly
* Always use red link when inserting
* If there is a red link on right, `rotateLeft` the parent
* If there are 2 consecutive red left links, `rotateRight` the parent of the top red link
* If a node has 2 red children, `colorFlip` all edges (to emulate splitting)

## Hashing
* Integers overflow at $2,147,483,647$ which can cause hash collisions (inevitable)
* Store items in `buckets` (an `ArrayList` or `LinkedList`)
* If bucket `h` is empty, create a new list with element `x`
* Otherwise add it to the list at bucket `h`
### Hashtable
* Use modulo operator `%` to reduce bucket count to create a `HashTable`
* Use `Math.floorMod()` to avoid negative indices
* Or `(key.hashCode() & 0x7FFFFFFF) % buckets;`
* Increase the number of buckets `M`, `N` items, such that $\frac{N}{M} > 1.5$, then double M
* $\frac{N}{M}$ is the load factor
* Worst case for all operations is $\Theta (\frac{N}{M}) = \Theta (1)$ on average, assuming evenly spread items
* Resizing takes $\Theta (N)$ time for some `add` operations
* All objects implement a `.hashCode()` method
* Warnings
* Never store keys that change or their hashCode will change and get lost
* Never override `.equals()` without overriding `.hashCode()`
## Heaps and PQs
### Priority Queues:
* Allow tracking and removal of the smallest item
```java=
public interface MinPQ<Item> {
public void add(Item x); Adds the item to the priority queue.
public Item getSmallest(); Returns smallest item in the priority queue
public Item removeSmallest(); Removes smallest item from the priority queue
public int size(); Returns the size of the priority queue.
}
```
### Heaps:
* Represented by BSTs
* Keys are stored in an array with zeroth element blank, first being the root of the tree with the second and third element being its children and so on
* Binary tree must be **complete** and obey the **min-heap property**
* Min-heap: Every node is less than or equal to both of its children
* Complete: Missing items only at the bottom level (if any), all nodes are as far left as possible.
* `leftChildIndex = 2k`
* `rightchildIndex = 2k+1`
* Operations:
* `add(x)`: Add to end of heap temporarily then swim up hierarchy to rightful place
* `removeSmallest()`: Remove the root, put the last item in the heap into the root, then sink down the hierarchy to its rightful place
* Heap Implementation of Priority Queue
## Tries
* Basic idea: Store each letter of the string as a node in a tree.
* Mark the end of each key/string
* Trie Implementations:
* DataIndexedCharMap: Very fast, but memory hungry.
* Search runtime: $\Theta(1)$
* Hash Table: Almost as fast, uses less memory.
* Search runtime: $O(R)$ where R is size of alphabet
* Balanced BST: A little slower than Hash Table and uses a similar amount of memory as Hash Table
* Search runtime: $O (\text{log }R)$ where R is size of alphabet
* Don't have to worry very much about runtime because alphabet size is fixed: can think of all three as $\Theta(1)$
* Trie Operations:
* `collect()`: Collects all keys in a trie
* `keysWithPrefix(str)`: Collects all keys starting with prefix `str`
* `longestPrefixOf(key)`: Returns the longest prefix of a key in the Trie
## K-d Trees
* Root nodes partition by left-right (by x)
* Alternates axis partitioning at every depth level
* Explore the better side first
* If the worse side has a possible point closer than current best, explore that
* Compare the appropriate axis value(x if partition by x) to see if that distance is < best
* Tie break: equal in 1 dimension put as right child
## Graphs
Graph API:
```java=
public Graph(int V): Create empty graph with v vertices
public void addEdge(int v, int w): add an edge v-w
Iterable<Integer> adj(int v): vertices adjacent to v
int V(): number of vertices
int E(): number of edges
```
### Traversals
* Preorder/DFS: Root, left, right
* Inorder: Left, root, right
* Postorder/DFS: Left, right, root
* Level order/BFS: Top to bottom, left to right
### DFS
* Find a path from s to every reachable vertex.
* Recursively search children until you reach the end and return T/F
* Constructor: $O(V +E)$ time, $\Theta (V)$ space
```java=
private boolean[] marked;
private int[] edgeTo;
private void dfs(Graph G, int v) {
marked[v] = true;
for (int w : G.adj(v)) {
if (!marked[w]) {
edgeTo[w] = v;
dfs(G, w);
}}}
```
### BFS
* Find a shortest path from s to every reachable vertex.
* Initialize a queue(fringe) with a starting vertex `s` and mark that vertex.
* Use an adjacency list (array of lists, index = vertex number)
* Distance to all items on queue is always $k$ or $k + 1$ for some $k$
* Runtime
* Constructor: $O(V +E)$ time, $\Theta (V)$ space
* Printing all connected vertices: $\Theta (V + E)$
* Space used: $\Theta (V + E)$
```java=
private boolean[] marked; //marked[v] is true iff v connected to s
private int[] edgeTo; //edgeTo[v] is previous vertex on path from s to v
private void bfs(Graph G, int s) {
Queue<Integer> fringe = new Queue<Integer>();
fringe.enqueue(s);
marked[s] = true;
while (!fringe.isEmpty()) {
int v = fringe.dequeue();
for (int w : G.adj(v)) {
if (!marked[w]) {
fringe.enqueue(w);
marked[w] = true;
edgeTo[w] = v;
distTo[w] = distTo[v] + 1; //if keeping track of distance
}}}}
```
## Shortest Paths
* BFS vs DFS: Both accurate, same time efficiency, BFS guarantees shortest paths
* DFS is worse for spindly graphs, BFS worse for bushy graphs
* BFS does not use edge weights
* Terminology
* All shortest paths from a source vertex `s` will be a tree
* Edge relaxation = use the edge only if it yields a better distance, add that to the SPT
### Dijkstra's Algorithm
* Visit vertices in order of best-known distance from source and relax every edge of visited vertex.
* Relaxing is $log$ runtime b/c PriorityQueue)
* Guaranteed to give best result assuming non-negative edges
* Use to find shortest path of 1 vertex to all others
```java=
//Dijkstra's Algorithm Pseudocode
PQ.add(source, 0);
for v in vertices:
PQ.add(v, infinity)
while PQ is not empty:
p = PQ.removeSmallest()
relax all outgoing edges from p
//Relaxing an edge p → q with weight w:
if distTo[p] + w < distTo[q]:
distTo[q] = distTo[p] + w
edgeTo[q] = p
PQ.changePriority(q, distTo[q])
```
* Invariants and properties
* `edgeTo[v]` is the best known predecessor of `v`.
* `distTo[v]` is the best known total distance from source to `v`.
* PQ contains all unvisited vertices in order of `distTo`.
* Always visits vertices in order of total distance from source.
* Relaxation always fails on edges to visited (white) vertices.
* Runtime
* PQ `add`: $O( V \text{ log } V)$
* PQ `removeSmallest`: $O( V \text{ log } V)$
* PQ `changePriority`: $O( E \text{ log } V)$
* Overall: $O( V \text{ log } V) + O( V \text{ log } V) + O( E \text{ log } V)$
### A* Algorithm
* Use a heuristic to guide algorithm in the right direction
* Visit vertices in order of `d(source, v) + h(v, goal)`
* `h` is a heuristic (does not change while algorithm runs)
* Constant heuristics means it just does Dijkstra's
* Want `h` to be admissible(cannot overestimate the actual shortest path) and consistent: $h[v] \leq \text{ weight}(v,w) + h[w]$ for each neighbor of w
* Not every vertex is visited
* Result is not a shortest paths tree, only shortest path to goal
## MST
* A MST T is a subgraph of G.
* Connected, acyclic, includes all vertices
* Spanning tree whose edge weights have the minimum sum
* Similar to shortest path trees (SPT) but not always the same
* SPT depends on source vertex while MST does not
* Edge weights not unique $\implies$ multiple different MSTs
* Cut property:
* Divide vertices into 2 sets arbitrarily.
* The minimum crossing edge for ANY cut is in the MST
* Prim's Algorithm
* Choose an arbitrary source node
* Repeatedly add the shortest edge to connect a new vertex
* Very similar to Dijkstra's except consider distance from entire tree (any vertex in tree) instead of from source
* Runtime: $O(E\text{ log }V)$ (Assuming $E > V$)
* Krusgal's Algorithm
* Add the lowest weight edge to the tree as long as it doesn't create a cycle
* Runtime: $O(E\text{ log }E)$, if pre-sorted list of edges: $O(E\text{ log* }V)$ (using a WQUPC)