# CS170 MT1 Notes
1. MT1 Notes can be found [here](https://hackmd.io/@matthewtang/B1js2XPrS)
2. MT2 Notes can be found [here](https://hackmd.io/@matthewtang/B1NCuHWdB)
3. Final Notes can be found [here](https://hackmd.io/@matthewtang/r1PACERcH)
# Lecture 1: 8/28/19
## Cycles
* Two pointers, advance p1 twice, advance p2 once. If they ever end up at the same place, report cycle.
* If no cycle, slow pointer never catches fast cycle
* If cycle, pointers will enter cycle at some time and dist will decrease between the pointers
* Runtime: $O(n)$. n steps to cycle, n steps to catch up.
# Lecture 2: 8/30/19
## Multiplication
* Given n-bit numbers $x, y$.
* $k_{th}$ place of $xy$ = coefficient of $2^k$
* $a_k = \sum\limits_{i \leq k} x_i y_{k-i}$
* $xy = \sum\limits^{2n}_{k=0} s^k a_k$
* Number of basic operations: $\sum\limits_{k \leq 2n} = min(k, 2n-k) = \Theta(n^2)$
### Recursive Algorithm
* Divide x into 2 parts: $x_L, x_R = 2^{n/2} x_L + x_R$
* Similarly, y divided into 2 parts = $2^{n/2} y_L + y_R$
* Multiplying and collecting terms, $2^n x_L y_L + 2^{n/2}* (x_Ly_R + x_Ry_L) + x_Ry_R$.
* Write a recurrence relation: $T(n) = 4T(\frac{n}{2}) + O(n)$ so $T(n) = \Theta(n^2)$
### Comparing to Python
* Pythons runs in $O(n^{\text{log }_2 3})$
### Gauss's Trick
* $(a+ bi)(c + di) = (ac-bd) + (ad+bc)i$
* 4 multiplications: $ac, bd, ad, bd$
* $P_1 = (a+b)(c+d) = ac + ad + bc + bd$
* 4 multplications from one, all added up
* $P_2 = ac, P_3 = bd$
* $(ac - bd) = P_2 - P_3$
* $(ad + bc) = P_1 - P_2 - P_3$
### Gauss's Trick with Multiplication
* Just need 3 terms: $x_Ly_L, x_Ly_R + x_Ry_L, x_Ry_R$
* This makes it $T(n) = 3T(\frac{n}{2} + O(n))$ so $T(n) = \Theta(n^{\text{log }_2 3})$
### Exponents and Logs
* $(a^b)^c = (a^c)^b$
* $a \triangleq b^{\text{log }_b a}$
* $a^{\text{log }_b n} = n^{\text{log }_b a}$
* For $T(n) = 4T(\frac{n}{2} + cn)$
* Depth: $d = \text{log }_2 n$
* Total work = $cn + 2cn + 4cn \ldots cn^2 = O(n^2)$ (Geometric Series)
* For $T(n) = 3T(\frac{n}{2} + O(n))$
* Depth: $d = \text{log }_2 n$
* $3^{\text{log }_2n } = n^{\text{log }_2 3}$ base case problems
* Total work: $cn + \frac{3}{2}cn + \ldots + cn^{\text{log }_2 3} =O(n^{\text{log }_2 3})$
## Master's Theorem
* For $T(n) = aT\frac{n}{b} + O(n^d)$
* Depth: $k = \text{log }_b n$
* Level i work: $(\frac{a}{b^d})^i n^d$
* Total: $n^d \sum\limits^{\text{log }_b n}_{i=0} (\frac{a}{b^d})^i$
* If $d > \text{log }_b a$ first term dominates: $O(n^d)$
* If $d < \text{log }_b a$ second term dominates: $O(n^{\text{log }_b a})$
* If $d = \text{log }_b a$: $O({n^d \text{log }_b n})$
* Branching by a, diminishing by b, working by $O(f(n))$
* Leaves $n^{\text{log }_ba}$
## Matrix Multiplication
* Recurrence: $T(n) = 8T(\frac{n}{2}) + O(n^2)$
* Masters Theorem says its $O(n^3)$
* We can divide into subproblems to do 7 multplications instead of 8 to get $T(n) = 7T(\frac{n}{2}) + O(n^2)$
* Masters Theorem says its $O(n^{\text{log }_2 7})$
# Lecture 3: 9/4/19
## Recursive Mergesort
* Splitting: $O(n)$ time
* $T(n) = 2T(\frac{n}{2}) + O(n)$
### Iterative Mergesort
* Bottom up, use queues
* Make each element into a lists, put lists into a queue
* Merge first 2 lists, put in queue at end
* Each element touched once in a pass through so $O(n)$ time
* Each pass halves the number of lists
* $O(\log n)$ passes so $O(n \log n)$ total
## Sorting lower bound
* Comparison sort requires $\Omega (n \log n)$ comparisons
* Proof
* Input: $a_1, a_2, \ldots a_n$
* Represent output as permutation of $[1, \ldots ,n]$
* $n!$ outputs so algorithm must output any of the $n!$ permutations
* After a sequence of comparisons, terminates with 1 permutation
* If $a_i > a_j$, return subset of permutations $S_1$, else return $S_2$
* $S_1 \cup S_2 = S \implies \max(|S_1|, |S_2|) \geq |S|/2$
* Need at least $\log_2 n!$ comparisons to get 1 permutation
* $n! \geq (\frac{n}{e})^n \implies \log n! = \Omega (n \log n)$
## Median finding
* Find median element of a set of elements: $a_1, \ldots, a_n$
* Finding average:
* Takes $O(n)$ time
* Finding median:
* Sort and find median: $O(n \log n)$ time
* Proof
* Select kth smallest item
* Median: select $\lfloor n/2 \rfloor + 1$
* Select (k, S)
* Pseudocode
* Base Case: $k=1, |S| = 1$
* Choose a random pivot from S
* Form $S_L$ containing all elements $< v$, $S_v$, $S_R$ similarly
* If $k \leq |S_L|$, Select (k, $S_L$)
* elseif $k \leq |S_L| + |S_v|$, return v
* else Select$(k - |S_L| -|S_v|, S_R)$
* Worst case runtime: $\Theta(n^2)$
* Average runtime:
* $T(n) \leq T(\frac{3}{4}n) + O(n)$
### Deterministic Selection
* Instead of finding a random element, find an element that must be in the middle
* $P(n) \leq T(\frac{n}{9}) + O(n)$
# Lecture 4: 9/6/19
## Polynomial Review
* $P(x) = p_0 + p_1x + p_2 x^2 + \ldots p_d x^d$
* Representation: $<p_0 \ldots p_d>$, $x \in \mathbb{R}$
* Compute $p(\alpha) = \sum\limits^d_{i=0} p_i \alpha^i$
* Recall that $d+1$ points uniquely define a degree $d$ polynomial
## Horner's Method
* $p(\alpha) = 1 + \alpha(2 + \alpha(3 + \alpha (4)))$
* $p(\alpha) = p_0 + \alpha( p_1 + \alpha(p_2 \ldots))$
* Addition:
* Given $p = <p_0 \ldots p_d>$, $q = <q_0 \ldots 1_d>$
* $p+q(x) = <p_0 + 1_0, p_1 + 1_1, \ldots>$
* $\Theta(n)$ time
* Can also add the evaluations of $n+1$ points
* Multiplication:
* $p(x)*q(x)$
* Naive alg: Multiply all pairs and simplify
* $\Theta(d^2)$ time
* Goal: $\Theta(d \log d)$
* Using evaluations:
* $pq = <p(\alpha_1)q(\alpha_1), \ldots , p(\alpha_n)q(\alpha_n)>$
* This takes $\Theta(n)$ time
| | Coefficients | Evaluations |
|-|--------------|-----------------|
|Evaluation | $O(d)$ |
|Addition | $O(d)$ | $O(n)$ |
|Multiplication | $O(d^2)$ | $O(n)$ |
## Fast Polynomial Multiplication
* Convert coefficient form to evaluations
* $\Theta(d \log d)$
* Multiply the corresponding set of $n$ evaluations
* $\Theta(n)$
* Find coefficients of $p(x)q(x)$
* $\Theta(d \log d)$
### Convert from Coefficients to Evaluations
* Split
* $p_{odd}(z) = p_1 + p_3 z + p_5 z^2 + \ldots$
* $p_{even}(z) = p_0 + p_2 z + p_4 z^2 \ldots$
* $p(x) = p_{even}(x^2) + x * p_{odd}(x^2)$
* Evaluate
* $(p_{odd}, \{\alpha_1^2, \ldots \alpha_n^2\})$
* $(p_{even}, \{\alpha_1^2, \ldots \alpha_n^2\})$
* $\forall i = 1, \ldots n$
* $p(\alpha_i) = p_{even}(\alpha_i^2) + \alpha p_{odd}(\alpha_i^2)$
* $T[n,d] = 2T[n, d/2] + \Theta(n) = \Theta(nd)$
* $T[n, 1] = \Theta(n)$
# Lecture 6: 9/11/19
## FFT
* $A(x) = A_e(x^2) + x(A_o(x^2))$
* Roots of Unity
* n = 2^k
* Points are 1, $\omega, \omega^2, \ldots$
### Complex numbers
* Using a polar representation, multiply 2 numbers as follows:
* $(r_1r_2)e^{i(\theta_1 + \theta_2)} = (r_1r_2, \theta_1 + \theta_2)$
* $e^{\pi i} = -1$
* $\frac{2\pi}{n}$ pairs with $\frac{2\pi}{n} + \pi$
## Graphs
* 4 color theorem
* Graph representations
* Matrix representation
* Adjacency list
* Search (explore)
* Use a stack
* Once you visit, set the visited[v] to true
* for each edge (v,w) in E
* if not visited[w] explore (w)
* Property:
* All and only nodes reachable by A are returned
* Only: when u visited, stack contains nodes in a path from a to u
* All: If a node u is reachable, there is a path to it.
* Proof by contradiction
* Runtime
* Let $n = |V|, m = |E|$
* $T(n,m) \leq (d)T(n-1,m) + O(d)$
* Don't use recurrence
* We know that it touches every node once. Linear time $O(n)$
* Each edge is processed twice so $O(m)$
* Total: $O(n + m)$
# Lecture 7: 9/13/19
## Explore
* Set visited[v] = true
* previst(v)
* for each edge (v,w) in E
* if not visited[w], explore(w)
* postvisit(v)
* Runtime: $O(V + E)$
## Previsit and Postvisit
* Number the nodes with their connected component number (ccnum)
* Can put a clock on each pre/post call to count calls
* Clock ends with 2 * number of vertices
* For nodes $u, v$, [pre(u), post(u)] and [pre(v), post(v)] are either disjoint(not on stack at same time) or 1 is contained (stack on same time) in the other
* Due to stack properties
* Clock interval properties (Undirected)
* Tree edge iff $[pre[v],post[v]] \in [pre[u],post[u]]$
* u on stack before v
* Back edge iff $[pre[u],post[u]] \in [pre[v],post[v]]$.
* Creates a cycle
* No edge between $u,v$ if disjoint intervals
* Clock interval properties (Directed)
* Tree edge(u,v):
* Edge exists in the DFS tree
* Forward edge(u,v): int(u) in int(v)
* Edge to descendent
* Back edge(u,v): int(v) contains int(u)
* Edge to ancestor
* Cross edge (u,v): int(v) before int(u)
* v is already explored before u is visited
* Graph has a cycle iff has a back edge
* Run DFS, $O(V+E)$ time to check if graph is acyclic
## Topological Sort
* Every edge(u,v) in a DAG has post(u) > post(v)
* DAG has no back edges
* Output in reverse post order number
* Another algorithm:
* Source: node w/o incoming arcs
* Sink: Node w/o outgoing arcs
* Every DAG has at least 1 source and sink
* Find a source, output, repeat
# Lecture 8: 9/16/19
## Directed Graphs
* [pre(u), post(u)] is "clock interval on stack"
* Older = ancestor
* No back edges $\implies$ no back edges
* Topological order - edge(u,v) means u before v
* Cannot do topological sort if cycles
* Reverse post-order is topological order
* Strongly connected - 2 nodes are connected when there is a path to and from u and v
* Nodes are strongly connected to themselves
* Collapsing strongly connected components (SCCs) yields a DAG
* Every directed graph is a DAG of strongly connected components
* Sink component - no outgoing arcs
* Run explore on node in sink component
* Output visited nodes
* Repeat
* But, we don't know what the sink component is
* The node with the highest post order number is in a source component
* If C and C' are SCCs with an edge from C to C', highest post# of a node in C larger than post# of any node in C'
* Only true for highest post# node in C, not all
* Reversing edges makes the source component the sink component
## SCC Algorithm
* DFS on $G^R$ to compute post()
* Highest post # vertex v in $G^R$ in sink component of G
* Output nodes visited in explore(v)
# Lecture 9: 9/18/19
## Paths
* DFS - use stack, but is not shortest path
* BFS - use queue
* Alarm algorithm
* Set alarms so we don't need to keep waiting when we explore dummy nodes
* Dijkstra's Algorithm: $O(V+E \log V)$
* O(V) deletemins, O(E) insert/decrease keys
* For dense graphs, $O(V \log V)$ using a fibonacci heap
## D-ary Heap
* Degree $d$, Depth $\log_d n$
* Insert/DecreaseKey $\log n / \log d$
* Delete min $d \log n / \log d$
# Lecture 10: 9/20/19
## Dijkstra's
* Inductive proof
## Bellman-Ford
* Negative weights
* Update edge $(u,v): d(v) = \min(d(v), d(u)+l(u,v))$
* $d(v)$ is correct if u is 2nd to last node on shortest path, $d(u)$ is correct
* Never makes $d(v)$ too small
* Update all edges $|V| -1$ times
* Paths of length $k$ correct after iteration $k$
* Time: $O(|V||E|)$
* This really isn't clever at all
* Can we do better..? Nobody knows
### Negative Cycles
* Assumes length of shortest path is at most $n-1$
* No cycles b/c cycles only add length unless negative cycle
* Doesn't work if there is a negative cycle
* Don't even know if this works for all cases!
## DAGs
* Dijkstra: Directed graph with positive edge lengths
* $O(M+N \log N)$
* Bellman-Ford: Directed graph with arbitrary edge lengths
* $O(MN)$ time
* $O(MN)$ time to detect a negative cycle
# Lecture 11: 9/23/19
## Tree Definitions
* $n-1$ edges, connected, no cycles, all pairs of vertices connected by unique path
## Minimum Spanning Trees
* If negative edge weights, restrict to tree
* MST is not SPT
* Cut Property
* Divide vertices into 2 sets arbitrarily.
* The minimum crossing edge for ANY cut is in the MST
* Kruskal's Algorithm
* Add the lowest weight edge to the tree as long as it doesn’t create a cycle
* Sort edges first
* Runtime: $O(E \log V)$
* Use disjoint set
## Union-Find Data Structure
* Pointer implementation $\pi(u)$
* makeset(s) - $\pi(u) = u$
* Make singleton set of s
* find(x) - returns root of pointer structure
* Finds set containing x
* union(x,y) - $\pi(\text{find}(x)) = \pi(\text{find}(y))$
* Merge sets containing x and y (add edge)
* Union by rank: $O(\log n)$
# Lecture 12: 9/25/19
## Prim's Algorithm
* Find smallest edge to connect to existing MST
* Choose an arbitrary source node
* Repeatedly add the shortest edge to connect a new vertex
* Very similar to Dijkstra's except consider distance from entire tree (any vertex in tree) instead of from source
* Use a priority queue
* Time: $O((V+E) \log V)$
* Or $O(m + n \log n)$ using Fibonnaci heap
## Huffman Coding
* Symbols s with frequencies
* Prefix free code
* Morse can be ambiguous without pauses
* No letter code is a prefix of another
* Expected length of fixed length encoding for N chars: 2N
* Expected length using prefix free code: 1.9N
### Prefix Code Tree
* Binary tree with symbols at leaves
* Given a prefix free code $S = \{s_1, s_2 \ldots s_n \}$, symbols $\{c_1, \ldots, c_n\}$
* Left: All strings that start with 0, remove first bit and recurse
* Right: All strings that start with 1, remove first bit and recurse
* Every symbol/codeword corresponds to a leaf
* Want $\min \sum_\limits i f_i(\text{depth of }i)$
* Cost: $\sum\limits_s$ depth(s) * freq(s)
* Alternative cost: sum of frequencies of internal nodes (recursive)
* Algorithm: merge lowest frequency symbols, recurse
* Put all the nodes into a PQ and recursively merge lowest symbols
* Exchange argument $\implies$ exists optimal tree with this structure