CS170 MT2 Notes

# CS170 MT2 Notes 1. MT1 Notes can be found [here](https://hackmd.io/@matthewtang/B1js2XPrS) 2. MT2 Notes can be found [here](https://hackmd.io/@matthewtang/B1NCuHWdB) 3. Final Notes can be found [here](https://hackmd.io/@matthewtang/r1PACERcH) # Lecture 13: 9/27/19 ## Horn SAT * SAT formula = satisfiability formula * Any formula with clauses that you must satisfy * A Horn SAT consists of implications and negative clauses * Implications * AND of positive literals imply 1 positive literal * $x \land y \implies z$ * Negative clauses * OR of negative literals * $\overline{u} \lor \overline{v}$ * Greedy solution: Only assign a variable to true if you have to * Deal with negative clauses by "not looking at them" * General SATs is NP hard (no polynomial time) ## Set Cover * Input: $B = \{1, 2, \ldots n\}$, $S = S_1, S_2, \ldots S_m \subseteq B$ * Goal: Find fewest sets that union B * Greedy Algorithm (attempt) * Choose set $S_i$ that contains the most elements * Remove elements in $S_i$ in all other sets * Repeat * Number of iterations = number of sets * Property: Let k sets be the best solution * Each set must have size at least $\frac{1}{k}$ sets * In iteration $t$, $n_{t+1} \leq n_t (1 - \frac{1}{k})$ * Induction: $n_t \leq n_0 (1- \frac{1}{k})^t $ * Stop when $n_t < 1$, $t = k \ln n$ * $(1-x) \leq e^{-x}$ * $k \ln n$ is worse than our assumed optimal $k$ * Doesn't work - this is NP hard ## Disjoint Set Path Compression * Link all descendents to new root * If we just do this for every root we find, worst case is still $O(\log n)$ * But future finds are better * $m$ finds takes $O(\log^* n)$ time on average (amortized) * $\log^* \triangleq$ num of itmes it takes for log to get to 1 * How high is the stack of 2s * At most $O(n/2^r)$ internal nodes with rank $r$ * n per group (geometric sum), $\log^*$ groups. Total: $n \log^* n$ * Each path is $O(\log^* n)$, m finds is $O(m\log^* n)$ * Total cost: $O((m+n) \log^* n)$ # Lecture 14: 10/2/19 * Dynamic and Linear Programming are more applicable than divide and conquer ## Longest Path in a DAG * Input: an unweighted DAG * Goal: Find length of longest path * Assume $1, 2, \ldots n$ is a topological sort * Define subproblems: * Let $L[i]$ be length of longest path ending at vertex i * n subproblems for each vertex * Longest path in DAG = $\max \{ L[1], L[2], \ldots L[n]\}$ * Write a recurrence relation for subproblems * Longest path to $L[7] = \max$ of all ancestors of 7 + 1 for the edge * Compute $T[i]$ in topological/linearized order using recurrence relation * Algorithm * Initialize $L[i] = 0, \forall i = 0, 1, \ldots n$ * for $i = 1, 2 \ldots n$ * $L[i] = \max\limits_{j \rightarrow i} \{L[j] + 1\}$ * Return $\max\{L[1], L[2], \ldots L[n]\}$ * Runtime: $O(V+E)$ ## Longest Increasing Subsequence (LIS) * Input: array $A$ of $n$ integers * Goal: Find LIS * Idea: Create a DAG where each number is a vertex * Add an edge from $i \rightarrow j$ if $A[i] < A[j], i < j$ * Longest path in this DAG = LIS * Runtime: $O(n^2)$ # Lecture 15: 10/4/19 ## Knapsack Problem * Input: Set of N items with (weight, value) * Goal: Find a subset items that maximizes value and < max weight W * If repetition is allowed, same as "longest path in a DAG" * Attempt 1: Without A, find optimal knapsack for $W-w_A$ * But this doesn't prevent repetitions * Attempt 2: $K[w, F]$ = optimal knapsack with total weight $\leq w$ where F is forbidden elements. * Too many subproblems b/c too many sets of forbidden elements ## Knapsack Solution * $K[w,i] = $ optimal knapsack with total weight $\leq w$ that uses elements from $\{1, \ldots i\}$ * $K[w,i] = \max(V_i + K[w-w_i, i-1], K[w, i-1])$ * Max of knapsack that contains i and does not contain i * $K[w,n]$ is answer * Have a 2 dimensional DAG of subproblems * Calculate subproblems in a double for loop ### Pseudocode ```python= # Initialize with 0s for i = 1 to n: K[0, i] = 0 # Note you could switch the order of the loops for i = 1 to n: for w = 1 to W: K[w, i] = max(V_i + K[w-w_i, i-1], K[w, i-1]) return K[W,n] ``` # Lecture 16: 10/7/19 ## Min Edit Distance * Input: 2 strings $x[1 \ldots n]$, $y[1 \ldots m]$ * Goal: Min num of operations/keystrokes to edit x into y * Operations: Delete, insert, substitute * Subproblem: $E[i,j] =$ edit distance to go from $x[1 \ldots i]$, $y[1 \ldots j]$ * Recurrence relation: $E[i,j] = \min \begin{cases} 1 + E[i, j-1] & & \text{Insertion}\\ 1 + E[i-1, j] & & \text{Deletion}\\ 1 + E[i-1, j-1] & x[i] \neq y[j] & \text{Substitution}\\ E[i-1, j-1] & x[i] = y[j] & \text{Carry over} \end{cases}$ * Answer: $E[n,m]$ * Runtime: $O(n^2)$ ### Pseudocode ```python= int E[m,n] for i=0 to m: E[i,0] = i for j=0 to n: E[0,j] = j for i = 1 to m: for j = 1 to n: E[i,j] = recurrence relation return E[n,m] ``` ## Gambling * Play $n=500$ games in a casino * Game A: $\begin{cases} 1/2 & \text{Earn 2}\\ 1/2 & \text{Lose 2} \end{cases}$ * Game B: $\begin{cases} 2/3 & \text{Earn }5\\ 1/3 & \text{Lose }5 \end{cases}$ * Goal: Succeed if you win exactly $170 after n games * Given $m$ money and $l$ games left, compute optimal strategy * $P[m, l] = $ Probability of success for optimal strategy starting with m money and l games left * Base case: $P[m,0] = \max \begin{cases} 1 & m = 170\\ 0 & \text{Everywhere else} \end{cases}$ * $P[m,l] = \max \begin{cases} \frac{1}{2}P[m+2, l-1] + \frac{1}{2}P[m-2, l-1] & \text{Game A}\\ \frac{2}{3}P[m+5, l-1] + \frac{1}{3}P[m-5, l-1] & \text{Game B} \end{cases}$ * Answer: Return $P[0, n]$ * Solve subproblems in increasing order of $l$ # Lecture 17: 10/14/19 ## Min Triangulation * Input: $n$ points that form a convex polygon * Goal: Minimize cost triangulation (total perimeter of all triangles) * Subproblems: * Naive: Don't want exponential number of subproblems. There is 1 convex polygon for each subset of points * If we look at an edge $(v_i, v_{i+1})$, it must be in some triangle * Ex. $(1,2)$ can be a part of triangle $(1,2,n)$ * Creates 2 other polygons * There are $O(n^2)$ subproblems * $T[i,j] =$ min cost of triangulating the convex polygon formed by $(v_i, v_{i+1}, \ldots v_j)$ * Answer: $T[1, n]$ * Recurrence relation: $T[i, j] = \min\limits_k (T[i, k] + T[k, j] + Perimeter(i, j, k))$ ## Formulating subproblems * Strings: * Prefix, suffix * 2 strings: * (Prefix of x, Prefix of y) * (Prefix of x, Suffix of y) * Triangulaton * Sections: $n^2$ sections of $i \rightarrow j$ * Trees * Rooted subtrees * TSP * All possible subsets # Lecture 18: 10/16/19 ## All-Pair Shortest Paths (DP) * Input: Graph G with edge weights * Goal: Compute shortest paths between all pairs of vertices * Subproblem: $D[i,j, k]=$ length of shortest path from $i$ to $j$ which only uses $k \in \{1, \ldots k\}$ as intermediate vertices * Shortest path from i to j: $D[i,j,n]$ * Recurrence relation: Either vertex k is on the shortest path or not (then every intermediate vertex is on the shortest path) * $D[i,j,k] = \min \begin{cases} D[i,j, k-1] & \text{Does not use vertex k} \\ D[i, k, k-1] + D[k, j, k-1] & \text{Uses vertex k} \end{cases}$ ## Linear Programming * Family of optimization problems * Can be solved efficiently * Variables: $x_1 \ldots x_n \in \mathbb{R}$ * Max/min linear function subject to linear constraints * Feasible region: range of variables that satisfy constraints * Fact 1: this is a convex polygon/polyhedron * Fact 2: The optima of a linear program occurs at a vertex of a polygon/polyhedron * Fact 3: Corner/vertex of a polygon is given by the intersection of constraints * Convex: A subset of points $S \subseteq \mathbb{R}^d$ is convex if $\forall a,b \in S$ ## Coffee Shop * 1 cappucino = 3 coffee, 1 milk * 1 latte = 2 coffee, 4 milk * Available: 6 coffee, 8 milk * Goal: Maximize num of drinks (positive real numbers) = $x + y$ * x = num of cappucinos * y = num of lattes * $3x+2y \leq 6$, $1x + 4y \leq 8$ # Lecture 19: 10/18/19 ## Convex Optimization * Draw a convex body of feasible region(satisfy inequality) * **Isocline**: All points have same value on hyperplane * Optimal at an intersection of boundaries * If there are n variables, m constraints, there are $\binom{n}{m}$ possible vertices (n constraints define a point) * Finite but exponential in number of vertices ## Simplex * Start at vertex, move to better neighboring vertex * Combine some equations to form the upper bound objective function * If we can create an upper bound objective value then this is an upper bound * There is always a way to prove optimality * Figuring out which ones to add - duality * Simplex runtime is not polynomial but fast in practice ## Carpet Production * Demands: $d_1, d_2, \ldots d_{12}$, range: $440-920$ * 30 employees, 20 carpets/month, 2000/month * Overtime: 80% extra, at most 30% for 1 employee * Hiring/firing cost: 320/400 * Storage: 8/carpet, no storage at end of the year * Variables: * $w_i$ - workers in month $i$ ($w_0$ = 30) * $x_i$ - carpets made in month $i$ * $o_i$ - overtime carpets in month $i$ * $h_i, f_i$ - hired/fired in month $i$ * $s_i$ - stored at end of month $i$ ($s_{12}=0$) * Constraints: * Nonnegative: $w_i, x_i, o_i, h_i, f_i, s_i \geq 0$ * Production: $x_i = 20w_i + o_i$ * Employment: $w_i = w_{i-1} + h_i - f_i$ * Inventory: $s_i = s_{i-1} + x_i - d_i$ * Regulations: $o_i \leq 6w_i$ * Objective: $\min 2000\sum_i w_i + 320 \sum_i h_i + 400 \sum_i f_i + 8 \sum_i s_i + 180 \sum_i o_i$ ## Bandwidth * A network of users A,B,C * Variables: * $X_{AB}$ - flow along path AabB ## Reduction to Standard Form * Maximization to minimization? * Multiply objective function by $-1$ * Less than into greater than? * Multiply by $-1$ * Inequalities and equalities * $\sum_i a_i x_i \leq b$ into equality * $\sum_i a_i x_i + s = b, s> 0$ * $\sum_i a_i x_i = b$ into inequalities * $\sum_i a_i x_i \leq b$ and $\sum_i a_i x_i \geq b$ * Simulate unrestricted variable x with positive variables * Can also write in matrix form # Lecture 20: 10/21/19 ## Duality * Primal LP: * $\max c^T x$ * $Ax \leq b, x \geq 0$ * Dual LP: * $\min y^T b$ * $y^T A \geq c$ * $y \geq 0$ * If a linear program has a bounded value, its dual is bounded with the same value * Given $A,b,c$ and feasible solutions x,y * Solutions x and y are both optimal iff $x_i(c_i - (y^TA)_i = 0)$ and $y_j(b_j - (Ax)_j)$ ## Simplex Algorithm * Start at origin * If there is a positive $c_i$, increase $x_i$ until you hit a new constraint * Try all constraints, see which stops you first * If all $c_i \leq 0 \implies x_i$ decreases value (optimal) * Once you leave the origin, new coordinates = dist from new tight constraints * For constraint i: $y_i = b_i - a_i x$, with x at $(y_1, y_2)$ in new coordinates * Rewrite in coordinate system for tight steps * Go to tight constraint along improving constraints * If all coeffs are negative, optimal. ## Origin not feasible * Introduce positive variables $z_i$ for inequality i, * Constraints: $|a_ix - z_i \leq b_i$ * $\max \sum - z_i$ * Degenerate vertices: intersection of more than 2 constraints * Bland's anticycle * Just move the lines a little bit * Unboundedness: * Runtime: * Optimality - $O(n)$ * Find tight constraints: $O(m)$ total * New coordinate systems (rewriting LP) * $y_i = b_i - a_ix$ solve for $x_i$ interms of $y_i$ * Only 1 new constraint to update * $O(mn)$ time to update LP * Can be exponential simplex steps but fast i ### Dual Solution * Negation of coefficients of new function # Lecture 21: 10/23/19 ## Max Flow * Input: Flow network G, source s, sink $t \in V$, capacities $c_e > 0$ * Find flow $f_e$ * Capacity constraints: $0 \leq f_e \leq c_e$ * If $u \neq s, t \implies \sum_{(w,u)} f_{wu} =\sum_{(u,w)} f_{uw}$ * Maximize size$(f) = \sum_{(s,u)} f_{su} $ * S-T cut: Capacity is total capacity of edges from S to T * Add to flow variables along the path, add flow to variables along path, update remaining capacity * Exists an integer solution * Residual capacity * $r_e = c_e - f_e$, $r_{uv} = f_{vu}$ # Lecture 22: 10/25/19 ## Optimality of Max Flow * $S-T$ cut: splits S and T * Exists an integer vertex solution * Find $S-T$ path with remaining capacity * Add flow variables along path (backwards arc) * Update remaining capacity (forwards arc) * Optimality * Capacity of any S-T cut is an upper bound on the flow * Flow out of S is flow out of s * Flow into T is flow into t * Flow of S into T is sum of flows across edges of S to T * At termination, no path has residual capacity between cut * DFS from S does not reach T gives min cut * Max flow is the min $S-T$ cut ## Edmonds-Karp * Augment along shortest path * BFS with $O(VE)$ augmentations * Flow values do not matter in runtime * $O(VE)$ removals, $O(VE^2)$ total time ## Maximum Matching * Input: Bipartite graph $G(L, R, E)$ * Goal: Find largest subset of 1-1 matches * Reduces to Max-Flow * Add a source on left (S), sink on right (T) * Run Max-Flow where each flow capacity is 1 * Directly solve: Augmenting Alternating Paths * Start at unmatached nodes: * Follow unmatched edges * follow matched * Repeat until an unmatched node # Lecture 23: 10/28/19 ## Strategic Games * N players, each player has a strategy set $\{S_1, \ldots S_N\}$ * Vector valued playoff function: $u(s_1, \ldots s_n)$ * Nash Equilibrium: Neither player has incentive to change strategies ## Zero Sum Games * $m x n$ payoff matrix A with row plays $i$, column plays $j$ * Row mixed strategy: $x = (x_1, \ldots x_m)$ * Column mixed strategy: $y = (y_1, \ldots y_n)$ * Payoff: $p(x,y) = x^TAy = \sum_i \sum_j x_ia_{i,j}y_j$ * Row maximizes, column minimizes * Equilibrium pair: $(x^*, y^*)$ * No better row strat against column player and vice versa * $\max\limits_i A^{(i)} \cdot y^*$ * $R = \min\limits_y \max\limits_x(x^TAy)$ * Row player sees column strategy so R has advantage * $C = \max\limits_x \min\limits_y(x^TAy)$ * Column player sees row strategy so C has advantage * Strong Duality - Doesn't matter who plays first since equilibrium point * Mixed strategy: Each player plays over distribution of strategies * Pure strategy: Each player plays a single strategy * Both only play optimal strategies (like complementary slackness) * Use a probability distribution to limit opponent playoff ## Zero Sum Games LP * Row player minimizes C * $C = \max z$ * $\forall i, a^{(i)} \cdot x \geq z$ * $\sum\limits_i x_i = 1$ * $x_i \geq 0$ * Column player maximizes R * $R = \min z$ * $\forall i, a^{(i)} \cdot y \leq z$ * $\sum\limits_i y_i = 1$ * $y_i \geq 0$ ## Circuit Evaluation * Input: DAG of boolean gates: AND, Or, NOT * $0 \leq x_g \leq 1$ * Create inequalities to force gates to work correctly # Lecture 24: 10/30/19 ## Expert Predictions Problem * $n$ experts, each offers a prediction each day * Alg 1: Choose one of the perfect experts * Mistake bound: $n-1$ * Alg 2: Go with majority of previously correct experts * We don't need to find the perfect expert * Mistake bound: $\log n$ * Number of perfect experts drops by a factor of 2 when an algorithm makes a mistake * Don't always have an infallible expert ## Multiplicative Weights Algorithm * Penalize inaccurate experts * 1) $w_i = 1$ * 2) Predict with weighted majority of experts * 3) $w_i \rightarrow w_i/2$ if wrong * Best expert makes $m$ mistakes * Potential function: $\sum_i w_i = n$ * For best expert b, $w_b \geq \frac{1}{2^m}$ * For each mistake, each incorrect expert weight is multiplied by $\frac{1}{2}$ * Total sum of incorrect experts is reduced by $\frac{1}{2}$ and this is the majority * So the total weight decreases by $\geq \frac{1}{4}$ * Mistake $\rightarrow$ potential function decreases by factor of $\frac{3}{4}$ * $$ \frac{1}{2^m} \leq \sum_i w_i \leq n\left( \frac{3}{4}\right)^M$$ * $m$ - best expert mistakes, $M$ - algorithm mistakes * Take log of both sides, solve for M * $M \leq (m + \log n)/ \log(4/3) \leq 2.4(m + \log n)$ * Using $(1-\epsilon)$ penalty * $M \leq 2(1 + \epsilon)m + \frac{2 \ln n}{\epsilon}$ * Approaches a factor of 2 of best expert performance as $m \rightarrow \infty$ ## Randomization * $A$ correct on odd days, $B$ correct on even days * Never converges * Choose expert $i$ with probability $\propto w_i$ * For $\epsilon \leq 1, x \in [0,1]$ * $(1-\epsilon)^x \leq (1-\epsilon x)$ * For $\epsilon \in [0, \frac{1}{2}]$ * Expert i loses $l_i^t \in [0,1]$ in round $t$ * Algorithm: * 1) $w_i = 1$ for expert $i$ * 2) Choose expert $i$ with probability $\frac{w_i}{W}, W = \sum_i w_i$ * 3) $w_i \leftarrow w_i(1-\epsilon)^{l_i^t)}$ * $W(0) = n$ * Best expert b loses $L^*$ total. $W(t) \geq w_b \geq (1-\epsilon)^{L^*}$ * $L_t = \sum_i \frac{w_i l_i^t}{W}$ is expected loss of alg in time $t$ * For $\epsilon \leq \frac{1}{2}, W(t+1) \leq w(t)(1-\epsilon L_t)$ * Loss causes weight loss * No longer a factor of 2 * Can use positive weights or scaled weights $>1$ ## Other Models * What if you only observe the advice of expert you pick * Bandit problem (exploration vs exploitation) * Reinforcement Learning ## Optimization * Minimize a convex function $f(x), x^* \in R^n$ * $x^*$ minimizes $f(x)$ * Gradient descent # Lecture 25: 11/1/19 * For any $\epsilon$ there exists an $\epsilon$ Approximate Equilibrium * Complexity: $T = \frac{\ln n}{\epsilon^2} \rightarrow O(nm\frac{\log n}{\epsilon^2})$ * Basically linear! * LP would be quadratic $O(n^3m)$ * Boosting: Use a weak learner to produce a strong learner using multiplicative weights * Experts are the points, adversary is the weak learner