# [Public] Multiplicative weight update and applications
By Danupon Nanongkai
Notes:
* While I tried to make this note self-contained, its main purpose is to supplement the lecture.
* I usually start with precise mathematical statements, followed by the discussions. If you are lost in the math, look at the discussions for some help.
## Notations
* $\mathbb{R}_+$ and $\mathbb{R}_{\geq 0}$ denote the sets of positive and non-negative reals respectively.
* For every vector $x\in \mathbb{R}^n$, $x_i$ is the $i^{th}$ coordinate of $x$.
* A probability distribution is a vector $p\in [0,1]^n$ such that $\sum_{i=1}^n p_i=1$.
* **Vectors on graphs:** For any graph $G=(V, E)$, let $\mathbb{R}^V$ be the set of $|V|$-dimensional vectors whose coordinates correspond to vertices in $G$. For any $x\in \mathbb{R}^V$ and $v\in V$, $x_v$ denote a real number corresponding to vertex $v$. Similarly, we can define $\mathbb{R}^E$, $[0,1]^V$, $[0,1]^E$, etc.
* For any $x\in \mathbb{R}$, let $exp(x)=e^x$ where $e\approx 2.71828$ is the Euler's number
* Let $\langle u, v\rangle$ be the inner product of two vectos $u$ and $v$, i.e. $\langle u, v\rangle=\sum_i u_iv_i$. We may write $u\cdot v$ instead of $\langle u, v\rangle$.
## 1. The MWU algorithm and its guarantee
*Comment:* The algorithm and theorem below will start to make more sense when we discuss their interpretation. Think of them as mathematical statements for now. This section will be useful for future references.
---
### 1.1 Algorithm MWU($n,T,\epsilon, \rho, \{g^t\}_{t=1}^T$)
*Inputs:*
* Positive integers $n$ and $T$.
* Reals $0<\epsilon\leq 1/2$ and $\rho>0$.
* Vectors $g^1, ..., g^t$ in $[-\rho, \rho]^n$.
* Comment: We use "$g$" as these vectors will be called "gains" later. Note that $t$ is an index, not a power.
*Outputs:* Probability distributions $p^t\in [0,1]^n$ for $t=1, 2, ..., T$.
**Algorithm description:**
* Let $w^1\in \mathbb{R}_+^n$ be $(1,1,1,...)$
* For $t=1, 2, ..., T$:
1. Let $p_i^t=\frac{w_i^t}{\sum_j w_j^t}$
2. For all $i$, $w^{t+1}_i\leftarrow w_i^t \cdot \exp(\epsilon g^t_i/\rho)$.
---
### 1.2 Guarantee
**Definitions:**
* $G:=\sum_t g_i^t$.
* $G_*:=\max_i G_i = \max_i \sum_t g_i^t$.
* $G_A:=\sum_{t=1}^T \langle p^t, g^t\rangle$.
* "$A$" stands for "average". This word will make more sense in the interpretation below.
**Theorem 1.** For any $T\geq \frac{\rho\ln(n)}{\epsilon^2}$, we have $\frac{G_*}{T}\leq (1+\epsilon) \frac{G_A}{T}+\epsilon$.
---
### 1.3 Comments (can skip)
There are many other guarantees, which you may notice when reading elsewhere. Today we will focus only on Theorem 1 to avoid confusions. To give you an idea, below are some guarantees when $g^1, ..., g^t\in [-\rho, \rho]^n$ (instead of to $[0,\rho]^n$ required by Theorem 1):
**Theorem a.** For $\rho=1$ and any $T$, we have $G_*\leq G_A+\epsilon T + \frac{\ln n}{\epsilon}.$
**Corollary b.** For $\rho=1$ and any $T$, we have $G_*\leq G_A+2\sqrt{T\ln n}.$ (The term $2\sqrt{T\ln n}$ is also known as *regret*.)
Observe that all guarantees are in the same spririt: they bound the ratio or difference beween $G_*$ and $G_A$. There are also guarantees that consider $G_{\min}:=\min_i G_i$ instead of $G_*$ ($=\max_i G_i$). For example:
**Theorem a'** For $\rho=1$ and any $T$, we have $G_A\leq G_{\min}+\epsilon T + \frac{\ln n}{\epsilon}$.
---
**How I view the MWU algorithm when used in applications:**
The MWU algorithm interacts with another algorithm $A$. We will use different $A$ in differnt applications.
During the interaction, the MWU algorithm generates probability distributions $p^t\in [0,1]^n$ for $t=1, 2, ..., T$ and $A$ generates $g^1, g^2, ..., g^T$ in $[0,\rho]^n$. (MWU also generates vectors $w^t\in \mathbb{R}_+^n$ but this is only to generate $p^t$.)
* Initially, $p^1=(1/n,1/n, ...)$.
* For any $t$, based on $p^1, ..., p^{t-1}$, algorithm $A$ returns $g^t$.
* Based on $g^t$, a probability distribution $p^{t+1}$ is computed by some "weighted majority" rule. In most applications, knowing this rule is not important, and we only need to know that each $p^t$ can be generated in $O(n)$ time.
Theorem 1 gives an *upper bound* of "the max of $g$" $G_*:=\max_i \sum_t g_i^t$. The upper bound is in the form of "the average" $G_A:=\sum_{t=1}^T \langle p^t, g^t\rangle.$
---
## 2. Interpretation as online decision making
*Comments:* I will discuss two views, starting with a non-conventional one (that fits my own taste better). Any of these or other views that make you feel comfortable with Theorem 1 will suffice.
---
### 2.1 Investment Diversification
In the beginning of every month, you want to invest 1 coin (or, say, thousand Kr) on $n$ investment options. At the end of the month, the $i^{th}$ option gives you some **gain** in the range of $[0, \rho]$ in return. How can you best invest your money?
* **Strategy 1 (fix option):** One strategy is to study all the $n$ options hard, an invest on that option every month. If $g^t_i$ denotes the return from the $i^{th}$ option on the $t^{th}$ month, then the best you can get from this strategy is $G_*=\max_i \sum_t g^t_i$.
* **Strategy 2 (diversify):** Another strategy is to invest the same amount, i.e. $1/n$ coin on every option, on the first month. In the following month, we invest more on options with higher return and less on other options. But how much more?
The MWU algorithm and theorem 1 tell us that there is a way to adjust our investment so that over many months, the return from second strategy is not bad compared to the first strategy.
More formally, if we invest $p_i^t$ coin on the $i^{th}$ option in the $t^{th}$ month, then after $T\geq \frac{\rho\ln(n)}{\epsilon^2}$ months, we have $\frac{G_*}{T}\leq (1+\epsilon) \frac{G_A}{T}+\epsilon$. That is, the averge monthly return from the first strategy ($\frac{G_*}{T}$) is not much more than the averge monthly return from the second strategy, which is $G_A=\sum_{t=1}^T \langle p^t, g^t\rangle/T$.
### 2.2 Randomly following experts
Consider when there are $n$ experts giving advices each day. On each day we can follow one of the experts. On day $t$, if we follow an expert $i$ (e.g. in buying stocks, betting on sports, etc), then we will get a gain of $g_i^t$ at the end of the day. Which expert should we follow.
One strategy is this. In the first day, since we don't know which experts are good or bad, we just follow one randomly. Our *expected* gain would be $\frac{1}{n} \sum_i g^1_i = \langle p^1, g^1\rangle$ (recall that $p^1=(1/n, 1/n, ...)$). On the following days, we use the MWU algorithm to suggest how we should follow the experts, i.e. on day $t$, we follow expert $i$ with probability $p^t_i$. Then, our expected gain on day $t$ is $\langle p^t, g^t\rangle$. Thus, our expected total gain from this strategy is $G_A=\sum_{t=1}^T \langle p^t, g^t\rangle$.
How good is this strategy? Theorem 1 says that this strategy is not bad *compared to the best expert*. Observe that the best expert would gain $G_*=\max_i \sum_t g^t_i$. Theorem 1 says that after $T\geq \frac{\rho\ln(n)}{\epsilon^2}$ days, we have $\frac{G_*}{T}\leq (1+\epsilon) \frac{G_A}{T}+\epsilon$; i.e. the averaged daily gain of the best expert is not much more than our MWU strategy, except some $(1+\epsilon)$ multiplicative and $\epsilon$ additive errors.
### 2.3 Discussions
**Online Decision Making:** The MWU algorithm is useful in a number of situations that involve *online decision making* where we have to choose an action (e.g. which agent to follow) and then receive an outcome ($g^t$). We will see more online algorithms in the future.
## 3. Using the MWU algorithm in applications (can skip)
These are the key points I keep in mind when trying to apply the MWU algorithm in applications (like below). These points might not be clear now, but they might help in your second read and while working on exercises.
* Define vectors $g^t$ (what would be each coordinate?)
* Say something about the average $\langle p^t\cdot g^t\rangle$ in each round.
* If we can bound the average, then over time the max will be not much more than the average
## 4. Application to max-flow
*Comment:* You may ask why we need the algorithm below to solve maxflow when you have learned other (probably faster) algorithms already. The reason is that you can extend the technique below for a harder problem of *multicommodity flow*. Moreover, there is still some hope that the algorithm below can be really fast (this is still an on-going research).
---
**Problem:** We want to find a maximum st-flow in a **unit capacity** directed graphs $H=(V,E)$.
**Assume:** We know the optimal max flow value, denoted by $OPT$. Note that $OPT\leq |E|$ because $H$ has unit capacity. (We can find $OPT$ by a standard binary search trick.)
**Goal:** For any $\epsilon$, we would like to output a flow $f$ of value at least $OPT/(1+O(\epsilon))$ (say, $OPT/(1+2\epsilon)$)
**Definitions:**
* Let $m=|E|$.
* Below we will work with vectors of $m$ dimensions, where each coordinate of these vectors correspond to an edge $e\in E$. To simplify notations, we denote each coordinate of a vector $x$ by $x_e$.
* We will think of a flow $f$ as an $m$-dimensional vector too, where $f_{(u,v)}$ is a flow from $u$ to $v$ for every edge $(u,v)$. We sometimes write $f(u,v)$ instead of $f_{(u,v)}$ to simplify notations.
* Recall that a flow $f$ must satisfy these properties.
* *Flow conservation:* For every vertex $v\in V\setminus \{s,t\}$, $\sum_{(u,v)\in E} f(u,v) = \sum_{(v,u)\in E} f(v,u)$.
* *Capacity constraint:* For every edge $(u,v)$, $f(u,v) \leq 1$.
* Without the capacity constraint, $f$ is called **pseudo-flow**.
* Carefule: Pseudo-flow may be defined differently in other contexts!
### 4.1 Algorithm
Let $\epsilon>0$ be an input parameter. Let $T=\frac{OPT\ln(n)}{\epsilon^2}$
For $t=1, 2, ..., T$ do
1. Define a weighted graph $H^t=(V,E,p^t)$, i.e. $H^t$ is a copy of $H$ with every edge $e$ having weight $p^t_e$.
* E.g., $H^1$ is a copy of $H$ with all edges having weights $1/m$ (recall: $p^1=(1/m, 1/m, ...)$).
2. Find a shortest $st$-path $P^t$ in $H^t$.
3. Let $g^t$ be a pseudo-flow that send $OPT$ flows along $P^t$, i.e. $g^t_e=OPT$ for all $e\in P^T$ and $g^t_e=0$ for other edges.
4. Call the MWU algorithm MWU($m,T,\epsilon, \rho=OPT, \{g^{t'}\}_{t'=1}^t$) to produce $p^{t+1}$.
Let $G=\frac{\sum_{t=1}^T g^t}{T}.$
**RETURN** $f=G/(1+2\epsilon)=\frac{\sum_{t=1}^T g^t}{(1+2\epsilon)T}$.
---
### 4.2 Analysis
* Since each $g^t$ is a psudo-flow of value $OPT$, $f$ clearly has value $OPT/(1+2\epsilon)$.
* Since $g^t$ are **not** flows but pseudo-flows, we need to argue that that $f$ is a flow, i.e. $$\max_{e\in E} f_e\leq 1.$$
* The key is the following claim.
---
**Definition:** For any path $P$ and weight function $p^t$, let $p^t(P)$ denote the weight of $P$, i.e. $p^t(P)=\sum_{e\in P} p^t_e$.
**Claim 1:** For any $t$, the shortest $st$-path $P^t$ has weight $p^t(P^t)\leq 1/OPT$ in $H^t=(V, E, p^t)$.
**Proof sketch:**
* We use a basic fact (without proving it) that a flow of size $OPT$ consists of $OPT$ edge-disjoint $st$-paths $P^*_1, P^*_2, \ldots, P^*_{OPT}$.
* Since $\sum_{e\in E} p^t_e=1$, there must be $P^*_i$ such that $p^t(P^*_i)\leq 1/OPT$.
**QED**
---
**Corollary 1:** For any $t$, $\langle p^t, g^t\rangle\leq 1.$
**Proof:**
* $\langle p^t, g^t\rangle=\sum_{e\in E} p^t_e g^t_e=\sum_{e\in P^t} p^t_e \cdot OPT$ since every $e\in E$ contains a pseudo-flow of $g^t_e=OPT$ if $e\in P^t$ and $g^t_e=0$ otherwise.
* But $\sum_{e\in P^t} p^t_e\leq 1/OPT$ by Claim 1.
**QED**
---
* Now recall Theorem 1: For any $T\geq \frac{\rho\ln(n)}{\epsilon^2}$, we have $\frac{G_*}{T}\leq (1+\epsilon) \frac{G_A}{T}+\epsilon$.
* Note that $\rho\leq OPT$ because $g^t_e\in \{0,OPT\}$.
* So, we can conclude that after $T= \frac{OPT\ln(n)}{\epsilon^2}$ iterations, $\frac{G_*}{T}\leq (1+\epsilon) \frac{G_A}{T}+\epsilon\leq 1+2\epsilon.$
* (Corollary 1 implies that $\frac{G_A}{T}=\frac{\sum_{t=1}^T \langle p^t, g^t\rangle}{T}\leq 1$.)
* Since $f=G/(1+2\epsilon)$, $\max_{e\in E} f_e = \max_{e\in E} \frac{1}{(1+2\epsilon)}\frac{G_e}{T}=\frac{1}{(1+2\epsilon)}\frac{G_*}{T}\leq 1$ as desired.
---
### 4.3 Discussions (can skip)
**Recap:**
* The algorithm is simply this: In each round $t$, we find a shortest path and augment a pseudo-flow $g^t$ of value $OPT$ along this path. The final output is essentially the average of these pseudo-flows.
* Intuitively, it helps to increase the weight of edges in the shortest paths so that the next shortest paths use other edges. We do exactly this via the MWU algorithm, which increases the weight (probability) $p^t_e$ for edges $e$ in the shortest paths.
* Each pseudo-flow $g^t$ may violate the capacity constrint: The "max violation", i.e. $\max_{e\in E} g^t_e$, can be as big as $OPT$. But the "average violation", i.e. $\sum_{e\in E} p^t_eg^t_e$ can be proven small ($\leq 1$).
* Key idea: $g^t$ is augmented along a path $P^t$ of weight $p^t(P^t)\leq 1/OPT$. (See Claim 1)
* The MWU guarantees that over time, the max violation of the "averaged flow" $G=(\sum_t g^t)/T$ will be close to the average violoation, i.e. $\max_e G_e\leq 1+O(\epsilon)$.
* For the final output, we scale down $G$ slightly so that we do not violate the capacity constraint.
**Extensions**
* This technique can be extended to *weighted graphs*.
* The total running time for the above algorithm might not be good if you need $O(m)$ time to find a shortest path. However, there is a possibility to find shortest paths faster using *dynamic algorithms*. This is an active research area. See, e.g., [[BGS21](https://arxiv.org/abs/2101.07149)]. Even though this technique only gives approximate maxflow, there are techniques to turn approx maxflow to *exact* maxflow on directed weighted graphs.
* For an advanced version of maxflow, see [Sherman's approximate maxflow](https://arxiv.org/abs/1304.2077) (This also has something to do with both MWU and tree embedding algorithms)
* This technique can also be extended to a harder problem called *multicommodity flow* where, given a number of source-sink pairs $(s_1, t_1), \ldots, (s_k,t_k)$, we would like to maximum to total $(s_i,t_i)$-flows over all $i$. See, e.g., [[Madry10](https://arxiv.org/abs/1003.5907)]
## 5. Application to bipartite matching
**Problem:** Given a **bipartite graph** $H=(V, E)$, a matching is the set of edges $M\subseteq E$ without common vertices. The *maximum matching* is a matching of maximum size.
**Assume:** We know the size of a maximum matching, denoted by $OPT$.
**Goal:**
* A **fractional matching** is an $m$-dimensional vector $f\in[0,\infty]^m$ where each coordinate corresponds to an edge (denoted by $f_e$ or $f(e)$) that, for every vertex $v$, satisfies $\sum_{(u,v)\in E} f(u,v)\leq 1$.
* Observe: if $f\in \{0,1\}^m$ and satisfies the constraint above, then it is a matching.
* For any $\epsilon$, we would like to output a fractional matching of size at least $OPT/(1+O(\epsilon))$ (say, $OPT/(1+14\epsilon)$)
**Preliminaries:**
* Let $n=|V|$ and $m=|E|$.
* A **maximal matching** is a matching $M\subset E$ such that for any $e\in E\setminus M$, $M\cup e$ is not a matching.
* Note that a maximum matching must be maximal, but a maximal matching might not be a maximum matching.
* Below we will work with vectors of $n$ dimensions, where each coordinate of these vectors correspond to a vertex $v\in V$. To simplify notations, we denote each coordinate of a vector $x$ by $x_v$.
* We will use the following fact without proving it: In any bipartite graph, minimum fractional vertex cover is equal to maximum fractional matching.
* Recall: A fractional vertex cover is a function $c:V\rightarrow [0, \infty]$ such that for every edge $(u,v)$, $c(u)+c(v)\geq 1$.
---
### 5.1 Algorithm
Let $\epsilon\in (0,1/4]$ be an input parameter. Let $T=\frac{\ln(|E|)}{\epsilon^3}.$
**Note:** Below, $p^t,g^t\in \mathbb{R}^V$, $f^t\in \mathbb{R}^E$, and $M^t\in \{0,1\}^E$. We will view $M^t$ as both a vector and a matching (set of edges). We use $|M^t|$ to denote its size when $M^t$ is viewed of a set1.)
For $t=1, 2, ..., T$ do
1. Let $E^t=\{(u,v)\in E\ |\ p^t(u)+p^t(v)<\frac{1}{(1-2\epsilon)OPT}\}$.
* E.g., Since $p^1(u)+p^2(v)=2/n$ for every edge $(u,v)$, $E^1$ is either $E$ or $\emptyset$.
2. Let $M^t\in \{0,1\}^m$ be any maximal matching in $E$.
3. Let $f^t=M^t \times \frac{OPT}{|M^t|}$, i.e. $f^t(e)=\frac{OPT}{|M^t|}$ if $e$ is in $M^t$ and $f^t(e)=0$ otherwise.
* *Comment:* Observe that $\|f^t\|_1:=\sum_{e\in E} f^t=OPT$.
4. For every $v\in V$, let $g^t_v=\sum_{(u,v)\in E)} f^t(u,v)$;
* i.e. $g^t_v=\frac{OPT}{|M^t|}$ if $v$ is matched in $M^t$ and $g^t_v=0$ otherwise.
5. Call the MWU algorithm MWU($m,T,\epsilon, \rho=1/\epsilon, \{g^{t'}\}_{t'=1}^t$) to produce $p^{t+1}\in [0,1]^n$.
**RETURN** $m$-dimensional vector $f=\frac{1}{(1+14\epsilon)T}\sum_{t=1}^T f^t$
---
### 5.2 Analysis
* Since each $f^t$ has value $OPT$, $f$ clearly has value $OPT/(1+14\epsilon)$.
* But $f$ might **not** be a matching, i.e. it is possible that for some vertex $v$, we have $\sum_{(u,v)\in E} f(u,v)>1$. We will show that this is **not** possible, i.e. $\max_{v\in V}\sum_{(u,v)\in E} f(u,v) \leq 1$.
* The key are the following claims. Note that Claim 2 is new while Claim 3 is similar to max flow.
---
**Claim 2:** For every $t$, $|M^t|\geq \epsilon OPT$.
**Proof:**
* Assume for contradiction that $|M^t|<\epsilon OPT$.
* Consider a fractional vertex cover solution $c:V\rightarrow [0,1]$, where $c(v)=1$ if $v$ is matched in $M^t$ and $c(v)=p^t_v \cdot (1-2\epsilon)OPT$ otherwise.
* This is a vertex cover because (1) for every $(u,v)\in E^t$, either $u$ or $v$ is matched in $M$ and (2) for every $(u,v)\not\in E^t$, $p^t(u)+p^t(v)\geq \frac{1}{(1-2\epsilon)OPT}$; so $c(u)+c(v)=(1-2\epsilon)(p^t(u)+p^t(v))\geq 1.$
* Since $\sum_{v\in V} p^t_v=1$, $c$ is a vertex cover of value $(1-2\epsilon)OPT + 2|M^t|<OPT$.
* This contradicts the fact that minimum fractional vertex cover is equal to maximum fractional matching.
**QED**
**Corollary 2:** For any $t$ and $v\in V$, $g^t_v\leq 1/\epsilon$. In other words, $\rho\leq 1/\epsilon$.
**Proof:** Since $g^t_v=\frac{OPT}{|M^t|}$ but $|M^t|\geq \epsilon OPT$ by Claim 2. **QED**
---
**Claim 3:** For any $t$, $\langle p^t, g^t\rangle\leq \frac{1}{1-2\epsilon}.$
**Proof:**
* $\langle p^t, g^t\rangle=\sum_{v\in V} p^t_v g^t_v=\sum_{(u,v)\in M^t} (p^t_u+p^t_v)\frac{OPT}{|M^t|}$ since every *unmatched* $v\in V$ has $g^t_v=0$ and every *matched* $v\in V$ has $g^t_v=\frac{OPT}{|M^t|}$.
* For every $(u,v)\in M^t\subseteq E^t$, $p^t_u+p^t_v<\frac{1}{(1-2\epsilon)OPT}$.
* So, $\langle p^t, g^t\rangle<\sum_{(u,v)\in M^t} \frac{1}{(1-2\epsilon)OPT}\frac{OPT}{|M^t|}=\frac{1}{1-2\epsilon}$
**QED**
---
* Now recall Theorem 1: For any $T\geq \frac{\rho\ln(|E|)}{\epsilon^2}$, we have $\frac{G_*}{T}\leq (1+\epsilon) \frac{G_A}{T}+\epsilon$.
* $\rho\leq 1/\epsilon$ by Corollary 2.
* So, we can conclude that after $T\geq \frac{\rho\ln(|E|)}{\epsilon^2}=\frac{\ln(|E|)}{\epsilon^3}$ iterations, $$\frac{G_*}{T}\leq (1+\epsilon) \frac{G_A}{T}+\epsilon\leq \frac{1+\epsilon}{1-2\epsilon}+\epsilon\leq 1+14\epsilon.$$ (Detail below)
* By Claim 3, $\frac{G_A}{T}=\frac{\sum_{t=1}^T \langle p^t, g^t\rangle}{T}\leq \frac{1}{1-2\epsilon}$.
* $\frac{1}{1-x}\leq 1+3x$ for any $x\in (0, 1/2]$ (e.g. [this plot](https://www.google.com/search?q=plot+%281%2B3x%29-1%2F%281-x%29&sxsrf=ALiCzsYR8C69apXLJs6FksCyUkC9rqT-Dg%3A1654750063573&ei=b3uhYsXHIqyRxc8Pv9u84A8&ved=0ahUKEwiFu-ytyJ_4AhWsSPEDHb8tD_wQ4dUDCA4&uact=5&oq=plot+%281%2B3x%29-1%2F%281-x%29&gs_lcp=Cgdnd3Mtd2l6EAM6BwgAEEcQsAM6BggAEB4QCEoECEEYAEoECEYYAFDWEFjlE2CnF2gBcAF4AIABPYgBeZIBATKYAQCgAQHIAQjAAQE&sclient=gws-wiz)).
* Note: The term $1+14\epsilon$ is an overestimate.
* Thus, $\frac{G_*}{T}\leq 1+14\epsilon$.
* $f=\frac{1}{(1+14\epsilon)T}\sum_{t=1}^T f^t$
* Observe that for every $v\in V$, $$\sum_{(u,v)\in E} f(u,v) = \frac{1}{(1+14\epsilon)T}\sum_{(u,v)\in E} \sum_{t=1}^T f^t(u,v) = \frac{1}{(1+14\epsilon)T} G_v\leq \frac{1}{1+14\epsilon}\cdot \frac{G_*}{T}$$ which is at most $1$ as desired.
---
### 5.3 Discussions (can skip)
**Recap:**
* The algorithm is simply this: In each round $t$, if we think of $p^t\in [0,1]^n$ as a *vertex cover*, we find a set of edges $E^t$ that are *not sufficiently covered*. We then find a maximal matching $M^t$ in this set and scale it up by $OPT/|M^t|$ to get a "pseudo-matching" $f^t$ of value $OPT$. The final fractional matching is essentially the average of these pseudo matchings.
* I am using the word "pseudo-matching" informally here.
* The above pseudo-matching *overload* some vertex horribly as some vertex might be matched by an edge of value $OPT/|M^t|$. We represent this "load" of $v$ by $g^t_v$.
* While $g^t_v$ might be large for some $v$, we argue that the *average* $\sum_{v\in V} p^t_vg^t_v$ is low (Claim 2).
* This is simply because we select maximal matchings from edges $(u,v)$ such that $p^t_u+p^t_v$ is low.
* In addition, we can also argue that $\rho\leq 1/\epsilon$ (Corollary 2), by arguing that $|M^t|$ must be large (Claim 3).
* **This is the main difference from maxflow, where $\rho=OPT$. The key is Claim 3.**
* Finally, we apply Theorem 1 to argue about the *max load* $G_*=\max_{v\in V} G_v$ which is roughly $\frac{G_*}{T}$. Since the load is $\leq 1$ on average, Theorem 1 guarantees that the $\frac{G_*}{T}$ is not much more than $1$ too.
**Matching vs. Maxflow:** It might help to recall that bipartite matching can be solved by max flow. Because of this, the matching algorithm above is similar to the maxflow algorithm in many ways, but there is a twist: instead of finding one edge, which will correspond to finding a flow on a path, it finds a *maximal* matching and argue that it is large (Claim 3). This helps reduce the number of rounds.
**Extesions:** There are many algorithms for computing approximate matchings. An important feature of the above algorithm is that it can be extended to other settings, such as streaming and dynamic settings. See, e.g.
[[Gupta14](https://drops.dagstuhl.de/opus/volltexte/2014/4845/pdf/21.pdf)] and [[Ahn-Guha](https://arxiv.org/pdf/1104.2315.pdf)].
## 5.4 Matching on graph streams (sketch)
**Problem:** We want to design a streaming algorithm that uses $\tilde O(n)$ space to find a matching in an $n$-vertex input bipartite graph whose edges can be read in the streaming manner, i.e. we can read edges $e_1, e_2, ...$ one by one. (Observe: The algorithm does not have enough memory to remember all edges and we cannot read $e_i$ before $e_{i-1}$.)
Moreover, after we finish reading all edges, we can read $e_1, e_2, ...$ again. The number of times we do this is called the **number of passes**.
**Goal:** We would like to find a maximum matching of size at least $OPT/(1+O(\epsilon))$, assuming that we know OPT.
**Algorithm (sketch):** Observe that the algorithm above can be simulated in this situation with $\tilde O(n)$ space in $(\log(n)/\epsilon)^{O(1)}$ passes to get an approximate fractional matching.
**Useful fact:** Given a fractional matching whose positive values are on edge $E'\subseteq E$, there exists a matching of the same value on $E'$.
## 6. Other applications
The MWU algorithm is extremely powerful and has been discovered repeatedly very diverse fields fields. The [wikipedia](https://en.wikipedia.org/wiki/Multiplicative_weight_update_method) mentions machine learning, optimization, theoretical computer science, and game theory. I have already mentioned some applications for maxflor and matching. For further reading, below are some interesting sources.
* Check out Roughgarden's lectures [11](http://timroughgarden.org/w16/l/l11.pdf) & [12](http://timroughgarden.org/w16/l/l12.pdf) for, e.g., The minimax Theorem, Linear Classifiers, and Max-flow.
* Check out [Ghaffari's chapter 8](https://people.inf.ethz.ch/gmohsen/AA21/AAscript.pdf) for, e.g., solving linear programs and oblivious routing.
* The MWU algorithm also plays an important role in computing mincut, e.g. åKarger's near-linear time algorithm](https://arxiv.org/abs/cs/9812007) (in particular, the tree packing algorithm) and [Thorup's dynamic mincut](https://doi.org/10.1007/s00493-007-0045-2).