# Summary of Convergence results for Games ## Bilinear Games $$\min _{\mathbf{x} \in \mathbb{R}^{d}} \max _{\mathbf{y} \in \mathbb{R}^{p}} \mathbf{a^\top x} + \mathbf{x}^{\top} \mathbf{B y} - \mathbf{c^\top y}$$ $$\mathbf{J}=\left[\begin{array}{cc} \mathbf{0} & \mathbf{B} \\ -\mathbf{B}^{\top} & \mathbf{0}\end{array}\right]$$ We define $\kappa=\frac{\sigma_\max(B)}{\sigma_\min(B)}=\frac{\max_{\lambda \in Sp(J)}\Im(\lambda)}{\min_{\lambda \in Sp(J)}\Im(\lambda)}$ --- #### Convergence rate of Extragradient (see Mokhtari et al 2019) **Iteration complexity**: $\mathcal{O}\left(\sigma^2_\min(B)\kappa^2 \cdot \ln \left(\frac{1}{\epsilon}\right)\right)$ --- #### Convergence rate of Optimistic (see Mokhtari et al 2019) **Iteration complexity**: $\mathcal{O}\left(\sigma^2_\min(B)\kappa^2 \cdot \ln \left(\frac{1}{\epsilon}\right)\right)$ --- #### Convergence rate of Negative Momentum (see Gidel et al 2019) **Learning rate**: $\eta \leq \frac{1}{L_{xy}}$ **Convergence rate**: $\Delta_{t} \in O\left(\Delta_{0}\max \left\{\frac{1}{2}, 1-\frac{1}{16 \kappa^2}\right\}^{t} \right)$ --- #### Convergence rate of Hamiltonian method **Lemma**: Hamiltonian is $L_\mathcal{H}$-smooth and $\mu_\mathcal{H}$-quasi-strongly-convex with $L_\mathcal{H}=\sigma_\max(B)^2$ and $\mu_\mathcal{H}=\sigma_\min(B)^2$ **Convergence rate (no momentum)**: $\Delta_{t} \in O\left(\left(1-\frac{1}{2\kappa^2}\right)^{t}\right)$ **Convergence rate (Polyak momentum)**: $\Delta_{t} \in O\left(\left(1-\frac{2}{1+\kappa}\right)^{t}\right)$ *Remark: The optimal method for solving the Hamiltonian problem is optimal for bilinear game (average-case and worst-case) see (Domingo-Enrich et al, 2020).* ### Generalized Bilinear Games $$\min _{x} \max _{y} F(x, y):=f(x)+x^{\top} B y-g(y)$$ where $f(x)$ is $\mu_{x}$ -strongly convex and $g(y)$ is $\mu_{y}$ -strongly convex. The coupling matrix $B$ satisfies $\|B\|_{2} \leq L_{x y}$. **Lower-Bound** (proximal oracles): $\Omega\left(\sqrt{\frac{L_{x y}^{2}}{\mu_{x} \mu_{y}}+1} \cdot \ln \left(\frac{1}{\epsilon}\right)\right)$ see (Zhang et al 2019) ## Quadratic Games $$\min _{\mathbf{x} \in \mathbb{R}^{d}} \max _{\mathbf{y} \in \mathbb{R}^{p}} \frac{1}{2} \mathbf{x}^{\top} \mathbf{A} \mathbf{x}+\mathbf{x}^{\top} \mathbf{B y}-\frac{1}{2} \mathbf{y}^{\top} \mathbf{C} \mathbf{y}$$ $$\mathbf{J}=\left[\begin{array}{cc}\mathbf{A} & \mathbf{B} \\ -\mathbf{B}^{\top} & \mathbf{C}\end{array}\right]$$ **Remarks**: - A & C = 0 => Bilinear game. - If A & C are positive definite => Strongly-convex Strongly-Concave game. ### Convergence rate for Smooth Strongly-Convex Strongly-Concave Game **Assumptions**: $\quad \mu_{\mathbf{x}} \mathbf{I} \preceq \mathbf{A} \preceq L_{\mathbf{x}} \mathbf{I}, \quad \mu_{\mathbf{y}} \mathbf{I} \preceq \mathbf{C} \preceq L_{\mathbf{y}} \mathbf{I}, \quad \|\mathbf{B}\|_{2} \leq L_{\mathbf{x y}}$ We let $L:=\max \left\{L_{\mathbf{x}}, L_{\mathbf{y}}, L_{\mathbf{x y}}\right\}$ and $\mu:=\min \left\{\mu_{\mathbf{x}}, \mu_{\mathbf{y}}\right\}$ and define the condition number $\kappa:=L / \mu$. **Metric**: $\quad \Delta_{t}=\left\|\mathbf{x}_{t}-\mathbf{x}^{*}\right\|_{2}^{2}+\left\|\mathbf{y}_{t}-\mathbf{y}^{*}\right\|_{2}^{2}$ --- #### Convergence rate of Sim-GDA **Step-size**: $\eta=\frac{\mu}{2 L^{2}}$ **Spectral radius**: $\rho\left(\nabla F_{\eta}^{\operatorname{Sim}}\right)=1-\frac{1}{2 \kappa^{2}}$ **Convergence rate**: $\Delta_{t} \in \mathcal{O}\left(\Delta_{0}\left(1-\frac{1}{2 \kappa^{2}}\right)^{t}\right)$ **Iteration complexity**: $\mathcal{O}\left(\kappa^{2}\right)$ --- #### Convergence rate of Alt-GDA **Step-size**: $\eta=\frac{1}{2 L}$ **Spectral radius**: $\rho\left(\nabla F_{\eta}^{\operatorname{Sim}}\right)=1-\frac{1}{2 \kappa}$ **Convergence rate**: $\Delta_{t} \in \mathcal{O}\left(\Delta_{0}\left(1-\frac{1}{2 \kappa}\right)^{t}\right)$ **Iteration complexity**: $\mathcal{O}\left(\kappa\right)$ --- #### Lower-bound from (Azizian et al 2020) **Problem class**: Smooth Srongly Monotone := $\{J : \forall \lambda \in Sp(J), \quad 0<\mu \leq \Re \lambda, \quad |\lambda| \leq L\}$ *Remark: This is a slightly more general than Smooth Strongly-Convex Strongly Concave games.* **Worst-case**: $\rho >\frac{L-\mu}{L+\mu}=1-\frac{2}{\kappa + 1}$ --- #### Lower-bound from (Zhang et al 2019) **Worst-case instance**: $$\min _{x} \max _{y} F(x, y):=\frac{1}{2} x^{\top}\left(B_{x} A^{2}+\mu_{x} I\right) x+\frac{L_{x y}}{2} x^{\top} A y-\frac{1}{2} y^{\top}\left(B_{y} A^{2}+\mu_{y} I\right) y-b^{\top} y$$ with $B_{x}:=\frac{L_{x}-\mu_{x}}{4}$ and $B_{y}:=\frac{L_{y}-\mu_{y}}{4}$ **Special case**: $L = L_x = L_y = L_{xy}$ and $\mu = \mu_x = \mu_y$ => iteration complexity = $\mathcal{O}\left(\kappa \cdot \ln \left(\frac{1}{\epsilon}\right)\right)$ see (Nemirovsky and Yudin 1983). **General case**: $\Omega\left(\sqrt{\frac{L_{x}}{\mu_{x}}+\frac{L_{x y}^{2}}{\mu_{x} \mu_{y}}+\frac{L_{y}}{\mu_{y}}} \cdot \ln \left(\frac{1}{\epsilon}\right)\right)$ *Remark: This holds for non-quadratic games.* --- #### Lower-bound from (Ibrahim et al 2020) **Iteration Complexity** : $\mathcal{O}\left(\sqrt{\frac{L_{x y}^{2}+\mu_{x} \mu_{y}}{\mu_{x y}^{2}+\mu_{x} \mu_{y}}} \cdot \ln \left(\frac{1}{\epsilon}\right)\right)$ ## Strongly-Convex Strongly-Concave Games ### Special Case: $L = L_x = L_y = L_{xy}$ and $\mu = \mu_x = \mu_y$ *Remark:* also works if we consider $L=\max\{L_x, L_y, L_{xy}\}$ and $\mu=\min\{\mu_x, \mu_y\}$ --- #### Convergence rate of Mirror-Prox (see Nemirovsky 2004) **Iteration complexity**: $\mathcal{O}\left(\kappa \cdot \ln \left(\frac{1}{\epsilon}\right)\right)$ --- #### Convergence rate of Extragradient (see Mokhtari et al 2019) **Learning rate**: $\eta = \frac{1}{4L}$ **Convergence rate**: $\Delta_t\in \mathcal{O}\left(\Delta_0\left(1-\frac{c}{\kappa}\right)^{t}\right)$ **Iteration complexity**: $\mathcal{O}\left(\kappa \cdot \ln \left(\frac{1}{\epsilon}\right)\right)$ --- #### Convergence rate of Optimistic (see Mokhtari et al 2019) **Learning rate**: $\eta = \frac{1}{4L}$ **Convergence rate**: $\Delta_t\in \mathcal{O}\left(\Delta_0\left(1-\frac{c}{\kappa}\right)^{t}\right)$ **Iteration complexity**: $\mathcal{O}\left(\kappa \cdot \ln \left(\frac{1}{\epsilon}\right)\right)$ --- #### Convergence rate of Accelerated Dual Acceleration (see Nesterov and Scrimali 2006) **Iteration complexity**: $\mathcal{O}\left(\kappa \cdot \ln \left(\frac{1}{\epsilon}\right)\right)$ **Modified Version**: $\mathcal{O}\left(\sqrt{\frac{L_{x}^{2}}{\mu_{x}^{2}}+\frac{L_{x y}^{2}}{\mu_{x} \mu_{y}}+\frac{L_{y}^{2}}{\mu_{y}^{2}}} \cdot \ln \left(\frac{1}{\epsilon}\right)\right)$ *(not sure where to find this result...)* --- #### Convergence rate of Hamiltonian Gradient Descent **Assumption**: $\|\xi(x)\| \leq L_{1}$ and $\|\nabla \xi(x)\| \leq L_{2}$, and $\|\nabla \xi(x)-\nabla \xi(y)\| \leq L_{3}|| x-y||$ *Remark: This assumption make the Hamiltonian very hard to compare to the rest of the literature... Can we weaken them ? @Nicolas maybe you have results for this ?* **Lemma**: $\mathcal{H}$ is PL with parameter $\alpha = \mu^2$ and is $\left(L_{1} L_{3}+L_{2}^{2}\right)$-smooth. **Convergence rate**: $\left\|\xi_t\right\| \leq\left(1-\frac{\mu^{2}}{L_1L_3+L_2^2}\right)^{t / 2}\left\|\xi_0\right\|$ *Remark: The metric is different due to the fact that the Hamiltonian is only PL, which makes it hard to compare to rest of the literature. The story would be different if the Hamiltonian was also strongly-convex strongly-concave. Maybe start by proving stuff on quadratic games ? my intuition tells me we should be able to characterize the Hamiltonian for quadratic games.* ## Open Ideas and Questions I can see two approach to unifying the theory: 1. Consider a specific problem and look at what is the optimal method by comparing convergence rate on this specific problem 2. Consider a specific method and look at on what class of problem it performs better or achieve optimal convergence rate (Azizian et al 2020) show that nesterov acceleration is optimal for a class of problem with a particular spectral shape. They also show that methods can be seen as operators that modify the spectral shape of the original problem. They can also easily derive lower-bound for specific spectral shape. ## References - [Azizian et al (AISTATS 2020), Accelerating Smooth Games by Manipulating Spectral Shapes](https://arxiv.org/abs/2001.00602) - [Zhang et al (2021), Don't Fix What ain't Broke: Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization](https://arxiv.org/abs/2102.09468) - [Zhang et al (2019), On Lower Iteration Complexity Bounds for the Saddle Point Problems](https://arxiv.org/abs/1912.07481) - Nemirovsky and Yudin (1983), Problem complexity and method efficiency in optimization - [Nemirovsky (2004), Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems](https://www2.isye.gatech.edu/~nemirovs/SIOPT_042562-1.pdf) - [Mokhtari et al (2019), A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach](https://arxiv.org/abs/1901.08511) - [Nesterov and Scrimali (2006), Solving strongly monotone variational and quasi-variational inequalities](http://webdoc.sub.gwdg.de/ebook/serien/e/CORE/dp2006_107.pdf) - [Ibrahim et al (ICML 2020), Linear Lower Bounds and Conditioning of Differentiable Games](https://arxiv.org/abs/1906.07300) - [Gidel et al (AISTATS 2019), Negative Momentum for Improved Game Dynamics](https://arxiv.org/abs/1807.04740) - [Lin et al (COLT 2020), Near-Optimal Algorithms for Minimax Optimization](https://arxiv.org/abs/2002.02417) - [Domingo-Enrich et al (2020), Average-case Acceleration for Bilinear Games and Normal Matrices](https://arxiv.org/abs/2010.02076)