Convex optimization

--- tags: math, convex optimization --- # Convex optimization ## I. Convex set ### 1. Convex set A set $C$ is called convex if: $$x, y \in C \Rightarrow \theta x + (1-\theta)y \in C, \forall \theta \in [0,1]$$ In other words, a set $C$ is convex if the line segment between any two points in $C$ lies in $C$. ### 2. Convex combination A convex combination of the points $x_1, .., x_k$ is a point of the form $$\theta_1 x_1 + ...+ \theta_k x_k, $$ where $\theta_1 +...+\theta_k = 1$ and $\theta_i \geq 0$. A set if convex iff it containes every convex combinations of its points. ### 3. Convex hull The **convex hull** of a set $C$ is the set of all convex combinations of points in $C$. ![](https://i.imgur.com/gKJ3b14.png) ### 4. Convex cone A set $C$ is called a **cone** if $x \in C \Rightarrow \theta x \in C, \forall \theta \geq 0$. A set $C$ is called a **convex cone** if it is convex and a cone, i.e, $$x_1, x_2 \in C \Rightarrow \theta_1 x_1 + \theta_2 x_2 \in C, \forall \theta_1 \theta_2 \geq 0$$ The point $\sum{\theta_i x_i}, \theta_i \geq 0$ is called **conic combination** of $x_1, ..., x_k$. A convex hull of a set $C$ is the set of all conic combination of points in $C$ ![](https://i.imgur.com/0PGfobz.png) ### 5. Hyperplanes and halfspaces A **hyperplane** is a set of the form $\{x \in \mathbb{R}^n | a^Tx = b\}$ where $a \neq 0, b \in \mathbb{R}$. A **halfspace** is a set of the form $\{x \in \mathbb{R}^n | a^Tx \leq b\}$ where $a \neq 0, b \in \mathbb{R}$. Hyperplanes and halfspaces are convex. ![](https://i.imgur.com/tZ4b322.png) ![](https://i.imgur.com/xHCyZHN.png) ### 6. Euclidean balls and ellipsoids Euclidean ball in $\mathbb{R}^n$ with center $x_c$ and radius $r$: $$B(x_c, r) = \{x| \|x-x_c\|_2 \leq r \} = \{ x_c + ru | \|u\|_2 \leq 1 \}$$ Ellipsoid in $\mathbb{R}^n$ $$\epsilon = \{x | (x-x_c)^T P^-1 (x-x_c) \leq 1\}$$ where $P \in S_{++}^n$ is symmetric and positive definite matrix. ### 7. Norm balls and norm cones Norm ball with center $x_c$ and radius $r$: $\{x | \|x-x_x\| \leq r \}$ Norm cone: $C = \{(x,t) | \|x\| \leq t \} \subseteq \mathbb{R}^{n+1}$ ### 8. Positive semidefinite cone $S^n$: the set of symmetric $n \times n$ matrices. $S^n_{+} = \{X \in S^n | X \geq 0 \}$: symmetric positive semidefinite matrices $S^n_{++} = \{X \in S^n | X > 0 \}$: symmetric positive definite matrices. ### 9. Operations that perserve convexity #### a. Intersection If $S_\alpha$ is convex for every $\alpha \in A$, then $\bigcap_{\alpha \in A} S_\alpha$ are convex. #### b. Affine function Suppose $f: R^n \rightarrow R^m$ is an affine function ($f(x) = Ax +b$). Then - The image of a convex set under $f$ is convex $S \subseteq R^n$ is convex $\Rightarrow f(S) = \{f(x) | x \in S \}$ is convex. - The inverse image of a convex set under $f$ is convex $B \subseteq R^m$ is convex $\Rightarrow f^{-1}(B) = \{x | f(x) \in B \}$ is convex. #### c. Perspective and linear-fractional function - Perspective function: $P: R^{n+1} \to R^n$: $$P(x, y) = \frac{x}{t}, dom P = \{(x, t) | t >0 \}$$ Images and inverse images of convex sets under $P$ are convex. - Linear-fractional function: $f: R^n \to R^m$: $$f(x, y) = \frac{Ax+b}{c^Tx+d}, dom f = \{x | c^Tx + d >0 \}$$ Images and inverse images of convex sets under $f$ are convex. ## II. Convex function ### 1. Definition $f: R^n \to R$ is convex if $dom f$ is a convex set, then $$f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y)$$ for all $x, y \in dom f$, and $\theta \in [0,1]$. The definition gives a clear visualization to prove if a function is convex. However, in practice, it's hard to mathematically prove a function is convex using the definition. - If $f$ is convex, $-f$ is concave. - $f$ is strictly convex if there is no equality ![](https://i.imgur.com/VC8JDgH.png) ### 2. Convex function #### a. Examples on $R$ - Affine: $Ax+b$ - Exponential: $e^{\alpha x}, \forall \alpha \in R$. - Powers: $x^a$ on $R_{++}$, for $a \geq 1$ or $a \leq 0$ $-x^a$ on $R_{++}$, for $0 \leq a \leq 1$ - Powers of absolute value: $|x|^p$ on $R$, $p \geq 1$ - Negative log: $-log(x)$ on $R_{++}$ - $xlog(x)$ on $R_{++}$ #### b. Examples on $R^n$ - Every norm on $R^n$ is convex. - Max function: $f(x) = max\{x_1,...,x_n\}$ is convex. - Quadratic over linear function: $f(x, y) = \frac{x^2}{y}$ is convex - Log-sum-exp: $f(x) = log(e^{x_1}+...+e^{x_n})$ is convex on $R^n$ - Geometric mean: $f(x) = (\prod x_i)^{1/n}$ is concave on $domf = R^n_{++}$ - Log determinat: $f(X) = log(det(X))$ is concave on $domf=S^n_{++}$ ### 3. First-order conditions If $f: R^n \to R$ is differentiable, then f is convex iff $domf$ is convex and: $$f(y) \geq f(x) + \nabla f(x)^T(y-x), \forall x,y \in domf$$ ### 4. Second-order conditions If $f: R^n \to R$ is twice differentiable, then f is convex iff $domf$ is convex and its Hessian is positive definite. $$\nabla^2 f(x) \geq 0, \forall x \in domf$$ ### 5. Operations preserving convexity #### a. Positive weighted sum and composition with affine function - $f$ is confex $\Rightarrow \alpha f$ is convex $\forall \alpha \geq 0$ - $f_1, f_2$ are convex $\Rightarrow f_1 + f_2$ is convex (Extend to infinite sums, integrals) - $f$ is convex $\Rightarrow f(Ax+b)$ is convex #### b. Pointwise maximum If $f_1, ..., f_m$ are convex, then $f(x) = max(\{f_1, ..., f_m \})$ is convex #### c. Pointwise supremum If $f(x,y)$ is convex in $x$ for each $y \in A$, then $g(x) = \sup_{y \in A}f(x,y)$ is convex #### d. Minimization If $f(x,y)$ is convex in $(x, y)$ and $C$ is a convex nonempty set, then $g(x) = \inf_{y \in C} f(x,y)$ is convex #### e. Composition with scalar (similar with vectors) Composition of $g: R^n \to R$ and $h: R \to R$: $$f(x) = h(g(x))$$ $f$ is convex if - $g,h$ are convex, $\tilde h$ non decreasing. - $g$ concave, $h$ convex, $\tilde h$ non decreasing. ## III. Duality ### 1. The Lagrangian We consider the **primal** problem: $\min f_0(x)$ $s.t. f_i(x) \leq 0, i=1,...m$ $h_i(x) = 0, i=1,..p$ variable $x \in R^n$, domain $D = \bigcap^m_i domf_i \cap \bigcap^p_i domh_i$, optimal value $p^*$ Then, we define **Lagrangian**: $L: R^n \times R^m \times R^p$, with $dom L = D \times R^m \times R^p$ $$L(x, \lambda, \nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x)$$ $\lambda_i$ and $\nu_i$ are called Lagrange multipliers. ### 2. Lagrangian dual function Lagrangian dual function: $g: R^m \times R^p \to R$, $g(\lambda, \nu) = inf_{x \in D}L(x, \lambda, \nu) = inf_{x \in D}(f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x)$$)$ Since the dual function is the pointwise infimum of a family of affine functions of $(\lambda, \nu)$, it is always concave, even when the primal problem is not convex. **Theorem** (do not give proof here): For any $\lambda > 0$ and $\forall \nu$, we have: $g(\lambda, \nu) \leq p^*$ ### 3. The dual problem $\max g(\lambda, \nu)$ $s.t. \lambda \geq 0$ - Find the best lower bound on $p^*$ - A convex comptimization problem, optimal value denoted $d^*$ - $\lambda, \nu$ are dual feasible if $\lambda \geq 0, (\lambda,\nu) \in dom g$ - $(\lambda^*, \nu^*)$ dual optimal multipliers. - $p^* - d^*$ is called the **optimal duality gap**. ### 4. Weak and strong duality - Weak duality: $d^* \leq p^*$, which is always true. - Strong duality: $d^* = p^*$, which does not hold in general. ### 5. KKT conditions If strong duality holds, $x$ is primal optimal, $(\lambda, \nu)$ is dual optimal, $f_i, h_i$ are differentiable, then the 4 conditions below must hold: 1. Primal constraints: $f_i(x) \leq 0, i=1,...m, h_i(x) = 0, i=1,..p$ 2. Dual constraints: $\lambda \geq 0$ 3. Complementary slackness: $\lambda_i^* f_i(x^*) = 0$ 4. Gradient of Lagrangian w.r.t. x vanishes: $$\nabla f_0(x) + \sum_{i=1}^m \lambda_i \nabla f_i(x) + \sum_{i=1}^p \nu_i \nabla h_i(x) =0$$ ## IV. Unconstrained minimization We will discuss methods for solving unconstrained optimization problem $$\min f(x)$$ where $f: R^n \to R$ is convex and twice differentiable; optimal value $p^* = inf_x f(x)$ is attained and finite. ### 1. Descent methods The method produces a minimizing sequence $x^{(k)}, k=1,...$, where: $x^{(k+1)} = x^{(k)} + t^{(k)} \Delta x^{(k)}$ ![](https://i.imgur.com/eEIjomJ.png) - $\Delta x^{(k)}$ descent direction: $f(x^{(k+1)}) < f(x^{(k)})$ - $t > 0$ is the step size - $f$ is convex $\Rightarrow \nabla f(x^{(k)})^T \Delta x^{(k)} < 0$ #### a. Exact line search One line search method sometimes used in practice is exact line search, in which $t$ is chosen to minimize $f$ along the ray $\{x + t\Delta x| t \geq 0\}$: $$t = argmin_{s \geq 0}f(x+ s\Delta x)$$ #### b. Backtracking line search ![](https://i.imgur.com/GzPgIY3.png) ### 2. Gradient descent methods A natural choice for the search direction is the negative gradient $\Delta x = -\nabla f(x)$. ![](https://i.imgur.com/stKWGjC.png) - Stopping criterion: $\|\nabla f(x) \|_2 \leq \epsilon$ ### 3. Newton's method - Newton step $\Delta x_{nt} =- \nabla^2 f(x)^{-1} \nabla f(x)$ - Newton decrement $\lambda(x) = (\nabla f(x)^T \nabla^2 f(x)^{-1} \nabla f(x))^{1/2}$ is a measure of proximity of $x$ to $x^*$. ![](https://i.imgur.com/iyzgzKf.png) ## V. Equality constrained minimization We will discuss the method for solving a convex optimization problem with equality constraints, $$\min f(x);s.t. Ax=b$$ where $f: R^n \to R$ is convex and twice continuously differentiable. $A \in R^{p \times n}$ with $rankA = p$. Assume that $p^*$ is finite and attained. Recall Lagrangian, we have optimality condition: $x^*$ if optimal iff $\exists \nu^*$ such that $\nabla f(x^*) + A^T \nu^*= 0, Ax^*=b$ ### 1. Equality constrained convex quadratic minimization $\min f(x) = (1/2)x^TPx + q^Tx+ r$ $s.t. Ax=b$ Then, the optimality conditions are: $Ax^* = b; Px^* + q + A^T\nu^* = 0$ which we can write as a matrix form: $$ \begin{bmatrix} P & A^T \\ A & 0 \end{bmatrix} \begin{bmatrix} x^* \\ \nu^* \end{bmatrix} = \begin{bmatrix} -q \\ b \end{bmatrix} $$ - Coefficient matrix is called **KKT matrix** - KKT matrix is non-singular if: $Ax = 0, x \neq 0 \Rightarrow x^TPx >0$ $P+A^T >0$ ### 2. Newton's method with equality constraints To derive the Newton step $\Delta x_{nt}$ for the equality constrained problem: $\min f(x)$ $s.t. Ax = b$ at feasible point $x$, we replace the objective with its second-order Taylor (not Swift) approximation near $x$ to form the problem: $\min \hat f(x+v) = f(x) + \nabla f(x)^T v + (1/2)v^T \nabla^2 f(x) v$ $s.t. A(x+v) = b$, with variable $v$ We define $\Delta x_{nt}$, the Newton step at x, as the solution of the convex quadratic problem, assuming the associated KKT matrix is nonsingular. We can rewrite the problem in matrix form: $$ \begin{bmatrix} \nabla^2 f(x) & A^T \\ A & 0 \end{bmatrix} \begin{bmatrix} \Delta x_{nt} \\ w \end{bmatrix} = \begin{bmatrix} -\nabla f(x) \\ 0 \end{bmatrix} $$ Where $w$ is the associated optimal dual variable ($\nu^*$) for the quadratic problem. - Newton decrement $\lambda(x) = (\Delta x_{nt} \nabla^2 f(x) \Delta x_{nt})^{1/2} = \|\nabla^2 f(x)^{1/2} \Delta x_{nt} \|$ - The Newton's method with equality constraints is exactly the same as for unconstrained problems. The method is called *feasible descent method* ![](https://i.imgur.com/r4fI3F9.png) ### 3. Newton's method with infeasible start $$ \begin{bmatrix} \nabla^2 f(x) & A^T \\ A & 0 \end{bmatrix} \begin{bmatrix} \Delta x_{nt} \\ w \end{bmatrix} = \begin{bmatrix} -\nabla f(x) \\ -Ax + b \end{bmatrix} $$ We can develop an extension of Newton's method, using the Newton step $\Delta x_{nt}$ with $x^{(0)} \in domf$, but not necessarily satisfying $Ax^{(0)}=b$ ![](https://i.imgur.com/h6KAV1E.png) ## VI. Interior-point methods The method is for solving convex optimization problems that include inequality constraints, $$\min f_0(x) \\ s.t. f_i(x) \leq 0, i = 1,...m \\ Ax = b$$ where $f_i: R^n \to R$ are convex and twice differentiable, and $A \in R^{p \times n}$ with $rank(A)=p < n$. Assume that $p^*$ is finite and attained and the problem is **strictly feasible**, hence strong duality holds. ### 1. Logarithmic barrier - Reformulate using indicator function: $$\min f_0(x) + \sum_{i=1}^m I\_(f_i(x)) \\ s.t. Ax = b$$ where $I\_(u) = 0$ if $u \leq 0$ and $\infty$ otherwise. - We approximate the above problem using logarithmic barrier $$\min f_0(x) - \frac{1}{t} \sum_{i=1}^m log(-f_i(x)) \\ s.t. Ax = b$$ which is an equality constrained problem; smooth approximation of indicator function; approximation improves as $t \to \infty$ ![](https://i.imgur.com/bfo7XbN.png) ### 2. Logarithmic barrier function We define $\phi(x) = - \sum_{i=1}^m log(-f_i(x))$. The function is convex ($\nabla^2 \phi(x) >0$). ### 3. Central path For $t>0$, define $x^*(t)$ to be the solution of $$\min tf_0(x) + \phi(x) \\ s.t. Ax=b$$ The central path is $\{x^*(t) | t>0 \}$ ### 4. The barrier method ![](https://i.imgur.com/XNfkFGs.png)