---
tags: math, convex optimization
---
# Convex optimization
## I. Convex set
### 1. Convex set
A set $C$ is called convex if:
$$x, y \in C \Rightarrow \theta x + (1-\theta)y \in C, \forall \theta \in [0,1]$$
In other words, a set $C$ is convex if the line segment between any two points in $C$ lies in $C$.
### 2. Convex combination
A convex combination of the points $x_1, .., x_k$ is a point of the form
$$\theta_1 x_1 + ...+ \theta_k x_k, $$
where $\theta_1 +...+\theta_k = 1$ and $\theta_i \geq 0$.
A set if convex iff it containes every convex combinations of its points.
### 3. Convex hull
The **convex hull** of a set $C$ is the set of all convex combinations of points in $C$.

### 4. Convex cone
A set $C$ is called a **cone** if $x \in C \Rightarrow \theta x \in C, \forall \theta \geq 0$.
A set $C$ is called a **convex cone** if it is convex and a cone, i.e,
$$x_1, x_2 \in C \Rightarrow \theta_1 x_1 + \theta_2 x_2 \in C, \forall \theta_1 \theta_2 \geq 0$$
The point $\sum{\theta_i x_i}, \theta_i \geq 0$ is called **conic combination** of $x_1, ..., x_k$.
A convex hull of a set $C$ is the set of all conic combination of points in $C$

### 5. Hyperplanes and halfspaces
A **hyperplane** is a set of the form $\{x \in \mathbb{R}^n | a^Tx = b\}$ where $a \neq 0, b \in \mathbb{R}$.
A **halfspace** is a set of the form $\{x \in \mathbb{R}^n | a^Tx \leq b\}$ where $a \neq 0, b \in \mathbb{R}$.
Hyperplanes and halfspaces are convex.


### 6. Euclidean balls and ellipsoids
Euclidean ball in $\mathbb{R}^n$ with center $x_c$ and radius $r$:
$$B(x_c, r) = \{x| \|x-x_c\|_2 \leq r \} = \{ x_c + ru | \|u\|_2 \leq 1 \}$$
Ellipsoid in $\mathbb{R}^n$
$$\epsilon = \{x | (x-x_c)^T P^-1 (x-x_c) \leq 1\}$$
where $P \in S_{++}^n$ is symmetric and positive definite matrix.
### 7. Norm balls and norm cones
Norm ball with center $x_c$ and radius $r$: $\{x | \|x-x_x\| \leq r \}$
Norm cone: $C = \{(x,t) | \|x\| \leq t \} \subseteq \mathbb{R}^{n+1}$
### 8. Positive semidefinite cone
$S^n$: the set of symmetric $n \times n$ matrices.
$S^n_{+} = \{X \in S^n | X \geq 0 \}$: symmetric positive semidefinite matrices
$S^n_{++} = \{X \in S^n | X > 0 \}$: symmetric positive definite matrices.
### 9. Operations that perserve convexity
#### a. Intersection
If $S_\alpha$ is convex for every $\alpha \in A$, then $\bigcap_{\alpha \in A} S_\alpha$ are convex.
#### b. Affine function
Suppose $f: R^n \rightarrow R^m$ is an affine function ($f(x) = Ax +b$). Then
- The image of a convex set under $f$ is convex
$S \subseteq R^n$ is convex $\Rightarrow f(S) = \{f(x) | x \in S \}$ is convex.
- The inverse image of a convex set under $f$ is convex
$B \subseteq R^m$ is convex $\Rightarrow f^{-1}(B) = \{x | f(x) \in B \}$ is convex.
#### c. Perspective and linear-fractional function
- Perspective function: $P: R^{n+1} \to R^n$:
$$P(x, y) = \frac{x}{t}, dom P = \{(x, t) | t >0 \}$$
Images and inverse images of convex sets under $P$ are convex.
- Linear-fractional function: $f: R^n \to R^m$:
$$f(x, y) = \frac{Ax+b}{c^Tx+d}, dom f = \{x | c^Tx + d >0 \}$$
Images and inverse images of convex sets under $f$ are convex.
## II. Convex function
### 1. Definition
$f: R^n \to R$ is convex if $dom f$ is a convex set, then
$$f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y)$$
for all $x, y \in dom f$, and $\theta \in [0,1]$.
The definition gives a clear visualization to prove if a function is convex. However, in practice, it's hard to mathematically prove a function is convex using the definition.
- If $f$ is convex, $-f$ is concave.
- $f$ is strictly convex if there is no equality

### 2. Convex function
#### a. Examples on $R$
- Affine: $Ax+b$
- Exponential: $e^{\alpha x}, \forall \alpha \in R$.
- Powers:
$x^a$ on $R_{++}$, for $a \geq 1$ or $a \leq 0$
$-x^a$ on $R_{++}$, for $0 \leq a \leq 1$
- Powers of absolute value: $|x|^p$ on $R$, $p \geq 1$
- Negative log: $-log(x)$ on $R_{++}$
- $xlog(x)$ on $R_{++}$
#### b. Examples on $R^n$
- Every norm on $R^n$ is convex.
- Max function: $f(x) = max\{x_1,...,x_n\}$ is convex.
- Quadratic over linear function: $f(x, y) = \frac{x^2}{y}$ is convex
- Log-sum-exp: $f(x) = log(e^{x_1}+...+e^{x_n})$ is convex on $R^n$
- Geometric mean: $f(x) = (\prod x_i)^{1/n}$ is concave on $domf = R^n_{++}$
- Log determinat: $f(X) = log(det(X))$ is concave on $domf=S^n_{++}$
### 3. First-order conditions
If $f: R^n \to R$ is differentiable, then f is convex iff $domf$ is convex and:
$$f(y) \geq f(x) + \nabla f(x)^T(y-x), \forall x,y \in domf$$
### 4. Second-order conditions
If $f: R^n \to R$ is twice differentiable, then f is convex iff $domf$ is convex and its Hessian is positive definite.
$$\nabla^2 f(x) \geq 0, \forall x \in domf$$
### 5. Operations preserving convexity
#### a. Positive weighted sum and composition with affine function
- $f$ is confex $\Rightarrow \alpha f$ is convex $\forall \alpha \geq 0$
- $f_1, f_2$ are convex $\Rightarrow f_1 + f_2$ is convex (Extend to infinite sums, integrals)
- $f$ is convex $\Rightarrow f(Ax+b)$ is convex
#### b. Pointwise maximum
If $f_1, ..., f_m$ are convex, then $f(x) = max(\{f_1, ..., f_m \})$ is convex
#### c. Pointwise supremum
If $f(x,y)$ is convex in $x$ for each $y \in A$, then $g(x) = \sup_{y \in A}f(x,y)$ is convex
#### d. Minimization
If $f(x,y)$ is convex in $(x, y)$ and $C$ is a convex nonempty set, then $g(x) = \inf_{y \in C} f(x,y)$ is convex
#### e. Composition with scalar (similar with vectors)
Composition of $g: R^n \to R$ and $h: R \to R$:
$$f(x) = h(g(x))$$
$f$ is convex if
- $g,h$ are convex, $\tilde h$ non decreasing.
- $g$ concave, $h$ convex, $\tilde h$ non decreasing.
## III. Duality
### 1. The Lagrangian
We consider the **primal** problem:
$\min f_0(x)$
$s.t. f_i(x) \leq 0, i=1,...m$
$h_i(x) = 0, i=1,..p$
variable $x \in R^n$, domain $D = \bigcap^m_i domf_i \cap \bigcap^p_i domh_i$, optimal value $p^*$
Then, we define **Lagrangian**: $L: R^n \times R^m \times R^p$, with $dom L = D \times R^m \times R^p$
$$L(x, \lambda, \nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x)$$
$\lambda_i$ and $\nu_i$ are called Lagrange multipliers.
### 2. Lagrangian dual function
Lagrangian dual function: $g: R^m \times R^p \to R$,
$g(\lambda, \nu) = inf_{x \in D}L(x, \lambda, \nu) = inf_{x \in D}(f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x)$$)$
Since the dual function is the pointwise infimum of a family of affine functions of $(\lambda, \nu)$, it is always concave, even when the primal problem is not convex.
**Theorem** (do not give proof here):
For any $\lambda > 0$ and $\forall \nu$, we have: $g(\lambda, \nu) \leq p^*$
### 3. The dual problem
$\max g(\lambda, \nu)$
$s.t. \lambda \geq 0$
- Find the best lower bound on $p^*$
- A convex comptimization problem, optimal value denoted $d^*$
- $\lambda, \nu$ are dual feasible if $\lambda \geq 0, (\lambda,\nu) \in dom g$
- $(\lambda^*, \nu^*)$ dual optimal multipliers.
- $p^* - d^*$ is called the **optimal duality gap**.
### 4. Weak and strong duality
- Weak duality: $d^* \leq p^*$, which is always true.
- Strong duality: $d^* = p^*$, which does not hold in general.
### 5. KKT conditions
If strong duality holds, $x$ is primal optimal, $(\lambda, \nu)$ is dual optimal, $f_i, h_i$ are differentiable, then the 4 conditions below must hold:
1. Primal constraints: $f_i(x) \leq 0, i=1,...m, h_i(x) = 0, i=1,..p$
2. Dual constraints: $\lambda \geq 0$
3. Complementary slackness: $\lambda_i^* f_i(x^*) = 0$
4. Gradient of Lagrangian w.r.t. x vanishes:
$$\nabla f_0(x) + \sum_{i=1}^m \lambda_i \nabla f_i(x) + \sum_{i=1}^p \nu_i \nabla h_i(x) =0$$
## IV. Unconstrained minimization
We will discuss methods for solving unconstrained optimization problem
$$\min f(x)$$
where $f: R^n \to R$ is convex and twice differentiable; optimal value $p^* = inf_x f(x)$ is attained and finite.
### 1. Descent methods
The method produces a minimizing sequence $x^{(k)}, k=1,...$, where:
$x^{(k+1)} = x^{(k)} + t^{(k)} \Delta x^{(k)}$

- $\Delta x^{(k)}$ descent direction: $f(x^{(k+1)}) < f(x^{(k)})$
- $t > 0$ is the step size
- $f$ is convex $\Rightarrow \nabla f(x^{(k)})^T \Delta x^{(k)} < 0$
#### a. Exact line search
One line search method sometimes used in practice is exact line search, in which $t$ is chosen to minimize $f$ along the ray $\{x + t\Delta x| t \geq 0\}$:
$$t = argmin_{s \geq 0}f(x+ s\Delta x)$$
#### b. Backtracking line search

### 2. Gradient descent methods
A natural choice for the search direction is the negative gradient $\Delta x = -\nabla f(x)$.

- Stopping criterion: $\|\nabla f(x) \|_2 \leq \epsilon$
### 3. Newton's method
- Newton step $\Delta x_{nt} =- \nabla^2 f(x)^{-1} \nabla f(x)$
- Newton decrement $\lambda(x) = (\nabla f(x)^T \nabla^2 f(x)^{-1} \nabla f(x))^{1/2}$ is a measure of proximity of $x$ to $x^*$.

## V. Equality constrained minimization
We will discuss the method for solving a convex optimization problem with equality constraints,
$$\min f(x);s.t. Ax=b$$
where $f: R^n \to R$ is convex and twice continuously differentiable. $A \in R^{p \times n}$ with $rankA = p$. Assume that $p^*$ is finite and attained.
Recall Lagrangian, we have optimality condition: $x^*$ if optimal iff $\exists \nu^*$ such that
$\nabla f(x^*) + A^T \nu^*= 0, Ax^*=b$
### 1. Equality constrained convex quadratic minimization
$\min f(x) = (1/2)x^TPx + q^Tx+ r$
$s.t. Ax=b$
Then, the optimality conditions are:
$Ax^* = b; Px^* + q + A^T\nu^* = 0$
which we can write as a matrix form:
$$
\begin{bmatrix}
P & A^T \\
A & 0
\end{bmatrix}
\begin{bmatrix}
x^* \\
\nu^*
\end{bmatrix} =
\begin{bmatrix}
-q \\
b
\end{bmatrix}
$$
- Coefficient matrix is called **KKT matrix**
- KKT matrix is non-singular if:
$Ax = 0, x \neq 0 \Rightarrow x^TPx >0$
$P+A^T >0$
### 2. Newton's method with equality constraints
To derive the Newton step $\Delta x_{nt}$ for the equality constrained problem:
$\min f(x)$
$s.t. Ax = b$
at feasible point $x$, we replace the objective with its second-order Taylor (not Swift) approximation near $x$ to form the problem:
$\min \hat f(x+v) = f(x) + \nabla f(x)^T v + (1/2)v^T \nabla^2 f(x) v$
$s.t. A(x+v) = b$, with variable $v$
We define $\Delta x_{nt}$, the Newton step at x, as the solution of the convex quadratic problem, assuming the associated KKT matrix is nonsingular. We can rewrite the problem in matrix form:
$$
\begin{bmatrix}
\nabla^2 f(x) & A^T \\
A & 0
\end{bmatrix}
\begin{bmatrix}
\Delta x_{nt} \\
w
\end{bmatrix} =
\begin{bmatrix}
-\nabla f(x) \\
0
\end{bmatrix}
$$
Where $w$ is the associated optimal dual variable ($\nu^*$) for the quadratic problem.
- Newton decrement
$\lambda(x) = (\Delta x_{nt} \nabla^2 f(x) \Delta x_{nt})^{1/2} = \|\nabla^2 f(x)^{1/2} \Delta x_{nt} \|$
- The Newton's method with equality constraints is exactly the same as for unconstrained problems. The method is called *feasible descent method*

### 3. Newton's method with infeasible start
$$
\begin{bmatrix}
\nabla^2 f(x) & A^T \\
A & 0
\end{bmatrix}
\begin{bmatrix}
\Delta x_{nt} \\
w
\end{bmatrix} =
\begin{bmatrix}
-\nabla f(x) \\
-Ax + b
\end{bmatrix}
$$
We can develop an extension of Newton's method, using the Newton step $\Delta x_{nt}$ with $x^{(0)} \in domf$, but not necessarily satisfying $Ax^{(0)}=b$

## VI. Interior-point methods
The method is for solving convex optimization problems that include inequality constraints,
$$\min f_0(x) \\ s.t. f_i(x) \leq 0, i = 1,...m \\ Ax = b$$
where $f_i: R^n \to R$ are convex and twice differentiable, and $A \in R^{p \times n}$ with $rank(A)=p < n$. Assume that $p^*$ is finite and attained and the problem is **strictly feasible**, hence strong duality holds.
### 1. Logarithmic barrier
- Reformulate using indicator function:
$$\min f_0(x) + \sum_{i=1}^m I\_(f_i(x)) \\ s.t. Ax = b$$
where $I\_(u) = 0$ if $u \leq 0$ and $\infty$ otherwise.
- We approximate the above problem using logarithmic barrier
$$\min f_0(x) - \frac{1}{t} \sum_{i=1}^m log(-f_i(x)) \\ s.t. Ax = b$$
which is an equality constrained problem; smooth approximation of indicator function; approximation improves as $t \to \infty$

### 2. Logarithmic barrier function
We define $\phi(x) = - \sum_{i=1}^m log(-f_i(x))$. The function is convex ($\nabla^2 \phi(x) >0$).
### 3. Central path
For $t>0$, define $x^*(t)$ to be the solution of
$$\min tf_0(x) + \phi(x) \\ s.t. Ax=b$$
The central path is $\{x^*(t) | t>0 \}$
### 4. The barrier method
