Chapter 4.1. Static Optimization Methods

# Chapter 4.1. Static Optimization Methods ### 4.1.1. Unconstrained Static Optimization Suppose we have a(n at least twice-differentiable) objective function of $n$ variables $$\max_{\mathbf{x} = \left(x_{1}, \ldots, x_{n}\right) \in \mathbb{R}^{n}} f\left(x_{1},\ldots,x_{n}\right). \tag{4.1a}$$ The first-order conditions when maximizing $f$ are: \begin{align} f_{1} &= \dfrac{\partial f}{\partial x_{1}} &= 0, \\ f_{2} &= \dfrac{\partial f}{\partial x_{2}} &= 0, \tag{4.1b}\\ \vdots & & \vdots \\ f_{n} &= \dfrac{\partial f}{\partial x_{n}} &= 0. \end{align} The ==Hessian matrix== associated to our objective function $f$ is $$Hf = \begin{bmatrix} f_{11} & f_{12} & \cdots & \cdots & f_{1n} \\ f_{21} & f_{22} & \cdots & \cdots & f_{2n} \\ \vdots & \vdots & \ddots & & \vdots \\ \vdots & \vdots & & \ddots & \vdots \\ f_{n1} & f_{n2} & \cdots & \cdots & f_{nn} \end{bmatrix},$$ where for $i,j=1,2,\ldots,n$, $f_{ij} = \dfrac{\partial^{2} f}{\partial x_{i}\partial x_{j}}$. The second-order conditions for maximization of the objective function $f$ requires that $Hf$ is an ==negative definite matrix==, i.e., $(-1)^{n}M_{n} > 0$ where $M_{n}$ is the ==$n$-th order leading principal minor== of the $Hf$ given by \begin{align*} M_{1} &= f_{11} \\ M_{2} &= \left| \begin{array}{cc} f_{11} & f_{12} \\ f_{21} & f_{22} \end{array} \right| \\ &\vdots \\ M_{n} &= |Hf| \end{align*} ### 4.1.2. Static Optimization Problems with Equality Constraints Consider the two-variable problem $$\max_{x,y} f(x,y)$$ subject to $g(x,y) = c$. The constraint set (i.e. the set of all pairs $(x,y)$ for which $g(x,y) = c$) is a set of points in $(x,y)$-space. Assume that the functions $f$ and $g$ are at least twice-differentiable. At a solution $(x^{*},y^{*})$ of the problem, the constraint curve is tangent to a level curve of $f$ i.e., $$-\frac{f'_{1}(x^{*},y^{*})}{f'_{2}(x^{*},y^{*})} = -\frac{g'_{1}(x^{*},y^{*})}{g'_{2}(x^{*},y^{*})}$$ or, $$-\frac{f'_{1}(x^{*},y^{*})}{g'_{1}(x^{*},y^{*})} = -\frac{f'_{2}(x^{*},y^{*})}{g'_{2}(x^{*},y^{*})} \tag{4.2}$$ assuming that neither $g'_{1}(x^{*},y^{*})$ nor $g'_{2}(x^{*},y^{*})$ is zero. Now introduce a new variable, $\lambda$, and set it equal to the common value of the quotients: $$\lambda = \frac{f'_{1}(x^{*},y^{*})}{g'_{1}(x^{*},y^{*})} = \frac{f'_{2}(x^{*},y^{*})}{g'_{2}(x^{*},y^{*})}.$$ It might appear that introducing a new variable merely complicates the problem, but in fact it is a clever step that allows the condition for a maximum to be expressed in an appealing way. Given the definition of $\lambda$, the condition for $(x^{*},y^{*})$ to solve the problem may be written as the two equations \begin{align*} f'_{1}(x^{*},y^{*})-\lambda g'_{1}(x^{*}, y^{*}) &= 0 \\ f'_{2}(x^{*},y^{*})-\lambda g'_{2}(x^{*}, y^{*}) &= 0. \end{align*} At a solution of the problem we need, in addition, $c = g(x^{*},y^{*})$ (the constraint is satisfied). Thus, the following conditions must be satisfied at a solution $(x^{*},y^{*})$: \begin{align} f'_{1}(x^{*},y^{*})-\lambda g'_{1}(x^{*}, y^{*}) &= 0 \\ f'_{2}(x^{*},y^{*})-\lambda g'_{2}(x^{*}, y^{*}) &= 0 \tag{4.3}\\ c-g(x^{*},y^{*}) &= 0. \end{align} The first two equations can be viewed conveniently as the first-order conditions for the derivatives of the ==Lagrangian== $$\mathcal{L}(x,y,\lambda) = f(x,y)-\lambda(g(x,y)-c), \tag{4.4}$$ with respect to $x$ and $y$ to be zero. They are known as the first-order conditions for the problem, and $\lambda$ is called the ==Lagrange multiplier==. Suppose we have a function of $n$ variables and we face $m$ constraints, where $m<n$. Our problem is maximizing the following objective function: $$\max_{x_{1}, \ldots, x_{n}} f\left(x_1, \ldots, x_n\right), \tag{4.5a}$$ subject to \begin{align} g^{1}\left(x_{1}, \ldots, x_{n}\right) &= c_{1} \\ g^{2}\left(x_{1}, \ldots, x_{n}\right) & =c_{2} \tag{4.5b}\\ \vdots &= \vdots \\ g^{m}\left(x_{1},\ldots, x_{n}\right) & = c_{m}. \end{align} To solve (4.5a-b), form the Lagrangian by multiplying each constraint equation $g^{j}$ by a different Lagrangian multiplier $\lambda_{j}$ and subtracting them all from the objective function $f$. For $\mathbf{x}=\left(x_{1}, \ldots, x_{n}\right)$ and $\mathbf{\lambda}=\left(\lambda_{1}, \ldots, \lambda_{m}\right)$, we obtain the function of $n+m$ variables $$\mathcal{L}(\mathbf{x}, \lambda)=f(\mathbf{x})-\sum_{j=1}^{m} \lambda_j \left(g^j(\mathbf{x})-c_{j}\right). \tag{4.6}$$ The first-order conditions again require that all partial derivatives of $\mathcal{L}$ be equal to zero at the optimum. Because $\mathcal{L}$ has $n+m$ variables, there will be a system of $n+m$ equations determining the $n+m$ variables $\mathbf{x}^{*}$ and $\lambda^{*}$: \begin{align} \frac{\partial \mathcal{L}}{\partial x_{i}} &= \frac{\partial f\left(\mathbf{x}^{*}\right)}{\partial x_{i}}-\sum_{j=1}^m \lambda_{j}^{*} \frac{\partial g^{j}\left(\mathbf{x}^*\right)}{\partial x_{i}}=0 \mbox{ for } i=1, \ldots, n \tag{4.7}\\ \frac{\partial \mathcal{L}}{\partial \lambda_{j}} &= g^{j}\left(\mathbf{x}^{*}\right)=0 \mbox{ for } j=1, \ldots, m. \end{align} As in the case of unconstrained optimization, second-order conditions require that the associated Hessian matrix is negative definite. ### 4.1.3. Static Optimization Problems with Inequality Constraints We start by considering a general one-variable maximization problem (the analysis of the minimization problem is analogous - it is the same as maximizing $-f(x)$) with a simple non-negativity constraint: \begin{align} \max_{x \in \mathbb{R}} & f(x), \tag{4.8} \\ \mbox{subject to } & x \geq 0. \end{align} With the non-negativity constraint on the domain of solutions, we need to modify the necessary conditions for $x^{*}$ to be a solution of the optimization problem. There are three possibilities: (i) if $x^{*} > 0$ then $x^{*}$ is an ==interior optimum== and we have the familiar result that the derivative of $f$ must be zero at $x^{*}$, i.e., $x^{*}> 0$ implies that $f'(x^{*})=0$; (ii) if $x^{*}$ solves (9.1) and $x^{*}=0$ then $x^{*}$ is a ==boundary optimum== and there are two possibilities for $f'(x^{*})$, i.e., $f'(x^{*}) \leq 0$. Therefore, the first-order necessary conditions for $x^{*} \geq 0$ can be summarized as follows: $$f'(x^{*}) \leq 0 \mbox{ and } x^{*}f'(x^{*}) = 0.$$ The second condition above is often referred to as a ==complementary slackness condition== because it specifies which variable can be non-zero or ==slack== at optimum. We can generalize these observations immediately to an $n$ variable optimization problem. \begin{align} \max_{\mathbf{x} \in \mathbb{R}^{n}} & f(\mathbf{x}), \tag{4.9} \\ \mbox{subject to } & \mathbf{x} = (x_{1},\ldots,x_{n}) \geq (0,\ldots,0) = \mathbf{0}. \end{align} The first-order necessary conditions for $\mathbf{x}^{*} = (x_{1}^{*},\ldots,x_{n}^{*})$ solves $(4.2)$ are \begin{equation} f_{i}(\mathbf{x}^{*}) = \dfrac{\partial f(\mathbf{x}^{*})}{\partial x_{i}} \leq 0 \mbox{ and } x_{i}^{*}f_{i}(\mathbf{x}^{*}) = 0 \mbox{ for all } i=1,2\ldots,n. \tag{4.10a} \end{equation} or, since for all $i=1,\ldots,n$, $x_{i}^{*} \geq 0$ and $f_{i}(\mathbf{x}_{i}^{*}) \leq 0$, (4.10a) is equivalent to \begin{equation} f_{i}(\mathbf{x}^{*}) = \dfrac{\partial f(\mathbf{x}^{*})}{\partial x_{i}} \leq 0 \mbox{ for all } i=1,2\ldots,n, \mbox{ and } \sum_{i=1}^{n} x_{i}^{*}f_{i}(\mathbf{x}^{*}) = 0 \tag{4.10b} \end{equation} which in turn is equivalent to \begin{equation} \nabla f(\mathbf{x}^{*}) \leq 0 \mbox{ and } \mathbf{x}^{*} \cdot {\nabla f(\mathbf{x}^{*})}^{T} = 0 \tag{4.10c} \end{equation} where $\nabla f(\mathbf{x}^{*}) = \left(f_{1}(\mathbf{x}^{*}),\ldots,f_{n}(\mathbf{x}^{*})\right)$ is the ==gradient vector== of $f$ at $x^{*}$. In case of a minimization problem, the inequalities in (4.10a-c) will be reversed. Note that the conditions in (4.10) are only necessary for maximization; the usual second-order conditions for sufficiency apply. Thus, for each $i$, the condition ensures either $x_{i}^{*} > 0$ and $f_{i}(\mathbf{x}^{*}) = 0$, or $x_{i}^{*} = 0$ and $f_{i}(\mathbf{x}^{*}) \leq 0$. This condition must hold for every \(i = So far, we worked with simple non-negativity constraints on the domain of solutions. Now, we are in a position to analyse constrained optimization problems that involve functional inequalities. \begin{align} \max_{\mathbf{x} \in \mathbb{R}^{n}} & f(\mathbf{x}), \\ \mbox{subject to } & g(\mathbf{x}) \geq 0, \tag{4.11} \\ & \mathbf{x} \geq 0. \end{align} We convert the problem in (4.11) to a problem with equality constraints by introducing a ==slack variable== $s \geq 0$ as follows: \begin{align} \max_{\mathbf{x} \in \mathbb{R}^{n}} & f(\mathbf{x}), \\ \mbox{subject to } & g(\mathbf{x})-s = 0, \tag{4.12} \\ \mathbf{x} \geq 0 &\mbox{ and } s \geq 0. \end{align} The associated Lagrangian of the optimization problem in (4.12): $$\mathcal{L}(x, s, \lambda)=f(x)+\lambda\left(g(x)-s\right), \tag{4.13}$$ where $\lambda$ is the Lagrange multiplier. A necessary condition for $\left(\mathbf{x}^{*}, s^{*}\right)$ to solve problem (4.13) is that there exists a $\lambda^{*}$ such that: $$ \begin{align} \mathcal{L}_{i}\left(\mathbf{x}^{*}, s^{*}, \lambda^{*}\right) &= f_{i}\left(\mathbf{x}^{*}\right)+\lambda^{*} g_{i}\left(\mathbf{x}^{*}\right) \leq 0 \quad \text { for all } i=1,2, \ldots, n, \\ \sum_{i=1}^n x_{i}^{*} \mathcal{L}_i\left(\mathbf{x}^{*}, s^{*}, \lambda^{*}\right) &= \sum_{i=1}^n x_{i}^{*}\left(f_{i}\left(\mathbf{x}^{*}\right)+\lambda^{*} g_{i}\left(\mathbf{x}^{*}\right)\right) = 0, \\ \mathcal{L}_{s}\left(\mathbf{x}^{*}, s^{*}, \lambda^{*}\right) &= -\lambda^{*} \leq 0, \tag{4.14} \\ s^* \mathcal{L}_s\left(\mathbf{x}^{*}, s^{*}, \lambda^{*}\right)&= -s^{*} \lambda^{*} = 0, \\ \mathcal{L}_{\lambda}\left(\mathbf{x}^{*}, s^{*}, \lambda^{*}\right) &= g\left(\mathbf{x}^{*}\right)-s^{*} = 0, \\ s^{*} \geq 0 \quad \text{ and } \quad &x_{i}^{*} \geq 0 \quad \text{ for all } i=1,2, \ldots, n . \end{align} $$ The conditions in (4.14) allow us to eliminate the slack variable $s^{*}$. From the third expression in (4.14) we have $\lambda^{*} \geq 0$, and from the fifth expression we have that $g\left(\mathbf{x}^{*}\right)=s^{*} \geq 0$. Combining these two results with the fourth expression in (4.14) gives us $\lambda^{*} g\left(\mathbf{x}^{*}\right)=0$. Replacing the expressions in (4.14) with the ones we just derived (which excludes $s^{*}$ ) and using $\mathcal{L}$ for the Lagrangian, provides the following necessary conditions for $\mathbf{x}^{*} \in \mathbb{R}_{+}^{n}$ to solve the functionally constrained problem (4.12): $$ \begin{align} \mathcal{L}_{i}\left(\mathbf{x}^{*}, \lambda^{*}\right) & =f_{i}\left(\mathbf{x}^{*}\right)+\lambda^{*} g_{i}\left(\mathbf{x}^{*}\right) \leq 0 \quad \text { for all } i=1,2, \ldots, n, \\ \sum_{i=1}^n x_{i}^{*} \mathcal{L}_i\left(\mathbf{x}^{*}, \lambda^{*}\right) & =\sum_{i=1}^n x_{i}^{*}\left[f_i\left(\mathbf{x}^{*}\right)+\lambda^{*} g_{i}\left(\mathbf{x}^{*}\right)\right]=0, \\ \mathcal{L}_\lambda\left(\mathbf{x}^{*}, \lambda^{*}\right) & =g\left(\mathbf{x}^{*}\right) \geq 0, \tag{4.15} \\ \lambda^* \mathcal{L}_\lambda\left(x^{*}, \lambda^{*}\right) & =\lambda^* g\left(x^*\right)=0, \\ \lambda^* \geq 0 \quad & \text { and } \quad x_{i}^{*} \geq 0 \quad \text { for all } i=1,2, \ldots, n, \end{align} $$ where $\mathcal{L}(\mathbf{x},\lambda)=f(\mathbf{x})+\lambda g(\mathbf{x})$. We are now in a position to extend the argument to the more general case of $m$ functional constraints. In the general $m$-constraint problem, there are $n$ variables and $m$ functional constraints with no necessary relationship between $n$ and $m$. The general problem is \begin{align} & \max_{\mathbf{x}} f(\mathbf{x}) \tag{4.16a}\\ & \text {subject to: } g^{1}(\mathbf{x}) \geq 0, \\ & g^{2}(\mathbf{x}) \geq 0, \\ & \vdots \tag{4.16b}\\ & g^{m}(\mathbf{x}) \geq 0, \\ & \mathbf{x} \geq \mathbf{0}, \end{align} The Lagrangian function $\mathcal{L}$ for the general problem (4.16) is $$\mathcal{L}(\mathbf{x},\mathbf{\lambda})=f(\mathbf{x})+\sum_{j=1}^{m} \lambda_{j} g^{j}(\mathbf{x})=f(\mathbf{x})+\lambda^{T} g(\mathbf{x}) \tag{4.17}$$ where $\mathbf{\lambda}^{T}=\left(\lambda_1, \lambda_2, \ldots, \lambda_m\right)$ is the $m$-dimensional row vector of Lagrange multipliers (one for each constraint function) and $g(\mathbf{x})$ is the $m$-dimensional column vector of constraint functions with $[g(\mathbf{x})]^T=\left(g^{1}(\mathbf{x}), g^{2}(\mathbf{x}), \ldots, g^{m}(\mathbf{x})\right)$. The main result of the theory of nonlinear programming is that if $\mathbf{x}^{*}$ is a solution to (4.16), then there must exist a $\mathbf{\lambda}^{*} \in \mathbb{R}_{+}^{m}$ such that the following ==Karush-Kuhn-Tucker (KKT) conditions== hold: \begin{gather} \mathcal{L}_{i}\left(\mathbf{x}^{*}, \mathbf{\lambda}^{*}\right) =f_{i}\left(\mathbf{x}^{*}\right)+\sum_{j=1}^{m} \lambda_{j}^{*} g_{i}^{j}\left(\mathbf{x}^{*}\right) \leq 0 \quad \text { for all } i=1,2, \ldots, n, \tag{4.18a} \\ x_{i}^{*} \mathcal{L}_{i}\left(\mathbf{x}^{*}, \mathbf{\lambda}^{*}\right) =x_{i}^{*}\left[f_{i}\left(\mathbf{x}^{*}\right)+ \sum_{j=1}^m \lambda_{j}^{*} g_{i}^{j}\left(\mathbf{x}^{*}\right)\right]=0 \quad \text { for all } i=1,2, \ldots, n, \tag{4.18b} \\ \mathcal{L}_{\lambda_j}\left(\mathbf{x}^{*}, \mathbf{\lambda}^{*}\right) =g^{j}\left(\mathbf{x}^{*}\right) \geq 0 \quad \text { for all } j=1,2, \ldots, m, \tag{4.18c} \\ \lambda_{j}^{*} \mathcal{L}_{\lambda_{j}}\left(\mathbf{x}^{*}, \mathbf{\lambda}^{*}\right) =\lambda_{j}^{*} g^{j}\left(\mathbf{x}^{*}\right)=0 \quad \text { for all } j=1,2, \ldots, m. \tag{4.18d} \end{gather} $\lambda_{j}^{*} \geq 0 \quad$ for all $j=1,2, \ldots, m \quad$ and $\quad x_{i}^{*} \geq 0 \quad$ for all $i=1,2, \ldots, n$. The conditions in (4.18a) and (4.18c) suggest that the Lagrangian function is maximized with respect to the variables in $\mathbf{x}$ and minimized with respect to the variables in $\mathbf{\lambda}$. If these suggestions are correct, then the Lagrangian will satisfy the following condition: \begin{align} \mathcal{L}\left(\mathbf{x}, \mathbf{\lambda}^{*}\right) \leq \mathcal{L}\left(\mathbf{x}^{*}, \mathbf{\lambda}^{*}\right) \leq \mathcal{L}\left(\mathbf{x}^{*}, \mathbf{\lambda}\right) \quad \text { for all } x \in \mathbb{R}_{+}^n \quad \text { and } \quad \lambda \in \mathbb{R}_{+}^{m}. \tag{4.19} \end{align} The condition in (4.19) says that $(\mathbf{x}^{*}, \mathbf{\lambda}^{*})$ is a saddle point of the Lagrangian; the optimal value of the Lagrangian can be written as $$\mathcal{L}\left(\mathbf{x}^{*}, \mathbf{\lambda}^{*}\right)=\max_{\mathbf{x} \geq \mathbf{0}}\min _{\mathbf{\lambda} \geq \mathbf{0}} \mathcal{L}\left(\mathbf{x}, \mathbf{\lambda}\right). \tag{4.20}$$ Thus on the basis of the KKT conditions, we conclude that solving (4.18) and finding a saddle point for the Lagrangian are equivalent problems. > Example. Consider the following maximization problem with inequality constraints: \begin{align} & \max_{(x,y) \in \mathbb{R}^{2}} f(x,y)\\ & \text {subject to: } g(x,y) = 2-x-y^{2} \geq 0, \\ & (x,y) \geq \mathbf{0}, \end{align} The Lagrangian for this problem is $$\mathcal{L}\left(x,y,\lambda\right)=xy+\lambda\left(2-x-y^{2}\right).$$ The first-order conditions are: \begin{align} \mathcal{L}_{x}\left(x^{*},y^{*},\lambda^{*}\right)&=y^{*}-\lambda^{*} \leq 0 \\ x^{*}\mathcal{L}_{x}\left(x^{*},y^{*},\lambda^{*}\right)&=x^{*}\left(y^{*}-\lambda^{*}\right)=0 \\ \mathcal{L}_{y}\left(x^{*},y^{*},\lambda^{*}\right) &= x^{*}-2y^{*}\lambda^{*} \leq 0 \\ y^{*}\mathcal{L}_{y}\left(x^{*},y^{*},\lambda^{*}\right) &= y^{*}\left(x^{*}-2y^{*}\lambda^{*}\right) = 0 \\ \mathcal{L}_{\lambda}\left(x^{*},y^{*}, \lambda^{*}\right)&=2-x^{*}-\left(y^{*}\right)^{2} \geq 0 \\ \lambda^{*}\mathcal{L}_{\lambda}\left(x^{*},y^{*},\lambda^{*}\right)&=\lambda^{*}\left(2-x^{*}-\left(y^{*}\right)^{2}\right)=0 \end{align} where $x^{*},y^{*},\lambda^{*} \geq 0$. We consider two cases. **Case 1** ($\lambda^{*}=0$). Clearly, we have $y^{*}=0$. Therefore, $x^{*} \leq 2y^{*}\lambda^{*} = 0$ and since $x^{*} \geq 0$, we have $x^{*}=0$. **Case 2** ($\lambda^{*}>0$). From the last FOC (the complementary slackness condition related to $\lambda^{*}$), we have $x^{*}+\left(y^{*}\right)^{2}=2$. **Case 2.1.** Assume $x^{*}>0$, implying that $y^{*} = \lambda$. Also, $x^{*}-2(y^{*})^{2}=0$ or $(y^{*})^{2} = \frac{x^{*}}{2}$. Therefore, $x^{*}+\frac{x^{*}}{2}=2$ or $x^{*} = \frac{4}{3}$ and $y^{*} = \sqrt{\frac{2}{3}}$. **Case 2.2.** Assume $x^{*}=0$. Therefore, $y^{*}=\sqrt{2}$ and $x^{*}-2y^{*}\lambda^{*} = 0$, implying that $\lambda^{*}=0$, taking us back to Case 1.