5.1. Fundamental elements in optimization process

--- tags: Mathematical modeling and optimization title: 5.1. Fundamental elements in optimization process --- ## 5.1. Fundamental concepts in optimization process A classical optimization assignement starts sometimes like this: *"A construction company is planning a project that involves the construction of multiple buildings. The company needs to allocate its resources efficiently to maximize the profit from the project. The resources include skilled labor, construction materials, and machinery."* In this scenaio, the operator have different feasible decisions to finish the projects. In which sequence, how many skill workers on which project, and so on. Finding the optimal solution is a mathematical challenge and would directly relate to the profit of a company. The related optimization assignements can be in different aspects, this field of science is also called [__*operations research*__](https://en.wikipedia.org/wiki/Operations_research). ### 5.1.1 Elements of optimization process In every optimization assignement, two fundamental elements should be clearly defined before the algorithm takes place. These are __*cost function*__ and __*constraints*__. __*Cost function*__, also called __*objective function*__, is a crucial component representing the objective to be minimized or maximized. It quantifies the performance or quality of a solution in the optimization problem. In the construction company scenario, the cost function can be the total project time or the total project cost, which are to minimize, __*Constraints*__ are conditions that must be satisfied during the optimization process. Constraints delineate the feasible region within which the solution must lie. In the construction company scenario, a constraint could be total available skilled labor hours and/or total available materials quantity, etc. These elements collectively define the optimization problem, guiding algorithms in navigating the solution space to identify the most optimal outcome that fulfills the defined criteria. ### 4.1.2 Type of optimization problems Optimization assignments can be generally categorized into several types, each serving a specific purpose. These categories include [__*linear programming*__](https://en.wikipedia.org/wiki/Linear_programming), [__*integer programming*__](https://en.wikipedia.org/wiki/Integer_programming), [__*nonlinear programming*__](https://en.wikipedia.org/wiki/Nonlinear_programming), and [__*dynamic programming*__](https://en.wikipedia.org/wiki/Dynamic_programming). - __*Linear programming*__ focuses on optimizing linear objective functions subject to linear constraints. An example can be: Maximize: $3x+2y$ Subject to: $\qquad \qquad 2x+y≤20$ $\qquad \qquad 4x−5y≥−10$ $\qquad \qquad x,y≥0$ where both the objective function and constraints are linear functions. Solution of linear programming assignement can be straight forwards. Since all the representations are linear, the optimum point will be on the intersection of the constraints. <img src="https://live.staticflickr.com/65535/53509667957_5de7b401a5_z.jpg"> - When variables are presented in the discrete form, e.g. integer, boolean variables, such assignements are called __*Integer programming*__. Such assignments also called _combinatorial problems_. Extending the linear programming example, $x$ and $y$ are restricted to integer, meaning $\qquad \qquad x, \ y \subset \mathbb{Z}$. The solutions are then restricted to the green points in the figure. <img src="https://live.staticflickr.com/65535/53512478461_af79961b8b_z.jpg">. It is seen that the maximum objectiv value slightly varies due to the integer restriction. Most of the time, the optimal discsret solution locates closed to the continuous solution.  - __*Nonlinear programming*__ extends the scope to non-linear objectives and constraints. Non-linear terms are simply elements such as $x^n$, $\exp(x)$, $\ln(x)$, etc. An example of non-linear programming can be: Minimize: $2x^2 - 100x$ Subject to $\qquad \qquad x≤30$ where the non-linear term $2x^2$ appears in the condition Since the objective function is non-linear, the maximum/minimum is not restricted on the intersection of contraints. For the example above, the minimum value yields at $x=25$, where it is not the intersection to the contraint $x=30$ line <img src="https://live.staticflickr.com/65535/53511709829_211f52cf0c_z.jpg"> Solving non-linear programming problem is complex. We need the gradient information to achieve the local minimum/maximum. Methods based on *gradient decent* are commonly utilized for such problems, see next section. - __*Dynamic programming*__ addresses problems with sequential decision-making over time. These categories provide a structured approach to tackling diverse optimization challenges in the field of operations research. __See also__ [Scipy](https://docs.scipy.org/doc/scipy/reference/optimize.html) provide a complex package to deal with optimization problems, including [linear prgramming](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html) for linear problems and the add-on option *integrality* for integer programing problem. [minimize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html) is the tool used to applied for non-linear problems. These can be directly called without implemented individually. [Pyomo](http://www.pyomo.org/) is an individual Python package designed to facilitate optimization modeling and solve complex engineering problems efficiently. It offers a flexible modeling environment that allows users to formulate optimization problems using a high-level modeling language including linear programming, mixed-integer linear programming, nonlinear programming, and stochastic programming. Besides, Pyomo seamlessly integrates with popular optimization solvers such as [GLPK](https://en.wikipedia.org/wiki/GNU_Linear_Programming_Kit), [CBC](https://coin-or.github.io/Cbc/intro.html), [Gurobi](https://www.gurobi.com/), and [CPLEX](https://ampl.com/products/solvers/solvers-we-sell/cplex/?gclid=Cj0KCQiAzoeuBhDqARIsAMdH14F1gfOssl3Sj-Mma4j1nfBBVoiTKVed8sLctsiwoL0IYij7GR-QWVUaAmvnEALw_wcB), allowing users to leverage state-of-the-art optimization algorithms to solve their engineering problems efficiently. $\color{red}{\text{Some examples lacking!!}}$ ### 5.1.3 Gradient Decent towards local optimum Gradient Descent is an optimization algorithm used to find the local minimum (or maximum) of a function. It is commonly applied in machine learning, deep learning, and various optimization problems. The core idea behind Gradient Descent is to iteratively update the parameters of a model or the variables of an optimization problem by moving in the direction of the negative gradient of the objective function. This process continues until convergence is achieved, i.e., until the change in the objective function becomes negligible. The mathematical description of gradient decent is straight forwards. Supposed an objective function as $f(x)$, where $x$ represents the parameters or variables of the function. The goal is to minimize $f(x)$. The gradient of $f(x)$ with respect to $x$ is denoted as $\nabla f(x)$, which is a vector that points in the direction of the steepest ascent. The update rule for Gradient Descent can be expressed as: $\qquad \qquad \large \displaystyle x_{n+1}=x_n−\alpha \ \nabla f(x_n)$ Where: - $x_n$ is the current value of the parameters or variables. - $\alpha$ is the learning rate, which determines the size of the step taken in each iteration. It is a positive scalar. - $\nabla f(x_n)$ is the gradient of the objective function evaluated at $x_n$. The learning rate $\alpha$ is a crucial hyperparameter in Gradient Descent. It controls the step size and influences the convergence speed and stability of the algorithm. Choosing an appropriate learning rate is essential to ensure efficient convergence without overshooting the optimum or getting stuck in local minima. For the above example minimizing $2x^2 - 100x$, the derivative to $x$ yiedls $4x -100$. Putting these information to the gradient decent algorithm: def gradient_descent(func, gradient, initial_point, learning_rate=0.01, max_iterations=1000, tolerance=1e-6): """ Gradient Descent optimization algorithm. Parameters: - func: The objective function to minimize. - gradient: The gradient of the objective function. - initial_point: The initial guess for the optimum. - learning_rate: The step size for each iteration (default is 0.01). - max_iterations: The maximum number of iterations (default is 1000). - tolerance: The convergence criterion (default is 1e-6). Returns: - optimum: The optimum point found by the algorithm. - optimum_value: The value of the objective function at the optimum. - iterations: The number of iterations performed. """ trajectory = [] current_point = np.array(initial_point) iterations = 0 trajectory.append(current_point) while iterations < max_iterations: # Compute gradient at current point grad = gradient(current_point) # Update current point using gradient descent new_point = current_point - learning_rate * grad # Compute change in objective function delta = np.linalg.norm(new_point - current_point) if delta < tolerance: break current_point = new_point trajectory.append(current_point) iterations += 1 optimum = current_point optimum_value = func(optimum) return optimum, optimum_value, iterations, trajectory With def quadratic_function(x): return 2*x**2 - 100*x def gradient_quadratic_function(x): return 4*x - 100 initial_guess = 0 optimum, optimum_value, iterations, trajectory =\ gradient_descent(quadratic_function, gradient_quadratic_function, initial_guess) Starting from the $x = 0$, the trajectory approach optimal solution shows <img src="https://live.staticflickr.com/65535/53514556027_b834f23652_z.jpg"> The black point are the position of each iteration. It can be seen that the gradient decent algorithm leads the solution to the bottom of the valley and stops there as final results. Mainly, this method is only capable to lead the solution to the local valley. No mechanism is addiotnally defined in the gradient decent to bring up-hills. Thus, a global solution can be only achieved if the cost function is __*convex*__, where global minimum overlaps with local minimum. __See Also__ A [__*Convex Function*__](https://en.wikipedia.org/wiki/Convex_function) is a special type of mathematical function that has a particular geometric property. Imagine you have a curve representing a function on a graph. If you pick any two points on that curve and draw a straight line connecting them, the curve lies below the line. In simpler terms, a function is convex if it always curves upwards, and the line segment connecting any two points on the graph lies below or on the graph itself. <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/c/c7/ConvexFunction.svg/1280px-ConvexFunction.svg.png" width=80%> Key Features of Convex Functions are: The "Bowtie" Property: One simple way to visualize convexity is to think of a bowtie. If you pick two points on the curve, the curve should lie entirely below the line connecting those points. This property is often referred to as the "bowtie" property. Positive Second Derivative: Mathematically, a function is convex if its second derivative is always positive. For those unfamiliar with calculus, this condition ensures that the function always curves upward.