5.3 Heuristic optimization

--- tags: Mathematical modeling and optimization title: 5.3 Heuristic optimization --- ## 5.3 Heuristic optimization In the last sections, we've learned to describe a static model and using least square technique to fit the modeling paramters. However, analytical solution selden does exist in the dynamic systems, for which no static formulation can be derived to describe the system behavior. Thus, no gradient can be built. System optimization using classical gradient based method is not feasible. When exact methods are impractical or computationally infeasible, __*Heuristic optimization*__ is an alternative to find near-optimal solutions to complex optimization problems. The principle of heuristic optimization revolves around the concept of iteratively refining a solution candidate by exploring neighboring solutions and making incremental improvements. Instead of exhaustively evaluating every possible solution, heuristic optimization algorithms employ strategies to intelligently navigate the solution space, often favoring exploration over exploitation to avoid getting trapped in local optima. The heuristic optimization methods can be generally categorized into two types: __*local-search based*__ and __*population based*__. The local search based methods usually start from a random point, then iteratively stepping into the local minimum. Instead of the gradient decent method, special algorithms are created to prevent the solution trapped in the local minimum. This mechanism helps solution to go up-hill and keep finding the global minimum. Some representative methods are __*Simulated Annealing*__ and __*Tabu Search*__ Apart from that, the __*population based*__ methods start with an amount of seeds (population), by selection and/or information exchange, the seeds evolute in the next iteration by characterist modify and/or summarizing information to approach the golbal minimum. Classical methods are __*Genetic Algorithm*__, __*Evolutionary Strategy*__ and __*Particle Swarm Optimization*__. ### 5.3.1 Neighborhood search based methods: __*Simulated Annealing*__ and __*Tabu Search*__ Simulated Annealing and Tabu Search are two selected _local search_ based methods, which check out the neighbouring points to approch better solution. However, certain mechanisms are introduced repectively to avoid the solution stucking in the local minimum. The simulation annealing throws a dice against the anealing analogous __probability limit__, and Tabu search put the best results so far into a __tabu list__. #### Simulated Annealing for continuous problems Simulated Annealing is a probabilistic metaheuristic inspired by the annealing process in metallurgy, where a material is cooled slowly to reach a low-energy state. It operates by iteratively exploring the solution space in search of the optimum solution while allowing for occasional uphill moves to escape local optima. At each iteration, the algorithm considers a neighboring solution and decides whether to accept it based on a probability distribution derived from the [Metropolis criterion](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm). A rough description of Simulated Annealing starts with - A (randomly) chosen initial solution $x_0$ and set an initial temperature $T_0$. - In each iteration: 1. A new candidate solution $x^\prime$ generatedby perturbing the current solution. (via random process). Then - Compute the change in objective function value: $\displaystyle \Delta E=f(x^\prime)−f(x)$. 2. If - $\Delta E < 0$ (i.e., the new solution is better), replace the current solution by the new one. $x_{t+1}=x^\prime$. - $\Delta E \geq 0$, accept the new solution with probability $\exp⁡(\frac{−ΔE}{T_t})$. 3. Temperature $T$ updates according to a given cooling rate parameter, so the temperature is reducing at each iteration. - Common cooling process include exponential decay, linear decay, and geometric decay. 4. Check if stopping Criterion is met to stop the iteration. The $\exp⁡(\frac{−\Delta E}{T_t})$ in the method represents the probabilistic mechanism for accepting new solutions, analogous to the Boltzmann distribution in statistical mechanics. It governs the trade-off between [__exploration and exploitation__](https://www.researchgate.net/post/What_is_Exploitation_and_Exploration_in_Optimization_Algorithms), promoting exploration at higher temperatures and exploitation at lower temperatures, ultimately guiding the algorithm towards the global optimum. A python example for [Beale function](https://en.wikipedia.org/wiki/Test_functions_for_optimization) using simulated annealing can be implemented as following: import numpy as np import matplotlib.pyplot as plt # Define the Beale function as the cost function def BealeFunction(X): x = X[0] y = X[1] Beale = (1.5 - x + x*y)**2.0 + \ (2.25- x +x*y**2)**2 + \ (2.625 - x + x*y**3 )**2 return Beale # Simulated Annealing Algorithm with a specified cost function def simulated_annealing(cost_func, initial_solution, bounds,\ max_iter=5000, initial_temp=2500, cooling_rate=0.8): current_solution = initial_solution current_cost = cost_func(current_solution) best_solution = current_solution best_cost = current_cost temperature = initial_temp for i in range(max_iter): # Generate a new candidate solution by # perturbing the current solution within bounds new_solution = np.random.uniform(bounds[0], bounds[1], \ size=len(initial_solution)) # Compute the change in cost function value delta_E = cost_func(new_solution) - current_cost # Accept the new solution with probability exp(-delta_E / temp) if delta_E < 0 or np.random.rand() < np.exp(-delta_E / temperature): current_solution = new_solution current_cost = cost_func(new_solution) # Update the best solution found so far if current_cost < best_cost: best_solution = current_solution best_cost = current_cost # Update the temperature temperature *= cooling_rate return best_solution, best_cost # Define the bounds for the Beale function bounds = (-5, 5) # Perform Simulated Annealing to minimize the Beale function best_solution, best_cost = simulated_annealing(BealeFunction, [0, 0], bounds) # Plot the Beale function x = np.linspace(bounds[0], bounds[1], 100) y = np.linspace(bounds[0], bounds[1], 100) X, Y = np.meshgrid(x, y) Z = himmelblau([X, Y]) plt.figure(figsize=(8, 6)) plt.contourf(X, Y, Z, levels=50, cmap='jet') plt.colorbar(label='Cost Function Value') plt.scatter(best_solution[0], best_solution[1], \ color='red', marker='*', label='Best Solution') plt.title('Beale Function and Simulated Annealing Solution') plt.xlabel('x') plt.ylabel('y') plt.legend() plt.grid(True) plt.show() print("Best Solution:", best_solution) print("Minimum Cost:", best_cost) --- #### Tabu Search for combinatorial problem Tabu Search is a metaheuristic optimization algorithm, which is inspired by the observation that some search moves may be counterproductive in optimization problems, especially in combinatorial optimization where traditional search methods might get trapped in local optima. Tabu Search overcomes this limitation by maintaining a memory structure, called a __Tabu list__, to prevent revisiting recently explored solutions and guide the search towards promising regions of the solution space. Taking a [__Knapsack problem__](https://en.wikipedia.org/wiki/Knapsack_problem) for instance, a brief description of Tabu Search method can be described as: - A (randomly) chosen initial solution as a binary vector $x_0$ of length $n$, where $n$ is the number of items, corresponding to whether an item is included ($1$) or not included ($0$) in the knapsack. - Iteratively approching the minimum, by which 1. the neighborhood search structure generates neighboring solutions by performing small modifications to the current solution, such as flipping the inclusion status of a single item. 2. maintain a Tabu list to store recently visited solutions that are not allowed to be revisited for a certain number of iterations. This prevents the algorithm from cycling through previously explored solutions. 3. Optionally, Aspiration Criteria allow the optimization to override the Tabu status of certain moves. The aspiration criteria can be removing the best-so-far solution to dive into local optimum (exploitation), or bringing out to different search area to break the local optimal (exploration).  The python implementation could be like this: import numpy as np # Define the Knapsack Problem instance (weights and values of items) weights = [2, 3, 4, 5, 9] # Example weights of items values = [3, 4, 5, 8, 10] # Example values of items capacity = 10 # Knapsack capacity # Define the objective function to evaluate the total value of items included in the knapsack def knapsack_value(solution): total_value = sum(values[i] for i in range(len(solution)) if solution[i] == 1) return total_value # Tabu Search for the Knapsack Problem def tabu_search(weights, values, capacity, max_iter=1000, tabu_list_size=10): n = len(weights) current_solution = np.zeros(n) # Initialize with all items excluded best_solution = current_solution.copy() tabu_list = [] for l in range(max_iter): # Greedy heuristic for neighborhood search: add/remove the most valuable/least valuable item neighbor = current_solution.copy() available_items = [i for i in range(n) if i not in tabu_list] if available_items: item_to_flip = max(available_items, key=lambda i: values[i] if current_solution[i] == 0 else -values[i]) neighbor[item_to_flip] = 1 if current_solution[item_to_flip] == 0 else 0 # Check if the new solution violates the capacity constraint if sum(weights[i] * neighbor[i] for i in range(n)) <= capacity: current_solution = neighbor if knapsack_value(neighbor) > knapsack_value(best_solution): best_solution = neighbor # Update the Tabu list tabu_list.append(item_to_flip) if len(tabu_list) > tabu_list_size: tabu_list.pop(0) return best_solution # Solve the Knapsack Problem using Tabu Search best_solution = tabu_search(weights, values, capacity) # Print the best solution and its total value print("Best Solution (Item indices included in the knapsack):", [i for i in range(len(best_solution)) if best_solution[i] == 1]) print("Total Value of Items in the Knapsack:", knapsack_value(best_solution)) --- #### See Also In the application category, the Tabu Search is rather suitable for the combinatorial problems. Since only the best soultion tested sofar are documented in the tabu list, continuous solution type might still lead the solution trajectory staying in the local minimum. Whereas, simulated annealing can be easily applied for continuous problem due to the probability based decision making algorithm. ### 5.3.2 Population based methods: Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) Population-based methods are a class of optimization algorithms that maintain a population of candidate solutions (individuals) to iteratively explore the solution space and improve the quality of solutions over successive generations. These methods are inspired by natural phenomena such as evolution, swarm behavior, and social interactions. Two seleted population-based methods here are Genetic Algorithms (GAs) and Particle Swarm Optimization (PSO). #### Genetic Algorithms (GAs): Genetic Algorithms are optimization algorithms based on the principles of natural selection and genetics. The algorithm operates by iteratively evolving a population of candidate solutions (chromosomes) over generations, mimicking the process of natural selection, __Crossover__, __Mutation__ and __Selection__. __Cross over__ Crossover simulates the biological process of recombination. It involves combining genetic material from two parent solutions to create one or more offspring solutions. The objective is to explore the search space by generating new candidate solutions that inherit characteristics from both parents. e.g.,In a binary genetic algorithm, crossover typically involves selecting a random crossover point and swapping the genetic material between the parents at that point. Let's consider two parent solutions represented as binary strings: Parent 1: __0101__ 101101011011 Parent 2: 1010 __010010100100__ Suppose we randomly select the crossover point as the 4th position. The crossover operation would produce two offspring solutions: Offspring 1: 01010 (from Parent 1) + 010010100100 (from Parent 2) = 01010010010100100 Offspring 2: 1010 (from Parent 2) + 101101011011 (from Parent 1) = 1010101101011011 In real-valued genetic algorithms, crossover is applied differently based on the representation of the solutions. For example, in single-point crossover for real-valued representations, a random crossover point is chosen, and the values of the variables beyond that point are exchanged between the parents. __Mutation__ Mutation introduces random changes in the genetic material of individual solutions. It helps maintain diversity in the population and prevents premature convergence by exploring new regions of the search space. In a binary genetic algorithm, mutation involves flipping the bits of a solution with a certain probability. In real-valued genetic algorithms, mutation involves adding a small random value to the variables of a solution. This random perturbation introduces small changes in the solution's genotype, leading to potentially new phenotypic characteristics. __Selection__ selection method is responsible for determining which individuals from the population will be chosen to produce offspring for the next generation. The selection process aims to favor individuals with higher fitness values, as they are more likely to contribute to the improvement of the population over successive generations. Common methods are [Roulette Wheel Selection](https://en.wikipedia.org/wiki/Fitness_proportionate_selection), [Tournament Selection](https://en.wikipedia.org/wiki/Tournament_selection), [Rank Selection](https://en.wikipedia.org/wiki/Selection_(genetic_algorithm)#Rank_Selection) and [Stochastic Universal Sampling](https://en.wikipedia.org/wiki/Stochastic_universal_sampling) --- An example of Genetic Algorithm implementation for Himmelblau equation in python shown as follows: import numpy as np def genetic_algorithm(objective_function, bounds, population_size=20, num_generations=1000, \ crossover_rate=0.8, mutation_rate=0.1): """ Genetic Algorithm for continuous optimization problems with two variables. Parameters: objective_function (function): The objective function to be optimized. bounds (list of tuples): The search space bounds for each variable [(min_x, max_x), (min_y, max_y)]. population_size (int): The number of individuals in each generation (default is 100). num_generations (int): The number of generations for the optimization process (default is 1000). crossover_rate (float): The probability of crossover operation (default is 0.8). mutation_rate (float): The probability of mutation operation (default is 0.1). Returns: tuple: The best solution found and its fitness value. """ # Initialization search_space_dimension = len(bounds) population = np.random.uniform(low=bounds[0], high=bounds[1], size=(population_size, search_space_dimension)) best_solution = None best_fitness = float('inf') # Main loop for _ in range(num_generations): #print("population:\n", population) # Fitness evaluation fitness_values = np.apply_along_axis(objective_function, 1, population) #print("fitness_values:\n",fitness_values) best_index = np.argmin(fitness_values) current_best_fitness = fitness_values[best_index] # Update global best solution if current_best_fitness < best_fitness: best_fitness = current_best_fitness best_population = population.copy() best_solution = best_population[best_index] # Selection: Roulette wheel selection probabilities = 1 / (fitness_values + 1e-10) # Avoid division by zero probabilities /= np.sum(probabilities) selected_indices = np.random.choice(range(population_size), size=population_size, p=probabilities) # Crossover: Single-point crossover for i in range(0, population_size, 2): if np.random.rand() < crossover_rate: crossover_point = np.random.randint(1, search_space_dimension) temp = population[selected_indices[i], crossover_point:].copy() population[selected_indices[i], crossover_point:] = population[selected_indices[i+1], crossover_point:] population[selected_indices[i+1], crossover_point:] = temp #print("mutation_rate:", mutation_rate) # Mutation: Gaussian mutation for i in range(population_size): if np.random.rand() < mutation_rate: mutation_vector = np.random.normal(loc=0, scale=0.1, size=search_space_dimension) population[i] += mutation_vector # Clip population within search space for j in range(search_space_dimension): population[:, j] = np.clip(population[:, j], bounds[0][j], bounds[1][j]) return best_solution, best_fitness # Example usage def objective_function(x): return (x[0]**2 + x[1] - 11)**2 + (x[0] + x[1]**2 - 7)**2 bounds = [[-5, -5], [5, 5]] # Search space bounds for each variable best_solution, best_fitness = genetic_algorithm(objective_function, bounds, population_size=20, mutation_rate=0.5) print("Best Solution:", best_solution) print("Best Fitness:", best_fitness) --- #### Particle Swarm Optimization (PSO): Particle Swarm Optimization is an optimization algorithm inspired by the collective behavior of swarms, such as bird flocks or fish schools. In PSO, candidate solutions, called particles, move through the solution space while adjusting their positions based on their own experience and the collective information of the swarm. The PSO algorithm operates on the principle of self-organization and collaboration. At the start, particles are initialized randomly within the search space. As the algorithm progresses, each particle adjusts its position and velocity based on two critical pieces of information: __Personal Best__ (Pbest): Each particle tracks the best solution it has encountered so far. This information guides the particle towards regions of the search space where it has previously found success. __Global Best__ (Gbest): The overall best solution discovered by any particle in the population. This global information enables particles to explore promising regions of the search space beyond their individual experiences. Particles update their positions and velocities iteratively using simple mathematical equations. By balancing exploration (searching new regions) and exploitation (exploiting known solutions), PSO efficiently navigates the search space to converge towards the optimal solution. Algorithm can be described in following steps: - Initialize a population of particles randomly within the search space. - In each iteration, - Evaluate the fitness of each particle using the objective function. - Initialization of Particle's Velocity: Initialize the velocity of each particle randomly. - Update each particle's best-known position based on its current position and fitness. - Update the global best-known position based on the best position found by any particle in the population. - Update each particle's velocity and position using the following equations: $\qquad \displaystyle v_i(t+1)=w\ v_i(t)+c_1⋅rand_1⋅(P\text{best}_i −x_i(t))+c_2⋅rand_2⋅(G\text{best}−x_i(t))$ $\qquad \displaystyle x_i(t+1)=x_i(t)+v_i(t+1)$ Where: - $v_i(t)$ and $x_i(t)$ are the _velocity_ and _position_ of particle $i$ at time $t$. - $Pbest_i$ is the best-known position of particle $i$. - $Gbest$ is the global best-known position. - $w$ is the inertia weight. - $c_1$ and $c_2$ are acceleration coefficients. - $rand_1$ and $rand_2$ are random values between 0 and 1. A sample implementation reads: import numpy as np def objective_function(x): return (x[0]**2 + x[1] - 11)**2 + (x[0] + x[1]**2 - 7)**2 # Himmelblau equation def particle_swarm_optimization(objective_function, bounds, num_particles=50, num_iterations=1000, w=0.5, c1=1.5, c2=1.5): search_space_dimension = len(bounds[0]) global_best_position = np.zeros(search_space_dimension) global_best_fitness = float('inf') # Initialize particles particles_position = np.random.uniform(bounds[0], bounds[1], size=(num_particles, search_space_dimension)) particles_velocity = np.random.uniform(-1, 1, size=(num_particles, search_space_dimension)) particles_best_position = particles_position.copy() particles_best_fitness = np.apply_along_axis(objective_function, 1, particles_position) # Main loop for _ in range(num_iterations): # Update global best position and fitness min_index = np.argmin(particles_best_fitness) if particles_best_fitness[min_index] < global_best_fitness: global_best_fitness = particles_best_fitness[min_index] global_best_position = particles_best_position[min_index] # Update particles' velocity and position for i in range(num_particles): r1, r2 = np.random.rand(), np.random.rand() particles_velocity[i] = w * particles_velocity[i] + c1 * r1 * (particles_best_position[i] - particles_position[i]) + c2 * r2 * (global_best_position - particles_position[i]) particles_position[i] += particles_velocity[i] # Check boundaries for j in range(search_space_dimension): particles_position[i][j] = np.clip(particles_position[i][j], bounds[0][j], bounds[1][j]) # Update particle's best position and fitness current_fitness = objective_function(particles_position[i]) if current_fitness < particles_best_fitness[i]: particles_best_fitness[i] = current_fitness particles_best_position[i] = particles_position[i] return global_best_position, global_best_fitness # Define search space bounds bounds = [[-5, -5], [5, 5] # Himmelblau equation search space bounds for each variable # Run PSO algorithm best_solution, best_fitness = particle_swarm_optimization(objective_function, bounds) # Print the results print("Best Solution (x, y):", best_solution) print("Best Fitness:", best_fitness) __See Also__ ### 5.3.3 Handling of constraints Handling constraints in heuristic optimization methods is crucial for ensuring that the solutions generated are feasible and adhere to the problem's constraints. Possible constraints in the optimization assignments can be equality constraints (e.g., $g(x)=0$) or inequality constraints (e.g., $h(x)\leq 0$), where $x$ represents the decision variables. Several strategies are commonly employed to handle constraints in heuristic optimization methods: - __Penalty Function Method__: This approach penalizes infeasible solutions by __adding a penalty term to the objective function__. The penalty term increases as the violation of constraints increases. The objective function thus becomes a combination of the original objective and penalty terms. The objective function with penalty terms can be expressed as: $\qquad \displaystyle f(x)+\sum_i λ_i\ g_i(x)^2+ \sum_j \mu_j\max(0,h_j(x))$ where $g_i(x)$ and $h_j(x)$ represent the equality and inequality constraint functions, respectively, and $\lambda_i$ and $\mu_j$ are penalty parameters. - __Feasibility-based Methods__: These methods __prioritize feasibility__ during the search process by incorporating feasibility checks at each iteration. If a candidate solution violates any constraint, it is either repaired to satisfy the constraints or discarded. Feasibility checks are performed at each iteration to ensure that candidate solutions $x$ satisfy all constraints: $\qquad \displaystyle g_i(x)=0,i=1,2,…,m$ $\qquad \displaystyle h_j(x)\leq 0,j=1,2,…,n$ - __Repair Methods__: Repairing infeasible solutions involves modifying the decision variables xx to satisfy the constraints: $\qquad \displaystyle x=\text{Repair}(x)$ Taking the [__Himmelblau equation__](https://en.wikipedia.org/wiki/Himmelblau%27s_function) as scenario. The function reads: $\qquad {\displaystyle f(x,y)=(x^{2}+y-11)^{2}+(x+y^{2}-7)^{2}.\quad }$ if the constraints are set as: $\qquad {\displaystyle x > 3\ ; \quad y < 1}$ $\qquad {\displaystyle x + y \ = 3.5\quad }$ The corresponding acts will be: - __Penalty Function Method__: Augment the objective function with penalty terms to penalize violations of constraints. Penalize solutions where $x\leq 3$, $y \geq 1$, and $x+y \neq 3.5$ using penalty parameters. $\displaystyle \text{obj}(x, y) = f(x,y)+ \lambda ⋅(x+ y -3.5)^2+\mu_x⋅\max(0, 3−x)+\mu_y⋅\max(0, y−1)$ - __Feasibility-based Methods__: Ensure that generated solutions satisfy all constraints during the optimization process. At each iteration, check if the current solution violates the constraints $x>3$, $y<1$, and $x+y= 3.5$. If so, discard the solution. - __Repair Methods__: Modify infeasible solutions to make them feasible while preserving solution quality. Adjust the solution to satisfy the constraints $x>3$, $y<1$, and $x+y=3.5$ without significantly affecting the objective function value. A python example using penalty method in genetic algorithm (GA) shows: import numpy as np # Define the Himmelblau function def himmelblau(x, y): return (x**2 + y - 11)**2 + (x + y**2 - 7)**2 # Define the constraint function def constraint(x, y): return abs(x + y - 3.5) # Define the penalty function method for handling constraints def penalty_function_method(x, y): # Calculate the penalty terms for constraint violations penalty_term_x = max(0, 3 - x) penalty_term_y = max(0, y - 1) penalty_term_x_y = constraint(x, y) # Evaluate the objective function with penalty terms return himmelblau(x, y) + penalty_term_x + penalty_term_y + penalty_term_x_y # Define the optimization parameters population_size = 100 num_generations = 1000 crossover_rate = 0.8 mutation_rate = 0.1 # Define the genetic algorithm def genetic_algorithm(): # Initialization population = np.random.uniform(low=[3, -2], high=[5, 1], size=(population_size, 2)) best_solution = None best_fitness = float('inf') # Main loop for _ in range(num_generations): # Fitness evaluation fitness_values = np.apply_along_axis(lambda x: penalty_function_method(x[0], x[1]), 1, population) best_index = np.argmin(fitness_values) current_best_fitness = fitness_values[best_index] # Update global best solution if current_best_fitness < best_fitness: best_fitness = current_best_fitness best_solution = population[best_index] # Selection: Roulette wheel selection probabilities = 1 / (fitness_values + 1e-10) # Avoid division by zero probabilities /= np.sum(probabilities) selected_indices = np.random.choice(range(population_size), size=population_size, p=probabilities) # Crossover: Single-point crossover for i in range(0, population_size, 2): if np.random.rand() < crossover_rate: crossover_point = np.random.randint(1, 2) temp = population[selected_indices[i], crossover_point:].copy() population[selected_indices[i], crossover_point:] = population[selected_indices[i+1], crossover_point:] population[selected_indices[i+1], crossover_point:] = temp # Mutation: Gaussian mutation for i in range(population_size): if np.random.rand() < mutation_rate: mutation_vector = np.random.normal(loc=0, scale=0.1, size=2) population[i] += mutation_vector # Clip population within search space population[:, 0] = np.clip(population[:, 0], 3, 5) # x > 3 population[:, 1] = np.clip(population[:, 1], -2, 1) # y < 1 return best_solution, best_fitness # Optimize the Himmelblau function with constraints using genetic algorithm best_solution, best_fitness = genetic_algorithm() # Print the results print("Best Solution (x, y):", best_solution) print("Best Fitness:", best_fitness)