---
tags: Discrete Optimization, Coursera
---
<h1> Local Search </h1>

Some local search assignments can thus violate the 'constraints' that you'd put in a constraint programming solution, unlike constraint programming where all intermediate states satisfy all constraints.

Satisfiability to Optimization

How constraint violations are counted can differ, could even build an arbitrary loss function on it with different weights to different violations.

:::spoiler N Queens
In the N-Queens problem, your local move can be 'move along the current row' because we know one queen will be there in every column, and queens are not distinct. This reduces our 'search space'.
:::

In the graph, nodes are 'states' or 'assignments' and edges are between those assignments which we can transition between in 1 move. Directed graph as it may not make sense to go from a 'lower loss function' assignment to a higher one in osme cases. Can always add backward edge if needed anyway. The algorithm designer decides which edges should be allowed for each node. Lower the edges, there might be less chance of reaching optimal solution (not until a certain point though) but also algorithm will run faster.
Local Minima: Every possible move leads to an equal or worse configuration.

No guarantees for achieving global optima. Escaping bad local optima is a critical issue.
**Hard Constraints:** Constraints that are always satisfied during the search.
:::spoiler Examples
- Total no. of tasks scheduled
- Total no. of cars of each type on the assembly line (demand constraint)
- All numbers should be different in magic square assignment
:::
<br>
**Soft Constraints:** Constraints that might be violated during the search but are necessary in the final output.
:::spoiler Examples
- No window of k time should have >m tasks
- For each 'option' i.e. feature of car, no window of k slots should have >m cars requiring that option
- Sum of numbers on each row, column, diagonal in magic square should be equal
:::
<br>
**Swaps:** It may be wise to perform swaps of any 2 variables in the current assignment s.t. hard constraints are maintained.
:::spoiler Examples
- Swap 2 tasks in the current ordering
- Swap 2 cars in the current ordering
- Swap 2 numbers in the magic square
:::
<br>
**Travelling Salesman:**
Simple neighbourhood is to take an arbitrary path, and then replace 2 edges (notice it cant be 1) with some other edges, while ensuring a circuit is formed (reverse an arbitrary subset of edges if needed). This is called 2-opt
But why 2? Why not 3? 4? more... Well the time required worsens.
**K-Opt**
Instead of searching for an entire set of edges to swap, build the edges-to-be-swapped set incrementally. How?
1. Pick a long edge $e_1$, say $u_1, u_2$
2. Pick a shorter edge $e_2$ from $u_2$ to another vertex $u_3$ that you want to replace $e_1$ with. If can't do this, pick a new $e_1$ (restart).
3. Now you have to remove one of $u_3$'s edges, $e_3$ from $u_3$ to $u_4$.
4. Now we can connect with edge $c_1$ = $u_1$ to $u_4$, but we won't.
- First notice that on doing so, we remove $e_1$ and $e_3$ while adding $e_2$ and $c_1$, compute the cost of this 2-opt-like swap.
- But we don't need to end here, assume $c_1$ to be a new $e_1$ (like step 1) and loop steps 2-4, each iteration giving us (i+1)-opt as one additional edge gets removed (in new step 3) and one additional gets added (in new step 2).
Keep doing this until we have done k-1 iterations, giving k-opt. Finally, see which iteration gave the lowest cost and execute the swaps made to achieve that iteration's configuration.
### Solving feasibility and optimality together
There are many such problems with some constraint and a quantity to optimize.
Eg: Graph Coloring
Constraint: No 2 edges should have same colored vertex
Optimize: Minimize the no. of colors used
There are 3 approaches to such problems:
1. For a certain value of to be optimized quantity, find a feasible solution. Then improve the quantity slightly and switch things around to make the solution feasible.
:::spoiler Graph coloring example

How to make solution feasible with k-1 colors? Try to minimize constraint violations using local search.
:::
<br>
2. Always stay in feasible solution space, i.e. dont violate constraints at any stage. Then see if you can improve the optimization quantity using some changes such that you don't violate the constraints. For this, you may want to use a different loss function that keeps the essence of the original optimization quantity but is more easy to evaluate 'improvement' in.
:::spoiler Graph coloring example
The original loss function is just number of colors. With a local move of 'change color of one vertex', you most probably won't improve the loss function. Therefore we need a loss function that conveys more verbose information.

The loss function basically improves whenever we have larger color classes. So most color change moves would bring about a change in the loss function, and a change for the better if a node's color changes from $i$ to $j$ s.t. $C_i$ < $C_j$ because we make progress towards removing the $i^th$ color. Ofcourse this might lead to bad local minima as it's more of an approximation.
But we also have to ensure each move keeps the solution feasible. How to do that? Well, each move is not a single color change, but a sequence of color changes that lead to a feasible solution, called a **kemp chain**


Changing v from Ci to Cj would cause a match in colors with some adjacent vertices, so we switch those adjacent vertices to Ci and further on similarly. Sometimes (as in the image examples) it leads to an improvement in cost function (from 7-7 to 8-6) and these are the local search moves we make. So our neighbourhood consists of valid kemp-chain moves, not a single vertex color change now.
:::
<br>
3. Do a mix of the above 2 strategies, i.e. bring improvements in either feasibility or optimization-quantity or both with each move.

::: spoiler Graph coloring example
Define $B_i$ as set of bad-edges (same colored ends) of $i^{th}$ color. We want to minimize $B_i$ (to 0) and maximize $|C_i|^2$ to minimize colors. How to combine them? Well we want to try that the new objective function ensures that our local minima are atleast feasible (even though we can't say anything about optimality). Here, this can be ensured using the objective function $2|B_i||C_i| - |C_i|^2$. Proof:

Thus the minima won't be a local minima if $|B_i|>0$ for any $i$, and thus the local minima are always feasible, though they are built towards by improving both feasibility and optimality iteratively.
:::
#### Complex Neighbourhoods - Traveling Tournament Problem

The travel distance is the sum of distances travelled by each team over the entire tournament.
So essentially we have to a fill a schedule (Sch) table of teams x rounds size, specifying the $i^{th}$ teams plays the $j^{th}$ round with team $Sch[i][j]$ and also add info. on which team is home and away in this matchup.
:::spoiler Intuition for first constraint
Had the first constraint not existed, it might have been optimal to make teams play a lot of home matches together, and then go out on a tour of away games. But this gets boring for fans during the away part of the season and also causes teams with an early home phase to take a lead in the season only to be caught up later, i.e. scoreboard becomes more subtle/complex, which is undesriable.
:::
<br>
The interesting part of this problem is defining a neighbourhood for a particular assignment. Something as simple as swap 2 rounds for a particular team might not work well alone (because it also propagates changes elsewhere in the table for the opponent teams). The current best solution to this problem constructs the following moves to define a complex neighbourhood:
1. **Swap homes**: For given pair of teams A and B, swap which game is at A's home. It doesn't affect much, except the first constraint and possibly reducing travel distance.
2. **Swap rounds**: The exact round number for matchups is irrelevant, so you could take 2 columns (rounds) and swap them. This is a big move, i.e. radical change is possible.
3. **Swap teams**: Take the sequence of matches for two teams (two rows), and swap them. But now the opponents schedule also gets affected, so changes have to be made to maintain the integrity ($Sch[i][j] = Sch[Sch[i][j]][j]$) of the table acc to new schedule. Keep their games with each other as it is. Another radical change.
4. **Swap partial rounds**: Pick 2 rounds, and instead of swapping them entirely, exchange around the schedule for teams within these 2 rounds while maintaining integrity.
5. **Swap partial teams**: Instead of swapping the whole row, swap a subset of round matches of 2 picked teams. Now integrity would be violated in the table elsewhere, so fix that too.
### Non-global optima - Definitions
There are 2 main ways you can end up at non-global (henceforth called bad) optima:
1. You move into a bad optima and there's no way to move out
2. You start in a region which is disconnected from the global optima, so you can never reach it and end at a bad optima.
We define a 'connected' neighbourhood as:

Connectivity doesn't guarantee optimality as in your greedy search strategy you might end up at a local minima. But it is essential for there to be some hope for reaching the global minima.
To prove connectivity, you assume the optimal solution and prove that there is a sequence of moves within your defined neighbourhood to construct the optimal solution from your starting configuration.
:::spoiler Graph coloring example

:::
:::spoiler Car Scheduling - Swap Neighbourhood
Swapping can take you from any permutation to the optimal one.

:::
:::spoiler Travelling Salesman - 2-opt
The neighbourhood defined by a 2-opt move (pick 2 non-connected edges, replace them with diff. 2 edges among the 4 nodes) is connected. This is because travelling salesman solution can be considered an array if we fix arbitrary node as the starting point. Each 2-opt move (assume perfect graph) corresponds to reversing exactly one contigous subarray of size >=2. Thus we can achieve any optimal permutation we want from any starting configuration.

:::
Open question left by PvH: Is travelling tournament problem 5-step neighbourhood connected?!
### Local Search Implementation

$f$ is the objective function, $N$ is neighbourhood function, $L$ is legal neighbourhood from given neighborhood function and and $S$ is the next-state selection from legal neighbourhood function. $s$ is the current state. Algorithm basically says that start with initial solution, maintain $s*$ which is the best seen state till now. Then do some number of trials in which a new state is moved to by selecting it from the set of legal neighbourhoods which are selected from the set of neighbourhoods.

**Heuristics**: Choose the next neighbour, driving the search towards a local minimum.
**Metaheuristics**: Try to escape a local minima, driving the search towards a global minimum. Typically include some memory or learning.
For example legal neighbourhood function might be 'all states that reduce $f$ from the current state', reduce might be relaxed to 'dont degrade' or even 'select all ''(identity function). The selection function may be 'best neighbour', i.e. choose the one which gives best objective function. It may also be 'first neighbour', i.e. the first one found to be legal. Lastly something mixed can be done, called 'multi-stage selection' where we first select one part of the neighbourhood and then choose the best among these.
:::spoiler Examples of multi-stage heuristics
**Max/Min Conflict**: Select the variable with most violations, assign it the value that leads to least violations.
**Min Conflict**: Randomly select a variable which has violations, assign it the value that leads to least violations.
:::
Essentially, choosing a multi-stage heuristic involves a tradeoff between the decrease in size of the part selected leading to possibly worse outcomes, but taking lesser time. Intuitively though, if you have a really large neighbourhood, it makes sense to choose a small part of it and then pick the best, and you can always find your way back during the local search as you can now run it for more iterations.
**Random Walks**: Choose a random neighbour, see if it improves performance, pick it if it does. Most effective when the neighbourhood is really large and ideas on how to logically reduce it to something smaller are not clear. Example: Travelling tournament problem. In-fact used by the best result for which the 5 neighbourhoods were described earlier.
**Iterated Local Search**
Re-start local search again and again with either a different initial configuration (might be chosen with some defined logic or randomly) or some randomization inside the algorithm, choose the best state obtained in all these iterations as the final solution.
**Metrapolis Metaheuristic**
If a move improves the objective function, take it. If it degrades it, take it with some probability. This probability gets lower the more the move lowers the objective function.

So if $t$ is very large, the probability converges to 1, so we almost always take the move and it tends to a random-walk. If $t$ is very small, the probability converges to 0, so we rarely take the move and it tends to a greedy search. So we have to make a compromise between accepting degrading moves and moving towards only better solutions.
**Simulated Annealing**
Start with a very high temperature, essentially a random-walkish approach. Progressively cool the temperature, becoming more and more greedy (towards a normal local improvement search). A reasonably fast updatetemperature example is $t_k = \alpha*t_{k-1}$

For an extremely (unrealizably) slow search with cooling speed tending to 0, it is proven to converge to the global optimum [as it basically becomes like random-walk] But even in practice with a reasonably fast schedule also it does well in problems like TTP.
This can be combined with various techniques like re-starts (iterated) and re-heats (increase the temperature periodically so that you have spurts of 'breadth' and 'depth' in the search-space). Can also take insights from tabu-search.
#### Tabu Search
We maintain a list of 'tabu' states which we don't want to visit again. Intuitively, this can ensure our meta-heuristic won't make us back-track, and instead will force us to find new states/regions even if they are worse than some previously seen states in terms of the objective function. Now, it's not necessary to mark every visited state as tabu, tabu selection function can also be programmer-decided.
:::spoiler Graph coloring tabu-search performance graph

Y-axis is number of colors and X-axis is iterations.
:::
<br>

**Tabu-list selection**
It may not even make sense to maintain the list of all visited states because each configuration is large to maintain in memory and we will have thousands/millions of configurations over time. We can use something like short-term memory, where we only maintain the last X states, where X can change dynamically (eg: higher X if taking a degrading move and lower if improving move). But even then it can be quite costly.
So we may want to store an "abstraction" of the tabu list, instead of explicitly storing disallowed states. Instead of storing states in the tabu-list, we can store transitions. The idea is not to make transitions that essentially undo recent translations, because not a lot of 'context' would have changed.
:::spoiler Example/Implementation of Car assembly
If transition is swapping to places, don't swap a recently swapped pair back.


Reduce violations by making swaps that are not tabu. Red-lines represent tabu-search specific code. Notice that even if all moves are tabu right now, in the future the iterations will increase so some moves will start becoming available. In practice, moves are not implemented before the cost is calculated unlike the code example.
:::
Calculating costs before making moves is called 'differentiation'. 
<br>
Can also then be combined with multi-stage heuristic so instead of quadratic we can choose linear neighbourhood (for a particular car, find which car to swap it with).
:::spoiler Tabu list - Queen's problem example

:::
Sometimes you may take a coarser tabu criteria, that makes the search more diversified. Like in the queens problem, don't allow the same queen to be assigned a new value for a certain number of moves.
**Aspiration Criterion in Tabu Search**: If a move not being allowed by tabu is too good, we might want to take it. We can set criterion for this such as if it's more than a certain magnitude improvement over the objective function.
Some more techniques:

In strategic oscillation if we are spending too much time in a local minima, we might increase the weight of the objective function and decrease weight of the constraints to come out. On the other hand, if we're violating too many constraints we may do the opposite. So the weights oscillate and take us to-and-fro from feasible region to infeasible region. These can also be used in simulated annealing, tabu-search or normal local search etc.
### Advanced pointers on Simulated Annealing
Source: https://cdn.intechopen.com/pdfs/4631/InTech-Practical_considerations_for_simulated_annealing_implementation.pdf
- The number of iterations at a particular temperature may be > 1. Often works better
- . Thus the choice of k is important to get right.
## Guided LS and Fast LS Meta-Heuristics
Reference: https://www.bracil.net/CSP/papers/VouTsa-Gls-MetaHeuristic2003.pdf
Guided local search is a penalty-based metaheuristic algorithms that wraps around an existing local search algorithm. It is a way to come out of local optimum to continue exploring.
Fast local search on the other hand is a way to reduce the neighborhood size explored in each individual move while maintaining the same overall neighborhood size/connectivity.
### Guided Local Search
#### Outline
For GLS, one defines a set of features present in the candidate solutions. When LS gets trapped in a local optima, certain features are selected and penalized. The objective function to be evaluated during local moves is augmented with these feature penalties. These are described in more detail below. Care must be taken that too much penalties are not built up during the execution, because this could lead the algorithm away from good solutions. This can lead to the rate of improvement dropping drastically over time in GLS.
GLS is closely related to taboo search as described earlier. It is a softer taboo that guides away from local minima, without completely 'preventing' them from recurring. In-fact, the idea of penalizing transitions instead of states (taboo lists) can be used when features for states become too large. Moreover, ideas related to aspiration (taking taboo moves if the improvement is significant) can also be applied in a more advanced implementation.
### Explanation using TSP
Let's take the travelling salesman problem as a running example to explain the procedure.
A possible 'feature' here could be whether the edge 'A->B' exists in the current candidate solution. Each feature has an associated 'cost' and 'penalty'. The cost here is often defined to be the same as the objective function. Eg: in TSP, the 'distance'/'weight' on the edge. The penalties for each feature are initialized 0 and increased eventually.
### General Formulation
GLS defines a function $h$ that will be used for evaluating local moves: $h(s) = g(s) + \lambda * (\sum{p_i * I_i(s)})$
Here $g$ is the original cost function, $\lambda$ is a coefficient for penalty weighting, $p_i$ is the penalty associated with the $i^{th}$ feature and $I_i(s)$ is 0 or 1 depending on whether the $i^{th}$ feature is present in the current state $s$.
How are penalties calculated? GLS aims to penalize *unfavorable features* in the local optima that *matter most*. Everytime a local optima is encountered, the feature with the highest $util$ value sees an increment of 1 in it's penalty.
$util_i = I_i(s) * \frac{c_i}{1+p_i}$
where $c_i$ is the cost of the $i^{th}$ feature (how unfavorable it is, eg: edge-length in TSP).
The intuition is that the higher the cost of the feature, the more is the utility of penalizing it, and the more a feature has been already penalized, the less the utility of penalizing it again.
So while GLS moves towards local optima (as they have lower $h$ values), it comes out of them quickly due to the penalties that get applied.
The performance of GLS is not too sensitive to $\lambda$ on some problems, but on others it may be so. A recommended starting point before tuning is:
$\lambda = \alpha * g(s_*)/NumFeaturesIn(s_*)$ where $\alpha$ is now a constant that can be tuned based on the data-instance and $s_*$ is the local minima.
#### Pseudocode


### Fast Local Search
Guided local search works best with fast methods to find local minima so that more iterations of penalties can be applied with small rate of scaling up. While fast local search is an independent technique in and of itself, it's thus often coupled with guided local search.
### Outline
The neighbourhood is broken down into small sub-neighbourhoods that seem natural for the problem. An activation bit is assigned to each sub-neighbourhood. Initially, all are active. Neighborhoods are examined in a fixed order. If on examining a particular sub-neighborhood no improving moves are found, it is turned into inactive. It is made active again when an *adjacent* neighborhood is penalized because then this neighborhood's situation is likely to have changed. Once all neighborhoods become inactive, we can say the iteration of local search has completed. From here, we can choose to re-start or try some different initialization.
#### Explanation Using TSP
Example, in TSP with 2-opt, we initially have an $O(N^2)$ neighborhood of all possible sub-segment reversals. A sub-neighborhood can be a particular node which has to be considered for swaps. Whenever an edge incident on the node is penalized (if using guided local search too), the node is again considered as an end-point for sub-segment reversal.
#### Pseudocode

### Guided Fast Local Search
So let's see how these two techniques look when combined:
#### Combined Pseudocode

