Coursera Discrete Optimization - Week 3
===
<h2> Constraint Programming </h2>
Try to make the constraints or relevant substructures of the problem as explicit as possible to the program.
Actively remove values that are impossible.
So basically when there's no single option, make a choice. Then propogate the constraints that this choice produces (such as disallowing certain assignments to certain decision variables). When there's only a single option or some assignment is forced, make those. Roll back to your last 'non-forced' choice if a particular solution is infeasible/unoptimal.
<b> It's not a heuristic, it will find the optimal solution given enough time, space. </b>
The difference from branch and bound is that the focus in constraint programming is on feasability. How to prune the search space and remove as many assignments as possible. You don't take an arbitrary decision and follow it's decision tree. Instead you systematically impose constraints and only when you can't prune any more states automatically do you make a choice and explore it's consequences.

So we keep iterating over the list of constraints and seeing what all states can be pruned until we are finally get a feasible solution.
Constraints can be arbitrarily complex, they can be optimization problems themselves!
Each constraint 'c' can have a different pruning algorithm, i.e. it finding the types of possibilities from our search space can require different approaches/algorithms. How good these pruning methods are define how fast the algorithm will run.
<h3> Types of Constraints </h3>
<h4> Reification </h4>
It is the ability to convert a constraint into a 0/1 variable. So to find the frequency of a value k, you can use the constraint:
$\sum_{\substack{i \in D }} (a[i]=k)$
:::spoiler Example of Reified constraints

Now the constraint is that $S[i] = freq(i \in S)$. This by itself isn't understandable to the program. So we can break the problem down as:


So for each index $k$, we're introducing a boolean decision variable $b[i]$ which is 1 if $S[i] = k$, but 0 otherwise. And now $s[k] = \sum(b[i])$
:::
<h4> Element Constraint </h4>
It is when a decision variable indexes another decision variables. So suppose $x$, $y$ are decision variables and $c[]$ is an array of integers, then $x = c[y]$ is an element constraint.
So in element constraints, interestingly, any restriction on x can propagate on restriction on y and vice versa.
<h3> Global Constraints </h3>
They are constraints that apply on more than just 2 decision variables. They can be used as blocks for describing in a wide variety of problems. The good part is that dedicated algorithms for many of these constraints exist and can be used by the solver. So your job is to break down your constraint problem into pieces that use these global constraints.
Example: <b> AllDifferent </b>
All values that are parameters to the AllDifferent() function need have different values.
:::spoiler Example - N queens in NxN Chessboard

:::
How can it be checked? Make a bipartite graph from variables -> values, see if max matching = |variables|.
Pruning is much more interesting. Naive algorithm is find Max Matching M, take the non-matching edges and force them, see if another max matching of same size can be formed. This takes $O(E*Match)$. But we can do faster:
:::spoiler Faster algorithm, slides

The even paths starting at a free vertex can just be found using DFS from the free vertices. Checking if an edge is in a cycle (all cycles in bipartite graphs are even) is simple, just find SCCs (because for alternating paths we direct matching edges from L->R and nonmatching from R->L or vice-versa) and all edges in an SCC are part of atleast one cycle.
:::
<br>
Can I detect feasibilty / pruning for Global Constraints fast? For some constraints you can, sometimes you cannot.
Example: <b> Table Constraint </b>
Given a cartesian product of domains for different decision variables, it defines a subset of choices among which the decision variables can take.
<b> How to solve optimization problems using constraint programming? </b>

<h3> Symmetries </h3>
When doing constraint programming, a lot of times some parts are symmetric to another. This means that we don't need to solve them separately, and solving one gives us information about what is possible for the other symmetric parts.
<h3> Redundant Constraints </h3>
The formal constraints might ensure only valid solutions are picked, but other constraints can be added for better pruning. So for example, in the magic series the constraint $s[i] = freq(i)$ is enough to categorize the valid solutions. But adding constraints like $\sum s[i]$ = n and combining this with the frequency constraint, $\sum s[i]*i = n$, make the pruning much faster without eliminating any valid solutions.
<h3> Surrogate Constraints </h3>
They are a type of redundant constraints which are used to make the constraints interact. They are a combination of the existing constraints. So for example if we have to satisfy N equations, we can add all of these equations with differnent constants and get a combined global redundant constriant: $\sum{\alpha^i*LHS_i} = \sum{\alpha^i*RHS_i}$
Sometimes, given the original constraints, the program gets to know of infeasibility much later, towards the end of the decision tree. However, you can add weaker constraints (don't eliminate any feasible-solutions) that get evaluated at a shallower depth and form some basic pruning early on so that absolutely stupid solutions don't get propagated for too long.
<h3> Dual Modelling </h3>
The good thing about constraint programming is that if you see 2 types of ways to model a problem, you don't necessarily have to choose. You can throw both in. So for example in the 8 queens problem, you don't have to choose between having columns as decision variables that take rows as values and vice versa. Just put both in and make a constraint $r[i] = j \implies c[j] = i$. This might actually make the solution better.
Simiarly, just put as many constraints as you can (provided the feasibility/pruning algorithm for these aren't slow) and it'll improve performance. Don't really have to make too many choices.
<h3> Feasability Checking and Pruning </h3>
So how do we do feasability checking and pruning?
One way is we basically do a knapsack like-algorithm (pseudo-polynomial) on the decision variables.

Then we back-propagate and remove the arrows/states which don't lead to a feasible solution. If a particular variable (here $x_4$) can take only one value (here 1), we prune out the rest of it's domain and update our current situation, and then run again.

AllConstraint was explained earlier.
<h3> Search </h3>

In something like Euler Knight Tour, we want to always process positions where the knight has the least onward moves (fail-first). In general graphs, going to the node with least onward degree is called Warnsdorffs' Rule.
What if all domain sizes are equal? So suppose you have a graph-coloring problem, the idea is to start by coloring a node with maximum degree as that prunes out more states from neighbours. In the queens problem you could break-ties by choosing placements near the center of the board (as that attacks the most squares). To break ties further, can do lexicographic ordering.
Sample search procedure for 8 queens problem:

tryall here picks a random value from the possibilities (it doesn't literally try all of them one by one). getsize() is to choose one with minimum degree. $abs(r-n/2)$ is for closest to middle.


Dynamic here means that the 2nd variable chosen may not have the 2nd smallest domain initially, it is the variable which has the smallest domain after the first choice was made.
Both the value and variable can be chosen using first-fail principle.
:::spoiler Eg for 8 Queens

:::
Though in practice, you might want to choose values that keep the most amount of solutions open in-case your job is to find any one solution. So here you choose the most constrained variable but the least constrained value.
When solving optimization problems, it might be helpful to think about the objective function in the search methodology.
:::spoiler Example: Software -> Server assignment

Sep means components which must be allocated to separate locations. Col means in same location.
The CP model:

In the Variable/Value labeling, we select the variable (software) one with the highest frequency because it affects the optimization function the most. We assign it a server which minimizes the cost, or no. of hops to servers which this software has to communicate with.
:::
<h4> Value/Variable Labeling </h4>
Pick the value first, and then decide which variable takes it. This may be helpful in cases like time-table design when you know a class must take place at a certain time. Or in Perfect Squares, where multiple smaller squares need to be fit together into a target perfect square. Here based on the previously placed squares, you have some points where some square must have it's bottom-left coordinate, so you can start by picking those points and assign a square to them.
<h4> Domain Splitting </h4>
Imagine the magic square problem but on an NxN gride. Even if N is 10, you can have 100 starting values to choose from for a position. Assigning any is a shot in the dark and a commitment. So instead, you can actually 'split' the domain and say I'm going to use 'one of the first half' or other half etc. This becomes your decision then. So you decrease domain size everytime you make a choice instead of choosing a specific value.

This approach helps in car sequencing because the constraints are on the options and not necessarily the cars. Thus, commiting to an option seems better for a slot, and the domain is now reduced to the cars which have that option.
<h4> Symmetries and Search </h4>
We were earlier imposing constraints to break symmetries like lexicographic order or assign from existing slots or choose the $slot = max(s_1...s_{k-1}) + 1$. But these interfere with the first-fail search heuristic because the domain is changed. Instead, we could perhaps break these symmetries using search itself instead of putting constraints for them in the model.

:::spoiler Example Scene Allocation

The days here are symmetric (days can be permuted in the final solution).

Symmetry breaking by choosing one of previously chosen days or choosing the lowest new one.

Choose a scene according to fail-first (lowest domain size, and in case of ties the most expensive (affects objective the most))
Now in tryall (value search) itself we assign it to a day within the previously used ones or lowest numbered new one.
:::
<h4> Randomization and Restarts </h4>
Sometimes, a good search ordering is not known to us. There ofc might be some optimal search ordering but we as a solver can't find it. What can we do?

Essentially, run it randomly and if after some time progress looks bad, stop this run and restart with a new random order. We can also increase this time limit each time we fail to improve our chances of finding a solution (to ensure our time limit is not too restrictive).
This can be done for eg in magic square, where we have no clear idea what to do.
:::spoiler Magic square randomization restart model

Instead of directly picking the smallest domain variable, we can pick 3 and randomly choose one from them.
:::