Gamma-opt bookclub 2021

Gamma-opt bookclub 2021 === [![hackmd-github-sync-badge](https://hackmd.io/ZuG05Tv6SSyeKJ6Rdk6j7g/badge)](https://hackmd.io/ZuG05Tv6SSyeKJ6Rdk6j7g) Notes form the 2021 Gamma-opt book club when we read Integer programming - Wolsey (2021) ### Participants - Fabricio Oliveira - Paula Weller - Jaan Tollander de Balsch - Olli Herrala - Helmi Hankimaa - Nikita Belyak Chapter 1: Formulations --- ## 1.1 Introduction Classical problems in integer (IP) and mixed-integer (MIP) programming include - **Scheduling**: sequence activities in time, sometimes allocating resources to it. Can be either with the hope of finding *a* feasible schedule, as in a feasibility problem (train schedule example) or finding the *best* schedule according to a given metric (airline crew example) - **Production planning**: decide how inputs are converted into outputs, taking into account production system characteristics, such as capacities, minimum batches (production planning and cutting problem examples) and time-related relationships (commitment issues, reserves, and ramping times, as in the electricity generation planning example). - **Network design**: design an efficient distribution network in terms of delivery capacity. Might consider aspects related to distribution capacity, node capacity and perhaps combined with some of the above (e.g., supply chain management problems) - **Assignment, covering and graph related**: problems that can be structured as a graph in which one wishes to identify structures (assignments between nodes, such as in the kidney exchange program example), set covers (radiation therapy and other coverring problems) Basically, any problem in which decisions are made to be discreted, most notably binary, to represent Yes/ No, True/ False assignments. ## 1.2 What is an integer program? We depart from the linear program \begin{equation} \text{max. } \left\{cx : Ax \leq b, x \geq 0\right\} \tag{LP} \end{equation} where $A$ is a $m \times n$ matrix, $c$ is a $n$-dimensional row vector and $b$ is $m$-dimensional column vector; $x$ is a column vector of (continuous) decision variables A mixed-integer linear program has the form \begin{equation} \text{max. } \left\{c^\top x + h^\top y : Ax + Gy \leq b, x \geq 0 \text{ and integer}, y \geq 0 \right\} \tag{MIP} \end{equation} where $G$ is a $m \times p$ matrix and $h$ is a $p$-dimensional row vector. $x$ is a column vector of $n$ integer decision variables and column vector with $p$ continuous decision variables. If $n = 0$, we have a integer programming problem (IP); if furthermore $x \in \{0,1\}$, we have a binary integer programming problem (though everyone would call it an IP anyways). A **combinatorial optimisation problem** is of the form \begin{equation} \text{min. }_{S \subseteq N} \left\{ \sum_{j \in S} c_j : S \in \mathcal{F} \right\} \tag{COP} \end{equation} In COP, the objective is finding a subset from a set of feasible subsets $\mathcal{F}$ from $N$ (i.e, combinations of elements in $N$) such that an attribute of the items $j \in N$ is optimised. LP and MIP are very much look-alikes, and that is why LP theory underpins the understanding and solving of MIPs. One idea that immediately comes to place is the notion of rounding, which has its role in the solving of MIPs, but in more sophisticated way than first thought. ![](https://i.imgur.com/mFkHYI0.png) ![](https://i.imgur.com/4LJmXmr.png) The optimal integer solution is (5,0), while the optimal LP is that point at the top of the triangle. Notice that no rounding can lead you to the optimal IP solution. ## 1.3 Formulating IPs and BIPs One powerful tool from MIP is the notion of a $n$-dimensional 0-1 *incidence vector* $x^S$, which is connected to the notion of forming subsets $S \subseteq N$ in COPs: $$ x_j^S = 1 \Leftrightarrow j \in S; x_j^S = 0 \text{ otherwise.} $$ Let's look at 4 classic problems in IP: ### The assigment problem $x_{ij}$ - 1 if person $i$ does job $j$, $x_{ij} = 0$ otherwise. $c_{ij}$ - assignment cost. \begin{equation} \begin{aligned} \text{min. } & \sum_{i=1}^n \sum_{j=1}^n c_{ij} x_{ij} \\ \text{s.t.: } & \sum_{j=1}^n x_{ij} = 1, \text{ for } i=1,\dots,n \\ & \sum_{i=1}^n x_{ij} = 1, \text{ for } j=1,\dots,n \\ & x_{ij} \in \{0,1\}, \text{ for } i=1,\dots,n, j=1,\dots,n. \end{aligned} \tag{AP} \end{equation} ### The 0-1 knapsack problem $x_{j}$ - 1 if item is selected, $x_{j} = 0$ otherwise. $c_{j}$ - item value $b$ - weight capacity \begin{equation} \begin{aligned} \text{max. } & \sum_{j=1}^n c_j x_j \\ \text{s.t.: } & \sum_{j=1}^n a_j x_j \leq b \\ & x_{j} \in \{0,1\}, \text{ for } j=1,\dots,n. \end{aligned} \tag{KP} \end{equation} ### The set covering problem Set covering is strongly connected with contexts realted to services, including emergencies. Let $M = \{1, \dots, m \}$ be a set of regions and $N = \{ 1, \dots, n\}$ be a set of candidate locations. Let $S_j \subseteq M$ be the regions that are served by a server located at $j \in N$. The equivalent COP is \begin{equation} \text{min. }_{T \subseteq N} \left\{ \sum_{j \in T} c_j : \bigcup_{j \in T} S_j = M \right\} \tag{SCP} \end{equation} where $c_j$ is a location cost. As it is the case with all COPs, we can formulate SCP as a 0-1 IP. For that we need an *incidence matrix* $A$ (with elements $a_{ij}$), where $a_{ij} = 1$ if $i \in S_j$, and 0 otherwise. \begin{equation} \begin{aligned} \text{min. } & \sum_{j=1}^n c_j x_j \\ \text{s.t.: } & \sum_{j=1}^n a_{ij} x_j \geq 1, \text{ for } i=1,\dots,m\\ & x_{j} \in \{0,1\}, \text{ for } j=1,\dots,n. \end{aligned} \tag{SCP} \end{equation} ### The travelling salesman problem Given a collection of cities $N$, visit each city traversing arcs $(i,j), i,j \in N,$ only once, while minimising the total tour cost $$ \sum_{i=1}^n \sum_{j=1}^n c_{ij}. $$ $x_{ij}$ - 1, if the arc (i,j) is traversed (used to form the tour), 0 otherwise. $x$ is not defined for $i = j$. \begin{equation} \begin{aligned} \text{min. } & \sum_{i=1}^n \sum_{j=1}^n c_{ij} x_{ij} \\ \text{s.t.: } & \sum_{j=1}^n x_{ij} = 1, \text{ for } i=1,\dots,n \\ & \sum_{i=1}^n x_{ij} = 1, \text{ for } j=1,\dots,n \\ & x_{ij} \in \{0,1\}, \text{ for } i=1,\dots,n, j=1,\dots,n. \end{aligned} \tag{TSP} \end{equation} Notice that, as is, this is the same as (AP), which might allow for disconnected subtours between nodes $S \subset N$, as shown below. ![](https://i.imgur.com/H5rq62j.png) Either of the additional constraints can be used to prevent this: 1. **The cut-set constraint** $$ \sum_{i \in S} \sum_{j \not\in S} x_{ij} \geq 1 \text{ for } S \subset N, S \not= \emptyset $$ 2. **The subtour elimination constraint** $$ \sum_{i \in S} \sum_{j \in S} x_{ij} \leq |S|-1 \text{ for } S \subset N, 1 < |S| < n. $$ ## 1.4 The combinatorial explosion Feasible space of some of the above problems: - AP: $n!$ - KP and SCP: $2^n$ - TSP: $(n-1)!$ ![](https://i.imgur.com/ux6xbqe.png) when I think about those, I like to contrast these numbers with these: https://www.physicsoftheuniverse.com/numbers.html ## 1.5 Mixed integer formulations ### Modelling fixed costs Binary variables can be used to model specific nonlinear "behaviours", not necessarily representing an actual decision. For example, the function $h(y)$ ![](https://i.imgur.com/8Z7Lhx1.png) can be modelled as $fx + py$ with the additional constraints $y \leq Cx, x \in \{0,1\}$. In this case specifically, notice that under an optimisation standpoint, it makes no sense to have $y=0$ while $x=1$, though in principle it would be a feasible solution. Otherwise, a more sophisticated set of constraints (still IP) would be needed. ### Uncapacited facility location MIP combining netork desing and production planning aspects. - $x_j = 1$, if depot $j \in N$ is used, 0 otherwise. - $y_{ij}$ - fraction of the demand of client $i \in M$ satisfied from depot $j$. \begin{equation} \begin{aligned} \text{min. } & \sum_{i \in M} \sum_{j \in N} c_{ij} y_{ij} + \sum_{j \in N}f_j x_j \\ \text{s.t.: } & \sum_{i \in M} y_{ij} \leq m x_j, \text{ for } i=1,\dots,m \\ & \sum_{j=1}^n y_{ij} = 1, \text{ for } i=1,\dots,m \\ & x_{j} \in \{0,1\}, \text{ for } j=1,\dots,n \\ & y_{ij} \ge 0, \text{ for } i=1,\dots,n, j=1,\dots,n. \end{aligned} \tag{UFL} \end{equation} ### Uncapacited lot sizing Lot sizing is the common nomenclature for a classical production planning setting. The input data is - $f_t$ - fixed cost of producing in time period $t \in \{1, \dots, n\}$ - $p_t$ - unit production cost in time period $t \in \{1, \dots, n\}$ - $h_t$ - unit storage cost in time period $t \in \{1, \dots, n\}$ - $d_t$ - demand in time period $t \in \{1, \dots, n\}$ The decision variables are: - $y_t$ is the amount produced in period $t$ - $s_t$ is the amount produced in period $t$ - $x_t = 1$, if production occurs in $t$, 0 otherwise. \begin{equation} \begin{aligned} \text{min. } & \sum_{t=1}^n p_t y_t + \sum_{t=1}^n h_t s_t + \sum_{t=1}^n f_t x_t \\ \text{s.t.: } & s_{t-1} + y_t = d_t + s_t, \text{ for } t = 1, \dots, n \\ & y_{t} \leq Mx_t, \text{ for } t = 1, \dots, n \\ & s_t, y_t \geq 0, \text{ for } t = 1, \dots, n \\ & x_t \in \{0,1\}, \text{ for } t = 1, \dots, n. \end{aligned} \tag{ULS} \end{equation} Notice that in the absence of an upper bound for $y_{t}$ a sufficiently large constant $M$ must be used, which can lead to problems related numerical issues if not carefully considered. ### Discrete alternatives or disjuctions Useful for modelling nonconvex affine regions. Assume that either $$ a^1 y\le b_1 \text { or } a^2 y\le b_2 $$ must hold, with $0 \leq y \leq u$. Such case is illustrated in the picture below. ![](https://i.imgur.com/Pc0v8Ef.png) Define $M \geq \max\{a^i - b_i : 0 \leq y \leq u\}$ for $i=1,2$. Then, we can use the constraints \begin{equation} \begin{aligned} & a^iy - b_i \leq M(1 - x_i), \text{ for } i=1,2 \\ & x_1 + x_2 = 1 \\ & x_i \in \{0,1\}, , \text{ for } i=1,2. \end{aligned} \end{equation} This type of disjunction arise in scheduling problems, for example. Let $t_i$ be a starting time for processing a job $i$ and $p_i$ the processing time. Then either $$ t_2 \geq t_1 + p_1 \text{ or } t_1 \geq p_2 + t_2. $$ ## 1.6 Alternative formulations Typically, an IP has a multitude of possibe correct formulations. Let us look into what makes a formulation better than another. The definitions below state the notion of a polyhedral set (polyhedron) and exaclty what is a *formulation. ![](https://i.imgur.com/zU285s2.png) **Example 1.2**: Let $X = \{(1,1), (2,1), (3,1), (1,2), (2,2), (3,2), (2,3) \}$. The picture show alternative formulations for $X$. ![](https://i.imgur.com/3fVPtA0.png) ### Equivalent formulation for UFL This is an example in which the alternative formulations use the same variables. Notice that the constraint $$ \sum_{i \in M} y_{ij} \leq m x_j, \text{ for } j \in N $$ can be equivalently expressed as $$ 0 \le y_{ij} \le x_j, \text{ for } i \in M, j \in N $$ leading to the alternative formulation \begin{equation} \begin{aligned} \text{min. } & \sum_{i \in M} \sum_{j \in N} c_{ij} y_{ij} + \sum_{j \in N}f_j x_j \\ \text{s.t.: } & y_{ij} \leq x_j, \text{ for } i \in M, j \in N\\ & \sum_{j=1}^n y_{ij} = 1, \text{ for } i \in M \\ & x_{j} \in \{0,1\}, \text{ for } j \in N \\ & y_{ij} \ge 0, \text{ for } i \in M, j \in N. \end{aligned} \tag{UFL} \end{equation} ### Extended formulation for ULS Theer are cases in which alternative formulations are obtained by means of different decision variables, which makes the comparison between them somewhat tricky. These are called extended formulations. Let us redefine the variables in the ULS: - $w_{it}$ - amount produced in period $i$ to satisfy the demand at period $t$. - $x_t$ - if production occurs in period $t$ (like before). And redefine the parameter $c_t = p_t + h_t + \dots + h_n$. Then the ULS problem can be reformulated as \begin{equation} \begin{aligned} \text{min. } & \sum_{i=1}^n \sum_{t=1}^n c_i w_{it} + \sum_{t=1}^n f_t x_t \\ \text{s.t.: } & \sum_{i=1}^t w_{it} = d_t, \text{ for } t = 1, \dots, n \\ & w_{it} \leq d_tx_t, \text{ for } i \leq j \text { and }i,t = 1, \dots, n \\ & w_{ij} \geq 0, \text{ for } i \leq j \text { and }i,t = 1, \dots, n \\ & x_t \in \{0,1\}, \text{ for } t = 1, \dots, n. \end{aligned} \tag{ULS} \end{equation} ## 1.7 Good and ideal formulation The answer for understanding how a formulation can be better than another comes from geometrical insight. Ideally, we want to be able to solve a MIP problem by solving its LP formulation, $P$, since solving LPs is "easy" (certainly easier than MIPs) from a complexity and computational standpoint. The picture below show what a ideal formulation $P$ would look like: ![](https://i.imgur.com/eiBzcgK.png) This formalises the notion of ideal formulations: ![](https://i.imgur.com/qbfIeSz.png) **Proposition 1.1** can be seen by noticing that, by **Definition 1.3**, convex hulls are polyhedral sets in general. **Proposition 1.2** is a central result from linear optimisation. Combining **Proposition 1.1** and **Proposition 1.2** leads us to the conclusion that we can replace the (M)IP $$ \text{max.}\{cx : x \in X\} \text{ with } \text{max.}\{cx : x \in conv(X)\}, $$ which would allow us to solve the MIP as an LP. However, unless we already have an ideal formulation at hand (meaning that $P = conv(X)$), this is only of theoretical value, as describing $conv(X)$ require a (typically too) large number of constraints that are not trivially identifiable. But not everything is lost. We can use the above idea to define how one formulation can be better than another. ![](https://i.imgur.com/YwGCynb.png) ### Formulations for the UFL Let $P_1$ be the formulation for the UFL with the aggregate constraint \begin{equation} \sum_{i \in M} y_{ij} \leq m x_j, \tag{1} \end{equation} and $P_2$ be the formulation with the $m$ individual constraints \begin{equation} y_{ij} \leq x_j, \text{ for }i \in M. \tag{2} \end{equation} The reasoning of $P_2 \subset P_1$ (i.e., that $P_2$ is better than $P_1$) is as follows. 1. Any point $(x,y)$ that satisfies $y_{ij} \leq x_j, \text{ for }i \in M$, also satisfies $\sum_{i \in M} y_{ij} \leq m x_j$ by simply summing in $i \in M$. Thus $P_2 \subseteq P_1$ 2. Now we need to show that there are points in $P_1$ that are not in $P_2$. For that, let (x,y) be such that: - Suppose that $n$ divides $m$: $m = k \times n$ with $k > 1$. - For example: $m=4, n=2$. Assemble the solution: $x_j = k/m$ and $y_{ij}$ as \begin{equation} y_{11}, y_{21}, y_{32}, y_{42} = 1, 0 \text{ otherwise.} \end{equation} - This solution satisfy (1) but does not satisfy (2). Thus, $(x,y) \in P_1 \setminus P_2$, implying that $P_2 \subset P_1$. ### Formulations for the ULS In the case of the ULS, things are more complicated because the problems have different variables. The way to go around that is to rely on *projections*. ![](https://i.imgur.com/G4cAvo5.png) Let $P_1$ be defined as \begin{equation} \begin{aligned} & s_{t-1} + y_t = d_t + s_t, \text{ for } t = 1, \dots, n \\ & y_{t} \leq Mx_t, \text{ for } t = 1, \dots, n \\ & s_t, y_t \geq 0, \text{ for } t = 1, \dots, n \\ & 0 \leq x_t \leq 1, \text{ for } t = 1, \dots, n. \end{aligned} \tag{$P_1$} \end{equation} Recall that $M = \sum_{t} d_t$, which is important in the argument made later on. Now $P_2$ would need to be the projection of $Q_2$ onto variables $(y,s,x)$ so we can compare them. $Q_2$ can be defined as: \begin{equation} \begin{aligned} & s_{t-1} + y_t = d_t + s_t, \text{ for } t = 1, \dots, n \\ & \sum_{i=1}^t w_{ij} = d_t, \text{ for } t = 1, \dots, n \\ & w_{it} \leq d_i x_i, , \text{ for } i \leq j \text { and }i,t = 1, \dots, n \\ & y_i = \sum_{t=i}^n w_{it}, \text{ for } t = 1, \dots, n \\ & w_{it} \ge 0 \text{ for } i \leq j \text { and }i,t = 1, \dots, n \\ & 0 \leq x_t \leq 1,, s_t \ge 0, y_y \ge 0, \text{ for } t = 1, \dots, n. \end{aligned} \tag{$Q_2$} \end{equation} One way to see the projection of ($Q_2$) onto $(y,s,x)$ is to simply think of $w_{it}$ as "fixed". Considering the point $(\overline{x},\overline{s}, \overline{y})$ with $\overline{y}_t = d_t$, $\overline{x}_t = d_t/M$, we can see that it satisfies all constraints in $P_1$. Now, if you sum in $w_{it} \leq d_i x_i$ in $t$, and substitute $(\overline{x},\overline{s}, \overline{y})$ you obtain \begin{equation} \sum_{i=1}^n w_{it} \leq \sum_{i=1}^n d_i \overline{x}_i,\text{ for } t = 1, \dots, n. \end{equation} By taking $t < n$, we have \begin{equation} \sum_{t=1}^n w_{it} \leq d_t \frac{\sum_{i=1}^n d_i}{M} < d_t \end{equation} which violates the second constraint $\sum_{i=1}^t w_{ij} = d_t$ in $P_2$. To see that a point that satisfies $P_2$ also satisfy $P_1$, you can sum the third constraint in $Q_2$ in $t$, obtaining $$ \sum_{i=1}^n w_{it} = \overline{y}_i \leq \sum_{i=1}^n d_i \overline{x}_i = M\overline{x}_i,\text{ for } t = 1, \dots, n. $$ Chapter 2: Optimality, Relaxations, and Bounds --- ## 2.1 Optimality and Relaxation Given IP or COP $$Z = \max\{c(x):x∈X⊆ℤ^n\}$$ How can we prove that $x^{∗}$ is optimal? Find **lower bound** $\underline{Z}≤Z$ and **upper bound** $\overline{Z}≥Z$ such that $\underline{Z}=Z=\overline{Z}.$ In practice, the algorithm will find **decreasing** sequence of upper bounds and **increasing** sequence of lower bounds that stop when the difference between the bounds is small enough. ### Primal Bounds Every **feasible solution** $x^{∗}∈X$ provides a lower bound $$\underline{Z}=c(x^{∗})≤Z.$$ Only known method to obtain lower bounds. Can be easy or as difficult as solving the IP itself. ### Dual Bounds Most important approach to obtain upper bounds is **relaxation**. **Idea of relaxation**: Replace difficult IP with simpler optimization problem whose optimal value is at least as large as $Z.$ There are two ways to obtain a relaxation: 1) Enlarge the set of feasible solutions so that we optimize over a larger set 2) Replace max objective function by a function that has the same or a larger values on all the original feasible solutions **Definition 2.1**: Above statement formally. RP is the relaxation of IP. 1) The original feasible set is **subset** of relaxed feasible set, and 2) For all elements in the original feasible set, the objective value for the relaxed problem is **larger or equal** than for the original problem. **Proposition 2.1** and **proof**: Provides proof that the objective value for of relaxed IP is always larger or equal to the original problem. ## 2.2 Linear Programming Relaxations Idea: We can relax the **integer program** (IP) into a **linear program** (LP). **Definition 2.2**: Formally defines the LP relaxation for an IP. **Example 2.1**: $$\begin{aligned} Z = \max\, 4x_1-x_2 & \\ 7x_1-2x_2&≤14 \\ x_2 &≤ 3 \\ 2x_1-2x_2&≤3 \\ x &∈ ℤ_{+}^2 \end{aligned}$$ * Primal bound: $x=(2,1)$ is feasible solution, and gives lower bound $Z≥7$ * Dual bound: By relaxing $x∈ℝ_{+}^2$ and solving the LP we obtain $x^{∗}=(20/7, 3)$ and upper bound $Z≤59/7.$ Since optimal value must be integer, we obtain $Z≤⌊59/7⌋=8.$ --- Better formulations give tighter bounds. **Proposition 2.2**: If formulation $P_1$ is better than $P_2$, i.e. $P_1⊂P_2,$ then the linear programming relaxation for $P_1$ gives tighter bounds than $P_2.$ --- Relaxations sometimes allow us to prove **infeasibility** or **optimality**. **Proposition 2.3** and **proof**: 1) If relaxation RP is infeasible, the original problem IP is infeasible. If relaxed feasible set is empty then the original feasbile set is also empty, i.e. $X⊆T=∅.$ 2) An optimal solution $x^{∗}$ to RP is optimal solution IP if it belongs to the original feasible set $x^{∗}∈X$ and the objective values are equal $f(x^{∗})=c(x^{∗}).$ * feasibility provides an lower bound: $Z≥c(x^{∗})=f(x^{∗})=Z^{RP}$ * duality provides an upper bound: $Z≤Z^{RP}$ * combined: $c(x^{∗})=Z=Z^{RP}$ **Example 2.2**: $$\begin{aligned} \max\, 7x_1 + 4x_2 + 5x_3 + 2x_4 \\ 3x_1 + 3x_2 + 4x_3 + 2x_4 ≤ 6 \\ x∈X=\{0,1\}^4 \end{aligned}$$ By solving the LP relation $x∈[0,1]^4$ we obtain $x=(1,1,0,0)$ which is in $X$ and therefore solves the original IP. ## 2.3 Combinatorial Relaxations Examples of combinatorial relaxations for 1) **The Travelling Sales Man Problem** (TSP) * change constraints from tours to assignments * that is, we drop the subtour elimination constraint 2) **Symmetric Traveling Salesman Problem** 3) **The Quadratic 0-1 Problem** 4) **The Integer Knapsack Problem** * Modify the constraint by changing the items weights from $a_j$ to their floor $⌊a_j⌋$ * Also floor the capacity parameter from $b$ to $⌊b⌋$ * Explanation: Apparently knapsack is easier to solve with integer parameter values. (Why?) ## 2.4 Lagrangian Relaxation **Idea**: Add complicated constraints into the objective function with Lagrange multipliers to obtain a relaxation. Integer program (IP): $$Z=\max\{cx:Ax≤b,x∈X⊆ℤ^n\}$$ Lagragian relaxation of IP with lagrangian multipliers $u≥0$ (vector): $$Z(u)=\max\{cx+u(b-Ax):x∈X\}$$ **Proposition 2.4** and **proof**: Proves that the relaxation is upper bound for IP, i.e. $Z(u)≥Z$ for all $u≥0.$ 1) Let $x^{∗}∈X$ be optimal for IP 2) Since $u(b-Ax^{∗})≥0,$ we have $cx^{∗}≤cx^{∗}+u(b-Ax^{∗})≤Z(u)$ for all $u≥0.$ More in chapter 10, Lagrangian Duality. ## 2.5 Duality ### Dual Pairs We can obtain upper bounds for linear programs with **duality** -- does duality work for integer programs as well? Any feasible solution for a dual provides an upper bound instead of solving relaxation to optimality. **Definition 2.4**: Definition for weak and strong **dual pairs**. - **weak** dual pair: dual (D) objective value is greater or equal to IP for all feasible assignments - **strong** dual pair: optimal objective values for dual and IP are equal **Proposition 2.5**: Linear programming relaxation of IP and IP form a weak dual pair. **Proposition 2.6**: Similar to proposition 2.3, we can sometimes prove infeasibility or optimality using the dual. ### A Matching Dual TODO: image Example: **A Matching Dual**. Given a graph $G=(V,E)$: * **matching** $M⊆E$ is a set of disjoint edges * **covering** by nodes is a set $R⊆V$ of nodes such that every edge has at least one endpoint in $R.$ **Proposition 2.7**: Problem of finding **maximum cardinality matching** $$\max_{M⊆E}\{|M|:M \text{ is a matching}\}$$ and problem of finding a **minimum cardinality covering** by nodes $$\min_{R⊆V}\{|R|:R \text{ is a covering by nodes}\}$$ for a **weak-dual pair**. **Proof** (Graph): Any covering must contain atleast one of the nodes from each edge in a matching. Therefore $|R|≥|M|.$ --- **Definition 2.5**: We can also establish the same result using linear programming duality! * **Node-edge incidence matrix** $A∈\{0,1\}^{n×m}$ with $n=|V|$ and $m=|E|$ where $a_{j,e}=1$ if node $j$ is an endpoint of edge $e,$ and $a_{j,e}=0$ otherwise. Maximum cardinality matching: $$Z=\max\{𝟏x:Ax≤𝟏,x∈ℤ_{+}^m\}$$ Minimum cardinality covering by nodes: $$W=\min\{𝟏y:yA≥𝟏,y∈ℤ_{+}^n\}$$ Then, their linear programming relaxations $Z^{LP}=W^{LP},$ which established the duality $$Z≤Z^{LP}=W^{LP}≤W.$$ ### Superadditive Duals **Superadditive duals** provide a generalization of linear programming duality. **Definition 2.6**: * superadditivity: $F:ℝ^m→ℝ$ where $F(0)=0$ and $F(u)+F(v)≤F(u+v)$ for $u,v∈ℝ^m$ * nondecreasing: $F(u)≤F(v)$ for $u,v∈ℝ^m$ with $u≤v$ **Theorem 2.1**: We can find a dual that forms strong dual pair with an IP among the set of nondecreasing superadditive functions. ## 2.6 Linear Programming and Polyhedra Alternative ways to represent polyhedra using extreme points and rays. ## 2.7 Primal Bounds: Greedy and Local Search Obtaining feasible solution with restrictions and greedy heuristics for primal bounds. ### Heuristics from Restrictions Restriction is the converse of relaxation. **Definition 2.8**: 1) Feasible set of the restriction is subset of the original feasible set. 2) Objective function of restriction is less or equal to original objective for all original feasible assignments. **Proposition 2.10**: Optimal solution to the restriction is a feasible solution for the IP. Some way to create restrictions: * Fix some variables * Replace inequalities with equalities * Add new constraints Restriction should be easier to solve than the original problem. --- **Example 2.5**: Restriction on single-item lot sizing problem problem. * **Replace** inequality $y_t≤C_t x_t$ by equalities $y_t=C_t x_t$ * Makes the new problem a restriction * We can then **subtitute** the equality to $y_t$ variables * It leads to a simpler problem which can be solved quickly to obtain good feasible solutions. --- ### Greedy and Local Search Heuristics Greedy heuristic constructs a solution from scratch by choosing the best immdediate rewards at each step. NOTE: sometimes greedy heuristics produce optimal solutions. --- **Example 2.6**: TODO: Knapsack $$\begin{aligned} \max\, 12x_1 + 8x_2 + 17x_3 + 11x_4 + 6x_5 + 2x_6 + 2x_7 \\ 4x_1 + 3x_2 + 7x_3 + 5x_4 + 3x_5 + 2x_6 + 3x_7 ≤ 9 \\ x∈\{0,1\}^7 \end{aligned}$$ Variables are ordered such that $c_j/a_j≥c_{j+1}/a_{j+1},$ thus we can easily form a greedy solution 1) Enough space to fit $x_1.$ Fix $x_1=1.$ Space remaining $9-4=5$ 2) Enough space to fit $x_2.$ Fix $x_2=1.$ Space remaining $5-3=2$ 3) Not enough space to fit $x_3,x_4,x_5$. Fix $x_3,x_4,x_5=0.$ Space remaining $2-0=2$ 4) Enough space to fit $x_6.$ Fix $x_2=1.$ Space remaining $2-2=0$ 5) Not enough space to fit $x_7.$ Fix $x_7=0$ Solution $x^{G}=(1,1,0,0,0,1,0)$ with value $Z^G=cx^G=22.$ Chapter 3: Well-Solved Problems --- ## 3.1 Properties of Easy Problems In this chapter we examine well-solved problems. A problem is considered to be well-solved if an "efficient" algorithm is known for solving all instances of the problem. In this chapter we consider the algorithm to be efficient if the worst-case scenario can be solved in $O(m^p)$ (polynomial) time, where $m$ relates to the size of the problem in the following way: $m$ is the number of edges and $n$ is the number of nodes in graph $G = (V, E)$ and we assume that $n \leq m$. The following properties are often seen together when examining well-solved problems. Note that a problem is considered to be well-solved when (1) the efficient optimisation property holds. ### (1) Efficient Optimisation Property For a given class of optimisation problems ( P ) $\max \{cx : x \in X \subseteq ℝ^n \}$, there exists an efficient polynomial-time algorithm. ### (2) Strong Dual Property: For a given problem class, there exists a strong dual problem (D) $\min \{𝜔(u) ∶ u ∈ U\}$ allowing us to obtain optimality conditions that can be quickly verified: $x^∗ ∈ X$ is optimal in $P$ if and only if there exists $u^∗ ∈ U$ with $c^T x^∗ = 𝜔(u^∗)$. ### (3) Efficient Separation Property There exists an efficient algorithm for the *separation problem* associated with the problem class. **Definition 3.1.** (reworded) Solving a linear program over $P$ is equivalent to solving the *separation problem* over $P^*$, i.e. given a point y, either assert that $y ∈ P$ or assert that $y \notin P$ and find a $\pi$ such that $\pi^⊤ x < \pi^⊤y = \pi^0$ for all $x∈P$. In the case of an IP, we consider $P = \text{conv}(X)$. ### (4) Explicit Convex Hull Property A compact description of the convex hull $\text{conv}(X)$ is known, which in principle allows us to replace every instance by the linear program: $\max\{cx ∶ x ∈ \text{conv}(X)\}$. In the following we will examine classes of problems for which these four properties typically hold. Note that it is common for these properties to hold simultaneous. For instance, if the explicit convex hull property holds, then strong dual property holds and there is some likelihood that the efficient separation property holds. ## 3.2. IPs with Totally Unimodular Matrices The key take away in this section is that if the constraint matrix $A$ in $$ \text{(IP)} \quad \max\{c^T x: Ax \leq b, x \in ℤ_+^n\}$$ is totally unimodular (TU), then the LP relaxation solves the IP. Properties (2), (3) and (4) hold for problems with TU constraint matrices. ![](https://i.imgur.com/upD1erJ.png) Let's examine why a TU constraint matrix leads to the LP solving the IP and how we can identify TU matrices. ![](https://i.imgur.com/sBFnxeD.png) The proof of bbservation 3.1 shows that if we have an integral matrix B and its detrerminant is $\pm 1$, then the optimal vertex $x = B^{-1}b$ is integral. ![](https://i.imgur.com/UgNpi48.png) ![](https://i.imgur.com/BPGse5X.png) The conclusion of this chapter and the key takeaway is this: ![](https://i.imgur.com/2Xsc6bO.png) ## 3.3 Minimum Cost Network Flows Key take away: minimum cost network flows have a unimodular constraint matrix and can thus, be solved using a linear relaxation. Futhermore, the properties (2), (3) and (4) hold for this problem class. The minimum cost network flow problem is formulated in the following way: ![](https://i.imgur.com/AdzuF50.png) where $x_{ij}$ denotes the flow in arc $(i, j)$, $h_{ij}$ are the arc capacities, $c_{ij}$ the unit flow costs, $b_i$ the demands at the nodes. The sets $V^+(i) = \{k ∶ (i, k) ∈ A\}$ and $V^−(i) = \{k ∶(k, i) ∈ A\}$ are the outgoing and incoming flows for node $i$. For the problem to be feasible, it must be that $\sum_{i \in V}b_i = 0$. ![](https://i.imgur.com/WJlbOVb.png) ![](https://i.imgur.com/BtJEmZD.png) The additional constraints are the flow capacities $0 \leq x_{ij} \leq h_{ij}$. We can see that the constraint matrix above is TU, because it fulfills the criteria outlined in Proposition 3.2. Thus, if the demands $b_i$ and the capacities $h_{ij}$ are integral, then the extreme points of the problem are integral. This further implies that constraints (3.2.) and (3.3) describe the convex hull of the integral feasible flows. ## 3.4 Special Minimum Cost Flows The key take away in this section is that special cases of problems with TU constraint matrices are also efficient to solve. The strong duality property can for instance be very valuable if the dual is a problem that is difficult to solve. ### 3.4.1 Shortest Path Problem This is a variant of the minimum cost flow problem, where the source node $s$ has a "demand" of 1 and the terminal node $t$ a "demand" of -1. All other nodes have no demand and the flow capacities are unrestricted. ![](https://i.imgur.com/5yLV9V7.png) ### 3.4.2 Maximum s-t Flow This is a variant of the minimum cost flow problem, where we maximise the flow from the source node $s$ to the terminal node $t$. This is done by adding a backward arc from $t$ to $s$, which has an uncapacitated flow. ![](https://i.imgur.com/GxFlah1.png) ![](https://i.imgur.com/zp7PaBz.png) The minumum s-t cut problem is a very difficult COP. However, the max s-t problem has strong duality with this problem and thus it can now be efficiently solved. ![](https://i.imgur.com/iv9Px9q.png) ## 3.5 Optimal Trees ![](https://i.imgur.com/XWDksSa.png) ![](https://i.imgur.com/vIAb1Fz.png) ### The Maximum Weight Tree problem Key take away: This showcases an easy and intuitive algorithm that leads to an optimal solution. The greedy algorithm works in linear time and thus, the efficient optimisation property holds. **Greedy algorithm** Given a graph $G = (V, E)$ with $n$ nodes and $m$ edges, and edge weights $c_e$ for $e \in E$, the greeedy algorithm for finding a maximum weight tree is the following. Initialise the sets of elegible edges $E^t$ at each iteration $t$ and the chosen edges $T^t$ by each iteration $t$ in the following way: $$E^0 = E$$ $$T^0 = \phi$$ Order the edges in nonincreasing order by their weights $c_1 \geq c_2 \geq ... \geq c_m$. At each iteration $t$: If $T^{t−1} ∪ {e_t}$ contains no cycle, set $T^t = T^{t−1} ∪ {e_t}$. Otherwise, $T^t = T^{t−1}$. Set $E^t = E^{t−1} \setminus \{e_t\}$. If $|T^t| = n − 1$, stop with $T^t$ optimal. If $t = m$, stop with no feasible solution. ![](https://i.imgur.com/yeQO3AP.png) ![](https://i.imgur.com/wITFznj.png) Does the explicit convex hull property hold? Yes. ![](https://i.imgur.com/lMl1jNR.png) However, the issues is that has an exponential number of constraints. Thus, despite being a convex hull formulation, it is not necessarily great. A formulation with a polynomial number of constraints and variables can achieved with directed flows or directed cuts. ## 3.7 Two Harder Network Flow Problems The key idea is that we can use our understanding of minimum cost network flows problema and optimal trees to solve variant problems. Here the Steiner tree problem and the fixed charge network flow problem are examined. In both of these problems changing the problem to a directed version allows one to construct a tighter formulation. Chapter 5: Dynamic Programming --- In Dynamic Programming, we solve problems that can conveniently be divided into smaller subproblems for which the **principle of optimality** holds. The principle of optimality is "the property that pieces of optimal solutions are themselves optimal". An example: you have the shortest route from Espoo to Tampere, and it goes through Hämeenlinna. You then know that the subroutes Espoo-Hämeenlinna and Hämeenlinna-Tampere are also the shortest routes between these cities (Observation 5.1). ## 5.1 Some Motivation: Shortest Paths Following the idea in the introductory example, we get ![](https://i.imgur.com/hWgmYtl.png) stating that if we know the shortest path from $s$ to each neighbor of $v$, we observe that the overall shortest path from $s$ to $v$ must visit one of these nodes. We can thus simply iterate over these neighbors and find the best one, leading to the optimal $s-t$ path. ![](https://i.imgur.com/3UVKzLO.png) How do we obtain the optimal subpaths $d(i)$ then? We could just recursively calculate $$d(i)=min_{j \in V^-(i)} \{d(j)+c_{ji}\},$$ assuming the graph is acyclic. A more general version of this idea is presented in the book: $$D_k(j)=min \{ D_{k-1}(j), min_{i \in V^-(i)} [D_{k-1}(i)+c_{ij}]\},$$ where $D_k(j)$ is the length of the shortest path from $s$ to $j$ with at most $k$ arcs. We can initialize $D_0(j)=\infty , j \neq s$ and $D_0(s)=0$, and then easily calculate $D_1(j)$, then $D_2(j)$ and so on. This is Dynamic Programming. ## 5.2 Uncapacitated Lot-Sizing The formulation from Section 1.4. ![](https://i.imgur.com/3A8eyqm.png) We also assume $s_n=0$, that is, no product is left in the storage at the end of the planning period (this makes sense in a deterministic problem). ![](https://i.imgur.com/dcNJmpx.png) ![](https://i.imgur.com/Jc73TrR.png) TODO: Where does "an optimal extreme flow uses a set of arcs forming a tree" come from? Given these statements, it can be seen that $s_t = \sum_{i=1}^t (y_i -d_i) = \sum_{i=1}^t y_i - d_{1t}$, using the given substitution $d_{it} = \sum_{j=i}^t d_j$. This leads to a nicer cost function without the $s$-variables. In the new function $\sum_{t=1}^n (c_t y_t + f_t x_t)$, we only consider production at each stage. ![](https://i.imgur.com/0q5z1yI.png) ![](https://i.imgur.com/FmwbBsD.png) This might be a bit confusing because of using the numbers from previous subproblems directly. For example, the full calculation for $H(2)$ is $$H(2) = min[H(0) + f_1 + c_1d_{12}, H(1) + f_2 + c_2d_{22}].$$ This is a good chance to mention memoization. If you do Dynamic Programming in practice, you will repeatedly need the solutions of the smaller subproblems. You want to store the results of each subproblem somewhere so you don't need to calculate them multiple times (e.g., recursive Fibonacci algorithms). ![](https://i.imgur.com/OGyslSd.png) ## 5.3 An Optimal Subtree of a Tree - The problem: choose the maximum weight subtree rooted at $r$ - Observation: if we pick any node in the maximum weight subtree, the sub-subtree rooted at that node is also optimal We define the weight of the optimal subtree rooted at $v$ recursively as $H(v) = max \{0, c_v + \sum_{w \in S(v)} H(w) \}$. What this says is that either the subtree is empty (we discard it because it has a negative weight) and its weight is thus zero, or it has a positive weight. The weight is calculated as the sum of the weight of node $v$ and the weights of the optimal subtrees rooted at its child nodes $w \in S(v)$. ![](https://i.imgur.com/vVR9DYa.png) Go through leaf nodes: ![](https://i.imgur.com/GjAYquT.png) First, the H-value is calculated for all leaf nodes. If their weight is positive, they make it to the next round of recursion, if not, they are discarded and their weight in the following steps is zero. The calculated H-values are included under the nodes. Next update the first predecessor nodes. ![](https://i.imgur.com/18KdMIQ.png) We were able to calculate $H(5)$ and $H(8)$, both are positive. The root node has not been reached, continue upwards in the tree. ![](https://i.imgur.com/j593r4G.png) $H(2)$ is positive, $H(3)$ is not. The weight of the (non-empty) subtree at node 3 is 0, so it doesn't actually make a difference if it is cut off or not, but we cut it off to be consistent with the book. Finally, $H(1) = 2$ and the optimal subtree consists of nodes 1, 2, 5, 9 and 10. ## 5.4 Knapsack Problems ### 5.4.1 0-1 Knapsack Problems ![](https://i.imgur.com/vSX1FBP.png) We can solve subproblems $P_r(\lambda)$ of the form "what is the maximum value $f_r(\lambda)$ of the knapsack using only the first $r$ items and capacity $\lambda$". ![](https://i.imgur.com/BZwcGNH.png) ![](https://i.imgur.com/XCpHXH1.png) ![](https://i.imgur.com/fUd3SwY.png) ### 5.4.2 Integer Knapsack Problems ![](https://i.imgur.com/2dvjez4.png) ![](https://i.imgur.com/rtPDZx4.png) ![](https://i.imgur.com/Vi0mWdB.png) ![](https://i.imgur.com/ZkX9yfn.png) Chapter 7: Branch-and-Bound --- ## 7.1 Divide and Conquer In this section, we talk about the "divide and conquer" approach that is based on the idea of dividing the original problem into more tractable subproblems, solving them and then using obtained information to derive the solution for the original problem. ![](https://i.imgur.com/782cc8D.png) ![](https://i.imgur.com/fiLE8VH.png) A typical way to represent such approach is via an enumeration tree. For instance consider the salseman problem with 4 cities. WE need to find an optimal route $(i_1, i_2, i_3, i_4)$ .We start from picking an arbitrary point for $i_1$, 1 in our case. Then we split our set S into $S_{12} = \{i_1, i_2, i_3, i_4 \in S : i_1 = 1, i_2 = 2\}$, $S_{13} = \{i_1, i_2, i_3, i_4 \in S : i_1 = 1, i_2 = 3\}$ and $S_{14} = \{i_1, i_2, i_3, i_4 \in S :i_1 = 1, i_2 = 4\}$. On the next level we fix the point $i_3$ in such way that the arc $i_2 -i_3$ does not create a subtour till we reach all the points. As a result the leaves of the tree correspond to all possible tours $(i_1, i_2, i_3, i_4, i_1)$ the where the number of possible tours = $(n-1)!$ ![](https://i.imgur.com/ukZuYOJ.png) ## 7.2 Implicit Enumeration Since the complete enumeration is not possible once we have high enough number of the variables we need to somehow use the information on the bounds of ${Z_k}$. Since we are maxinising we take $\overline{Z_k} = -\inf$ when $S_k = \emptyset$ and $\underline{Z_k} = -\inf$ when no feasible solution in S_k was found. The primal (lower) bounds are provided by feasible solutions and the dual (upper) bounds by relaxation or duality. ![](https://i.imgur.com/JzSU7Ow.png) Therfore, there are three reasons that allow us to prune the tree and thus enumerate a large number of solutions implicitly ![](https://i.imgur.com/AvnqlHb.png) ![](https://i.imgur.com/8xzPp75.png) ## 7.3 Branch and Bound: an Example ![](https://i.imgur.com/7zz2qE7.png) ### Bounding To obtain the upper bound we add auxiliary variables $x_3, x_4, x_5$, drop the integrality constraints and solve the LP. ![](https://i.imgur.com/QMtcnln.png) We obtain the upper bound $\overline{Z} = \frac{59}{7}$ and solution $(x_1, x_2) = (\frac{20}{7},3)$. The solution is infeasible, therefore, we set $\underline{Z} = -\inf$ ### Branching Since $\underline{Z} < \overline{Z}$ we need to divide and branch. To split up the feasible reagion we pick a an integer variable $x_j$ that is fractional in linear programming solution and create new sets ![](https://i.imgur.com/abv3p3t.png) So then ![](https://i.imgur.com/VdjaNrE.png) and as the solution $\overline{x}$ is not feasible in LP(S_1) and LP(S_2) if there is no degeneracy then $\max\{\overline{Z_1}, \overline{Z_2}\} < \overline{Z}$ >> so the upper bound will strictly decrease ![](https://i.imgur.com/kHtpNoe.png) ### Choosing node >> arbitrary >> $S_1$ ### Reoptimizing To reusee the information from the previous node we could apply dual simplex as our previous optimal basis remains dual feasible after we addeded one single upper or lower bound constraint to the LP. ![](https://i.imgur.com/9PLZvWA.png) ![](https://i.imgur.com/sHIDYj2.png) ![](https://i.imgur.com/yxk9OTe.png) ![](https://i.imgur.com/o2zs4OD.png) ## 7.4 LP-Based Branch and Bound ![](https://i.imgur.com/Ro0aDqu.png) ### Storing the Tree At a minimum, the best-known dual bound and the variable lower and upper bounds needed to restore the subproblem are stored. Generally >> depedns on particluar instance and strategy. One of the common rules choose the variable to branch >> to choose the most fractional variable ![](https://i.imgur.com/R5rcWVd.png) ### Choosing a node While choosing the node one should keep in mind these two ideas: * Depth-First Search strategy >> descend as quickly as possible in the enumeration tree to find a first feasible solution + it is always easy to resolve the linear programming relaxation when a simple constraint is added. Once we have the feasible solution we can prune the tree significantly. In the example, this would imply that after treating node S1, the next node treated would be S3 or S4 rather than S2 * Best Node First strategy >> To minimize the total number of nodes evaluated in the tree. With such a rule, we will never divide any node whose upper bound $\overline{Z_t}$ is less than the optimal value Z. In the example this would imply that after treating node S1, the next node chosen would be S2 with bound 59/7 from its predecessor, rather than S3 or S4 with bound 15/2. In practice >> initial depth-first strategy until at least one feasible solution has been found, followed by a strategy mixing best node and depth first so as to try to prove optimality and also find better feasible solutions. ## 7.5 Using a Branch-and-Bound/Cut System ### Simplex Strategies Different simplex pricing strategies may make a huge difference in running times for a given class of models. On very largemodels, interior point methods may be best for the solution of the first linear program. ### Branching and Node Selection To choose the node >> optimal idea is to estimate the the optimal value that will be obtained from pursuing that branch $Z_{LP} - \sum \min [D^-_j, D^+j]$ where $Z_{LP}$ is the value of the LP solution at a node, where $D^-_j$ is a down degradation and $D^+_j$ is up degradation ![](https://i.imgur.com/gFVtfBn.png) These values can also be used to choose a branching variable ![](https://i.imgur.com/n7xTeUf.png) ### Strong Branching The idea is to spend some more time to select a better branching variable. From the set $C$ of integer variables that are fractional in LP we branch up and down on each of them in turn, and reoptimize on each branch either to optimality, or for a specified number of dual simplex pivots >> for each $j \in C$ we get upper bounds $Z^D_j$ for the down-branch and $Z^U_j$ for the up-branch. The variable having the largest effect (smallest dual bound) is then chosen and branching really takes place on this variable![](https://i.imgur.com/ybgiC6v.png) ### Special Ordered Sets SO1: is a set of variables $x_1, \dots x_k$ of which atmost one can be positive. Regular branching ![](https://i.imgur.com/Ha8Dn5z.png) would lead to unbalnced tree as the left tree would have k-1 options and the right one would have only one. So the better strategy ![](https://i.imgur.com/rW31J4a.png) ![](https://i.imgur.com/R1JTOwQ.png) For the functions of the type $f(x_1, \dots, x_k ) = \sum_{i = 1}^k f_i(x_i)$ we can make linearisation ![](https://i.imgur.com/aEZnLAL.png) ## Semicontinuous Variables The condition $x = 0$ or $l \le x \le u$ can be represented as semicontinuous variable sc(x,l,u). If at some stage we get new lower bound $l'$ we pick $max[l ,l']$ and if we get new upper bound $u'<l$ we set $x = 0$ ## Indicator Variables Some MIP accept the use of the logical variable z indicating whether some inequality constraint is active rather then introducing a big M constraint ## Priorities User can set prefifed information on what varaible are prioritised to be integer and if some variables happen to be fractional after solving LP the program will take into account that information and choose the one with the highest priority among them for branching. At the same time, the user can specify a preferred branching direction telling the system which of the two branches to explore first. ## Symmetry Breaking/Orbital Branching ![](https://i.imgur.com/HxIgSCZ.png) ![](https://i.imgur.com/nPsTUWF.png) ![](https://i.imgur.com/6axjb3n.png) So if we consider the example ![](https://i.imgur.com/8FMMhMu.png) ![](https://i.imgur.com/yXBE8Os.png) ![](https://i.imgur.com/e1ejMuh.png) ## 7.6 Preprocessing or Presolve The basic idea is to try to quickly detect and eliminate redundant constraints and variables, and tighten bounds where possible. Then if the resulting linear/integer programis smaller/tighter, it will typically be solved much more quickly. ### Linear Programming Preprocessing ![](https://i.imgur.com/Dw7HDDw.png) ![](https://i.imgur.com/aI1TuiA.png) ## Integer Programming Preprocessing If the bounds on integer varibale are not integer, tighten them ![](https://i.imgur.com/skuCZrk.png) ![](https://i.imgur.com/TMJv44m.png) ### Clique Finding ![](https://i.imgur.com/4AGvX6w.png) ### Strengthening Inequalities ![](https://i.imgur.com/ndA71IT.png) Chapter 8: Cutting Plane Algorithms --- ## 8.1 Introduction We already know from Chapter 1 that, since $conv(X)$ is a polyhedron (**Proposition 8.1**), an integer program (IP) can be formulated as a linear program (LP). However, it is usually much too difficult to find an explicit description of $conv(X)$, so we will in this chapter try to find effective ways of approximating it. For this, we need to describe the inequalities which don't exclude solutions from the feasible set, and can thus help us to approximate $conv(X)$. **Definition 8.1** An inequality 𝜋x ≤ 𝜋0 is a valid inequality for X ⊆ ℝ^n if 𝜋x ≤ 𝜋0 for all x ∈ X. For $X = \{x ∈ ℤ^n ∶ Ax ≤ b\}$ and $conv(X) = \{x ∈ ℝ^n ∶ \tilde{A}x ≤ \tilde{b}\}$, examples of valid inequalities for $X$ are $a_i x ≤ b_i$ and $\tilde{a}_i x \leq \tilde{b}_i$. ## 8.2 Some Simple Valid Inequalities We have already seen some valid inequalities in Example 7.6. **Example 8.3** Consider the set $$ X = \{ (x,y): y\leq 10x, 0\leq y\leq 14, x\in \mathbb{Z}_+^1\} $$ ![](https://i.imgur.com/Qh4QM2Y.png) We can see in the picture that the constraint connecting the points $(1,10)$ and $(2,14)$ forms the valid inequality: $$ y \leq 6+4x $$ More generally, when $C$ does not divide $b$ we have for $$X = \{(x, y) ∶ y ≤ Cx, 0 ≤ y ≤ b, x ∈ ℤ^1_+\}$$ the valid inequality $$ y ≤ b − 𝛾(K − x)$$ where $K = ⌈\frac{b}{C}⌉$ and $𝛾 = b −(⌈\frac{b}{C}⌉ − 1)C$. ![](https://i.imgur.com/wCB3WIB.png) **Question:** This cut is weaker, so why bother? Only to show the procedure? Integer rounding can also be applied to mixed integer programs (Example 8.5). I do not get this at all: ![](https://i.imgur.com/bL3Tdu7.png) ![](https://i.imgur.com/TIjY4OW.png) ## 8.3 Valid Inequalities To understand how to generate valid inequalities for integer programs, it is first necessary to understand valid inequalities for polyhedra (or linear programs). So the first question is: When is the inequality $𝜋x ≤ 𝜋_0$ valid for $P = \{x ∶ Ax ≤ b, x ≥ 0\}$? - LP (Proposition 8.2): $𝜋x ≤ 𝜋_0$ is valid for $P = \{x ∶ Ax ≤ b, x ≥ 0\} ≠ ∅$ if and only if there exist $u ≥ 0, 𝑣 ≥ 0$ such that $uA − 𝑣 = 𝜋$ and $ub ≤ 𝜋_0$, or alternatively there exists $u ≥ 0$ such that $uA ≥ 𝜋$ and $ub ≤ 𝜋_0$. - IP (Proposition 8.3): Let $X = \{x ∈ ℤ^1 ∶ x ≤ b\}$, then the inequality $x ≤ ⌊b⌋$ is valid for $X$. (This is in some sense the complete answer to the question which inequalities are valid for $X$.) We have already used this idea in Example 8.5. **Example 8.8** Let $X = P ∩ ℤ_n$ be the set of integer points in a polyhedron $P$, given by $$\begin{aligned} 7x_1 − 2x_2 ≤ 14 \\ x_2 ≤ 3 \\ 2x_1 − 2x_2 ≤ 3 \\ x ≥ 0. \end{aligned}$$ 1. Combining the constraints with nonnegative weights $u = (\frac{2}{7}, \frac{37}{63}, 0)$, we obtain the valid inequality for P $$2x_1 + \frac{1}{63} x_2 ≤ \frac{121}{21}.$$ 2. As $x ≥ 0$, the coefficients on the lhs can be reduced to the nearest integer giving the valid inequality for $P$: $$2x_1 + 0x_2 ≤ \frac{121}{21}.$$ 3. Now, as the lhs is integral for all points of $X$, we can reduce the rhs to the nearest integer, and we obtain the valid inequality for $X$: $$2x_1 ≤ ⌊\frac{121}{21}⌋= 5.$$ Observe that if we repeat the procedure, and use a weight of $\frac{1}{2}$ on this last constraint, we obtain the tighter inequality $x_1 ≤ ⌊52⌋ = 2$. Now, it is easy to describe the general procedure that we have been using. ![](https://i.imgur.com/kZDzdsQ.png) Surprisingly, but conveniently, this simple procedure can generate all valid inequalities for an integer program (but only theoretically, since there may be infinitely many. The important thing here is that the inequalities of the convex hull are included!) **Theorem 8.1** Every valid inequality for $X$ can be obtained by applying the Chvàtal-Gomory procedure a finite number of times. Proof: 2 pages, so maybe not today! For inequalities generated by other arguments, it is sometimes interesting to see how easily they are generated as C-G inequalities, see Exercise 15. Now that we have seen a variety of both ad hoc and general ways to derive valid inequalities, we turn to the important practical question of how to use them. ## 8.4 A Priori Addition of Constraints The idea is the following: - Examine the initial formulation $P = \{x: AX\leq b, x\geq 0\}$ with $X=P\cap \mathbb{Z}^n$ - Find a set of valid inequalities $Qx\leq q$ for $X$ and add them to the formulation, giving a new formulation $P' = \{x: Ax\leq b, QX\leq q, x\geq 0\}$ with $X=P\cap\mathbb{Z}^n$ - Apply our favourite algorithm to $P'$. **Advantages** - we can use standard branch-and-cut software - if the inequalities are well chosen the algorithm will be more effective - the chances of finding feasible integer solutions increases **Disadvantages** - lots of equations to add, so longer solving times or can't use standard branch-and-cut software In many instances, the feasible region $X$ can be written naturally as the intersection of two or more sets with structure. Decomposing or concentrating on one of the sets at a time may be a good idea. - $X = X^1 ∩ X^2$ - $X^2 = P^2 ∩ ℤ^n$, optimizing over this is easy - Find $P′^2 = conv(P^2 ∩ ℤ^n)$ , an explicit description - Replace the initial formulation $P^1 ∩ P^2$ by an improved formulation $P′ = P^1 ∩ P′^2$ ![](https://i.imgur.com/40X9CVr.png) The strong formulation is now commonly used for this problem despite having more constraints because the bound it provides is much stronger than that given by the weak formulation, and the linear programming relaxation has solutions that are close to being integral. ## 8.5 Automatic Reformulation or Cutting Plane Algorithms - There are usually too many known valid inequalities for them to be added a priori - We don't nedd the complete convex hull, only the part near the optimal solution This is a basic cutting plane algorithm to generate "useful" inequalities from a family of valid inequalities $\mathcal{F}$ we know. **Cutting Plane Algorithm** _Initialization._ Set $t = 0$ and $P_0 = P$. _Iteration t._ Solve the linear program: $Z_t = max\{cx ∶ x ∈ P_t\}.$ Let $x_t$ be an optimal solution. If $x_t ∈ ℤ^n$, stop. $x_t$ is an optimal solution for IP. If $x_t ∉ \mathbb{Z}^n$, solve the separation problem for $x_t$ and the family $\mathcal{F}$. If an inequality $(𝜋^t, 𝜋^t_0) ∈ \mathcal{F}$ is found with $𝜋^t x^t > 𝜋^t_0$ so that it cuts off $x_t$, set $P_{t+1} = P_t ∩ \{x ∶ 𝜋^tx ≤ 𝜋^t_0\}$, and augment $t$. Otherwise stop. If the algorithm terminates without finding an integral solution for IP, $P^t$ is an improved formulation that can be input to a branch-and-cut solver. It should also be noted that in practice it is often better to add several violated cuts at each iteration, and not just one at a time. In Section 8.6, we look at a specific implementation of this algorithm. ## 8.6 Gomory's Fractional Cutting Plane Algorithm Here, we consider an IP. The idea: - Solve the LP relaxation and find an optimal basis - choose a basic variable that is not integer - generate a Chvàtal-Gomory inequality on the constraint of this variable (unity matrix in the Simplex tableau) - cut off the LP solution Suppose, that given an optimal basis, the problem is rewritten in the form: ![](https://i.imgur.com/jCS1tkl.png) ![](https://i.imgur.com/GRab84y.png) The last part: The difference in 8.8 is integral, and replacing $x_b$ by something equal does not change that, hence the difference must still be integral in the solution. ![](https://i.imgur.com/FwEgCoy.png) We can also reformulate the cut in terms of the original variables. ## 8.7 Mixed Integer Cuts ### 8.7.1 The Basic Mixed Integer Inequality We saw above that when $x ≤ b, x ∈ ℤ^1$, the rounding inequality $x ≤ ⌊b⌋$ suffices to generate all the inequalities for a pure integer program. Here we examine if there is a similar basic inequality for mixed integer programs. ![](https://i.imgur.com/YxFTYF6.png) ![](https://i.imgur.com/BAzjGbe.png) The following corollary allows us to compare more directly with the all-integer case. **Corollary 8.1** If $X^≤ = \{(x, y) ∈ ℤ^1 + × ℝ^1 ∶ x ≤ b + y\}$ and $f = b − ⌊b⌋ > 0$, the inequality $$x ≤ ⌊b⌋ + \frac{y}{1 − f}$$ is valid for $X^≤$. Thus, we see that when the continuous variable $y = 0$, we obtain the basic integer rounding inequality. ### 8.7.2 The Mixed Integer Rounding (MIR) Inequality A slight variant of the basic inequality. ![](https://i.imgur.com/Edgy6Nq.png) ### 8.7.3 The Gomory Mixed Integer Cut As for all integer programs in Section 8.6, any row of an optimal linear programming tableau, in which an integer variable is basic but fractional, can be used to generate a cut removing the optimal linear programming solution. ![](https://i.imgur.com/Hqm4q5W.png) ### 8.7.4 Split Cuts An alternative viewpoint on both MIR inequalities and Gomory mixed integer cuts is provided by split cuts. ![](https://i.imgur.com/fyoh53L.png) ![](https://i.imgur.com/T879Zra.png) In fact, it is not difficult to show that MIR, split, and Gomory mixed integer cuts are essentially equivalent in the sense that any inequality that is of one type, can also be obtained as an inequality of the other two types. Chapter 9: Strong Valid Inequalities --- ### 9.1 Introduction In the last chapter formulating valid inequalities was discussed and it was mentioned that adding all possible valid inequalities is not necessariliy trackable or beneficial. In this chapter we focus on finding strong valid inequalities that are hopefully more effective. We do this by defining what is meant by *strong* and then examining a few interesting families of $F$ of strong inequalities. We also examining the relating separation problems, which tell us if a valid inequality is violated by the solution. ### 9.2 Strong Inequalities **Dominance** ![](https://i.imgur.com/gx8l1qA.png) **Redundant** ![](https://i.imgur.com/Uluis0C.png) ![](https://i.imgur.com/J3R1NPG.png) ![](https://i.imgur.com/M8l2Xcp.png) **Strongest possible valid inequalities** $F$ defines a **face** of the polyhedron $P$ if $F = \{x ∈ P ∶ \pi x = \pi_0\}$ for some valid inequality $\pi x ≤ \pi_0\}$ of $P$. $F$ is a **facet** of $P$ if $F$ is a face of $P$ and dim($F$)= dim($P)-1$. A facet is the strongest possible valid inequality for a polyhedron. **Proposition 9.1** If $P$ is fully-dimensional, a valid inequality $\pi x ≤ \pi_0$ is necessary in the description of $P$ if and only if it defines a facet of $P$. ![](https://i.imgur.com/AIctkiq.png) In theory the important inequalities – the ones that provide the best cuts – are those that are necessary for describing the polyhedron $P = \text{conv}(X)$. In practice when $\text{conv}(X)$ is not known, the goal is to avoid using a redundant inequality. We can avoid using a redundant constraint by checking if one that dominates it is readily available. -> The ideal situation is to be able to generate face-defining valid inequalities. (The book includes two approaches for showing that a valid inequality defines a facet of conv($X$).) ### 9.3 0-1 Knapsack Inequalities In this section we consider the set $$X = \{x \in \{0, 1\}^n: \sum_{j=1}^{n}a_jx_j \leq b\}$$ ### 9.3.1 Cover inequalities Covers allow us to identify infeasible incidence vectors and using the cover inequalities we can cut infeasible solutions and tighten the formulation. ![](https://i.imgur.com/VlChlgi.png) Note that $C$ is a cover if and only if its associated incidence vector $x^C$ is infeasible for $X$. ![](https://i.imgur.com/388rSVm.png) The idea is that not all $x_j$ such that $j \in C$ can be 1 in the solution, because then $x$ would be infeasible. ![](https://i.imgur.com/9jHfaJr.png) ### 9.3.2 Strengthening cover inequalities The cover inequality can be strengthened by extending the cover. Notice that the extended cover is never a minimal cover. ![](https://i.imgur.com/86Usi5B.png) ![](https://i.imgur.com/blcKfCy.png) **Procedure to Lift Cover Inequalities** Lifting cover inequalities is a process for making the inequality stronger, so that it is nonredundant (facet-defining). The procedure for lifting cover inequalities is posed as the problem to find the best possible values for $\alpha_j$ for $j \in N \setminus C$ such that the inequality $$\sum_{j \in C} x_j + \sum_{j \in N \setminus C} \alpha_j x_j \leq |C| -1$$ is valid for $X$. ![](https://i.imgur.com/xFjVZDz.png) The value $\zeta$ describes how much space is used up by the variables $C$ in the inequality when $x_1 = 1$. This procedure can be continued for all variables $j \in N \setminus C$. This leads to a facet-defining inequality for conv($X$) when $C$ is a minimal cover and $a_j \leq b$ for all $j \in N$. ### 9.3.2 Separation for Cover Inequalities We are give a nonintegral point $x^*$ with $0 \leq x_j^* \leq 1$ for all $j \in N$ and we wish to know whether $x^*$ satisfies all the cover inequalities. Note that the cover inequality can be written as $$\sum_{j \in C} (1-x_j) \geq 1.$$ Thus is suffices to answer the question: ![](https://i.imgur.com/p7fiooi.png) Since the set $C$ is unknown, we can formulate this as a 0-1 integer program where the variable $z_j =1$ if $j \in C$ and $z_j =0$ otherwise. ![](https://i.imgur.com/PBRrGO0.png) ### 9.4 Mixed 0-1 Inequalities ![](https://i.imgur.com/alJh5ZG.png) ![](https://i.imgur.com/iZbJI10.png) Notice the inequality: the incoming flow has to be *less than or equal* to the outgoing flow. ### 9.4.1 Flow Cover Inequalities ![](https://i.imgur.com/nACBdhW.png) Here, the formulation of a cover is defined using the excess term $\lambda$ and an equality assertion instead of an inequality as was done in the previous section. ![](https://i.imgur.com/VSMQbvo.png) *Note: we can choose set $L_2$.* ![](https://i.imgur.com/IzjFBfD.png) ### 9.4.1 Separation for Flow Cover Inequalities Assuming that $y_j = a_j x_j$ and $a_j ≥ \lambda$ for all $j \in C_1$, and that $L_2 = N_2\setminus C_2$, with some derivation the flow cover inequality becomes ![](https://i.imgur.com/sVTQSTu.png) This is just like the cover inequality for a 0-1 knapsack set with both positive and negative coefficients. Thus we can use the knapsack problem in the separation heuristic for the flow cover inequalities. ![](https://i.imgur.com/aAiWpcs.png) ![](https://i.imgur.com/BvGo0sV.png) ### 9.5 The Optimal Subtour Problem In this problem type we do not focus on valid inequalities, but instead on how to manage the exponential number of *generalised subtour elimination constraints*. ![](https://i.imgur.com/XUKxB8M.png) ### 9.5.1 Separation for Generalised Subtour Constraints Now ($x^∗, y^∗$) violates the ($k, S$) generalized subtour inequality if $$\sum_{e \in E'(S)} x_e^* > \sum_{i \in S \setminus \{k\}} y_i^*$$ if and only if $\zeta_k > 0$ where ![](https://i.imgur.com/BUcLHsg.png) This problem is a quadratic 0-1 program. We can turn it into a a linear program, which has a TU matrix by introducing variables $\omega_e =1$ if $z_i=z_j=1$ for $e=(i,j) \in E'$. ![](https://i.imgur.com/WII8Xda.png) The constraint matrix is TU, because once we drop 9.9 (it's redundant) the constraint matrix becomes: $$ \begin{bmatrix}-1 & 0 & 1\\ 0 & -1 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0\end{bmatrix} \begin{bmatrix} z_i \\ z_j \\ \omega_e \end{bmatrix} \leq \begin{bmatrix} 0 \\ 0 \\ 1 \\1 \end{bmatrix}$$ *An example of this kind of subtour problem is found in book on page 185.* ### 9.6 Branch-and-Cut The *Branch-and-Cut Algorithm* is a branch-and-bound algorithm in which cutting planes are generated throughout the branch-and-bound tree. In this method, instead of focusing on reoptimising efficiently at each node, we instead use resources to obtain a tight dual bound at the node – we do more work at each node with the hope that we will then need to traverse less nodes. Generating so many cuts requires a method for managing all this information. Thus, the cuts are all kept in a *cut pool*. In the node list, we store the bounds, a good basis just like in the B&B, but we also store pointers to the appropriate constraints in the cut pool. ![](https://i.imgur.com/F3G12qM.png) *An example of this kind of subtour problem is found in book on page 189.* Chapter 11 - Column and Row Generation Algorithms --- ## 11.1 Introduction Integer Program (IP) is given $$\max\{cx:x∈X\}$$ Based on ideas from **decomposition**: Feasible region $X$ can be written as the **intersection of two or more sets** with structure $$X=⋂_{k=0}^{K} X^k,\quad K≥1.$$ --- We are interested in IP with reasible region $X$ is of the following form. For $k=1,...,K$ we have **independent subsets** as $$X^k=\{x^k∈Z_{+}^{n^k} : D^k x^k ≤ d_k\}.$$ Also called **knapsack sets**. **Joint constraints** link together different sets of variables $$∑_{k=1}^{K} A^k x^k≤b.$$ Also called **assignment** or **complicating constraints**. --- *Resource-Constrained Shortest Path Problem* has the above form with one subproblem $K=1.$ ![](https://i.imgur.com/51BTdHs.png) --- Objective function of form $$\max ∑_{k=1}^{K} c^k x^k$$ allows us to exploit the problem structure in three ways: 1) Cut generation by trying to generate valid inequalities for each subset $X^k,k=1,...,K$ > 2) Lagragian relaxation by dualizing the joint constraints to obtain a dual problem: $\min_{u} L(u)$ where $$L(u)=\max\{∑_{k=1}^K (c^k - uA^k) x^k + ub : x^k∈X^k, k=1,...,K\}$$ breaks up into $K$ distinct subproblems > 3) We assume subsets $X^k$ are bounded. Solving an equivalent problem of form: $$\max\{∑_{k=1}^K γ^k λ^k : ∑_{k=1}^K B^k λ^k = β, λ^k≥0 \text{ integer for } k=1,...,K\}$ where each matrix $B^k$ has a very large number of columns, one for each of the extreme points of $\operatorname{conv}(X^k)$ and $λ^k$ is a vector of corresponding variables. --- Integer problems with **ernomous number of variables** where **columns are described implicitly** as the incidence vectors of certain subsets of a set. (tours, client subsets, etc) Reformulation can create these integer program formulations called **Master Problems**. LP relaxation of Master Problems can be solved using the solution of the **Column Generation Subproblem(s)**. When LP relaxation is not integral we must resort to enumeration which leads to **IP Column Generation** or **Branch-and-Price** algorithms. Adding cutting planes gives rise to **Branch-Cut-and-Price**. ## 11.2 The Dantzig-Wolfe Reformulation of an IP One subproblem $K=1$ $$Z_{IP}=\max\{cx : Ax≤b, x∈X\}$$ $$X=\{x∈ℤ_{+}^n : Dx≤b\}$$ We can substitute $X$ with its **convex hull**: $$Z_{IP}=\max\{cx : Ax≤b, x∈\operatorname{conv}(X)\}$$ We can represent the convex hull using extreme points. $$\operatorname{conv}(X)=\{x∈ℝ^n : x=∑_{t=1}^T z^t λ_t, ∑_{t=1}^T λ_t=1, λ∈ℝ_+^T\}$$ where $\{z^t\}_{t=1}^T$ are the **extreme points** of the convex hull. > Note: There are lots of extreme points. Now, we obtain the **Master Problem** (denoted by **M**): $$\begin{aligned} Z_M=& \max ∑_{t=1}^T (c z^t)λ_t \\ & ∑_{t=1}^T (Az^t)λ_t ≤ b \\ & ∑_{t=1}^T λ_t=1 \\ & λ_t∈ℝ_{+}^T \\ & x-∑_{t=1}^T z^tλ_t=0 \\ & x∈ℤ^n \end{aligned}$$ Referred to as **convexification**. ## 11.3 Solving the LP Master Problem: Column Generation Linear programming relaxation of the Master Problem (denoted by **LM**): $$\begin{aligned} Z_{LM}=& \max ∑_{t=1}^T (c z^t)λ_t \\ & ∑_{t=1}^T (Az^t)λ_t ≤ b \\ & ∑_{t=1}^T λ_t=1 \\ & λ_t∈ℝ_{+}^T \\ \end{aligned}$$ > Note: Removes the variables $x$ and associated constraints. --- Use column generation algorithm to solve. $\{π_i\}_{i=1}^m$ denotes **dual variables associated with complicating constraints**. $π_0$ denote **dual variable for the convexity constraint**. We use **Primal Simplex** such that we choose column to enter by largest reduced price optimization problem over set $X.$ > We can reoptimize Primal Simplex easily after we add new columns to simplex tableau. > Open question whether interior point method can perform reoptimization easily? --- *Initialization*: Assume we have extreme points $z^1,...,z^p$ available. We solve the **Restricted LP Master Problem** $$\begin{aligned} Z_{RLM}=& \max ∑_{t=1}^p (c z^t)λ_t \\ & ∑_{t=1}^p (Az^t)λ_t ≤ b \\ & ∑_{t=1}^p λ_t=1 \\ & λ_t∈ℝ_{+}^p \\ \end{aligned}$$ We obtain primal solution $λ^{∗}$ and dual solution $(π^{∗},π_0^{∗})∈ℝ_+^m×ℝ^1.$ *Primal feasibility*: Any feasible solution of RLM is feasible for LM, particularly $λ^{∗}$ is feasible (thus primal bound). *Optimality check for LM*: Check whether dual solution is feasible for LM. Solve $$ζ^p=\max\{(c-π^{∗}A)x-π_0^{∗}:x∈X\}$$ with optimal solution $x^{∗}.$ *Stopping criterion*: If $ζ^p=0$ solution is dual feasible, then $λ^{∗}$ is primal optimal. *Generating new column*: If $ζ^p>0$ * Set the optimal solution as new extreme point $z^{p+1}=x^{∗}$ * Add column $(xz^{p+1}, Az^{p+1}, 1)^T$ * Creates new RLM which we can reoptimize easily using the Primal Simplex algorithm *A Dual Upper Bound*: Algorithm terminates when $ζ^p=0.$ *Alternative Stopping Criterion*: $ζ^p>0$ and $x^{∗}$ satisfies complicating constraints, then $x^{∗}$ is optimal solution of LP. *Finding Initial Feasible Solution*: ? ### Example 11.1 Initial problem with: * Objective function: $$cx=6x_1+7x_2+4x_3+3x_4+2x_5$$ * Complicating constraints: $$Ax=5x_1+8x_2+6x_3+4x_4+2x_5≤8$$ * Subproblem constraints: $$Dx=7x_1+8x_2+6x_3+3x_4+3x_5≤10$$ * Binary variables: $x∈\{0,1\}^5$ Restricted LP Master Problem with $$\begin{aligned} \max\, & 6λ_1+7λ_2+4λ_3 \\ & 5λ_1+8λ_2+6λ_3≤8 \\ & λ_1+λ_2+λ_3=1 \\ & λ∈ℝ_+^3 \end{aligned}$$ with initial exteme points $z^1=(1,0,0,0,0),$ $z^2=(0,1,0,0,0),$ and $z^3=(0,0,1,0,0)$ that satisfy the knapsack constraint. ![](https://i.imgur.com/T58Nenn.png) ## 11.4 Solving the Master Problem: Branch-and-Price If solution $(\tilde{x},\tilde{λ})$ to LP relaxation of Master Problem is **not integer**, we need to resort to **branch-and-bound**, possibly with **cutting planes**. At each node we must decide: 1. How to branch and 2. How to solve the linear programming Master Problem ### Branching at a Node We can branch either on variables $x$ or $λ.$ Branching on $λ$ has undesirable properties. > The alternative is to branch on some fractional $λ_t$ variable, fixing it to 0 and 1 respectively, see Figure 11.1b. Note, however, that on the branch in which $λ_t$ = 0, just one column, corresponding to the solution z t of the subproblem, is excluded, so the resulting problem is almost identical to the original one. This means that the resulting enumeration tree has the undesirable property of being highly unbalanced. In addition, the new subproblems are not easy to solve. It is better to branch on variables $x$ whenever possible. May not be sufficient on all cases. ### Solving the LP Master at a Node ### Example 11.2 Continue from example 11.1 ![](https://i.imgur.com/I9zlv60.png) ![](https://i.imgur.com/PKnaQPG.png) ### Example 11.3 Continue from example 11.1 ![](https://i.imgur.com/uackOWQ.png) ![](https://i.imgur.com/teSbyR9.png) Chapter 12: Bender's Algorithm --- ## 12.1 Introduction Here we consider one approach for solving problems of the form $$ Z = \max\{cx+hy: Fx+Gy\leq d, x\in X\subseteq\mathbb{Z}_+^n, y\in \mathbb{R}_+^p\} \quad \quad (12.1)$$ where in many cases $p$ is large, while $n$ is relatively small. **Example: Capacitated facility location problem** - facilities $j$, clients $i$ - capacities $d_j$, fixed costs $f_j$ for each facility - demands $a_i$ - transportation costs $c_{ij}$ - $x_j$ usage of facility $j$ - $y_{ij}$ amount shipped from $j$ to $i$ ![](https://i.imgur.com/Mu9t6ML.png) **Example: Extended formulation** $Q = \{(x,y): Fx+Gy\leq d, x\in \mathbb{R}^n, y\in\mathbb{R}_+^p\}$ is an extended formulation for some set $P\subset\mathbb{R}^n$, i.e. $P = proj_x(Q)$. Then, $\max\{cx: x\in P\cap X\}$ can be replaced by $\max\{cx+0y: Fx+Gy\leq d, x\in X, y\in\mathbb{R}_+^p\}$, which is of the form (12.1). The approach of Benders' is to decompose the problem, so as to essentially solve it in the $x$-space. The underlying idea is that some of the variables are more "complicated" than others, for example the integer variables. Once these are set, solving the rest of the problem is relatively easy. ## 12.2 Bender's Reformulation The main step in Bender's reformulation is to rewrite the original problem (12.1) $$\begin{align*} \max &\quad cx+hy \\ \text{s.t.} &\quad Fx+Gy\leq d \\ &\quad x\in X \\ &\quad y\in \mathbb{R}_+^p \\ \end{align*}$$ in terms of the $x$ variables: $$\begin{align*} \max &\quad cx+\phi(x) \\ \text{s.t.} &\quad x\in X \\ \end{align*}$$ where $$\begin{align*} \phi(x):= \max &\quad hy \\ \text{s.t.} &\quad Gy\leq d-Fx \\ &\quad y\in \mathbb{R}_+^p \\ \end{align*}$$ We assume for simplicity that $\phi(x)<\infty$ for all $x\in\mathbb{R}^n$. Dualizing the subproblem $\phi(x)$, we obtain the *dual separation problem (DSP)* $$\begin{align*} \phi(x):= \min &\quad u(d-Fx) \\ \text{s.t.} &\quad uG\geq h \\ &\quad u\in\mathbb{R}_+^m \\ \end{align*}$$ We can assume that this is feasible, and thus the set of feasible solutions $U = \{u\in\mathbb{R}_+^m: uG\geq h\}$ is nonempty and can be redefined by a collection of extreme points $\{u^s\}_{s=1}^S$ and rays $\{v^t\}_{t=1}^T$.We can now rewrite the dualized subproblem as $$\begin{align*} \phi(x):= \max &\quad \eta \\ \text{s.t.} &\quad v^t(d-Fx)\geq 0 & \text{for } t=1,\dots,T\\ &\quad u^s(d-Fx)\geq \eta & \text{for } s=1,\dots,S \\ &\quad x\in X, \eta\in\mathbb{R}^1 \end{align*}$$ Substituting in the original, we obtain the Bender's Master problem: $$\begin{align*} \max &\quad cx + \eta \\ \text{s.t.} &\quad v^t(d-Fx)\geq 0 & \text{for } t=1,\dots,T\\ &\quad u^s(d-Fx)\geq \eta & \text{for } s=1,\dots,S \\ & x\in X, \eta\in\mathbb{R}^1 \end{align*}$$ This is a MIP with a very large number of constraints. With modern mixed-integer programming software, the natural way to solve such a problem is by brach-and-cut. ![](https://i.imgur.com/7h6vJOZ.png) ![](https://i.imgur.com/JTGnNoA.png) ![](https://i.imgur.com/23WARjj.png) ![](https://i.imgur.com/Mud0xX6.png) ![](https://i.imgur.com/OOyfccw.png) ![](https://i.imgur.com/xYzePdZ.png) ![](https://i.imgur.com/72ix1TB.png) ![](https://i.imgur.com/5nSlpe8.png) ## 12.3 Bender's with Multiple Subproblems In many applications, the MIP has block diagonal structure of the form ![](https://i.imgur.com/WHZB6hV.png) Bender's reformulations can now be viewed as ![](https://i.imgur.com/6nLI0hV.png) having $K$ distinct separation problems ![](https://i.imgur.com/Crs4iLl.png) in place of one large separation problem. The complete Bender's reformulation is now ![](https://i.imgur.com/XeTj9NW.png) Two examples of MIPs with such a block diagonal structure: ![](https://i.imgur.com/cb7Fh2U.png) ![](https://i.imgur.com/vaHlblH.png) ## 12.4 Solving the Linear Programming Subproblems The solution of the dual of the LP separation problem does not always provide strong cuts. When the optimum is unbounded, the extreme ray that is found is somewhat arbitrary. Therefore, it is usual to add a normalization constraint so that the LP has a bounded optimal value. Different normalizations such as $\sum_{i=1}^mu_i=1$ or $\sum_{i=1}^m w_iu_i=1$ have been suggested and tested, but some examples show that the resulting cuts can be weak. Three approaches that can lead to improved performance of the algorithm: ![](https://i.imgur.com/RpIAje3.png) **The In-Out Approach** Here the idea is to modify the point to be cutoff, typically by moving it nearer to a “central point” 4(𝜂^0, x^0)$ that could be the best solution found so far, or some point that is in (or close) to the feasible region. Thus, one replaces $(𝜂^∗, x^∗)$ by a point $(\hat{𝜂},\hat{x}) = 𝛼(𝜂^∗, x^∗) + (1 − 𝛼)(𝜂^0, x^0)$ for some $0 < 𝛼 < 1$. For feasibility, the motivation is that, if $\hat{x}$ is closer to the feasible region, then there are less inequalities $𝑣^t(d − Gx) ≥ 0$ cutting it off and thus there is a greater likelihood of generating an effective feasibility cut. For optimality cuts, the idea is similar. In the same vein, it has also been suggested to move the “central point” closer to the point $(𝜂^∗, x^∗)$ after some iterations. ![](https://i.imgur.com/GSNiNqw.png) Chapter 13 - Primal Heuristics --- ## 13.1. Introduction The basic idea in primal heuristics is that in at least some large problems, we have ways of obtaining feasible solutions faster than they would be found by the branch and bound algorithm. Three reasons for using heuristics were given: - Instance is too large (probably the case at least with some TSPs?) - Branch-and-cut has trouble finding (good) feasible solutions - We know something about the problem structure and can use that to our advantage It is mentioned that there were some heuristics with worst-case guarantees in Section 6.7. For example, STSP heuristics that guarantee the solution to be less than 2 or 1.5 times the actual (minimized) optimal solution. ## 13.2. Greedy and Local Search ### 13.2.1. Greedy Search We have an example of a reverse knapsack (why do you do this...). ![](https://i.imgur.com/1UERToD.png) The greedy heuristic explained: ![](https://i.imgur.com/MhieDcv.png) In the beginning, no items/facilities are selected, the cost is zero and the problem is infeasible unless $b \le 0$. ![](https://i.imgur.com/QUnO6r4.png) From the remaining items/facilities, greedily choose the one with the best value/weight ratio. Couldn't we just have $j_t = argmin_{j \in \{1,...,n\} \setminus S_{t-1}} \frac{c_j}{a_j}$? ![](https://i.imgur.com/DUn3Z8k.png) The way this reverse knapsack greedy heuristic works is that we greedily pick items until we have a feasible set and the next item would make the solution worse (have a positive value $c_j$). $S^t$ is the solution set at time $t$, $S^G$ is the final greedy solution. ![](https://i.imgur.com/LhDYujx.png) Basically return statements and setting up the next iteration of the loop. ![](https://i.imgur.com/GU41q1S.png) Different heuristics for different problems. For STSP: - Nearest neighbor: start from arbitrary node, go to the closest node, go from that node to the closest node (excluding those already visited) - Pure greedy: choose edges in increasing weight order, skipping those that would create a subtour when added to the previously chosen set of edges ### 13.2.2. Local Search ![](https://i.imgur.com/2AFlLwd.png) An example of a composite objective function is coming up in the simulated annealing part. Basic idea: start with a solution $S$, change it slightly (move to a new solution in the "neighborhood" $Q(S)$) to make it better. When you don't find better solutions in the neighborhood, the search terminates with a *local* optimum. The choice of neighborhood depends on the problem. Generally, it should contain solutions that are close to the previous one. For example, add/remove one element from the set $S$ or do both at the same time. For STSP: we have the well-known 2-Exchange heuristic (I think this is also called 2-opt). In this heuristic, you remove two edges from a tour and swap their endpoints. Usually this works best when used on edges that cross each other. ![](https://i.imgur.com/hFJUxzj.png) ![](https://i.imgur.com/DQ99tr9.png) ## 13.3. Improved Local Search Heuristics Or: how to escape from a local optimum. ### 13.3.1. Tabu Search When at a solution (perhaps a local optimum), pick the best solution in the neighborhood. The problem: after escaping a local optimum this way, it would be tempting to go back to the previous solution as it was better. Solution: forbid going back to the previous $k$ solutions (make them tabu). ![](https://i.imgur.com/zkx9usG.png) ![](https://i.imgur.com/0fy41YJ.png) ![](https://i.imgur.com/Okke1Tj.png) ### 13.3.2. Simulated Annealing Basic idea: pick a random solution in the neighborhood. If solution is better than current, we move there. If not, the probability of moving depends on the difference between the current value and new value, and the temperature of the system. Solution process starts with a "hot" system, which then cools down, reducing the probability of potentially bad moves. The motivation behind this is that in the beginning, we're mostly interested about finding the rough location of the best local optimum (global optimum) and want to make big steps. Then we slowly cool the system down and there's less and less movement. This way, we will reach the actual local optimum whose neighborhood we found in the earlier iterations. ![](https://i.imgur.com/3s0nfXQ.png) ![](https://i.imgur.com/wDe6sOQ.png) ![](https://i.imgur.com/zQ5fJ3l.png) ![](https://i.imgur.com/U1SwrWG.png) ### 13.3.3. Genetic Algorithms ![](https://i.imgur.com/nDsNHUM.png) Interesting stuff, lots of stuff on YouTube at least. ## 13.4. Heuristics Inside MIP Solvers ### 13.4.1. Construction Heuristics As the name suggests, the idea is to construct an integer feasible solution. ![](https://i.imgur.com/uyO7vzY.png) ![](https://i.imgur.com/IKabxPg.png) ![](https://i.imgur.com/Lc8Lkf0.png) ![](https://i.imgur.com/QZvarjz.png) ![](https://i.imgur.com/n4IzsNZ.png) ### 13.4.2. Improvement Heuristics Instead of starting from an LP solution (or an arbitrary point), start from an integer feasible solution $x^* \in X$ and try to find better solutions. ![](https://i.imgur.com/U74XxNT.png) Max $k$ variables change their value (all variables binary). ![](https://i.imgur.com/A7U7JPt.png) With as small changes as possible, improve the solution by at least $\delta$. ![](https://i.imgur.com/j7M3lD6.png) Solve an IP where the variables that are integer in the LP solution are fixed to those values. ![](https://i.imgur.com/48bpznh.png) Similar to RINS, but instead of the previous LP solution, we use a set of previous solutions to choose what variables are fixed. ## 13.5. User-Defined Heuristics ### 13.5.1. Relax-and-Fix ![](https://i.imgur.com/uOXg5wb.png) ![](https://i.imgur.com/xN34uVx.png) Often broken into more than two subsets. ### 13.5.2. Large Neighborhood Search ![](https://i.imgur.com/Yv3scnw.png) ![](https://i.imgur.com/5v6OvPG.png) ### 13.5.3. Extended Formulation ![](https://i.imgur.com/DNECbS9.png) ### 13.5.4. Problem-specific This includes anything where the user knows something about the problem structure that could be used to obtain (good) solutions. For example, the problem can be split into two parts that are easier to solve, and solving them separately provides a decent feasible solution. ![](https://i.imgur.com/Th45xpI.png) ![](https://i.imgur.com/nsTVEc0.png)