Challenge no. 6: fast optimisation for constrained allocation

# Challenge no. 6: fast optimisation for constrained allocation *Authors: Oisin Faust Joerg Fliege Justin Ward Abdul-Lateef Haji-Ali * Consider only observers/sensor on the ground. Observers will be passive optical and radar. Each sensor has a fixed finite set of configurations. If we use binary variables for allocations its easiest if we 'break up' sensors + allocations into separate sensors. For a given observer on the ground, objects in lower Earth orbit are above the horizon for ca. 15 minutes. Assignment problem: assign sensors to configurations, assign some sensors to check for unknown objects, assign other sensors to groups of known objects. Trajectories of known objects can be forecast by a separate code. The next time a sensor observes an object, it can check the forecast against the actual location. Large differences mean that the object should have a higher priority for observation. Other objects that should have high priority of observation: 1. Objects close to burn up. 2. Objects on known collision courses or known close flybys. Whats the objective for optimisation? Observe uncertainty $\sigma_i^2$. In each time step, compute priority $p_i$ of object $i$, with meaning $p_i > p_j$ means that object $i$ has higher priority than object $j$. Potential objectives are * $\min \sum_i \sigma_i^2$, which is the average uncertainty over objects $i$. This suffers from the type of poor tradeoff mentioned in the problem description: it might be optimal under this objective to have low uncertainty for several objects at the cost of many other uncertainties. * $\min \max \sigma_i^2$, which is the worst uncertainty over any object $i$. * $\max \sum_i p_i \sum_k a_{i, k}$ where $a_{i, k} = 1$ if sensor $k$ is allocated to object $i$ and $a_{i, k} = 0$ else. or or $\min_{i,t} \sigma_{i,t}^2$ Computation loop: 1. Observe objects. 2. Compute new forecasts. 3. Compare observations with old forecasts. 4. Update uncertainty estimates. 5. Update priority parameters. 6. Solve optimisation problem. General MILP formulation: Index sets: $I$ set of sensors, $J$ set of objects/tasks, For each $i \in I$, a partition $J_1 (i), \ldots, J_{m_i} (i)$ of $J$. Sensor $i$ can take on $m_i$ tasks simultaneously, but for each $k = 1, \ldots, m_i$, only one task from $J_k (i)$ can be taken on. $I_1, \ldots, I_m \subseteq I$: for sensors in a set $I_k$, only one can be used at any given time. For all $\ell \in I$ a set $I(\ell) \subseteq I$ and a set $J(\ell) \subseteq J$: if sensor $\ell$ is used, the sensors from the set $I(\ell)$ cannot be used for tasks from $J(\ell)$. Parameters: $r_{i, j}$ reward for allocating sensor $i$ to task $j$. Decision variables: $x_{i, j}$ with $x_{i, j} = 1$ if sensor $i$ is allocated to task $j$ and $x_{i, j} = 0$ else. Objective: $$ \max_{x_{i, j}} \sum_{i, j} r_{i, j} x_{i, j} $$ Constraints: $$ x_{i, j} \in \{ 0, 1 \} . $$ $$ \sum_{j \in J_k (i)} x_{i, j} \leq 1 \qquad \forall i \in I, \, \forall k = 1, \ldots, m_i. $$ (At most one task $j \in J_k (i)$ can be taken on by sensor $i$. In particular, sensor $i$ can take on $m_i$ tasks simultaneously.) $$ \sum_{i \in I_k} x_{i, j} \leq 1 \qquad \forall j, \forall k . $$ (From the sensors in the set $I_k$, only one can operate.) $$ x_{i, j} \leq 1 - x_{\ell, j} \qquad \forall j \in J(\ell), \; \forall i \in I(\ell) . $$ (If sensor $\ell$ is used for task $j$, then sensor $i$ cannot be used for task $j$) If there is an "extra reward" if task $j$ is assigned by two sensors, then add a term $$ \sum_{i_1, i_2 \in I} \hat{r}_{i_1, i_2, j} x_{i_1,j} x_{i_2, j} $$ to the objective. This term can be linearized into $$ \sum_{i_1, i_2 \in I} \hat{r}_{i_1, i_2, j} z_{i_1, i_2 , j} $$ and the additional constraint $$ z_{i_1, i_2, j} \leq x_{i_1,j} + x_{i_2, j} - 1 . $$ The case of assigning more than two sensors to a task can be handled with an analogous approach. Very preliminary numerical results on a random problem with $n=100$ sensors and $m=9000$ objects/tasks: ``` ilogcp 12.10.0: Version identifier: 12.10.0.0 | 2019-11-26 | 843d4de2ae CPXPARAM_MIP_Display 0 ilogcp 12.10.0: optimal solution 0 nodes, 100 iterations, objective 899887.5017 ampl: display _solve_elapsed_time; _solve_elapsed_time = 12.593 ``` ``` Gurobi 10.0.0: Gurobi 10.0.0: optimal solution; objective 899887.5017 0 simplex iterations 1 branching nodes ampl: display _solve_elapsed_time; _solve_elapsed_time = 10.109 ``` (On a Dell with i5-7300U CPU @ 2.60GHz) ## A combinatorial approach Suppose we have: * a set $S$ of sensors * a set $T$ of tasks. We let $A = S \times T$ be the set of *assignments* of sensors to tasks. We now want to choose some subset $C \subseteq A$ that maximises our reward. ### Restricting the set of assignments with constraints Note that we are not assuming anything about the structure of $C$ at this point---this will be part of the constraints we introduce into our problem. For example, we might require that $C$ satisfy the property that no $j$ appear in more than one assignment, or that no $i$ receive more assignments than some given budget $b_i$, or both. These constraints can be modelled, respectively, as a binary partition matroid on $A$, a general partition matroid on $A$, or a $b$-matching constraint on $A$. It remains to describe how our reward for an assignment $C \subseteq A$ is calculated. In the simplest setting, our reward is a linear function over assignments $A$. Formally, for each $i \in S$ and $j \in T$ we have a reward $r_{i,j}$ and then the total value of $C \subseteq A$ is given by$f(C) = \sum_{(i,j) \in C} r_{i,j}$. In this setting, there are several efficient exact and approximate combinatorial algorithms for several classes of constraints, and even for very complicated constraints, we can "typically" solve the problem exactly via mixed integer programming. Note that the Generalised Assignment Problem can be formulated in the above language as follows: for each $(i,j) \in A$ we have a cost $c_{i,j}$ of assigning sensor $i$ to task $j$. We also have a budget $b_i$ for each sensor. Then, our constraint on assignments is just a knapsack constraint on each sensor $i$: $$\sum_{j : (i,j) \in C} c_{i,j} \leq b_i,$$ and our objective is a linear objective of the sort described above. ### Modelling more complex interactions between assignments A natural generalisation would be to consider non-linear reward functions $f$ that have a diminishing returns property. Formally, for any $C \subseteq D \subseteq A$ and any $(i,j) \not\in D$ we suppose that: $$ f(C \cup \{(i,j)\}) - f(C) \geq f(D \cup \{(i,j)\}) - f(D). $$ In words, the above can be stated as follows: suppose we have 2 assignments $C,D$ of sensors to tasks and $C$ is a sub-assignment of $D$ (in the sense that $D$ makes all the same assignments that $C$ does and possibly some additional assigments). Then, for any additional assignment $(i,j)$ not appearing in $C$ or $D$, the gain in reward that we realise from including $(i,j)$ in $D$ can never be larger than the gain we realise from including $(i,j)$ in the smaller assignment $C$. Thus, the marginal reward we realise for including any additional assignment *diminishes* as we build a larger and larger set of assignments. This property is called *submodularity* and is equivalent to the more succinct statement that for all $C,D \subseteq A$: $$ f(C) + f(D) \geq f(C \cup D) + f(C \cap D). $$ Submodular optimisation problems are typically NP-hard, but for many of the same families of constraints discussed above, there are still fast, combinatorial algorithms that return solutions no more than a constant factor worse than the optimal solution. The requirement of submodularity is natural in several settings: for example entropy is submodular and so are rewards based on information, such as conditional reduction in entropy or maximisation of mutual-information. Coverage-based objectives, in which we gain a reward only for the first or best sensor that is assigned to a task are also submodular. In general, if we suppose that multiple assignments "interfere" with each other (in the sense that they are together worth *at most* the sum of their individual rewards) then our objective is submodular, but if they may "reinforce" one another (so that they are together worth *more than* the sum of their individual rewards) then our objective is not submodular. ## Bayesian model Let $t = 0,1,\ldots$ be a sequence of discrete time steps. As before, let $S$ represent the set of sensors, $T$ the set of tasks. Depending on the exact setting, we might regard the elements of $T$ as configurations or objects to track, the general notion is that each sensor can be in some way assigned an element of $T$ as a "task". For each sensor $i \in S$, task $j \in T$, and time step $t \in \mathbb{Z}_+$ we denote by $r_{i,j,t}$ the _reward_ that we gain when assigning $i$ to $j$ at time $t$. Concretely, $r_{i,j,t}$ may be a reduction in entropy, gain in information, etc. We will adopt a Bayesian approach in which each $r_{i,j,t}$ is in fact a random variable and we maintain a distribution on these across time steps. Formally, let $t$ now denote some fixed time step and suppose that we have a prior distribution on the values the rewards $r_{i,j,t}$ take at this step. We then solve a constrained allocation problem (using for example either the MILP or combinatorial optimisation approaches discussed here) to maximise the rewards. Then, we configure our sensors and obtain some data $D$ about the world. This data can then be used to obtain a posterior distribution on rewards $r_{i,j,t+1}$ in the next time step. We then iterate this approach using $r_{i,j,t+1}$ as the "prior" in the next step. Note that here we are encoding our belief about the world entirely in terms of the reward variables $r_{i,j,t}$. Depending on the exact setting, it may be beneficial to directly model instead some latent state of the world: for example, a position and velocity for each object, or a transmission frequency for each object, etc. Our rewards will then be calculated from this underlying state of the world. For now, we remain essentially model agnostic and suppose only that there is some way to use observations $D$ to update update rewards. Additionally, our single step optimisation problem remains a general combinatorial optimisation problem. Specifically, the Bayesian model is used only to compute expected rewards at each step. The set of feasible task allocations is thus essentially independent of our observations and our underlying model of the world. There are 2 general ways to improve on this, depending on our exact setting and goals. If we have access to some state of the world, we could incorporate this directly to generate a new set of physical constraints at each time step, given our observations. On the other hand, if some sensor configurations are infeasible for some world states, we could capture this by simply setting the corresponding rewards to 0 in our world model. This has the advantage of keeping the world decoupled from our optimisation problem: the underlying physics and geometry come into our optimisation only indirectly via the computation of rewards $r_{i,j,t}$, and all constraints in our problem stem directly from operational concerns. On the other hand, in some settings it may be that retaining the physics and geometry of the problem gives the rewards $r_{i,j,t}$ some additional structure that can be used to develop a more efficient optimisation routine. The above approach is still myopic, in that it greedily selects the best rewards at each time step. ## First principles Consider a set of space objects $i\in[N]$, sensors $k\in[K]$, and possible sensor configurations $p\in[P]$. We will maintain a point estimate $x_{i,t}$ for the state (position and velocity) of object $i$ at time $t$. We will also maintain an estimate $\sigma^2_{i,t}$ of our uncertainty for object $i$. The uncertainty should properly be a covariance matrix, but for simplicity we imagine that it can be represented well by a positive scalar. If we don't observe object $i$ at all at time $t$, then the dynamics are $$ x_{i,t+1}\gets Lx_{i,t},\quad \sigma^2_{i,t+1}\gets\sigma^2_{i,t}+\epsilon^2. $$ where $L$ is a linear operator and $\epsilon$ is our error in integrating the equations of motion. If we observe $i$ with sensor $k$, then the update for $x_{i,t+1}$ will incorporate this measurement, and the uncertainty updates as $$ \frac{1}{\sigma^2_{i,t+1}}\gets\frac1{\sigma^2_{i,t}+\epsilon^2} + \frac1{\rho_k^2} $$ where $\rho_k^2$ is the characteristic error of sensor $k$. Let $z_{k,p,t}$ be an array of $\{0,1\}$ decision variables, which indicates that at time $t$, sensor $k$ is in configuration $p$. Then, $$ \frac{1}{\sigma^2_{i,t+1}}\gets\frac1{\sigma^2_{i,t}+\epsilon^2} + \sum_{k,p}z_{k,p,t}C_{k,p}(x_{i,t})\frac1{\rho_k^2}. $$ Here, the term $C_{k,p}(x_{i,t})$ deserves some explanation. The function $C_{k,p}$ is the indicator function for all spatial positions visible to sensor $k$ when in configuration $p$. In summary, - $x_{i,t}$: point estimate for object $i$ at time $t$ - $\sigma^2_{i,t}$: uncertainty associated to the estimate $x_{i,t}$ - $C_{k,p}(x_{i,t})$: 0/1 encodes whether object $i$ is visible to sensor $k$ when in configuration $p$ - $z_{k.p.t}$: 0/1 encodes configuration of sensor $k$ at time $t$ - $\rho_k$: error of sensor $k$ - $\epsilon^2$: error in updating the dynamics. We first focus on the one-step look ahead problem. We seek to minimise the total uncertainty (suppressing dependence on $t$ in the notation): \begin{align*} \min_{z_{k,p}\in\{0,1\}^{K\times P}} \sum_i \Bigl(\underbrace{\frac1{\sigma^2_{i}+\epsilon^2} + \sum_{k,p}z_{k,p}C_{k,p}(x_{i})\frac1{\rho_k^2}}_{\sigma^2_{i,t+1}}\Bigr)^{-1}\qquad\text{ s.t. } \quad \sum_p z_{k,p}\le1\;\forall k. \end{align*} Here, $x_{i}$, $\sigma^2_{i}$ are known in advance, so the objective has the form $f(z)=\sum_i\frac1{b_i+a_i^\top z}$, which is convex; however, there are integer constraints. Nevertheless, we can feed this problem to a mixed integer conic program solver. Note that this is not a generalized allocation problem, since the objective is nonlinear. It can be relaxed into GAP form by linearizing the objective, but this would need some care. ### Non-myopic problem A possible extension is (at time $t=1$) \begin{align*} \min_{z_{k,p,t}\in\{0,1\}^{K\times P\times T}} \quad\max_{t=0,\dots,T-1} \quad\sum_i \sigma^{2}_{i,t+1}\quad\text{ s.t. } \quad &\sum_p z_{k,p,t}\le1\; \forall\;k,t,\\ &\sigma^{-2}_{i,t+1}=(\sigma^{2}_{i,t}+\epsilon^2)^{-1} + \sum_{k,p}z_{k,p,t}C_{k,p}(\mathbb{E}[x_{i,t+1} \, | \, x_{i,0}])\rho_k^{-2}\;\forall\;i,t. \end{align*} This formulation is quite unpleasant, since $\sigma^{-2}_{i,t+1}$ is no longer linear in the $z_{k,p}$ for $t>1$. #### Task: Find a tractable reformulation or relaxation of the above non-myopic problem. ### Further classifications: Make predictions of what reward you would get? Yes. Do we know the current positions? As distributions. You have a prior. What are the rewards? They are expected rewards. Example: sum of reduction of uncertainty, sum of information gains. Action space is discrete? Comes from the idea of discrete Markov decision processes. Action sapce can be seen as continuous, but reward space is often discrete 0/1. (See the target or not.) A method that can use such "discontinuous" effects would be useful. Don't want to lose the ability to observe two thing as once. Objects can be grouped together that makes certain assigments as group assignments possible. Don't think too much of specific sensors. (E.g. specific waveforms for radar.) Budget per sensor: is that operational cost? Yes. Some objects can be more expensive to track. Needs to be considered. Changing figuration is instanteneous? Depends on the scenario. Usually not a free lunch. Think of this as additional path constraints over longer time horizons. (Get out of the myopic view.)