# Bluebird: Optimisation agent design notes
In the optimisation agent we will be representing a aircraft's route through
space as a series of control point, with the first and last control points fixed
to be the locations (latitude, longitude, flight level) and time the aircraft
enters and exits the sector, respectively. Fixing the last control point's location
and exit time means that the plane will always meet its coordination objective.

The above image shows two fixed control points (black circles) and two
optimisable control points (white circles). The plane enters the sector at the
left-hand control point and exits are the right-hand control point.
## Representation
The fixed starting control point exists at time $t=0$ and the ending control point
exists at time $t=1$, with time normalised such that $t \in [0, 1]$. Given that we
need each control point's time to be some point along in time such consecutive control
points have strictly monotonically increasing time, representing a
"time of arrival at this location", deciding on a representation to optimise is
non-trivial.
A way to address this problem is to optimise the parameters of a curve which defines
the relative time between control points. The
[incomplete beta function](https://en.wikipedia.org/wiki/Beta_function#Incomplete_beta_function)
([scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.betainc.html))
gives us a way to map equidistance time points, $t_1$ and $t_2$ in the left-hand plot below,
to non-equidistance time points $t'_1$ and $t'_2$ through the parameterisable curve $I(a,b,t)$
with parameters $a, b$, which control the shape of the curve. The reason for choosing this
function is that it has useful properties, such as mapping $[0, 1]$ to $[0, 1]$, and creates
"nice"-looking time curves (right).
<img src="https://i.imgur.com/M3KfMsn.jpg" width="50%"><img src="https://i.imgur.com/wCp1oKz.png" width="50%" >
The $i$-th control point can be defined as a vector
$\mathbf{c}_i = [c_i^{(1)}, c_i^{(2)}, c_i^{(3)}]$,
where the three elements correspond to a $\{lat, lon, fl\}$ location. Therefore, we can
define a set of $N$ control points to optimise to be
$$
\begin{align}
\mathbf{x} & = [a, b, \mathbf{c}_1, \dots, \mathbf{c}_N] \\
& = [a, b, c_1^{(1)}, c_1^{(2)}, c_1^{(3)}, \dots, c_N^{(1)}, c_N^{(2)}, c_N^{(3)}],
\end{align}
$$
i.e. a vector of length $3N + 2$, where $a, b$ are the parameters of a incomplete beta function.
## Evaluation pipeline
The rough pipeline that needs to be carried out to *evaluate* a set of control points
is as follows:
```flow
soln=>start: Solution
tc=>operation: Trajectory Converter
traj=>start: Trajectory
evaluate=>operation: Evaluate Trajectory
reward=>end: Reward
soln(right)->tc(right)->traj(right)->evaluate(right)->reward
```
### Trajectory Converter
This needs to convert a solution $\mathbf{x}$ into a `Trajectory`. This could be
as simple as linear interpolation between the control points at a speed(s) defined
the *time* the aircraft needs to be there (Richard's implementation), or something
more complex, such performing the "preferred flight level"-style aircraft movement of
Tim's representation.
We will, eventually, want to have one that runs the starling simulator with a given
pilot (Linear pilot is roughly Richard's linear interpolation above). This will involve
using the (upcoming) `EnvironmentManager` loop to generate the trajectories.
### Trajectory Evaluation
Chuck the trajectories into the objectives and combine them via some predefined weighting to
get a final loss (negative reward).
## Class Outline
The following is (barely) pseudo-code (more rambling) of what the various components are and how they *could* work.
```python
class OptimisationAgent(Agent):
# This the agent will need to set up
def __init__(self, ..., settings: Dict[str, Any]):
# - Reward(s) and their weights -- or this could be the wrapped reward,
# so it is one reward only
# - number N of control points used (excluding the first/last) to represent a path
# - settings of the optimiser (to be passed into it)
# - other things I may not of though of!
# take in an observation of the environment and return a list of actions telling
# the aircraft what, if anything, to do
def generate_actions(self, environment: Environment) -> List[Action]:
# # # # optimisation stuff:
# - decide if any aircraft we're allowed to tell what to do need optimising
# - if so, call perform_optimisation with the list of callsigns,
# this gives back an optimal solution x*
# - convert x* into a list of actions that need to be performed at specific times
# - add this to "actions to be performed list/queue/??"
#
# # # # always happens:
# - check if the timestamp of the environment matches (or is past?/close to?) a
# time at which action(s) needs to be performed
# - if so, return a list of the actions that need to be performed and update our queue
def perform_optimisation(self, environment: Environment, callsigns: List[str]) -> np.ndarray:
# - create stuff needed to perform optimisation
# - function that needs to be called, i.e. f(x) where |x| = M(3N + 2)
# - the lower/upper bounds (lb/ub) on x,
# i.e. min/max values of lat/lon/fl/(speed/time???)
# - there may be more
# - pass this to the self.optimise(self, f, lb, ub) and get back an optimised
# solution x*. note that this function would be algorithm(Agent)-specific
def identify_relevant_aircraft(self, environment: Environment) -> List[str]:
# (needs a better name, "relevant" means something -- i.e. relevant traffic)
# take in the environment and return a list of the callsigns of the aircraft
# that we're going to optimise. This can be empty.
# this may be a 'strategy' kind of thing, as there is presumably no 'right' answer.
# I think we'd want to start with the naive strategy of identifying the aircraft
# that we haven't optmised yet and ignoring those that we have.
# will change later when we (will probably) find out that this isn't sufficient.
def create_bounds(self, environment: Environment, N:int, M:int) -> Tuple[np.ndarray, np.ndarray]:
# we need the environment here to get the upper/lower bounds on the long/lat/fl
# of the airspace, the minimum value of, for example, the fl should be the lowest
# fl in the sector (i.e. we take the smallest enclosing cuboid
# as the bounds on the parameters)
# - create lower bound array, same length as x M(3N+2)
# - create upper bound array, same length as x M(3N+2)
# return lb, ub
def create_fitness_function(self, aircraft_names: List[str], environment: Environment) -> Callable:
# return a function that evaluates the quality of a solution
def optimise(self, fitness_function: Callable, lb: np.ndarray, ub: np.ndarray, settings: Dict[str, Any]) -> np.ndarray:
# this will be the function that researchers will mainly focus on writing
# for the time being, it could just return the middle of the bounds,
# i.e. (ub - lb) / 2
def map_solution_to_actions(self, x: np.ndarray) -> List[Action]:
# this could also be overridden (I guess?) with different ways of doing this
# however, just getting way should be fine for now
# this seems non-trivial but I haven't spent much time thinking about it
```