Bluebird: Optimisation agent design notes

# Bluebird: Optimisation agent design notes In the optimisation agent we will be representing a aircraft's route through space as a series of control point, with the first and last control points fixed to be the locations (latitude, longitude, flight level) and time the aircraft enters and exits the sector, respectively. Fixing the last control point's location and exit time means that the plane will always meet its coordination objective. ![](https://i.imgur.com/4U3DrB1.jpg) The above image shows two fixed control points (black circles) and two optimisable control points (white circles). The plane enters the sector at the left-hand control point and exits are the right-hand control point. ## Representation The fixed starting control point exists at time $t=0$ and the ending control point exists at time $t=1$, with time normalised such that $t \in [0, 1]$. Given that we need each control point's time to be some point along in time such consecutive control points have strictly monotonically increasing time, representing a "time of arrival at this location", deciding on a representation to optimise is non-trivial. A way to address this problem is to optimise the parameters of a curve which defines the relative time between control points. The [incomplete beta function](https://en.wikipedia.org/wiki/Beta_function#Incomplete_beta_function) ([scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.betainc.html)) gives us a way to map equidistance time points, $t_1$ and $t_2$ in the left-hand plot below, to non-equidistance time points $t'_1$ and $t'_2$ through the parameterisable curve $I(a,b,t)$ with parameters $a, b$, which control the shape of the curve. The reason for choosing this function is that it has useful properties, such as mapping $[0, 1]$ to $[0, 1]$, and creates "nice"-looking time curves (right). <img src="https://i.imgur.com/M3KfMsn.jpg" width="50%"><img src="https://i.imgur.com/wCp1oKz.png" width="50%" > The $i$-th control point can be defined as a vector $\mathbf{c}_i = [c_i^{(1)}, c_i^{(2)}, c_i^{(3)}]$, where the three elements correspond to a $\{lat, lon, fl\}$ location. Therefore, we can define a set of $N$ control points to optimise to be $$ \begin{align} \mathbf{x} & = [a, b, \mathbf{c}_1, \dots, \mathbf{c}_N] \\ & = [a, b, c_1^{(1)}, c_1^{(2)}, c_1^{(3)}, \dots, c_N^{(1)}, c_N^{(2)}, c_N^{(3)}], \end{align} $$ i.e. a vector of length $3N + 2$, where $a, b$ are the parameters of a incomplete beta function. ## Evaluation pipeline The rough pipeline that needs to be carried out to *evaluate* a set of control points is as follows: ```flow soln=>start: Solution tc=>operation: Trajectory Converter traj=>start: Trajectory evaluate=>operation: Evaluate Trajectory reward=>end: Reward soln(right)->tc(right)->traj(right)->evaluate(right)->reward ``` ### Trajectory Converter This needs to convert a solution $\mathbf{x}$ into a `Trajectory`. This could be as simple as linear interpolation between the control points at a speed(s) defined the *time* the aircraft needs to be there (Richard's implementation), or something more complex, such performing the "preferred flight level"-style aircraft movement of Tim's representation. We will, eventually, want to have one that runs the starling simulator with a given pilot (Linear pilot is roughly Richard's linear interpolation above). This will involve using the (upcoming) `EnvironmentManager` loop to generate the trajectories. ### Trajectory Evaluation Chuck the trajectories into the objectives and combine them via some predefined weighting to get a final loss (negative reward). ## Class Outline The following is (barely) pseudo-code (more rambling) of what the various components are and how they *could* work. ```python class OptimisationAgent(Agent): # This the agent will need to set up def __init__(self, ..., settings: Dict[str, Any]): # - Reward(s) and their weights -- or this could be the wrapped reward, # so it is one reward only # - number N of control points used (excluding the first/last) to represent a path # - settings of the optimiser (to be passed into it) # - other things I may not of though of! # take in an observation of the environment and return a list of actions telling # the aircraft what, if anything, to do def generate_actions(self, environment: Environment) -> List[Action]: # # # # optimisation stuff: # - decide if any aircraft we're allowed to tell what to do need optimising # - if so, call perform_optimisation with the list of callsigns, # this gives back an optimal solution x* # - convert x* into a list of actions that need to be performed at specific times # - add this to "actions to be performed list/queue/??" # # # # # always happens: # - check if the timestamp of the environment matches (or is past?/close to?) a # time at which action(s) needs to be performed # - if so, return a list of the actions that need to be performed and update our queue def perform_optimisation(self, environment: Environment, callsigns: List[str]) -> np.ndarray: # - create stuff needed to perform optimisation # - function that needs to be called, i.e. f(x) where |x| = M(3N + 2) # - the lower/upper bounds (lb/ub) on x, # i.e. min/max values of lat/lon/fl/(speed/time???) # - there may be more # - pass this to the self.optimise(self, f, lb, ub) and get back an optimised # solution x*. note that this function would be algorithm(Agent)-specific def identify_relevant_aircraft(self, environment: Environment) -> List[str]: # (needs a better name, "relevant" means something -- i.e. relevant traffic) # take in the environment and return a list of the callsigns of the aircraft # that we're going to optimise. This can be empty. # this may be a 'strategy' kind of thing, as there is presumably no 'right' answer. # I think we'd want to start with the naive strategy of identifying the aircraft # that we haven't optmised yet and ignoring those that we have. # will change later when we (will probably) find out that this isn't sufficient. def create_bounds(self, environment: Environment, N:int, M:int) -> Tuple[np.ndarray, np.ndarray]: # we need the environment here to get the upper/lower bounds on the long/lat/fl # of the airspace, the minimum value of, for example, the fl should be the lowest # fl in the sector (i.e. we take the smallest enclosing cuboid # as the bounds on the parameters) # - create lower bound array, same length as x M(3N+2) # - create upper bound array, same length as x M(3N+2) # return lb, ub def create_fitness_function(self, aircraft_names: List[str], environment: Environment) -> Callable: # return a function that evaluates the quality of a solution def optimise(self, fitness_function: Callable, lb: np.ndarray, ub: np.ndarray, settings: Dict[str, Any]) -> np.ndarray: # this will be the function that researchers will mainly focus on writing # for the time being, it could just return the middle of the bounds, # i.e. (ub - lb) / 2 def map_solution_to_actions(self, x: np.ndarray) -> List[Action]: # this could also be overridden (I guess?) with different ways of doing this # however, just getting way should be fine for now # this seems non-trivial but I haven't spent much time thinking about it ```