Quick Presentation on indirect methods
===
###### tags: `Presentation`
## Hamiltonian formulation and Pontryagin's maximum principle
:::warning
The following content is rocket science. Literally. Its first application was for the maximization of terminal rocket speed.
:rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket: :rocket:
:::
Given an optimization problem of the form:
$$
\begin{split}
&\max_u&\bigg{[}\phi(x(t_1)) + \int_{t_0}^{t_1} f(t,x(t),u(t))dt \bigg{]} \\
&\mathrm{s.t.}\qquad &x' = g(t, x(t),u(t))\\
& & x(t_0) = x_0
\end{split}
\tag{1}
$$
One can instead study and solve the following system, __hoping__ for a identical solution.
$$
\begin{split}
&\frac{\partial H}{\partial u} = 0 \\
& \lambda' = -\frac{\partial H}{\partial x} \\
&\lambda(t_1) = \phi'(x^*(t_1))
\end{split}
\\
\mathrm{where}\quad H = f(t,x(t),u(t)) + \lambda(t)g(t,x(t),u(t))
\tag{2}
$$
This follows from __Pontryagin's maximum principle__ that states that if an optimal pair $(x^*,u^*)$ exists for (1), then there exists a piecewise differentiable adjoint variable $\lambda(t)$ such that:
$$
H(t,x^*,u,\lambda^*)\leq H(t,x^*,u^*,\lambda^*)
$$
for all controls u at every t, where $H$ is characterized by the system (2)
:::danger
The maximum principle is a __necessary condition__ that the pair $(x^*,u^*)$ must satisfy in order to be considered a potential solution to (1)
To ensure that a solution to (2) is also a solution to (1), one must verify additional conditions, in a similar fashion to the Lagrangian and the KKT conditions.
A simple condition is to check for concavity (convexity) of both $f$ and $g$ w/r to x and u.
:::
:::info
__Why indirect?__
Because you do not solve the problem directly. First you need to formulate the Hamiltonian and the control characterization; then you can solve the alternate problem.
:::
### Advantage of solving (2) over (1)
The optimality condition ($\frac{\partial H}{\partial u} = 0$) holds **pointwise**; this is an important simplifaction as it means that instead of trying to find the maximal $u(t)$ among an infinitely huge family of fonctions, you can find $u^*(t)$ by selecting the control that maximize $H$ at every t.
### Derivation
We can derive the Hamiltonian and the adjoint equation by leveraging tools from calculus of variations.
Let's define the objective we want to optimize w/r to u by:
$$
J(u) = \phi(x(t_1)) + \int_{t_0}^{t_1} f(t,x(t),u(t)) dt
\tag{3}
$$
__Assumptions__ : Let's assume a piecewise continuous control exists, $u^*(t)$, along a corresponding piecewise continuous state $x^*(t)$. Let's also assume that $f$ and $g$ are continuously differentiable w/r to their 3 arguments ($t,x,u$). Finally, $\phi$ is also assumed to be continuously differentiable w/r to $x$
We want to study variations in the vicinity of $u^*(t)$. Let's take the constant $\epsilon \in \mathbb{R}$ and the piecewise continuous variation function $h(t)$. We define $u^{\epsilon}(t)$ as
$$
u^{\epsilon}(t) = u^*(t) + \epsilon h(t)
$$
And it follows that the associated state $x^{\epsilon}$ must satisfy
$$
\frac{d}{dt}x^{\epsilon}(t) = g(t,x^{\epsilon}(t),u^{\epsilon}(t))
$$
With the boundary condition $x^{\epsilon}(t_0) = x_0$ still satisfied.
Obviously, as $\epsilon \to 0$ we have $u^{\epsilon}(t) \to u^*(t)$ and $x^{\epsilon}(t) \to x^*(t)$ since $g$ is differentiable w/r to $x$ and $u$. Let's remark that:
$$
\frac{\partial u^{\epsilon}(t)}{\partial\epsilon}\Bigg|_{\epsilon =0} = h(t)
$$
And
$$
\frac{\partial x^{\epsilon}(t)}{\partial\epsilon}\Bigg|_{\epsilon =0} \exists \quad \mathrm{since}\;\lim_{\epsilon \to 0} x^{\epsilon}(t)\: \exists
$$
With this, we now want to study the objective at $u^\epsilon$, that is:
$$
J(u^\epsilon) = \phi(x^\epsilon(t_1))+\int_{t_0}^{t_1} f(t,x^\epsilon(t),u^\epsilon(t)) dt
$$
We now introduce the adjoint, $\lambda(t)$, that for now we just define as a piecewise differentiable function on $[t_0,t_1]$. From FTC we have
$$
\int_{t_0}^{t_1} \frac{d}{dt}[\lambda(t)x^\epsilon(t)]dt + \lambda(t_0)x_0 - \lambda(t_1)x^\epsilon(t_1) = 0
$$
And thus
$$
\begin{split}
J(u^\epsilon) &= \phi(x^\epsilon(t_1))+\int_{t_0}^{t_1} \bigg[f(t,x^\epsilon(t),u^\epsilon(t))+\frac{d}{dt}[\lambda(t)x^\epsilon(t)]\bigg] dt+ \lambda(t_0)x_0 - \lambda(t_1)x^\epsilon(t_1)\\
&= \phi(x^\epsilon(t_1)) + \int_{t_0}^{t_1} \bigg[f(t,x^\epsilon(t),u^\epsilon(t))+\lambda'(t)x^\epsilon(t)+\lambda(t)g(t,x^{\epsilon}(t),u^{\epsilon}(t))\bigg] dt \\
& \qquad \qquad \qquad \qquad \qquad +\lambda(t_0)x_0 - \lambda(t_1)x^\epsilon(t_1)
\end{split}
$$
We know the maximum of $J(u^\epsilon)$ occurs at $u^*$ (by definition), so we have $\frac{d}{d\epsilon}J(u^\epsilon)\Big|_{\epsilon =0}=0$ and so
$$
0 = \int_{t_0}^{t_1} \frac{\partial}{\partial\epsilon}\bigg[f(t,x^\epsilon(t),u^\epsilon(t))+\lambda'(t)x^\epsilon(t)+\lambda(t)g(t,x^{\epsilon}(t),u^{\epsilon}(t))\bigg] dt\Bigg|_{\epsilon =0} - \frac{\partial}{\partial\epsilon}\lambda(t_1)x^\epsilon(t_1)\Big|_{\epsilon =0} \\
+\frac{\partial}{\partial\epsilon}\phi(x^\epsilon(t_1))\Big|_{\epsilon =0}
$$
We can apply the derivative and develop:
$$
\begin{split}
0 &= \int_{t_0}^{t_1} \bigg[f_x \frac{\partial x^{\epsilon}}{\partial\epsilon} + f_u \frac{\partial u^{\epsilon}}{\partial\epsilon} + \lambda'(t) \frac{\partial x^{\epsilon}}{\partial\epsilon} + \lambda(t)\bigg(g_x \frac{\partial x^{\epsilon}}{\partial\epsilon} + g_u \frac{\partial u^{\epsilon}}{\partial\epsilon} \bigg) \bigg]dt\Bigg|_{\epsilon =0} \\
& \qquad \qquad \qquad +\big(\phi'(x(t_1)) - \lambda(t_1)\big) \frac{\partial x^{\epsilon}}{\partial\epsilon}(t_1)\Bigg|_{\epsilon =0}\\
&= \int_{t_0}^{t_1} \bigg[ \big(f_x + \lambda(t)g_x + \lambda'(t) \big) \frac{\partial x^{\epsilon}}{\partial\epsilon}\Bigg|_{\epsilon =0} + \big(f_u + \lambda(t)g_u \big)h(t)\bigg]dt\\
& \qquad \qquad \qquad +\big(\phi'(x(t_1)) - \lambda(t_1)\big) \frac{\partial x^{\epsilon}}{\partial\epsilon}(t_1)\Bigg|_{\epsilon =0}\\
\end{split}
$$
Now, to exploit our adjoint function, we want to simplify this expression as much as we can, say by getting rid of the terms relying on $\frac{\partial x^{\epsilon}}{\partial\epsilon}$. For that, we can thus define the adjoint as:
$$
\lambda'(t) = -[f_x + \lambda(t)g_x]
\tag{4}
$$
With:
$$
\lambda(t_1) = \phi'(x(t_1))
\tag{5}
$$
Therefore, for this particular choice of adjoint we get
$$
0 = \int_{t_0}^{t_1} \big(f_u + \lambda(t)g_u \big)h(t)dt
$$
Since this must be true for every piecewise continuous $h(t)$ this must also be true for $h(t) = f_u + \lambda(t)g_u$ leading to:
$$
\begin{split}
0 &= \int_{t_0}^{t_1} \big(f_u + \lambda(t)g_u \big)^2dt\\
&\quad \implies f_u + \lambda(t)g_u = 0
\end{split}
\tag{6}
$$
#### Control bounds
Bounds, say $a\leq u\leq b$, can be taken into account when you derive the necessary conditions, which in the case of a maximization problem leads to the modify version condition (6) at an optimal point $u^*(t)$. (Reverse inequality signs when minimization)
$$
\begin{cases}
u^*(t)=a &\mathrm{if} &\frac{\partial H(t)}{\partial u(t)} <0 \; \mathrm{over\; all\;}[a,b]\\
a \leq u^*(t)\leq b &\mathrm{if} &\frac{\partial H(t)}{\partial u(t)} =0 \; \mathrm{somewhere\; on\;}[a,b]\\
u^*(t)=b &\mathrm{if} &\frac{\partial H(t)}{\partial u(t)} >0 \; \mathrm{over\; all\;}[a,b]
\end{cases}
$$
This shouldn't be surprising as it is equivalent to saying that in order to maximize (minimize) $H(u)$ on $u \in [a,b]$, you can follow the gradient.
:::info
All the relationship that we just derived can be extended to a problem with multiple states and multiples controls direclty.
:::
### Discrete time setting
We can also derive the Hamiltonian in discrete time with similar steps than above but replacing $\int$ by $\sum$.
Leading to the following form:
#### Pierre-Luc's derivation in discrete time
Worth noting, in his class notes for IFT 6760C, Pierre-Luc derive the hamiltonian and the adjoint equations for the discrete time setting in a very interesting manner, by first posing the adjoint in the setting where you only have a terminal reward, then extending it to a "Bolza discrete form". See p.55-59 of his notes
### Connecting this to the Lagrangian formulation
The Hamiltonian is indeed very similar to the Lagrangian formulation. In fact, if you consider the dependency of the adjoint over $t$, you get the following expression with the Lagrangian formulation:
$$
L = \int_{t_0}^{t_1} \big(f(t,x(t),u(t)) dt + \lambda(t)\big[g(t,x(t),u(t)) - x'(t) \big]\big)dt
$$
With integration by parts, you can rexpress the last term as:
$$
-\int_{t_0}^{t_1} \lambda(t)x'(t) = -\lambda(t_1)x(t_1) + \lambda(t_0)x(t_0) + \int_{t_0}^{t_1} \lambda'(t)x(t)
$$
When substituing this expression back in the Lagrangian, we can see that we recover an expression very similar to what we encountered previously.
## Forward-Backward Sweep Method (FBSM)
When you are able to find a characterization for $u^*$, that is if solving the equation $\frac{\partial H}{\partial u} = 0$ leads to an expression of the form:
$$
u^* = h(x(u^*),\lambda(u^*))
$$
One can be very tempted to apply fixed-point iteration, leading to the FBSM algorithm. The difficulty here resides in the fact that to calculate $h$ in the above equation, we must solve the ODE over $x$ and $\lambda$.
Thus:
1) Pick an initial guess over $u$
2) With this $u$, solve for $x$ given $x'(u)$ and $x_0$ (forward sweep)
3) With $u$ and $x(u)$, solve for $\lambda$ given $\lambda'(x,u)$ and $\lambda(t_1)$ (backward sweep)
4) Calculate $h(x(u),\lambda(u))$ and take a step toward it
5) Repeat until convergence
Under some Lipschitz conditions over $f$ and $g$ the algorithm can be showned to converge, see [McAsey, Mou & Han](https://www.researchgate.net/publication/257549034_Convergence_of_the_forward-backward_sweep_method_in_optimal_control).
But, as it often is, it is much more simple to try out the algo and check if the necessary conditions are met when it converges.
### Examples from Lenhart
__Example 1__ : Bear population control


__Example 2__ : Bang-bang control -- Timber harvesting
The FBSM works well when we can find a characterization for $u^*$, that is
$$
\frac{\partial H}{\partial u} = 0 \implies u^* = \mathrm{something}
$$
This approach hits a wall when $H$ is linear w/r to u, as it would then be impossible to find such a charachterization.
But we can still use the FBSM for very particular cases when $u$ is bounded in $[a,b]$. Indeed, if $\frac{\partial H}{\partial u} \neq 0$ everywhere in $[a,b]$ (and for all $t$) then we can follow the gradient to recover the maximum/minimum of $H$ on $[a,b]$ (as discussed previously when we introduced bounds).
This leads to the "bang-bang" solutions, named like this because you jump from one bound to the other.
To our example:

Where we have that
$$
\frac{\partial H}{\partial u} = x(k\lambda - e^{-rt})
$$
After making sure that
__Example 3__ : Invasive Plant Species
By taking into account the modified necessary conditions for the dicrete case, we can extend the FBSM to discrete setting problem.
$$
\begin{split}
&\min_u& \sum_{j=1}^5\bigg{[}r_{j,t} + B \sum_{t=0}^{T-1}u_{j,t}^2 \bigg{]} \\
&\mathrm{s.t.}\qquad &r_{j,\, t+1}=\bigg(r_{j,t}+\frac{r_{j,t}k}{\epsilon + r_{j,t}}\bigg)(1-u_{j,t}),\;r_{j,0} = \rho_j\\
& & 0\leq u_{j,t} \leq 1
\end{split}
$$
## Versatility
The indirect approach, in general, can be very versatile as it is often possible to rexpress a given problem in the form in which we previously developped the Pontryagin's maximum principle.
### Higher order differential
If we are given a problem where a state variable is defined via a higher order diffential equation, say
$$
\begin{split}
\mathrm{s.t.}\quad &x^{(n+1)}(t) = g(t,x(t),x'(t),...,x^{(n)}(t),u_1(t),...,u_m(t))\\
& x(t_0) = \alpha_1, x'(t_0)=\alpha_2, ... , x^{(n+1)}(t_0)=\alpha_{n+1}
\end{split}
$$
Then when can easily convert this problem into the form we have ben using by introducing $n+1$ states variables, s.t. $x_1(t)=x(t), x_2(t)=x'(t),...,x_{n+1}(t)=x^{(n)}(t)$:
$$
\begin{split}
\mathrm{s.t.}\quad & x_1'(t) = x_2(t),\; x_1(t_0)=\alpha_1 \\
& x_2'(t) = x_3(t),\; x_2(t_0)=\alpha_2 \\
&\qquad\vdots \\
& x_n'(t) = x_{n+1}(t),\; x_n(t_0)=\alpha_n \\
& x_{n+1}'(t) = g(t,x_1(t),x_2(t),...,x_{n+1}(t),u_1(t),...,u_m(t))\\
\end{split}
$$
### Isoperimetric constraint
Imagine that we want to exhaust all the ressources available to our control; for example imagine $u$ is a the drug amount administered through a treatment and we know that it is safe to use the total amount $B$ over the course of the treatment. If we want to administer precisely $B$, we are facing a problem of the form:
$$
\begin{split}
&\max_u&\bigg{[}\phi(x(t_1)) + \int_{t_0}^{t_1} f(t,x(t),u(t))dt \bigg{]} \\
&\mathrm{s.t.}\qquad &x' = g(t, x(t),u(t)),\;x(t_0) = x_0\\
& & \int_{t_0}^{t_1} h(t, x(t),u(t))dt=B\\
& & a\leq u(t) \leq b
\end{split}
$$
Where for the case we just discussed, we would have $h(t, x(t),u(t)) = u(t)$.
Those kind of problem can also be brought back to the form we have been working with by introducing again a new state variable
$$
z(t) = \int_{t_0}^{t} h(s, x(s),u(s))\, ds
$$
And the stated problem become:
$$
\begin{split}
&\max_u&\bigg{[}\phi(x(t_1)) + \int_{t_0}^{t_1} f(t,x(t),u(t))dt \bigg{]} \\
&\mathrm{s.t.}\qquad &x' = g(t, x(t),u(t)),\;x(t_0) = x_0\\
& & z'(t) = h(t, x(t),u(t)),\;z(t_0) = 0,\, z(t_1)=B\\
& & a\leq u(t) \leq b
\end{split}
$$
### Free terminal time
It is worth noting that the maximum's principle can be adapted to problem with free terminal time, that is , problem of the form:
$$
\begin{split}
&\max_{u,\, T}&\bigg{[}\phi(T,x(T)) + \int_{t_0}^{T} f(t,x(t),u(t))dt \bigg{]} \\
&\mathrm{s.t.}\qquad &x' = g(t, x(t),u(t)),\;x(t_0) = x_0\\
\end{split}
$$
In which case a new necessary condition arise that the Hamiltonian must sastisy at an optimal control which is:
$$
H(T^*,x^*(T^*),u^*(T^*),\lambda^*(T^*)) + \phi'(T^*) = 0
$$
So if we are able to solve the initial problem as usual with $t_1=T^*$, an unknow quantity, then we can use this new necessary condition to try to determine $T^*$
See Lenhart chapter 20 for additional information
_________