# ICMS Modelling Camp 2020 - Problem 2 Tugboats
**Problem recap**

The goal is to minimise the amount of fuel consumed by a tugboat while parking a ship in a harbour.
We treat this as a reinforced learning problem where some actions are associated with a positive or negative reward. Consuming fuel is associated with a negative reward proportionnel to that consumption.
**Physics part**
Our goal here is to associate different types of possible actions with a fuel cost.
We do this for a rectilign movement, starting from 0 speed and reaching a certain desired speed and for a rotation of a certain desired angle.
In general, the amount of fuel consumed is given by the specific rate of consumption of the engine $S$ (L/W/s). If I pull with power $P$ for time $T$ then the volume of fuel consumed is
$$ V=\int_0^T P\times S \, dt$$
What is power?
$$P = F(t)v(t)$$
with $F(t)$ is the force of the tugboat and $v(t)$ is its velocity.
**Rectilign movement**
* First we have movement in a straight line in the direction the ship is pointing. We initially assume no drag (this is not realistic) amd a constant force $F_{tug}$ applied by the tugboat.

Using Newton's second law of motion and the above formulas, we can find the fuel consumption for the whole journey (acceleration from 0 to $v^*$ (cruising velocity) +cruising without resistance at this speed + deceleration).
The fuel cost is
$$
V=Smv^{*2}
$$
This doesn't depend on the distance travelled since there is no acceleration/force/fuel consumption during crusing.
* Now assuming drag
The drag force on an object is given by the formula
$$F_{drag}=\frac{1}{2}\rho C_d A v^2$$
where $\rho$ is the density of the fluid, $A$ is the surface area of the object, $v$ is the velocity, and $C_d$ is the drag coefficient.

This time we take $F_{tugA}$ and $F_{tugD}$ to be constant forces, applied by the tugboat at acceleration and deceleration respectively. We also set a safe desired velocity $v^*$.

During the initial acceleration phase, the force applied y the tugboat and the drag force are opposite in directions, so we have $$m \ddot{x}=F_{tugA}-F_{drag}
$$
This leads to the speed during this acceleration phase to be
$$
v_1(t)=\sqrt{\frac{F_{tugA}}{K}}\tanh \left( \frac{t \sqrt{F_{tugA} K}}{m} \right)
$$
so the amount of fuel $V_1$ required is
$$
\boxed{V_1=\frac{F_{tugA}Sm}{K}\ln\Bigg\{\cosh \left(\frac{\sqrt{F_{tugA}K}}{m}t_a \right)\Bigg\}}
$$
Where $K=\frac{1}{2}\rho C_d A$ is a constant associated with the drag force and $t_a=\frac{m}{\sqrt{F_{tugA}K}} tanh^{-1}\left( \sqrt{\frac{K}{F_{tugA}}}v^*\right)$.
$V_1(F_{tugA})$ is decreasing, so the optimum can be found on the right border of the support of $F_{tugA}$, that can be known from the physical boundaries of the engine.
The distance of the tugboat and ship (approximated as a point) from the starting point after the acceleration phase is going to be $$d(t_a)=\sqrt{\frac{F_{tugA}}{K}}\int_0^{t_a} tanh \left( \frac{t \sqrt {F_{tugA}K}}{m}\right)dt \\=\frac{m}{K}ln\Bigg|cosh\left( \frac{t_a\sqrt {F_{tugA}K}}{m}\right)\Bigg|\\
=\frac{m}{K}ln\Bigg|cosh\left( \sqrt {F_{tugA}}tanh^{-1}\left(v^*\sqrt{\frac{K}{F_{tugA}}}\right)\right)\Bigg|.$$
Now the fuel consumption $V_2$ for moving at the constant velocity $v^*$ for $t_b-t_a$ amount of time
$$
\boxed{V_2=SKv^{*3}(t_b-t_a)}
$$
That amount of time is such that $t_b=\frac{d-d_1-d_2}{v^*}+t_a$ for a given distance $d$ to be covered a cruising speed, where $d_1=d(t_a)$ is the distance covered during the acceleratin phase and $d_2$ is the distance covered during the deceleration phase. Either $d_2$ or the total time of the operation $T$ should be defined arbitrarily.
During the final deceleration phase, the force applied by the tugboat to break and the drag force are in the same direction, so we have $$m \ddot{x}=-F_{tugD}-F_{drag}
$$the velocity for deceleration is
$$
v_3(t) = \sqrt{\frac{F_{tugD}}{K}} \tan \left( \tan^{-1} \left( v^* \sqrt{\frac{K}{F_{tugD}}} \right) + \frac{\sqrt{F_{tugD}K} (t_b - t) }{m} \right).
$$
so fuel consumption for the deceleration part is
$$
\boxed{V_3= F_{tugD}\frac{Sm}{K} \log \Big\{\frac{{\cos(\alpha (t_b-T)+\beta)}}{\cos\beta}\Big\}}
$$
where $\alpha =\frac{\sqrt{K}F_{tugD}^{1/2}}{m}$ and $\beta=tan^{-1}(\sqrt\frac{K}{F_{tugD}}v^*).$ $T$ is the final time. It is such that $v_3(T)=0$ that is $$tan^{-1} \left( v^* \sqrt{\frac{K}{F_{tugD}}} \right) + \frac{\sqrt{F_{tugD}K} (t_b - T) }{m}=0.$$
So $$c:=T-t_b=\frac{m}{\sqrt{F_{tugD}K}}tan^{-1}\left(v^*\sqrt{\frac{K}{F_{tugD}}}\right).$$
Hence, the distance covered during the deceleration phase, $d_2$, is given by $$d_2=\int_{t_b}^{t_b+c}v_3(t)dt=\sqrt{\frac{F_{tugD}}{K}} \int_{t_b}^{t_b+c}\tan \left( \tan^{-1} \left( v^* \sqrt{\frac{K}{F_{tugD}}} \right) + \frac{\sqrt{F_{tugD}K} (t_b - t) }{m} \right)dt\\=-\frac{m}{K}ln\Bigg|cos \left( \tan^{-1} \left( v^* \sqrt{\frac{K}{F_{tugD}}} \right)\right) \Bigg|$$
In conclusion, for a rectilign movement with cruising speed, the total amount of fuel spend is
$$\boxed{V_{tot}(F_{tugA},F_{tugD},v^*,t_b)=V_1(F_{tugA},v^*)+V_2(v^*,t_b)+V_3(F_{tugD},t_b)}.$$
WORK IN PROGRESS: write $t_b$ as a function of $v^*$ and the distance $d$ and we are calculating the fuel costs of rotations.
NB:To incorporate the *added mass* we should correct the mass $m$ into $4/3 m$.
....stuff for the angular velocity...
For the acceleration we have the angular velocity
$$
\omega_1 (t) = \sqrt{\frac{F}{k}} \tanh \left( \frac{t \sqrt{Fk} }{3mr} \right)
$$
which results in the cross-radial speed
$$
v_1 (t) = \omega_1 (t) r
$$
and cost function
$$
\boxed{V_1=\frac{F_{tugA}Smr^2}{K}\ln\Bigg\{\cosh \left(\frac{\sqrt{F_{tugA}K}}{3mr}t_a \right)\Bigg\}}
$$
In the interval $(t_a, t_b)$ the boat moves with constant angular velocity where $\omega_2(t)=\sqrt{\frac{3F}{K}}$, so the fuel consumption for $t_a$ to $t_b$ is
Now, the decelaration part starting at $t_b$:
$$
\omega_3 (t) = \sqrt{ \frac{F}{k} } \tan \left( \tan ^{-1} \left( \omega ^* \sqrt{ \frac{k}{F}} \right) + \frac{\sqrt{Fk} }{3mr} (t_b - t) \right)
$$
This results in a cross-radial speed
$$
v_3 (t) = \omega_3 (t) r
$$
and a fuel cost
$$
V_3 (t) = \frac{3mr^2SF}{k} \ln \left( \frac{\cos \left( \tan^{-1}\left(\omega^* \sqrt{\frac{k}{F}} \right) + \frac{\sqrt{Fk}}{3mr}(t_b-t) \right) }{\cos \left( \tan^{-1}\left(\omega^* \sqrt{\frac{k}{F}} \right) \right) }\right)
$$
Now we want to express $t_{a,b,c}$ in terms of the desired rotation angle $\phi$ , the cruising speed $v^*$ and the tugboat forces. The time at which we reach cruising speed is just $t_a$ such that $v_1(t_a) = v^*$, i.e. $t_a =\frac{3mr}{\sqrt{Fk}} \tanh ^{-1} \left( \frac{v^* \sqrt{k}}{r\sqrt{F}} \right)$
and the time needed to decelerate from $v^*$ to 0 is $t_c - t_b = \frac{3mr \tan ^{-1} \left( \omega^* \sqrt{\frac{k}{F}} \right)}{ \sqrt{Fk}}$. In order to calculate $t_b$ we look at the total angle we would like to rotate:
$$
\phi = \int_0^{t_a} \omega_1 (t) dt + \int_{t_a}^{t_b} \omega_2 (t) dt \int_{t_b}^{t_c} \omega_3 (t) dt.
$$
From this, we can express
\begin{align*}
t_b - t_a &= \frac{r}{v^*} \left( \phi -\int_0^{t_a} \omega_1 (t) dt - \int_{t_b}^{t_c} \omega_3 (t) dt \right) \\
&= \frac{r}{v^*} \left( \phi -\frac{mr}{k} \ln \left(\frac{cosh\left(\frac{\sqrt{FK}}{3mr}t_a\right)\left(cos\left(tan^{-1}\left(w^*\sqrt{\frac{K}{F}}\right)\right)\right)^3}{\left(cos\left( tan^{-1}\left(w^*\sqrt{\frac{K}{F}}\right)+\frac{\sqrt{FK}}{3mr}(t_b-t_c)\right)\right)^3} \right)\right)
\end{align*}
(last bit is very messy)
Now the input parameters are:
$\phi$ - desired rotation angle
$r$ - half of boat length
$F$ tugboat forces
...and we could optimize the fuel comsumption w.r.t the tugboat forces $F$.
**Game part**
Denote by $S =\{0,..,n\}^2\times\{0,1,...,359\}$ our *state space*.
And let $$A=\{\uparrow,\downarrow,\to, \gets,\circlearrowleft,\circlearrowright\} $$
to be the *action space (control space)*.
Where $\uparrow,\downarrow$ is a translation of the ship along its long axis (forwards and backwards) and by $\to, \gets$ we denote translations of the ship perpendicular to that axis by one unit in each case.
$\circlearrowleft,\circlearrowright$ will be anticlockwise and clockwise rotations of the ship by one degree.
(Note: In future we should add momentum. The assumption that we should come to rest at the end of each of these small actions is clearly bad.)
Let the *reward function* $R:S\times A \to \mathbb{R}$ measure the cost of transition from state $s \in S$ under action $a \in A$ and the *environment function* be $E:S\times A \to S$ showing where in $S$ the object lands starting from $s$ under $a$.
Next, let $\pi:S\to A$ be the *policy* function which determines what action is taken given a state $s \in S$. A standard choice for this is the so called Greedy Function. The greedy function with respect to a given action-value cumulative reward function Q is given by $\pi_Q^*(s)=argmax_{a\in A} Q(s,a)$.
State-action cumulative reward function, $Q_{\pi,E}(s_0,a)=R(s_0,a)+\sum_{t=1}^\infty\gamma^t R(s_t,a_t)$. First action is given by $a$, and then after this the sequence of states and actions are defined by the environment $E$ and policy $\pi$ respectively.
Our objective is to find $\pi$ which $\textbf{maximises}$ *cumulative reward* $V_{\pi,E}(s_0) = \sum_{t=0}^\infty \gamma^tR(s_t,a_t)$, where $s_0,s_1,..$ and $a_0,a_1,...$ are the sequences of states and actions obtained by following $\pi$ from initial state $s_0$ in environment $E$.
Define place-holder reward function:
$$
R(s,a)=
\begin{cases}
-10,& if\quad a\in\{\uparrow,\downarrow\}\\
-100,& if \quad a\in\{\leftarrow,\rightarrow\}\\
-1,& if \quad a\in\{\circlearrowleft,\circlearrowright\}
\end{cases}
$$
Given $(x,y)$, orientation $\theta$ and a distance $d$ to travel, we'll end up:
$\uparrow$ : $x_{new} = x+d\cdot\cos(\frac{\theta \pi}{180})$
$\quad$ $y_{new} = y+d\cdot\sin(\frac{\theta \pi}{180})$
$\downarrow$ : $x_{new} = x+d\cdot\cos(\frac{\theta \pi}{180} +\pi)$
$\quad$ $y_{new} = y+d\cdot\sin(\frac{\theta \pi}{180} + \pi)$
$\leftarrow$: $x_{new} = x+d\cdot\cos(\frac{\theta \pi}{180} +\frac{\pi}{2})$
$\quad$ $y_{new} = y+d\cdot\sin(\frac{\theta \pi}{180} + \frac{\pi}{2})$
$\rightarrow$: $x_{new} = x+d\cdot\cos(\frac{\theta \pi}{180} -\frac{\pi}{2})$
$\quad$ $y_{new} = y+d\cdot\sin(\frac{\theta \pi}{180} - \frac{\pi}{2})$
This is a basic model which does not take into account time taken and very importantly the momentum because in our model above we assume that we move in increments: we start and end actions at a standstill.
First improvement would be to add time as a new dimension to the action space:
$$A=\{\uparrow,\downarrow,\to, \gets,\circlearrowleft,\circlearrowright\} \times \{1,10,50\} $$
which we could interprete as time taken in seconds to perform a given action (e.g. at slow, normal or fast speed).
This would result in reward function looking more like
$$
R(s,a)=
\begin{cases}
-10 \gamma_{1}(t),& if\quad a\in\{\uparrow,\downarrow\} \times \{1, 10, 50 \} \\
-100\gamma_{2}(t),& if \quad a\in\{\leftarrow,\rightarrow\} \times \{1, 10, 50 \} \\
-\gamma_{3}(t),& if \quad a\in\{\circlearrowleft,\circlearrowright\} \times \{1, 10, 50 \}
\end{cases}
$$
for some appropriate $\gamma_{i}:\{1, 10, 50 \} \to \mathbb{R}, i=1,2,3$.
Next improvement would be to enlarge the state space to include the speed of the boat at a given state:
$$S =\{0,..,n\}^2\times\{0,1,...,359\}\times \{0,1,...,20\} $$
In this case the cost of pefoming certain actions will change since we would not need to be at standstill after each movement and so there would be no deceleration force for instance except at the very end if we were to move in a straight line for several meters.