12-3 - HackMD

# 12-3 --- lang-ref: ch.12-3 title: MPC (EBM version) lecturer: Alfredo Canziani authors: Yang Zhou, Daniel Yao date: 28 Apr 2021 --- ## Action plan - Model predictive control **[Here we are today]** - Backprop through kinematic equation - Minimisation of the latent - Truck backer-upper - Learning an emulator of the kinematics from observations - Training a policy - PPUU - Stochastic environment - Uncertainty minimisation - Latent decoupling ## State transition equations -- Evolution of the state Here we discuss a state transition equation where $x$ represents the state, $u$ represents control. We can formulate the state transition function in a continuous-time system where $x(t)$ is a function of continuous variable $t$. $\dot{x} = f(x,u)$ $\frac{\partial x(t)}{\partial t} = f(x(t),u(t))$ ![](https://i.imgur.com/JVBgqAi.png) We use a tri-cycle as the example to study it. The orange wheel is the control $u$, $(x_c,y_c)$ is the instantaneous center of rotation. You can also have two wheels in the front. For simplicity, we use one wheel as the example. In this example $\boldsymbol{x}=(x,y,\theta,s)$ is the state, $\boldsymbol{u}=(\phi,\alpha)$ is the control. $$ \left\{\begin{array}{l} \dot{x}=s \cos \theta \\ \dot{y}=s \sin \theta \\ \dot{\theta}=\frac{s}{L} \tan \phi \\ \dot{s}=a \end{array}\right.e $$ We can reformulate the differential equation from continuous-time system to discreate-time system $$ \boldsymbol{x}[t]=\boldsymbol{x}[t-1]+f(\boldsymbol{x}[t-1], \boldsymbol{u}[t]) \mathrm{d} t $$ To be clear, we show the units of $\boldsymbol{x}, \boldsymbol{u}$. $$ \begin{array}{l} {[\boldsymbol{u}]=\left(\mathrm{rad}\ \frac{\mathrm{m}}{\mathrm{s}^{2}}\right)} \\ {[\boldsymbol{x}]=\left(\mathrm{m} \ \mathrm{m} \ \mathrm{rad} \ \frac{\mathrm{m}}{\mathrm{s}}\right)} \end{array} $$ Let's take a look at different examples. We use different color for variables we care about. ![](https://i.imgur.com/ifLPpYe.png) Example 1: Uniform Linear Motion: No acceleration, no steering ![](https://i.imgur.com/WgnhZ7p.png) ![](https://i.imgur.com/a3iqtrb.png) Example 2: Crush into itself: Negative acceleration, no steering ![](https://i.imgur.com/pTRalp7.png) ![](https://i.imgur.com/uI26lsC.png) Example 3: Sine wave: Positive steering for the first part, negative steering for the second part ![](https://i.imgur.com/08I8byO.png) ![](https://i.imgur.com/JEH80wn.png) ## Kelley-Bryson algorithm What if we want the tri-cycle to reach a specified destination with a specified speed? - This can be achieved by inference using **Kelley-Bryson algorithm**, which utilizes **backprop through time** and **gradient descent**. ### Recap of RNN We can compare the inference process here with the training process of RNN. Below is an RNN schematic chart. We feed variable $x[t]$ and the previous state $h[t-1]$ into the predictor, while $h[0]$ is set to be zero. The predictor outputs the hidden representation $h[t]$. ![](https://i.imgur.com/UnDL32U.png) ### Optimal control (inference) In optimal control (inference) shown as below, we feed the latent variable (control) $z[t]$ and the previous state $x[t-1]$ into the predictor, while $x[0]$ is set to be $x_0$. The predictor outputs the state $x[t]$. ![](https://i.imgur.com/inEEpbt.png) Backprop is implemented in both RNN and Optimal Control. However, gradient descent is implemented with respect to predictor's parameters in RNN, and is implemented wrt latent variable $z$ in optimal control. ### Unfolded version of optimal control In unfolded version of optimal control, cost can be set to either the final step of the tri-cycle or every step of the tri-cycle. Besides, cost functions can take many forms, such as Average Distance, Softmin, etc. #### Set the cost to the final step From the figure below, we can see there is only one cost $c$ set in the final step (step 5), which measures the distance of our target $y$ and state $x[5]$ with control $z[5]$ ![](https://i.imgur.com/peikP5A.png) $(1)$ If the cost function only involves the final position with no restrictions on the final speed, we can obtain the results after inference shown as below. ![](https://i.imgur.com/3PeOs2N.png) From the figure above, it is seen that when $T=5$ or $T=6$, the final position meets the target position, but when $T$ is above 6 the final position does not. $(2)$ If the cost function involves the final position and zero final speed, we can obtain the results after inference shown as below. ![](https://i.imgur.com/VWkUe4x.png) From the figure above, it is seen that when $T=5$ or $T=6$, the final position roughly meets the target position, but when $T$ is above 6 the final position does not. ### Set the cost to every step From the figure below, we can see there is a cost $c$ set in every step. ![](https://i.imgur.com/O1nSpxP.png) $(1)$ Cost Example: Average Distance ![](https://i.imgur.com/znOZ9wZ.png) $(2)$ Cost Example: Softmin ![](https://i.imgur.com/0Rv5g6c.png) Different forms of cost functions can be explored through experimentation. ## Optimization_Path_Planner-Notebook In this notebook, we use tri-cycle as an example as well. ### Define kinematic model of a tricycle $\dot{x}=f(x,u)$. * $x$ represents state: ($x$, $y$, $θ$, $s$) * $u$ represents control: ($ϕ$, $a$) * We feed $x[t-1]$ and $u[t]$ to obtain the next state $x[t]$ ``` def f(x, u, t=None): L = 1 # m x, y, θ, s = x ϕ, a = u f = torch.zeros(4) f[0] = s * torch.cos(θ) f[1] = s * torch.sin(θ) f[2] = s / L * torch.tan(ϕ) f[3] = a return f ``` ### Define several cost functions As mentioned above, cost functions can take various forms. In this notebook, we list 5 kinds as follows: * **vanilla_cost**: Focuses on the final position * **cost_with_target_s**: Focuses on the final position and final zero speed. * **cost_sum_distances**: Focuses on the position of every step, and minimizes the mean of the distances. * **cost_sum_square_distances**: Focuses on the position of every step, and minimizes the mean of squared distances. * **cost_logsumexp**: The distance of the closest position should be minimized. ``` def vanilla_cost(state, target): x_x, x_y = target return (state[-1][0] - x_x).pow(2) + (state[-1][1] - x_y).pow(2) def cost_with_target_s(state, target): x_x, x_y = target return (state[-1][0] - x_x).pow(2) + (state[-1][1] - x_y).pow(2) + (state[-1][-1]).pow(2) def cost_sum_distances(state, target): x_x, x_y = target dists = ((state[:, 0] - x_x).pow(2) + (state[:, 1] - x_y).pow(2)).pow(0.5) return dists.mean() def cost_sum_square_distances(state, target): x_x, x_y = target dists = ((state[:, 0] - x_x).pow(2) + (state[:, 1] - x_y).pow(2)) return dists.mean() def cost_logsumexp(state, target): x_x, x_y = target dists = ((state[:, 0] - x_x).pow(2) + (state[:, 1] - x_y).pow(2))#.pow(0.5) return -1 * torch.logsumexp(-1 * dists, dim=0) ``` ### Define path planning with cost * The optimizer is set to be SGD. * Time interval T is set to be 1s. * We need to compute every state from the initial state by the following code: ``` x = [torch.tensor((0, 0, 0, s),dtype=torch.float32)] for t in range(1, T+1): x.append(x[-1] + f(x[-1], u[t-1]) * dt) x_t = torch.stack(x) ``` * Then compute the cost: `cost = cost_f(x_t, (x_x, x_y)) costs.append(cost.item())` * Implement backprop and update $u$ ``` optimizer.zero_grad() cost.backward() optimizer.step() ``` * Now we can feed values to path_planning_with_cost to obtain inference results and plot trajectories. **Example**: `path_planning_with_cost(x_x=5, x_y=1, s=1, T=5, epochs=5, stepsize=0.01, cost_f=vanilla_cost, debug=False)` ## Summary We introduced the state transition function and the way to model a physical system with state and control. We discussed how to achieve optimal control by inference using Kelley-Bryson algorithm, which utilizes backprop through time and gradient descent. Finally, we explained the notebook of Optimization_Path_Planner, in which various cost functions are defined and path planning is implemented to guide a tri-cycle to reach the desired position with the specified speed.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.