Generative Modeling for Control : Lecture 9 - Feedback Control using Diffusion Models (LTIs)

# Denoising LTI Systems In this lecture, we specialize our diffusion based feedback control to Linear Time Invariant Systems (LTIs), the Gaussian distributions of control theory (sorry, not sorry). Given that literature on methods to stabilize LTIs are *over-rich*, this might seem like a wasteful application. So let me calm your *BUT WHYYY?!*-nerves with two reasons to explore this case: 1. It provides a fun new way to solve the well known problem of steering a system to a target equilibrium using feedback. 2. It easily generalizes to broader objectives such as introducing multi-stability in a system. Lets consider the control system \begin{equation} \dot{x} = Ax+Bu. \end{equation} with $A$ and $B$ matrices of appropriate dimensions. The method to feedback control using the above model is identitical to what we saw in the [driftless nonlinear situtation](https://hackmd.io/QjdpXVeOSrWwPv_ON3QJNA). **i) Initialize the forward noising process at states that one wants to steer the system into**: \begin{equation} dX = -AX+\sqrt{2}BdW. \end{equation} $$X_0 \sim p_0(x)$$ **ii) Denoise the system back to goal configurations using the score function.** \begin{equation} \dot{x} = Ax+Bk(t,x). \end{equation} **where k(t,x) is learned from the forward process, is the denoising control. If system is controllable, all points eventually get steered into the points where $p_0(x)$ is positive.** A nice thing about the linear case is that [we know](https://hackmd.io/@clopenloop/H1StU_wO-g) for the forward noising process the probability density $p_t(x)$ evolves according to $$p_t(x) = \int_{\mathbb{R}^d} K(t,x,y) p_0(y)dy.$$ with kernel $$K(t,x,y)= \frac{1}{(2\pi)^{d/2} \, (\det \Sigma_t)^{1/2}} \exp\!\left( -\frac{1}{2} \bigl(x - e^{-tA} y\bigr)^\top \Sigma_t^{-1} \bigl(x - e^{-tA} y\bigr) \right),$$ where $$\Sigma_t = 2 \int_0^t e^{-sA} \, B B^\top \, e^{sA^\top} \, ds .$$ Lets see how this can be used in practice and how the feedback law simplifies in certain cases. ### Steering the System to the Origin using Time-varying Feedback First, we consider the situation where goal configuration is just a single state. Take the origin for example. Then we initialize the forward process at the origin: $$p_0(x) = \delta_{0}$$ or $X_0(\omega) = 0$ with probability 1. In this case, $$p_t(x) = K(t,x,0)$$ Substituting the expression we get, $$p_t(x) = \frac{1}{(2\pi)^{n/2} \, (\det \Sigma_t)^{1/2}} \exp\!\left( -\frac{1}{2} \bigl(x ^\top \Sigma_t^{-1} x) \right)$$ Computing the score we get, $$\frac{\nabla p_t(x)}{p_t(x)} = \frac{\Sigma_t^{-1}x}{K(t,x,0)}{K(t,x,0)} = \Sigma_t^{-1}x$$ The forward process has the same probability density as the ODE: $$\dot{x} = -Ax - Bu \quad u = B^T\Sigma_t^{-1}x. $$ That the probability density of the two processes are the same can be seen as before, using the corresponding equations for evolution of probability densities: We know that $p_0(x) = \delta_0$ and $p_t(x) >0$ for all $x \in \mathbb{R}^d, t>0$ Considering the reverse process we get, $$\dot{x} = Ax + Bu \quad u = B^T\Sigma_{T-t}^{-1}x $$ [Recall that](https://hackmd.io/QjdpXVeOSrWwPv_ON3QJNA) the Fokker-Planck equation of the forward process and the continuity equation of the reverse process have identical solutions, upto a time change: *Fokker-Planck equation of the forward process is given by,* $$\frac{\partial p_t(x)}{\partial t} = \nabla \cdot( [Ax] p_t(x)) + \nabla \cdot( BB^T \nabla p_t(x)) $$ *This equation can be equivalently written as the continuity equation with control $u = B^T \frac{\nabla p_t(x)}{p_t(x) } = B^T\Sigma_{T-t}^{-1}x$ is given by* $$\frac{\partial p_t(x)}{\partial t} = \nabla \cdot( [Ax + Bu] p_t(x)).$$ Since $\lim_{t \rightarrow 0} p_t(x) = \delta_0$, we get that the control system (or the reverse process), $$\dot{x} = Ax + Bu$$ converges to $$\lim_{t \rightarrow T} x(t) = 0$$ in some sense. ### Comparison with Minimum Energy Control It is an interesting phenomena, that we have recovered a control law using the controllability Gramian. This is similar to how the Gramian provides the [minimum energy control](https://en.wikipedia.org/wiki/Minimum_energy_control) to steer a system from one initial state to another. However, the minimum energy control is usually given in open loop form. In order to get the solution in feedback form, one has to solve a Riccati equation, which for the finite horizon case doesn't have a closed form solution. While we might be tempted to conclude that we have recovered the minimum energy control in feedback form, the feedback control we have constructed blows up as $t \rightarrow T$ due to the degeneracy of the Gramian inverse in this limit. ### Stabilizing to Multiple Equalibria A similar strategy works if we have multiple target equilibrium points. Let $$p_0(x) = \frac{1}{2}\delta_{a} + \frac{1}{2}\delta_{b}$$ for two points, $a,b$ in ${\ker} ~ A$. In this case, $$p_t(x) = \frac{1}{2}K(t,x,a)+ \frac{1}{2}K(t,x,b).$$ The score function in this case becomes, $$\frac{\nabla p_t(x)}{p_t(x)} = \frac{ \Sigma_t^{-1}(x-a) K(t,x,a)+\Sigma_t^{-1}(x-b) K(t,x,b) }{K(t,x,a)+ K(t,x,b)} $$ This gives us the closed-loop system in reverse: $$\dot{x} = Ax + Bu \quad u = B^T\frac{\nabla p_t(x)}{p_t(x)} $$ ### Numerical Example Following is an example for a four dimensional linear system, with two position coordinates and two velocity coordinates, with accelerations as the controls. The system is first stabilized to the origin. The system is the following. \begin{align} \begin{aligned} \dot{x}_1 &= x_2, \ \ \dot{x}_2 = x_1 + u_1 \ \ \dot{x}_3 = x_2, \ \ \dot{x}_4 = x_3 + u_2 \end{aligned} \end{align} The plots show the evolution of the position coordinates. ![trajectories](https://hackmd.io/_uploads/HyRXxMFtbg.gif) In the next example, we stabilize to a combination of two target positions with $0$ velocities both in $x-$ and $y-$ coordinates. ![trajectories](https://hackmd.io/_uploads/ByhmuRtAye.gif) ### Steering to More General Distributions $p_0(x)$ For more general distributions computing the kernel $K(t,x,y)$ can be expensive. So one instead learns the score loss: $$ \mathbb{E}_{x \sim p^{\rm fwd}_t} \left[ \left\| s_\theta(t, x) - \nabla \log p_t(x) \right\|^2 \right]. $$ See post by [Yang Song]( https://yang-song.net/blog/2021/score/), on how this loss can be minimized if one has samples of $p_t(x)$ (which we get by running the forward process from multiple initial conditions sampled from $p_0(x)$. Once the score has been learned, we apply the feedback law $u(t) = B^T\nabla p_{T-t}(x)$ If one wants the solution of the reverse proces to converge in distribution exactly to $p_0(x)$, not just to the support of $p_0(x)$, it is also important that the forward process is stable and converges to a invariant distirbution $p_{\infty}$: $$p_{\infty}(x) = \frac{1}{(2\pi)^{n/2} \, (\det \Sigma_{\infty})^{1/2}} \exp\!\left( -\frac{1}{2} x ^\top \Sigma_{\infty}^{-1} x \right),$$ where $$\Sigma_{\infty} = 2 \int_0^{\infty} e^{sA} \, B B^\top \, e^{sA^\top} \, ds$$ In general, the denoising algorithm can be summarized in the following way. $$\boxed{\begin{aligned} &\textbf{Algorithm (Score Matching for LTI Feedback Control)}\ \\[4pt] &1.\ \text{Sample $X_i(0) \sim p_0(x)$}. \\[4pt] & 2.\ \text{Run the forward process} \\ & \qquad \qquad dX = -AX + \sqrt{2}BdW \\ & 3.\ \text{Sample } x^i_0 \sim p_0(x) \\ &4.\ \text{ Simulate the forward process } t). \\[4pt] & 5. \ \text{Compute the score matching loss:} \\ &\qquad\qquad \mathcal{L}(\theta) = \mathbb{E}_{x \sim p^{\rm fwd}_t} \left[ \left\| s_\theta(t, x) - \nabla \log p_t(x) \right\|^2 \right]. \\[4pt] &6.\ \text{Stabilize the control system using learnt score} \\ & \quad 6.1 \ \text{If goal is density control to $p_0(x)$, sample}, x_i(0) \sim p_\infty(x),\\ &\qquad\qquad \dot{x}_i(t) = A x_i(t) + B u(t), \quad u(t) = B^T s_\theta(t, x_i(t)). \\[4pt] & \quad 6.2 \ \text{If goal is to steer the system to support of $p_0(x)$, $x(0)$ can be arbitary}, \\ &\qquad\qquad \dot{x}_i(t) = A x_i(t) + B u(t), \quad u(t) = B^T s_\theta(t, x_i(t)). \\[4pt] \end{aligned}} $$ ## References The contents of this blog are based on the presentation in our preprint, [Score Matching Diffusion Based Feedback Control and Planning of Nonlinear Systems](https://arxiv.org/abs/2504.09836) by myself, Darshan Gadginmath and Fabio Pasqualetti. See also [this paper](https://arxiv.org/abs/2504.00238) by Yuhang Mei, Amirhossein Taghvaei, Ali Pakniyat.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.