Generative Modeling for Control- Lecture 10: Feedback Control using Diffusion Models (Systems with Drift)

# Denoising Nonlinear Systems with Drift In this lecture, we consider the more general case of control affine systems with drift. Consider the control system $$\dot{x} = g_0(x) + \sum_{i=1}^m u_i(t)g_i(x)$$ Feedback control system at this level of generality is hard. A general solution to designing the control so that the system converges to the noise distribution is not known. This was partly used quite a bit in the case of [driftless systems](https://hackmd.io/QjdpXVeOSrWwPv_ON3QJNA) and [linear time invariant systems](https://hackmd.io/FvqH88QjSliUN983DU8QVw). However, if the time-reversed system: $$\dot{x} = -g_0(x) + \sum_{i=1}^m u_i(t)g_i(x) \tag{rev-sys}$$ is **globally reachable** — meaning that for every $T>0$ and $z,y \in \mathbb{R}^d$ there exists $\mathbf{u} = [u_1,\ldots,u_m]^T$ such that (rev-sys) satisfies $$x(0) = y, \quad x(T) = z$$ — then we can still apply our denoising methodology for feedback control. The workhorse of this approach is the following positivity theorem: --- **Claim (Informal Positivity Theorem)**: Consider the Fokker-Planck equation $$\partial_t p_t = -\mathcal{Y}^*_0 p_t + \sum_{i=1}^m (\mathcal{Y}_i^2)^* p_t$$ corresponding to the (Stratonovich) SDE: $$\mathrm{d}X = -g_0(X)\,dt + \sqrt{2}\sum_{i=1}^m g_i(X) \odot dW_i$$ If the corresponding control system $$\dot{x} = -g_0(x) + \sum_{i=1}^m u_i(t)\, g_i(x)$$ is globally reachable for any $T>0$, then the solution satisfies $$p_t(x) > 0 \quad \text{for all } t>0 \text{ and } x \in \mathbb{R}^d$$ assuming $p_0(x)\,dx = \mathbb{P}(X(0) \in dx)$. --- Recall that $\mathcal{Y}_i = \sum_{j=1}^m\partial_j g^j_i(x) \partial_{x_j}$ is the differential operator associated with the vector field $g_i(x)$ and $\mathcal{Y}^*_i$ is its adjoint. ## First Bad attempt at Denoising Based Control The positivity result guarantees $p_t > 0$, so we can rewrite the Fokker-Planck equation as a continuity equation: $$\partial_t p_t = -\mathcal{Y}^*_0 p_t + \sum_{i=1}^m \mathcal{Y}_i^*\!\left( \frac{\mathcal{Y}_i^* p_t}{p_t}\,p_t\right)$$ Hence, the forward process has identical distribution to the ODE: $$\dot{x}(t) = -g_0(x) + \sum_{i=1}^m \frac{\mathcal{Y}_i^* p_t}{p_t}\, g_i(x)$$ Reversing time, the corresponding reverse process is: $$\dot{x}_{\rm rev}(t) = g_0(x_{\rm rev}) - \sum_{i=1}^m \frac{\mathcal{Y}_i^* p_{T-t}}{p_{T-t}}\, g_i(x_{\rm rev})$$ The solution satisfies $x_{\rm rev}(t) \sim p_{T-t}$. Therefore: - If $p_0 = \delta_{x_0}$ for some $x_0 \in \mathbb{R}^d$, the control system is steered to $x_0$ as $t \to T$, using the feedback law $u_i = \frac{\mathcal{Y}_i^* p_{T-t}}{p_{T-t}}$ - More generally, if $\Omega$ is the support of $p_0$, then the system is steered to $\Omega$ as $t \to T$. ## A Fix While the first pair gives a valid denoising scheme, the quantity $\frac{\mathcal{Y}_i^* p_{T-t}}{p_{T-t}}$ is difficult to learn in practice. A more natural object is the generalized score $\frac{\mathcal{Y}_i\, p_{T-t}}{p_{T-t}}$, as we did in previous posts. To obtain a reverse ODE in terms of this quantity, we introduce a modified forward SDE with a drift correction: $$\mathrm{d}X = -g_0(X)\,dt + \sum_{i=1}^m v_i(X)\,g_i(X)\,dt + \sqrt{2}\sum_{i=1}^m g_i(X) \odot dW_i$$ where $v_i(x) = \operatorname{div} g_i(x)$. This correction is chosen precisely so that the Fokker-Planck equation takes the form: $$\partial_t p_t = -\mathcal{Y}^*_0 p_t - \sum_{i=1}^m \mathcal{Y}_i^*\mathcal{Y}_i\, p_t$$ Repeating the continuity-equation algebra with $\mathcal{Y}_i$ in place of $\mathcal{Y}_i^*$, the corresponding reverse process is: $$\dot{x}_{\rm rev}(t) = g_0(x_{\rm rev}) + \sum_{i=1}^m \frac{\mathcal{Y}_i\, p_{T-t}}{p_{T-t}}\, g_i(x_{\rm rev})$$ The feedback term $\frac{\mathcal{Y}_i\, p_{T-t}}{p_{T-t}}$ is now a generalized score that can be learned from data by standard regression techniques, just as in the Euclidean setting. ## Numerical Example: Inverted pendulum Consider a single inverted pendulum with state $x = (\theta, \omega)$, where $\theta$ is the angle measured from the upright position and $\omega = \dot{\theta}$ is the angular velocity. The dynamics are $$\dot{\theta} = \omega, \qquad \dot{\omega} = \alpha \sin\theta - \gamma \omega + u$$ where $\alpha = g/\ell$ is the gravitational parameter, $\gamma$ is a damping coefficient, and $u$ is a scalar torque input. This is a control-affine system with drift $$g_0(x) = \begin{pmatrix} \omega \\ \alpha \sin\theta - \gamma \omega \end{pmatrix}, \qquad g_1 = \begin{pmatrix} 0 \\ 1 \end{pmatrix}$$ so the system takes the form $\dot{x} = g_0(x) + u\, g_1$. The equilibrium $x = (0, 0)$ — pendulum balanced upright — is unstable under the uncontrolled dynamics. ### The forward noising process We inject noise through the actuation channel, turning the deterministic system into the SDE $$\mathrm{d}X = -g_0(X)\,dt + \sqrt{2}\ g_1 \odot dW$$ In coordinates this can we expressed as: $$\mathrm{d}\theta = \omega\,dt, \qquad \mathrm{d}\omega = \bigl(\alpha\sin\theta - \gamma\omega\bigr)\,dt + \sqrt{2}\,\, dW$$ Since $g_1$ is a constant vector, $\operatorname{div} g_1 = 0$, so the drift correction $v_1 = \operatorname{div} g_1$ from the general theory vanishes. Starting from a concentrated initial distribution $p_0$ supported near some configuration — say, the pendulum hanging up at $\theta = 0$ — the forward SDE diffuses the distribution outward as visualized in the following gif. **Forward noising.** Starting from the upright equilibrium $(\theta, \omega) = (0, 0)$, the forward SDE $$\mathrm{d}\theta = -\omega\,dt, \qquad \mathrm{d}\omega = -(\alpha\sin\theta - \gamma\omega)\,dt + \sqrt{2}\,\epsilon\, dW$$ diffuses the pendulum away from vertical. The negated drift $-g_0$ drives the system in reverse while the Brownian noise excites the angular velocity, which couples into the angle through the kinematics. Over time the pendulums spread out, exploring the full phase space, as can be seen the following simulation. ![pendulum_forward_noising](https://hackmd.io/_uploads/HkGHbTBsZx.gif) We then train a neural network $s_\theta(x,t)$ on samples from these forward trajectories to learn the score $\frac{Y_i p_{t}}{p_t}$. Applying this feedback law on the control system, initialising the system from multiple sample initial conditions, we observe the following behavior. ![pendulum_denoising](https://hackmd.io/_uploads/H1TBB3BiZe.gif) In general, the denoising algorithm can be summarized in the following way. $$\boxed{\begin{aligned} &\textbf{Algorithm (Denoising Control of Control Affine Systems)} \\[6pt] &\textbf{Setup:} \text{ Control-affine system } \dot{x} = g_0(x) + \sum_{i=1}^m u_i\, g_i(x), \\ \\ &1.\ \text{Choose initial distribution } p_0 \text{ supported on the target set } \Omega. \\[4pt] &2.\ \text{Simulate the forward noising process (Euler–Maruyama):} \\ &\qquad\qquad \mathrm{d}X = -g_0(X)\,dt + \sum_{i=1}^m\sqrt{2}\, v_i(x) g_i(x)dt \, + \sum_{i=1}^m\sqrt{2}\, g_i(x)\, dW \\ &\qquad\quad\text{Store samples } \{x^{(i)}_{t_k}\}_{i=1}^N \text{ at each time step } t_k. \\[4pt] &3.\ \text{Train a score network } s_\theta(x,t) \in \mathbb{R}^m \text{ via score matching to learn $\frac{\mathcal{Y}_i p_t}{p_t}$} \\ \\[4pt] &4.\ \text{Stabilise from any initial condition } x(0) \text{ using the reverse ODE:} \\ &\qquad\qquad \dot{x}(t) = g_0(x) + \sum_{i=1}^m u_i(t)\, g_i(x), \\[2pt] &\qquad\qquad u(t) = \, s_\theta(x(t),\, t) \in \mathbb{R}^m. \\[4pt] &\qquad\quad\text{As } t \to T,\ \text{the state } x(t) \to \Omega. \\[4pt] \end{aligned}}$$ Note: While the term stabilization is used in the title of the animation, strictly speaking, the control system is not guaranteed to be Lyapunov stable in any sense. It would be more accurate to call it *steering*. --- The numerical example considered above is borrowed from [Mei. et al](https://arxiv.org/abs/2504.00238)'s work, where a similar time-reversal scheme is introduced. A difference between the setting considered in the paper, and this post is that the reverse process I have introduced is deterministic. Whereas Mei. et al consider stochastic reverse processes. In that sense the choice is closer to my work with Darshan Gadginmath and Fabio Pasqualetti: [Score Matching Diffusion Based Feedback Control and Planning of Nonlinear Systems](https://arxiv.org/abs/2504.09836). It is also fairly more standard in the diffusion literature to consider stochastic reverse processes, as done by [Mei. et al](https://arxiv.org/abs/2504.00238). This raises a natural question: in a control setting, what is the best choice of reverse process? These alternative stochastic formulations, and the tradeoffs they introduce, will be the subject of the next post.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.