Feedback Control Using Diffusion Models

# Path Planning and Feedback Control using Denoising Diffusion In this blogpost, we discuss an approach to control a nonlinear system using diffusion models. Before we address the general control problem, we first introduce denoising diffusion models, and how diffusion model can be used for path planning in a bounded compact domain. ## Planning Problem Suppose we have a domain a compact domain $\Omega$ in which, say a robot evolves. From any given point $x_0 \in \Omega$, we want to find a trajectory $\gamma(t)$ such that $\gamma(0) = x_0$ and $\gamma(T) \in \Omega_{\rm target}$. There are many ways to solve this problem in literature: RRT, dynamic programming, potential field based methods, harmonic functions, etc. Here, we present a generative modeling inspired approach to solve this problem. Recently, there have been some attempts to apply [diffusion models](https://arxiv.org/abs/2011.13456) to this problem. Diffusion models are an approach to sample from an underlying probability distribution based on data. This has proved very fruitful for image generation. One existing way in which diffusion models have been extended for planning by thinking of $\gamma(t)$ as an image. See a [recent survey](https://arxiv.org/abs/2408.10266) and references therein. However, there are some drawbacks to these existing approaches, as one performs diffusion on the space of trajectories. In the following post we discuss how instead path planning can be performed using diffusion on the state space instead. The state space (finite dimensional) of a robot is much smaller in dimension that the set of trajectories (infinite dimensional for continuous time models) of a robot. Lets see how we can use denoising diffusion can be used to solve this problem in a more tractable way. For this purpose, as in classical diffusion mdoels, we introduce to steps: 1) Forward noising phase 2) Reverse denoising phase **Forward Noising Phase** For the forward phase, we solve the following stochastic differential equation. \begin{align} & \textup{d} X^\text{f} = \sqrt{2D } \, \textup{d} W + \textup{d} Z \\ & X^f(0) \in \Omega_{\rm target} \end{align} Here, $W$ is the brownian motion. And $Z$ is a process that constraints the system in $\Omega$. This is an instance of a reflected stochastic differential equation. This equation can be seen as the continuous time limit of the discrete iteration \begin{align} & \tilde{X}_{t+1} = X_t + \sqrt{2D } \Delta W_t \\ & X_{t+1} = {\rm Project}_{\Omega}(\tilde{X}_t) \\ & X^f(0) \in \Omega_{\rm target} \end{align} where the random variables $\Delta W_t$ are independent and identically distributed normal random variables with expected value zero and variance $\Delta t$. The ${\rm Project}_{\Omega}$ is the projection operation onto the set $\Omega$. Suppose $\Omega_{\rm target}$ is a single point, the forward process for the evolution of mutliple samples looks like in the following gif. | ![forward_process_animation](https://hackmd.io/_uploads/H1o1PWxAJe.gif) | |:-----------------------------------------------------------------------:| | Forward noising process with obstacles with samples initialized at target point $(0.75,-0.75)$ | Now, if $p^{\rm fwd}_t$ denotes the probability density of the forward process. It is known that $p^{\rm fwd}_t$ evolves according to the heat equation. \begin{align} \partial_t p^{\rm fwd}_t = \Delta p^{\rm fwd}_t \\ \vec{n}(x)\cdot \nabla p^{\rm forward}_t(x) = 0 ~~~ \mbox{on} ~~~ \partial \Omega , \\ p^{\rm fwd}_0 = c\mathbf{1}_{\Omega_{\rm target}} \end{align} where $c$ is some normalization constant and $\vec{n}(x)$ is the unit vector normal to the boundary $\partial \Omega$ of the domain $\Omega$. One could also replace $c\mathbf{1}_{\Omega_{\rm target}}$ with some dirac mass $\delta_{x_{\rm target}}$ at a target point $\delta_{x_{\rm target}}$. The goal is to find trajectories from any point in space to $x_{\rm target}$. However, we have gone in the other direction. The forward process takes everything from $x_{\rm target}$ and spreads everywhere in space. In terms of the heat equation, this is characterized by the fact that $\lim_{t \rightarrow \infty}p^{\rm fwd}_t \rightarrow c\mathbf{1}_{\Omega}$ to the uniform distribution in $\Omega$. Next, we try to *reverse* this process. Toward this end, note that the heat equation can be written as \begin{equation} \partial_t p^{\rm fwd}_t = \nabla \cdot( \frac{\nabla p^{\rm fwd}_t}{p^{\rm fwd}_t} p^{\rm fwd}_t) \end{equation} where $\nabla \cdot$ is the divergence operator. To see how this equation can be used to achieve reverse we introduce the continuity equation. Let $v(t,x)$ be a velocity field. Then consider the following partial differential equation, \begin{equation} \partial_t p = -\nabla \cdot( v(t,x) p). \end{equation} This equation captures the evolution of the probability density function of the ordinary differential equation, \begin{align} \dot{X} = v(t,X)\\ \end{align} Following plot shows the 2D projection of a [3D Lorentz system](https://) and its corresponding density evolution according to the continuity equation. | ![lorenz_system_combined(2)](https://hackmd.io/_uploads/HJiuIN3VJx.gif) | |:-----------------------------------------------------------------------:| | Evolution of particles in a flow governed by continuity equaton. | Now, lets apply this idea to reverse the forward process using the relation between the heat equation and the continuity equation. If we make the choice \begin{align} v(t,x) =-\frac{\nabla p^{\rm fwd}_{t}(x)}{p^{\rm fwd}_{t}(x)} \end{align} then another forward process that also flows according to the heat equation is the ODE \begin{align} \dot{X} = -\frac{\nabla p^{\rm fwd}_{t}(X)}{p^{\rm fwd}_{t}(X)} \end{align} **Remark**: While both the noisy forward process and the ODE above have identical pdfs $p^{\rm fwd}_{t}$, they are very different processes microscopically at the individual sample level. One is a stochastic process, and the second, deterministic. **Reverse phase** Now, note that $p^{\rm fwd}_{t}$ takes the system from $\delta_{x_{\rm target}}$ to the uniform distribution. But we want to go in the opposite direction. If we could compute the quantity $\frac{\nabla p^{\rm fwd}_{t}}{p^{\rm fwd}_{t}}$, known as the **score**, then we go in the opposite direction by considering the reverse process \begin{align} \dot{X} = \frac{\nabla p^{\rm fwd}_{T-t}(X)}{p^{\rm fwd}_{T-t}(X)} \end{align} where $T$ is some large time instant so that $p^{\rm fwd}_{T}$ is approximately equal to the uniform distribution. So the pseudoalgorithm for the noising denoising planning is as follows: 1) Run forward noising process long enough so the samples converge to uniform distribution. 2) Learn the score function. 3) Initialize the reverse process by sampling from uniform distribution over the domain. 4) Run the reverse process using the score function. Now, for step 2, how do we learn this magical object known as the score $\frac{\nabla p^{\rm fwd}_t}{p^{\rm fwd}_t}$ ? This is where the idea of score matching comes in. We saw that **planning via diffusion** requires access to the **score function**: $$ s(t,x) := \frac{\nabla p^{\rm fwd}_t(x)}{p^{\rm fwd}_t(x)}, $$ which allows us to reverse the forward diffusion and guide particles toward the target. But how do we estimate this score function in practice? --- ### Score Matching **Score matching** is a technique that allows us to learn the score function without directly estimating the density $p^{\rm fwd}_t(x)$ Instead of learning the density, we train a model $s_{\theta}(t, x)$ to approximate the score by minimizing the expected squared error: $$ \mathbb{E}_{x \sim p^{\rm fwd}_t} \left[ \left\| s_\theta(t, x) - \nabla \log p^{\rm fwd}_t(x) \right\|^2 \right]. $$ We consider a parametric function $s_\theta(x) \approx \nabla \log p(x)$, and define the **score matching loss** as: $$ \mathcal{L}_{\text{SM}}(\theta) = \mathbb{E}_{x \sim p(x)} \left[ \left\| s_\theta(x) - \nabla \log p(x) \right\|^2 \right] $$ This expression depends on the **true score** $\nabla \log p(x)$, which we don't know. So, we expand and simplify it: \begin{align} \mathcal{L}_{\text{SM}}(\theta) = \mathbb{E}_{x \sim p(x)} [s^2_\theta(x) +(\nabla \log p(x))^2 -2s_\theta(x) \nabla \log p(x)] \\ = \mathbb{E}_{x \sim p(x)} s^2_\theta(x) + \text{const} - 2\mathbb{E}_{x \sim p(x)}s_\theta(x)p(x) \nabla \log p(x) \\ = \mathbb{E}_{x \sim p(x)} s^2_\theta(x) + \text{const} - 2\int s_\theta(x) \nabla p(x) dx \end{align} Apply **integration by parts** to each component (assuming vanishing boundary terms): \begin{align} \int s_\theta(x) \nabla p(x) dx = - \int \nabla \cdot s_\theta(x) p(x) dx= - \mathbb{E}_{p(x)} \nabla \cdot s_\theta(x) \end{align} So, the loss becomes: $$ \mathcal{L}_{\text{SM}}(\theta) = \mathbb{E}_{p(x)} \left[ \| s_\theta(x) \|^2 + 2 \nabla \cdot s_\theta(x) \right] + \text{const} $$ Therefore, the score can be learned purely by computing the emperical average of $\| s_\theta(x) \|^2$ and $\nabla \cdot s_\theta(x)$. In summary the computational steps for denoising are the following, 1) First we run multiple samples of $X(t)$ in the forward direction. 2) Then we learning a model of the score function $s_{\theta}(t,x)$ using the score matching loss. 3) Once trained, we can use $s_\theta(t, x)$ in a reverse-time ODE: $$ \dot{X}_t = s_\theta(T - t, X_t), $$ to generate paths that start from a uniform distribution and reach the target. The reverse process looks like in the following gif (if you tweak your hyperparameters for your training really well). |![reverse_process_animation](https://hackmd.io/_uploads/H1jZXVgAkg.gif) | |:-----------------------------------------------------------------------:| | Reverse denoising process transporting a family of initial conditions to target point $(0.75,-0.75)$. Next, we want to apply this idea to design feedback controllers for *control affine systems* of the form \begin{align} \dot x = g(x,u) = g_0(x)+\sum_{i=1}^m g_i(x) u_i \end{align} A challenge is this case is that there are constraints in the directions along which we can add noise to the samples according to the vector fields $g_i(x)$. Another difficulty is that it is not clear if the forward noising process can be made to converge to a given noise distribution, like the uniform distribution or a Gaussian. For this reason, we have to solve two problems in this more general case. 1) Construct a forward process for which the the probability distribution converges to a given noise distribution. 2) Reverse the process along admissible control trajectories. For the purpose of simplicity, we consider the simpler control system, with $g_0(x) \equiv 0$. Therefore, the system becomes *driftless*. \begin{align} \dot x = \sum_{i=1}^m g_i(x) u_i \end{align} We choose the following forward process. \begin{align} dX^f = &\sum_{i}^m v_i(X^f)g_i(X^f) \\ \nonumber &+ \sqrt{2}\sum_{i}^m g_i(X^f) \odot dW + \textup{d} Z. \end{align} Here, the goal of $v_i(x)$ is to ensure that the noising process makes the system converge to the noise distribution. The noise in the process acts only along the control vector fields $g_i(x)$ through the term $g_i(x) \odot dW$. Now, suppose that we can choose such a $v_i(x)$ such that $\lim_{t \rightarrow \infty}p^{fwd}_t = p_{\rm noise}$, then in order to understand how to reverse the process, we have to look at the analog of the heat equation for this stochastic differential equation. \begin{align} \partial_t p^{\rm fwd} =\sum_{i=1}^m (Y_i^*)^2 p^{\rm fwd} +\sum_{i=1}^m Y_i^*(w_i(x) p^{\rm fwd}) \end{align} Where $Y^*_i = \sum_{j=1}^d -\frac{\partial}{\partial x_j} (g_i^j ~~~\cdot)$ are differential operators along the vector fields $g_i$, and $w_i(x)$ is a function related to $v_i(x)$ through the operator $Y_i^*$. Similar to the heat equation we can write this equation as a continuity equation, \begin{align} \partial_t p^{\rm fwd}_t &= -\sum_{i=1}^m Y_i^*\left(\dfrac{Y_i p^{\rm fwd}_t}{p^{\rm fwd}_t}\right) p^{\rm fwd_t} +\sum_{i=1}^m Y_i^* ( w_i(x)p^{\rm fwd}) \end{align} Where $Y_i$ is another differential operator associated with $g_i$ given by \begin{align*} Y_i ~ \cdot = \sum_{j=1}^d g^j_i(x) \partial_{x_j} \end{align*} This equation can be alternatively written, by simplifying the differential operators, in terms of more familiar continuity equation as, \begin{equation} \partial_t p = -\nabla \cdot( v(t,x) p). \end{equation} with $v(t,x)$ given by \begin{equation} v(t,x) = -\sum_{i=1}^m g_i(x) [\left(\dfrac{Y_i p^{\rm fwd}_t}{p^{\rm fwd}_t}\right) + w_i(x)] \end{equation} Now, we can use this process to achieve our reversal, by considering the deterministic reverse ODE, \begin{align} \dot{X} = \sum_{i=1}^m g_i(x)[\left(\frac{Y_i p^{\rm fwd}_{T-t}}{p^{\rm fwd}_{T-t}}\right) - w_i(x)] \end{align} If the forward process takes the system from an initial target distribution to the noise distribution, the reverse process takes the system from everywhere to the target set. Hence, we have solved the feedback control problem by viewing *control as a denoising proces*. The generalized score can be learned by using the score loss function as before, and then estimating the quantity $\frac{Y_i p^{\rm fwd}_{T-t}}{p^{\rm fwd}_{T-t}} = Y_i \log p^{\rm fwd}_{T-t} = g_i^T(x) \nabla \log p^{\rm fwd}_{T-t}$ by projecting the estimated score $s_{\theta}(t,x)$ along the direction $g_i(x)$. Following is an example of the time reversal for the unicycle model on a bounded domain. \begin{align} &\dot{x}_1 = u_1 \sin (x_3) \\ &\dot{x}_2 = u_1 \cos (x_3) \\ &\dot{x}_3 = u_2 \end{align} |![big-obst(2)](https://hackmd.io/_uploads/BJPnaseC1e.gif) | |:-----------------------------------------------------------------------:| |Reverse denoising process for the Unicycle model transporting a family of initial conditions to target point $(0,0)$. Unlike the fully actuated case in the first section of this post, the trajectories do not follow straight line paths away from the obstacles. This is due to the constraints in the motion of the unicycle model. This method also works for linear systems of the form \begin{equation} \dot{x} = Ax+Bu. \end{equation} Following is an example for a four dimensional linear system, with two position coordinates and two velocity coordinates, with accelerations as the controls. The system is stabilized to a combinations of two Dirac measures. \begin{align} \begin{aligned} \dot{x}_1 &= x_2, \ \ \dot{x}_2 = x_1 + u_1 \ \ \dot{x}_3 = x_2, \ \ \dot{x}_4 = x_3 + u_2 \end{aligned} \end{align} The gif visualizes the evolution of the position coordinates of the system. ![trajectories](https://hackmd.io/_uploads/ByhmuRtAye.gif) **Does this method avoid the curse of dimensionality?** No. There is indeed big computational advantage gained by replacing an alternative method such as HJB based approaches, with a score matching regression problem, it is likely that the number of samples required to effectively approximate the reverse process scales with the dimension of the state space. This is a potential direction for further research. While in image generation problems very good approximations of the reverse process are not as important, in control problems this is very critical so that your car doesn't land in a ditch. --- The contents of this blog are based on the presentation in our preprint, [Score Matching Diffusion Based Feedback Control and Planning of Nonlinear Systems](https://www.researchgate.net/publication/390746280_Score_Matching_Diffusion_Based_Feedback_Control_and_Planning_of_Nonlinear_Systems) by myself, Darshan Gadginmath and Fabio Pasqualetti. For original ideas on score matching based denoising diffusion as applied to machine learning, the reader can find useful information in Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole. *Score-Based Generative Modeling through Stochastic Differential Equations Authors.* See also the following [blog post](https://yang-song.net/blog/2021/score/) by Yang Song.