Generative Modeling for Control: Lecture 2 - The Degenerate Fokker-Planck Equations

# Non-Holonomic Fokker–Planck Equation and Shaping Invariant Distributions In the previous post, we studied how the Fokker–Planck equation characterizes the density evolution of the stochastic system $$ \mathrm{d}x = -\nabla U(x)\, \mathrm{d}t + \sqrt{2}\,\mathrm{d}W + \mathrm{d}Z, $$ and how choosing the drift $-\nabla U$ makes the system converge to the stationary density $$ \rho_\infty(x) = C\, e^{-U(x)}. $$ Thus **any** smooth, strictly positive target density $\rho_d$ can be realized by setting $$U = -\log \rho_d.$$ --- ## Generalization to Non-Holonomic Systems Many mechanical systems,such as cars, aircraft, and mobile robots, cannot actuate in arbitrary directions. These are *non-holonomic*. How do we extend the approach in the previous post for these kind of systems? That is, their motion is constrained by geometry, and this fundamentally changes how probability flows in the state space. Toward this end, we now extend this to systems where drift and noise act only along selected vector fields. Consider the controlled dynamics $$ \mathrm{d}x = \sum_{i=1}^m u_i(x)\, g_i(x)\, \mathrm{d}t + \mathrm{d}Z, $$ where $g_i$ are smooth vector fields on $\Omega$. The process $Z$ constraints the process to the compact domain $\Omega$. Suppose we inject drift and noise along these directions: $$ u_i(x) = v_i(x)\, +\, \mathrm{d}W_i, $$ which gives the Stratonovich SDE $$\mathrm{d}x = \sum_{i=1}^m v_i(x)\, g_i(x)\ \mathrm{d}t + \sum_{i=1}^m g_i(x)\,\mathrm{d}W_i + \mathrm{d}Z. $$ | ![nonholonomic_unicycle_langevin-min](https://hackmd.io/_uploads/B1dnNLZZZe.gif) | |:--------------------------------------------------------------------------------:| | A single unicycle performing nonholonomic gradient descent along a potential. | The animation above shows a **single realization** of the non-holonomic Langevin dynamics applied to a unicycle system. The simulated state is $$ (x(t),y(t),\theta(t))\in[0,1]^2\times\mathbb{S}^1. $$ The evolution used in the code is the Stratonovich SDE $$ \begin{aligned} \mathrm{d}x &= \cos\theta\,\big(u_1(x,y,\theta)\,\mathrm{d}t + \mathrm{d}W_1\big),\\ \mathrm{d}y &= \sin\theta\,\big(u_1(x,y,\theta)\,\mathrm{d}t + \mathrm{d}W_1\big),\\ \mathrm{d}\theta &= \mathrm{d}W_2, \end{aligned} $$ where $g_1(\theta)=(\cos\theta,\sin\theta)$ is the non-holonomic direction, and $\mathrm{d}W_1,\mathrm{d}W_2$ are independent Brownian noises. The drift injected along $g_1$ is $$ u_1(x,y,\theta) = -\,\partial_{g_1}\,U(x,y) = -\big(\nabla U(x,y)\cdot g_1(\theta)\big), $$ where in this experiment $$ U(x,y) = \tfrac12\big((x-0.5)^2 + (y-0.5)^2\big) $$ One can think of this process as the *non-holonomic stochastic gradient descent*. Our goal is to understand how to solve the following feedback control problem: ***Design feedback laws $v_i(x)$ so that the probability $P(x(t) \in dx) = p_t(x)dx$ asymptotically converges to a given target probability density $p_{\infty}$.*** Here, $p_t$ is the *law* of the variable $x(t)$. --- To address the feedback control problem we need to understand how $p_t(x)$ evolves. This leads us to the nonholonomic generalization of the Fokker-Planck equation we saw in the last lecture. To define partial differential equations that govern the evolution of the probability density $p_t(x)$, we introduce the **first-order differential operators** $$ \mathcal{X}_i f := \partial_{x_j}(g_i^j\, f), $$ and their formal adjoints $$ \mathcal{Y}_i f := -g_i^j \partial_{x_j}f. $$ To see, why we call them adjoints of each other, using integration by parts, it can be formally seen that $$\int_{\Omega}{X}_i f(x) g(x)dx = \int_{\Omega} f(x) \mathcal{Y}_ig(x)dx $$ for all compactly supported functions $f(x),g(x)$. Together they generalize the gradient–divergence pair familiar from Euclidean diffusion. ## Non-Holonomic Fokker–Planck Equation For Stratonovich dynamics, the density $p_t$ satisfies the partial differential equation (PDE), $$\partial_t p_t= \sum_{i=1}^m \mathcal{X}_i^2 p_t - \mathcal{X}_i(v_i p_t). $$ Assume we choose $v_i$ so that $$ \partial_t p_t = -\sum_{i=1}^m \big(\mathcal{Y}_i^{}\big)^*\,\mathcal{Y}_i p_t + \big(\mathcal{Y}_i\big)^*(U_i\, p_t), $$ for some functions $U_i$. > **What this PDE means.** > The equation describes how probability mass spreads *only* along the admissible directions $g_i$. Because motion is restricted, diffusion in directions orthogonal to $\mathrm{span}(g_i)$ occurs indirectly, through repeated reorientation. Let $\rho_\infty$ be a desired stationary density. We choose $$U_i = \mathcal{Y}_i \log \rho_\infty. $$ > **Design principle.** > Our goal is to emulate the classical Langevin strategy, follow the gradient of the log-density, but now restricted to the subspace defined by the vector fields $g_i$. The term $\mathcal{Y}_i \log \rho_\infty$ extracts exactly this directional component. Then $$\partial_t \rho_\infty = -\sum_{i=1}^m (\mathcal{Y}_i)^* \mathcal{Y}_i \rho_\infty + (\mathcal{Y}_i)^*(\mathcal{Y}_i\rho_\infty) = 0,$$ so $\rho_\infty$ is invariant or the steady state of the PDE. --- In order to study the long-term behavior of the PDE, it is convinient to consider the following transformation. Define the normalized density $$ f_t := \frac{p_t}{\rho_\infty}. $$ Substituting the control law gives the divergence-form evolution $$ \boxed{ \partial_t p_t = -\sum_{i=1}^m \mathcal{X}_i\!\big(\rho_\infty\, \mathcal{Y}_i f_t\big) } \qquad (f_t = p_t/\rho_\infty). $$ This generalizes the holonomic identity from Lecture 1 $$ \partial_t p_t = \nabla\!\cdot(\rho_\infty \nabla f_t). $$ --- ## Lyapunov Functional for Stability Define the weighted $L^2$ Lyapunov functional $$ \mathcal{F}(p_t) := \int_\Omega (f_t - 1)^2\, \rho_\infty\, dx = \| p_t - \rho_\infty \|_{L^2(\rho_\infty^{-1})}^2. $$ Differentiate: \begin{aligned} \frac{d}{dt}\mathcal{F}(p_t) &= 2\!\int (f_t -1)\,\rho_\infty \,\partial_t f_t\\ &= 2\!\sum_{i=1}^m \int (f_t-1)\,\mathcal{X}_i(\rho_\infty \mathcal{Y}_i f_t)\\ &= 2\!\sum_{i=1}^m \int \mathcal{Y}_i(f_t-1)\,\rho_\infty\,\mathcal{Y}_i f_t\\[4pt] &= -2\sum_{i=1}^m \int \rho_\infty\,|\mathcal{Y}_i f_t|^2 \\ &= -2\,\mathcal{D}(f_t)\le0. \end{aligned} Thus $\mathcal{F}(p_t)$ is non-increasing. --- ### Why Controllability Implies Strict Decay of the Lyapunov Functional > **Connection to control theory.** > Controllability ensures that one can steer the system between any two points using only the allowed directions. > Analytically, this causes the diffusion process to spread everywhere is key to the unique long term behavior of the process. Suppose the vector fields $\{g_i\}$ render the drift system $$ \dot{x}(t) = \sum_{i=1}^m u_i(t)\, g_i(x(t)) $$ **controllable** on $\Omega$. That is, for any $x_0, x_T \in \Omega$, there exists a control $u(t)$ steering $x(0)=x_0$ to $x(T)=x_T$. Now consider a smooth function $f$ satisfying $$ \mathcal{Y}_i f = -\,g_i \cdot \nabla f = 0, \qquad i=1,\dots,m. $$ Along any controlled trajectory $x(t)$ we have, by the chain rule, $$ \frac{d}{dt} f(x(t)) = \nabla f(x(t)) \cdot \dot{x}(t) = \sum_{i=1}^m u_i(t)\, \nabla f(x(t)) \cdot g_i(x(t)) = \sum_{i=1}^m u_i(t)\,(-\mathcal{Y}_i f(x(t))) = 0. $$ Thus **$f$ is constant along every admissible trajectory**. But controllability means that for any two points $x_0,x_T\in\Omega$ there is a trajectory connecting them. Therefore $$ f(x_0) = f(x_T) \qquad\forall\, x_0,x_T\in\Omega. $$ Hence $$ \fbox{$\mathcal{Y}_i f = 0\ \forall i \ \Rightarrow\ f \text{ is constant on }\Omega$} $$ This implies that the dissipation functional $$ \mathcal{D}(f_t) = \sum_{i=1}^m \int_\Omega \rho_\infty\, |\mathcal{Y}_i f_t|^2\, dx $$ vanishes **if and only if** $f_t$ is constant. Because $p_t$ preserves mass, $$ \int_\Omega p_t = 1 \quad\Longrightarrow\quad f_t = \frac{p_t}{\rho_\infty} \text{ constant } \Rightarrow f_t \equiv 1. $$ Therefore the Lyapunov functional $$\mathcal{F}(p_t) = \int_\Omega (f_t-1)^2 \rho_\infty dx$$ is strictly decreasing **unless** $p_t = \rho_\infty$. $$\fbox{$\text{Controllability } \Longrightarrow\ \text{strict decay of }\mathcal{F}(p_t) \text{ whenever } p_t \neq \rho_\infty$} $$ From [Lasalle's invariance principle](https://), one can conclude that $p_t \rightarrow p_{\infty}$ asymptotically. --- ## Exponential Stability (Non-Holonomic Poincaré Inequality) Now, we want to consider the stronger stability condition. When can we estimate the rate of convergence. Suppose a magical control theory fairy told us the following is true $$ \boxed{ \|f - f_{\Omega,\rho_\infty}\|_{L^2(\rho_\infty)} \le C' \left( \sum_{i=1}^m \|\mathcal{Y}_i f\|^2_{L^2(\rho_\infty)} \right)^{1/2} } $$ Then using $\displaystyle \frac{d}{dt}\mathcal{F} = -2\mathcal{D}$ and $\mathcal{F} \le (C')^2\mathcal{D}$, we obtain the exponential decay $$ \boxed{ \|p_t - \rho_\infty\|_{L^2(\rho_\infty^{-1})} \le e^{-t/(C')^2}\, \|p_0 - \rho_\infty\|_{L^2(\rho_\infty^{-1})}. } $$ --- ## When Does the Non-Holonomic Poincaré Inequality Hold? If the vector fields $\{g_i\}$ satisfy the **[bracket generating condition](https://en.wikipedia.org/wiki/Orbit_(control_theory))**, and the domain $\Omega$ is regular enough then on compact domains a non-holonomic Poincaré inequality holds. Equivalently: **controllability ⇒ asymptotic convergence**. **controllability + domain regularity ⇒ exponential convergence**. ## Summary We have shown that we can solve the feedback control problem for shaping the invariant distribution of a process. *If* $v_i(x)$ can be chosen so that $$ \partial_t p_t = \sum_{i=1}^m -\big(\mathcal{Y}_i^{}\big)^*\,\mathcal{Y}_i p_t + \big(\mathcal{Y}_i\big)^*(U_i\, p_t), $$ for some functions $U_i$, then we can set $U_i = \mathcal{Y}_i \log p_{\infty}$ solves our problem. So the question is, when does this assumption hold true? One instance, when this is the case is if $\mathcal{X}_i = - \mathcal{Y}_i$, then we can take $v_i(x) = U_i(x)$ and we have solved the feedback control problem. The examples in the following section, consider such a case. The more general situation will be addressed in next lecture. --- # Numerical Simulations We illustrate the behavior of the **non-holonomic Langevin dynamics** on a *unicycle model*. The state $$(x,y,\theta)\in [0,1]^2\times \mathbb{S}^1$$ evolves as \begin{aligned} dx &= \cos\theta\,\big(u_1(x,y,\theta) + dW_1\big),\\ dy &= \sin\theta\,\big(u_1(x,y,\theta) + dW_1\big),\\ d\theta &=dW_2. \end{aligned} We choose $p_\infty$ that depends only on $(x,y)$, So, according to the formula from our previous derivation, our feedback controls are, $$ u_1(x,y,\theta) = \mathcal{X}_1\log p_\infty(x,y),~~~u_2\equiv 0. \qquad $$ where $\mathcal{X}_i = \cos(\theta)\partial_{x}+\sin(\theta)\partial_{y}$. --- ## Experiment 1 — Gaussian Target at $(0.5,0.5)$ In the first experiment, 2000 particles are initialized uniformly randomly over the domain. For the target density we choose, $$ p_\infty(x,y)\propto \exp\!\left( -\frac{(x-0.5)^2+(y-0.5)^2}{2\sigma^2} \right). $$ The evolution of the sample trajectories is shown in the following gif. | ![nonholonomic_particles-min(1)](https://hackmd.io/_uploads/rJzy9SbbWl.gif) | |:---------------------------------------------------------------------------:| | **Collection of unicycle trajectories converging to a Gaussian** | --- ## Experiment 2 — Gaussian Mixture Target In the second experiment, 2000 particles are initialized at (5,5). For the target density we choose centers: $$ (0.2,0.8),\, (0.8,0.2),\, (0.2,0.2),\, (0.8,0.8). $$ using which we define the target density $$ p_\infty(x,y)\propto \sum_{k=1}^4 \exp\!\left( -\frac{(x-\mu_{x,k})^2+(y-\mu_{y,k})^2}{2\sigma^2} \right), $$ where $$ (\mu_{x,k},\mu_{y,k})\in \{(0.2,0.8),\,(0.8,0.2),\,(0.2,0.2),\,(0.8,0.8)\}. $$ ### Sample Trajectories | ![nonholonomic_particles-min](https://hackmd.io/_uploads/rJhvwS-bWe.gif) | |:-------------------------------------------------------------------------:| | **Trajectories converging to a four-component Gaussian mixture** | # References [Karthik Elamvazhuthi, and Spring Berman. "Density stabilization strategies for nonholonomic agents on compact manifolds." IEEE Transactions on Automatic Control.](https://arxiv.org/abs/2308.15755)