Generative Modeling for Control: Lecture 0 - Control as Generative Modeling

## Overview The goal of this series of notes is to frame control as a generative modeling problem and to incorporate modern diffusion-style deep learning methods into planning and control. --- ## Generative Modeling Lets review what a generative modeling problem is first. Let $X$ be a random variable distributed according to a probability density $p_X(x)$. Let $Y$ be another random variable distributed according to $p_Y(y)$. The generative modeling problem is to find $G_\theta$ such that $$Y \approx G_\theta(X).$$ --- ## Pushforward (Measure Transport) Formulation One can also use the *measure pushforward notation* to pose this problem. The map $G_\theta: \mathbb{R}^d \to \mathbb{R}^d$ induces a transformation of probability measures (and, when they exist, densities) $p_X \mapsto p_{G_\theta(X)}$, denoted by $(G_\theta)_{\#}$. Using this notation, the problem can be posed in terms of probability densities. :::info **Goal (density matching):** Find $G_\theta$ such that $$(G_\theta)_{\#}p_X = p_Y.$$ ::: --- ## Control as Generative Modeling In the control setting the question transforms in the following way. Let $$\frac{dX}{dt} = F(X, U), \qquad X(0) \sim p_{0}(x).$$ Can we find a (possibly time-dependent) feedback law $U_\theta(t,x)$ such that the closed-loop dynamics $$\frac{dX}{dt} = F\bigl(X, U_\theta(t,X)\bigr), \qquad X(0) \sim p_{0}(x)$$ satisfy the terminal objective $$X(T) \approx Y?$$ :::info **Control objective (terminal matching):** $$X(T) \approx Y.$$ ::: --- ## Pushforward View of Controlled Dynamics One can again use pushforward notation to pose this problem in terms of probability densities. Let $\Phi_U: \mathbb{R}^d \rightarrow \mathbb{R}^d$ be the flow map induced by the controlled dynamics. Given the ODE $$\frac{dx}{dt} = F\bigl(x, U(t,x)\bigr), \qquad x(0)=x_0,$$ define $\Phi_U(x_0) := x(T)$. Then the problem becomes: :::info **Goal (control as transport):** Find $U(t,x)$ such that $$(\Phi_U)_{\#}p_{0} = p_Y.$$ ::: ### Why Generative Modeling for Control? At first, it might seem like we have only made the problem harder. What once looked like a simple state-to-state control problem has now been transformed into a probabilistic task of transforming one density into another. This kind of “lifting” can feel like it’s taking the problem in the wrong direction. However, as we’ll see throughout this series, this perspective actually simplifies certain issues. The probabilistic formulation, in a sense, globally linearizes the problem, which makes it convenient for analysis. In some formulations, even when the underlying control system is nonlinear, the resulting control problem becomes convex. Another benefit is that this viewpoint integrates naturally with modern machine learning techniques such as diffusion models and flow matching. By viewing control as a denoising problem, we can repurpose classic control tools in new ways for feedback synthesis. ## Roadmap We will explore several ways to solve this control problem by adapting existing (non-control) generative modeling methods to the control setting. Following is a non-exhaustive list of topics to be considered: :::info 1. **Noise-Based Control** 2. **Optimal Transport** 3. **Normalizing Flows** 4. **Denoising Diffusion Models** 5. **Flow Matching** ::: The focus of this course will be the continuous-time, continuous-state setting, which (in my view) provides the cleanest path for generalizing modern generative modeling methods from machine learning. Discrete-time or discrete-state formulations are also possible, but we will not consider them in this series. ## What this Course is Not : Generative Imitation Learning In imitation learning, generative modeling is typically used to learn a *policy distribution* from demonstrations. This has become very popular recently in ML based robotics. Given *expert trajectories* $\mathcal{D}=\{(X_t,U_t)\}_{t=0}^{T},$ one fits a conditional generative model (a policy)$U_t \sim \pi_\theta(\cdot \mid X_t) \quad \text{(more generally, } U_t \sim \pi_\theta(\cdot \mid c_t)\text{)}$ so that the learned conditional $\pi_\theta$ reproduce the expert’s behavior. The distributional matching target is therefore the expert’s *behavior distribution* via the conditional law $\pi_\theta(U\mid X)$. In contrast, this course starts from known controlled dynamics and poses feedback synthesis as an explicit measure-transport problem: choose a (possibly time-dependent) feedback law $U_\theta(t,x)$ so that the induced flow map $\Phi_{U_\theta}$ pushes $p_0$ to a desired terminal law: $$(\Phi_{U_\theta})_{\#}p_0 = p_Y.$$