Neural ODE - HackMD

# Neural ODE ###### tags: `normalizing flows` `polynomials` `Jacobi` > An intuitive proof of the instantaneous change of variables theorem in [[Neural Ordinary Differential Equations, Th. 1]](https://arxiv.org/pdf/1806.07366.pdf) In the below I provide a more intuitive proof of the change of variable theorem for probability distributions driven by ordinary differential equations: ### Theorem Consider a family of random variables $(\mathbf{z}(t))_{t \in [0, T]}$ with described (almost surely) by the ODE $\dot{\mathbf{z}}(t) = f(\mathbf{z}(t), t)$ and with probability densities given by $p(\mathbf{z}(t))$, under regularity conditions we have $$ \frac{\partial \ln p(\mathbf{z}(t))}{\partial t} = -\textbf{Tr}\bigg(\frac{\partial f}{\partial \mathbf{z}}[\mathbf{z}(t), t]\bigg) $$ ### Proof #### Step 1: Discrete normalizing flow The first step starts similarly to the original proof: we consider the discretized system $$ \mathbf{x}_{t+h} = \mathbf{x}_t + h f(\mathbf{x}_t, t) $$ for which the Lebesgue change of variable theorem directly gives the well-known discrete normalizing flow formula: $$ \ln p(\mathbf{x}_{t+h}) = \ln p(\mathbf{x}_t) - \ln \bigg| \mathrm{det}(I + h J_{\mathbf{x}}f)\bigg| $$ which for a small enough $h$ simplifies to $\ln p(\mathbf{x}_t) - \ln \mathrm{det}(I + h J_{\mathbf{x}}f).$ #### Step 2: Algebraic argument We can now recognize the characteristic polynomial $P[-J_{\mathbf{x}}f]$ of $-J_{\mathbf{x}}f$ in the expression before. Indeed, if we factorize $h$ out we get: $$ \mathrm{det}(I + h J_{\mathbf{x}}f) = h^n \mathrm{det}\bigg(\frac{1}{h} I - (-J_{\mathbf{x}}f)\bigg) = h^n P[-J_{\mathbf{x}}f]\bigg(\frac{1}{h}\bigg) $$ Moreover, we know that the roots of the characteristic polynomial of $-J_{\mathbf{x}}f$ are exactly the eigenvalues of $-J_{\mathbf{x}}f$ and the sum of these eigenvalues is $\mathbf{Tr}(-J_{\mathbf{x}}f)$. #### Step 3: Analysis argument Additionally, any factorized (not necessarily in the same algebra) polynomial $Q(X) = \prod_{i=1}^n (X - a_i)$ can be written $Q(X) = X^n - X^{n-1}\sum_{i=1}^n a_i + \dots$, so that we can write \begin{align} \ln \mathrm{det}(I + h J_{\mathbf{x}}f) &= \ln\Big(h^n\big(h^{-n} + h^{1-n}\mathbf{Tr}(J_{\mathbf{x}}f) + \dots\big)\Big) \\ &= \ln\Big(1 + h \mathbf{Tr}(J_{\mathbf{x}}f) + O(h^2)\Big)\\ &= h \mathbf{Tr}(J_{\mathbf{x}}f) + O(h^2) \end{align} Finally, taking $h \to 0$ provides the result. ### What does this teach us on the result? A known version of Jacobi's formula consists in the the following expansion: for a matrix $A$, $$ \mathrm{det}(\exp(tA)) = 1 + t \mathbf{Tr}(A) + o(t) $$ so that if we were considering instead of our non-linear ODE a linear system: $$ \dot{\mathbf{z}}(t) = A \mathbf{z}(t) $$ with solution $\mathbf{z}(t) = \exp(tA)\mathbf{z}(0)$, then the change of volume (the log-density) in the parallelelipiped $\mathbf{z}(t)$ would be given in the first order by the trace of $A$. What the theorem therefore tells us is that under some regularity conditions, the rate of the change in volume is the same as the one in the linearized system.