Details for Adjoint Method in Neural ODE

# Details for Adjoint Method in Neural ODE ###### tags: `NODE` **In my opinion, this method is just like compute the gradients of NODE by solving another ODE.** Let's build a Neural ODE like: $$ x(t_{n}) = x(t_0) + \int_{t_0}^{t_{n}}f(x(t),t,\theta)dt $$ and the loss function $L(x(t_n),y)$ where $y$ is ground truth of $x({t_0})$ (To simplify, we just use $L$ to indicate $L(x(t_n),y)$). Our goal is to compute $\dfrac{\partial L}{\partial x(t_0)}$ and $\dfrac{\partial L}{\partial \theta}$. We achieve this goal by using adjoint method. Assume: $$ a(t) = \frac{dL}{dx(t)}$$ We have: $$ \frac{da(t)}{dt} = lim_{\epsilon\to0} \frac{a(t+\epsilon)-a(t)}{\epsilon} \\=lim_{\epsilon\to0} \frac{a(t+\epsilon) - \frac{dL}{dx(t+\epsilon)}\frac{dx(t+\epsilon)}{dx(t)}}{\epsilon} \\=lim_{\epsilon\to0} \frac{a(t+\epsilon) - a(t+\epsilon)\frac{dx(t+\epsilon)}{dx(t)}}{\epsilon} \\=lim_{\epsilon\to0} \frac{a(t+\epsilon) - a(t+\epsilon)\frac{d}{dx(t)}(x(t) + \epsilon f(x(t),t,\theta)+O(\epsilon^2))}{\epsilon} (Taylor\quad series\quad around \quad x(t)) \\= lim_{\epsilon\to0} -a(t+\epsilon)\frac{\partial f(x(t),t,\theta)}{\partial x(t)} + O(\epsilon) \\= -a(t)\frac{\partial f(x(t),t,\theta)}{\partial x(t)}$$ Therefore, we have a ODE formed like: $$ a(t_n) = \frac{dL}{dx(t_n)} $$ $$ a(t_0) = a(t_n) + \int_{t_n}^{t_0}-a(t)\frac{\partial f(x(t),t,\theta)}{\partial x(t)}dt$$ Just consider the parameter $\theta$ is a variable $\theta(t)$ changing with t (even it is not). By just replacing $x(t)$ with $\theta(t)$ and set the initial value as 0. ![](https://i.imgur.com/yy7JbV5.png) In summary, the forward and backward of Neural ODE are both transformed into the problem of solving ODE. ![](https://i.imgur.com/MP0CjAa.png)