# Details for Adjoint Method in Neural ODE
###### tags: `NODE`
**In my opinion, this method is just like compute the gradients of NODE by solving another ODE.**
Let's build a Neural ODE like:
$$ x(t_{n}) = x(t_0) + \int_{t_0}^{t_{n}}f(x(t),t,\theta)dt $$
and the loss function $L(x(t_n),y)$ where $y$ is ground truth of $x({t_0})$ (To simplify, we just use $L$ to indicate $L(x(t_n),y)$).
Our goal is to compute $\dfrac{\partial L}{\partial x(t_0)}$ and $\dfrac{\partial L}{\partial \theta}$. We achieve this goal by using adjoint method.
Assume:
$$ a(t) = \frac{dL}{dx(t)}$$
We have:
$$ \frac{da(t)}{dt} = lim_{\epsilon\to0} \frac{a(t+\epsilon)-a(t)}{\epsilon} \\=lim_{\epsilon\to0} \frac{a(t+\epsilon) - \frac{dL}{dx(t+\epsilon)}\frac{dx(t+\epsilon)}{dx(t)}}{\epsilon}
\\=lim_{\epsilon\to0} \frac{a(t+\epsilon) - a(t+\epsilon)\frac{dx(t+\epsilon)}{dx(t)}}{\epsilon}
\\=lim_{\epsilon\to0} \frac{a(t+\epsilon) - a(t+\epsilon)\frac{d}{dx(t)}(x(t) + \epsilon f(x(t),t,\theta)+O(\epsilon^2))}{\epsilon} (Taylor\quad series\quad around \quad x(t))
\\= lim_{\epsilon\to0} -a(t+\epsilon)\frac{\partial f(x(t),t,\theta)}{\partial x(t)} + O(\epsilon)
\\= -a(t)\frac{\partial f(x(t),t,\theta)}{\partial x(t)}$$
Therefore, we have a ODE formed like:
$$ a(t_n) = \frac{dL}{dx(t_n)} $$
$$ a(t_0) = a(t_n) + \int_{t_n}^{t_0}-a(t)\frac{\partial f(x(t),t,\theta)}{\partial x(t)}dt$$
Just consider the parameter $\theta$ is a variable $\theta(t)$ changing with t (even it is not). By just replacing $x(t)$ with $\theta(t)$ and set the initial value as 0.

In summary, the forward and backward of Neural ODE are both transformed into the problem of solving ODE.
