owned this note
owned this note
Published
Linked with GitHub
###### tags: `numerical analysis` `expository` `one-offs`
# Constructing Solutions to Ordinary Differential Equations
**Overview**: The goal of this note is to present one derivation of the class of Runge-Kutta methods for the numerical solution of ordinary differential equations (ODEs). The roadmap is to first describe a key result for proving that initial value ODE problems admit unique solutions *in theory* (the Cauchy-Lipschitz-Picard-Lindelöf Theorem), and to then explain how applying a constructive perspective to this theorem allows for the derivation of a practical algorithm for solving the ODE numerically *in practice*.
## Initial Value Problems
Restricting ourselves to one-dimensional problems for simplicity, an *initial value problem* (IVP) is typically given by specifying a *vector field*
\begin{align}
f : \mathbf{R} \times [0, T] \to \mathbf{R},
\end{align}
an *initial condition* $x_0 \in \mathbf{R}$, and seeking a differentiable path $x: [0, T] \mapsto \mathbf{R}$ which satisfies
\begin{align}
x(0) &= x_0 \\
\dot{x}(t) &= f (x(t), t) \quad \text{for } 0 \leqslant t \leqslant T.
\end{align}
A priori, we do not know whether such equations admit solutions for arbitrary $(f, T, x_0)$. As such, before seeking a numerical solution to the IVP, it seems sensible to first examine whether the 'real' problem has a solution.
## Iterative Construction of Solutions
The IVP can be viewed as an abstract root-finding problem in $x$, i.e. defining an operator $F$ by
\begin{align}
F [x] (t) = \dot{x}(t) - f(x(t), t),
\end{align}
then we are seeking a solution to $F[x] = \mathbf{0}$, subject to the condition that $x(0) = x_0$.
Integrating this condition with respect to $t$, we can reformulate the IVP as
\begin{align}
x(t) &= x_0 + \int_{0 \leqslant s \leqslant t} f(x(s), s) ds \\
&=: \Phi [x](t),
\end{align}
which means that we are now solving an abstract fixed-point problem for $x$.
The benefit of this reformulation is that it is possible to demonstrate that fixed-point equations admit solutions using quite generic tools. In particular, if we can show that the mapping $\Phi$ (or some iterate thereof) is contractive with resepct to some metric, then the Banach Fixed-Point Theorem guarantees that $\Phi$ admits a unique fixed point. This will be our strategy, which is known in this case as 'Picard iteration'.
## The Cauchy-Lipschitz-Picard-Lindelöf Theorem
We begin by assuming that the vector field $f$ satisfies a uniform continuity condition of the form
\begin{align}
| f(y, t) - f(z, t) | \leqslant L | y - z | \quad \text{for all } y, z \in \mathbf{R}, t \in [0, T].
\end{align}
We now consider two trial solutions to the ODE, $y$ and $z$, and consider how they behave under the application of $\Phi$:
\begin{align}
| \Phi [y] (t) - \Phi [z] (t) | &= \left| \left( x_0 + \int_{0 \leqslant s \leqslant t} f(y(s), s) ds \right) - \left( x_0 + \int_{0 \leqslant s \leqslant t} f(z(s), s) ds \right) \right| \\
&= \left| \int_{0 \leqslant s \leqslant t} \left( f(y(s), s) - f(z(s), s) \right) ds \right| \\
&\leqslant \int_{0 \leqslant s \leqslant t} \left| f(y(s), s) - f(z(s), s) \right| ds \\
&\leqslant \int_{0 \leqslant s \leqslant t} L \left| y(s) - z(s) \right| ds.
\end{align}
Now, we can begin to study how $\Phi$ brings solutions closer together. For example,
* If $|y|, |z|$ are both bounded in absolute value by $B$, then it follows that $| \Phi [y] (t) - \Phi [z] (t) | \leqslant 2BLt$ for $0 \leqslant t \leqslant T$.
* If $| y (t) - z(t) | \leqslant 2LBt$, then it folows that $| \Phi [y] (t) - \Phi [z] (t) | \leqslant BL^2t^2$ for $0 \leqslant t \leqslant T$.
From these two examples, one could then conjecture that applying $\Phi$ $K$ times would result in a bound of the form $| \Phi^{\circ K} [y] (t) - \Phi^{\circ K} [z] (t) | \leqslant 2B \cdot \frac{L^K t^K}{K!}$, and indeed this holds true.
Using the bound $K! \geqslant \left( \frac{K}{e} \right)^K$, it is then relatively straightforward to deduce that for $K$ sufficiently large, $\Phi^{\circ K}$ is a contraction in the supremum norm. The Banach Fixed-Point Theorem then allows for us to conclude that $\Phi$ admits a unique fixed point, which is the solution of our IVP.
## Constructing Numerical Solutions by Quadrature
In fact, our proof strategy gives us a little bit more, namely that the convergence to a fixed point is super-exponentially-fast. For a numerical scheme, this would be great news, but there is a hitch: for general IVPs, applying the operator $\Phi$ in closed-form is not possible. As such, we need to consider approximation strategies.
### Quadrature
We first make a brief detour into numerical quadrature, i.e. the task of estimating the value of integrals. For integrals over $[0, 1]$, a quadrature rule $Q$ is an estimate of the form
\begin{align}
\int_{0 \leqslant t \leqslant 1} F(t) dt &= Q(f) + \text{Err}_Q (F) \\
Q(f) &:= \sum_{i = 1}^N b_i F (t_i ),
\end{align}
where $\{t_1, \ldots, t_N \} \subset [0, 1]$, $\{ b_1, \ldots, b_N \} \subset \mathbf{R}$, and $\text{Err}_Q$ is hopefully small for $F$ of interest. Typical strategies involve choosing the 'nodes' $t_i$ to be appropriately-dispersed across the interval, and choosing the 'weights' $w_i$ such that $\text{Err}_Q$ is exactly $0$ for polynomials of a certain degree. There are many other approaches which one might consider, which I will omit discussion of here.
The crucial takeaway is that for nice problems in this setting, good quadrature rules exist and can be found easily enough.
### Picard Iteration by Quadrature
Given a quadrature rule $Q$, we can define a modified $\Phi$ operator by
\begin{align}
\Phi^Q [x](1) &= Q ( f \circ x) \\
&= \sum_{i = 1}^N b_i f ( x(t_i), t_i).
\end{align}
Note that this is only defined for finding the value of $x$ at $T = 1$.
Now, this is *almost* tractable, except for the fact that we don't know the value of the $x(t_i)$ in question either (except if $t_i = 0$ for some $i$). We try to circumvent this in two steps:
1. Define variables $X_i$, with the intent that $X_i \approx x(t_i)$.
2. Use an inner quadrature rule to approximate the $X_i$ in terms of the values of $f$ at the other $X_j$, i.e. writing $F_i = f ( X_i, t_i)$, construct a set of coefficients $\{ a_{i, j}\}$ such that
\begin{align}
\text{for } &i = 1, \ldots, N \\
X_i &= x_0 + \sum_{j = 1}^N a_{i, j} F_j.
\end{align}
If we can solve these equations for $X_1, \ldots, X_N$, we can then report a final estimate of
\begin{align}
x(1) \approx x_1 := x_0 + \sum_{i = 1}^N b_i F_i.
\end{align}
### Some Examples
* The quadrature rule
\begin{align}
\int_{0 \leqslant t \leqslant 1} F(t) dt \approx f(0)
\end{align}
(i.e. $N = 1, t_1 = 0, b_1 = 1$) yields the so-called *Explicit Euler* integrator
\begin{align}
X_1 &= x_0 \\
x_1 &= x_0 + f ( X_1, t_1) \\
&= x_0 + f ( x_0, 0).
\end{align}
* Along similar lines, we can consider the quadrature rule
\begin{align}
\int_{0 \leqslant t \leqslant 1} F(t) dt \approx f \left( \frac{1}{2} \right)
\end{align}
(i.e. $N = 1, t_1 = 1/2, b_1 = 1$) leads to
\begin{align}
x_1 &= x_0 + f ( X_1, t_1) \\
&= x_0 + f \left( x_1, \frac{1}{2} \right).
\end{align}
Now, we need to specify $X_1$ somehow. A simple approach would be to say
\begin{align}
X_1 &\approx x \left( \frac{1}{2} \right) \\
&= x_0 + \int_{0 \leqslant t \leqslant 1/2} f ( x (s), s) ds \\
&\approx x_0 + \frac{1}{2} f ( x_0, 0),
\end{align}
which leads to the so-called *midpoint integrator*.
## Runge-Kutta Integrators and their Properties
The above derivation is one perspective on a class of methods for numerical integration known as multi-stage (not to be confused with multi-step) or *Runge-Kutta* methods. At this stage, you may have many questions, such as:
* How does one choose the nodes $t_i$, and the weights $b_i$?
* How does one choose the matrix of coefficients $A = \{ a_{i, j} \}$?
Such questions are very reasonable, but the reader is assured that they do have very concrete answers. To some extent, the first two questions force one to think about the broader goals of the numerical solution of differential equations: do we want a highly-accurate solution, or a good-enough solution quickly? What sort of vector fields $f$ are we up against, and what challenges do they each pose?
Without delving into the specifics of each answer, it is useful to remark that many of the criteria by which Runge-Kutta schemes are judged can be reduced to the study of algebraic properties of the coefficents $(A, b)$. As such, it is often possible to abstract away the genesis of these coefficients in quadrature rules, and focus directly on the coefficients themselves.
Another very reasonable question is:
* How does one solve the system of equations for $X_1, \ldots X_N$?
This has a slightly more straightforward answer. The easy answer is that if one chooses the matrix $A$ to be strictly lower-triangular, then the equations for $X_i$ are fully-explicit, and there is no non-trivial equation to be solved. On the other end of the spectrum, if the matrix $A$ is dense, then one typically has to resort to numerical methods for solving nonlinear fixed-point equations, such as Newton's method. There are also intermediate situations, such as when $A$ is non-strictly lower-triangular, which means that instead of solving one nonlinear system involving $\mathcal{O}(N)$ variables, one can instead solve $N$ separate nonlinear systems, each involving $\mathcal{O}(1)$ variables, at least for problems in fixed dimension.
## Extensions
The above derivation provides one route to deriving Runge-Kutta methods for the numerical solution of ODE Initial Value Problems. The interested reader might find it instructive to revisit the derivations while considering other forms of evolution equation, such as diffusions, jump processes, and other more exotic classes of dynamical systems. Some of the ideas port over to these settings quite naturally, and others are more resistant.
## Conclusion
Runge-Kutta methods are a widely-used approach to the numerical solution of differential equations. It is appealing to me that they can be derived by first considering a theoretical tool (Picard iteration) for proving the existence and uniqueness of *exact* solutions to the ODE system, and then reshaping it into a practical numerical tool.
## Post-Script
It should be acknowledged that the above presentation draws on the pedagogy of many existing textbooks on the subject of numerical analysis. I would like to draw specific attention to the presentation in Arieh Iserles' excellent textbook [A First Course in the Numerical Analysis of Differential Equations](https://www.cambridge.org/core/books/first-course-in-the-numerical-analysis-of-differential-equations/2B4E05F5CFC58CFDC7BBBC6D1150661B) as having influenced my perspective here.