# Research: the Lagrangian pattern
###### tags: `cycle 14`
- Developers:
- Appetite:
## Scope
Find the common building block(s) of the double-nested k-loop pattern found in FV3's mapz, ICON's upwind_vflux_ppm4gpu and sedi_icon_box_core or FV3GFS's sedimentation melting and explore how it can be mapped to GT4Py. Either by developing a frontend feature and mapping to current Iterator IR or by extending it with a new builtin to cover the motif.
This document is a collection of examples of this pattern and previous explorations.
We (David, Hannes) considered this project suitable for a master thesis.
## The common pattern
... is a vertical loop that contains another loop from the current level to the start/end of the domain.
E.g.
```python!
def box_sedim_naive(v: npt.NDArray, z: npt.NDArray, phi: npt.NDArray, dt: npt.DTypeLike):
Pbar = np.zeros_like(v)
for k in range(v.shape[0]):
for m in range(k + 1):
z_up = min(z[k - m - 1], z[k] + np.abs(v[k - m]) * dt)
z_low = min(z[k - m], z_up)
Pbar[k] -= phi[k - m] * (z_up - z_low)
return Pbar
```
Several variations of this algorithm exist which try to improve performance by early exiting the inner loop.
### In other words
Computations in microphysics and also in advection seem to perform the following integral:
$F(z_0)$ = $\int_{z_0}^{\infty}$ $\phi$(z)$dz$
$\infty$ is used as the upper limit because the real limit is not known a priori. In practise, however, the upper limit is `approximated` by the domain top (or bottom).
## Examples
### Explicit hydrometeor sedimentation in the Seifert-Beheng 2-moment bulk microphysical scheme
This is the algorithm that we need for ICON's 2-moment microphysics.
A standalone description is available at
http://www.cosmo-model.org/content/model/documentation/core/docu_sedi_twomom.pdf.
If the box can be bounded at compile time, the algorithm can be unrolled to a single scan (with state as deep as the size of the box). In the limit of large box size the state will be large and a scan will not be efficient.
Hannes played with scan implementations of this algorithm in https://gist.github.com/havogt/c52d19f2f7557c2c048d12be7330bda8.
### Lagrangian-to-Eulerian remapping in FV3 (mapz)
Can be represented as a single scan if a `K_LIMIT` is introduced. However, this implemented was not clearly better performing and it is unclear if a compile-time `K_LIMIT` exists.
See a [master thesis report](http://www1.mate.polimi.it/~forma/Didattica/ProgettiPacs/Romani18-19/pacs.pdf) and a [post-thesis analysis by Linus]( https://polybox.ethz.ch/index.php/s/PfK4jDRVN9AS7gV).
GridTools implementations are in https://github.com/eth-cscs/GT_FV3.
### NOAA-GFDL's sedimentation melting
```python!
for k in range(ke, ks-1,-1):
if v_terminal[k] >= 1.e-10:
continue
if qice[k] > constants.QCMIN:
for m in range(k+1, ke+1):
if zt[k+1] >= ze[m]:
break
if (zt[k] < ze[m+1]) and (temperature[m] > constants.TICE):
cvm[k] = moist_heat_capacity(qvapor[k], qliquid[k], qrain[k], qice[k], qsnow[k], qgraupel[k])
cvm[m] = moist_heat_capacity(qvapor[m], qliquid[m], qrain[m], qice[m], qsnow[m], qgraupel[m])
dtime = min(timestep, (ze[m] - ze[m+1])/v_terminal[k])
dtime = min(1., dtime/tau_mlt)
sink = min(qice[k] * delp[k] / delp[m], dtime * (temperature[m] - constants.TICE) / icpk[m])
qice[k] -= sink * delp[m] / delp[k]
if zt[k] < zs:
r1 += sink*delp[m]
else:
qrain[m] += sink
temperature[k] = (temperature[k] * cvm[k] - li00 * sink * delp[m] / delp[k]) / moist_heat_capacity(qvapor[k], qliquid[k], qrain[k], qice[k], qsnow[k], qgraupel[k])
temperature[m] = (temperature[m] * cvm[m]) / moist_heat_capacity(qvapor[m], qliquid[m], qrain[m], qice[m], qsnow[m], qgraupel[m])
if qice[k] < QCMIN:
break
```
See https://github.com/GridTools/gt4py/issues/131.
### ICON Tracer Advection
See around https://github.com/C2SM/icon-exclaim/blob/e72b8427c0dc9ad110eb60d9a7beb022544abd0a/src/advection/mo_advection_vflux.f90#L1566
## Appendix
### Master thesis project outline
1. Getting started
- implement sedimentation http://www.cosmo-model.org/content/model/documentation/core/docu_sedi_twomom.pdf with NumPy
- rewrite as scan with k-limit
- study with idealized data (worst and best case)
- maybe analytical performance model? Need Torsten.
- (1a.) Study performance in kinematic model
2. Study realistic cases to derive reasonable k-limits in ICON
3. Reimplement in CUDA for performance comparison of the 2 implementations
- analyze performance as a function of the k-limit
4. Analyze common features of the double-k-loop algorithms that are translatable to scans and derive a GT4Py language construct that expresses this motif.
- Concrete for sedim: express the algorithm such that the k-limit is user input and generates code for given k-limit
- Generate GT4Py (frontend or IR) code from the above
5. Conclusion/discussion: Is this a pattern that makes sense for the domain scientist? Is it reasonable to have a k-limit for all use-cases, etc.