# Research: the Lagrangian pattern ###### tags: `cycle 14` - Developers: - Appetite: ## Scope Find the common building block(s) of the double-nested k-loop pattern found in FV3's mapz, ICON's upwind_vflux_ppm4gpu and sedi_icon_box_core or FV3GFS's sedimentation melting and explore how it can be mapped to GT4Py. Either by developing a frontend feature and mapping to current Iterator IR or by extending it with a new builtin to cover the motif. This document is a collection of examples of this pattern and previous explorations. We (David, Hannes) considered this project suitable for a master thesis. ## The common pattern ... is a vertical loop that contains another loop from the current level to the start/end of the domain. E.g. ```python! def box_sedim_naive(v: npt.NDArray, z: npt.NDArray, phi: npt.NDArray, dt: npt.DTypeLike): Pbar = np.zeros_like(v) for k in range(v.shape[0]): for m in range(k + 1): z_up = min(z[k - m - 1], z[k] + np.abs(v[k - m]) * dt) z_low = min(z[k - m], z_up) Pbar[k] -= phi[k - m] * (z_up - z_low) return Pbar ``` Several variations of this algorithm exist which try to improve performance by early exiting the inner loop. ### In other words Computations in microphysics and also in advection seem to perform the following integral: $F(z_0)$ = $\int_{z_0}^{\infty}$ $\phi$(z)$dz$ $\infty$ is used as the upper limit because the real limit is not known a priori. In practise, however, the upper limit is `approximated` by the domain top (or bottom). ## Examples ### Explicit hydrometeor sedimentation in the Seifert-Beheng 2-moment bulk microphysical scheme This is the algorithm that we need for ICON's 2-moment microphysics. A standalone description is available at http://www.cosmo-model.org/content/model/documentation/core/docu_sedi_twomom.pdf. If the box can be bounded at compile time, the algorithm can be unrolled to a single scan (with state as deep as the size of the box). In the limit of large box size the state will be large and a scan will not be efficient. Hannes played with scan implementations of this algorithm in https://gist.github.com/havogt/c52d19f2f7557c2c048d12be7330bda8. ### Lagrangian-to-Eulerian remapping in FV3 (mapz) Can be represented as a single scan if a `K_LIMIT` is introduced. However, this implemented was not clearly better performing and it is unclear if a compile-time `K_LIMIT` exists. See a [master thesis report](http://www1.mate.polimi.it/~forma/Didattica/ProgettiPacs/Romani18-19/pacs.pdf) and a [post-thesis analysis by Linus]( https://polybox.ethz.ch/index.php/s/PfK4jDRVN9AS7gV). GridTools implementations are in https://github.com/eth-cscs/GT_FV3. ### NOAA-GFDL's sedimentation melting ```python! for k in range(ke, ks-1,-1): if v_terminal[k] >= 1.e-10: continue if qice[k] > constants.QCMIN: for m in range(k+1, ke+1): if zt[k+1] >= ze[m]: break if (zt[k] < ze[m+1]) and (temperature[m] > constants.TICE): cvm[k] = moist_heat_capacity(qvapor[k], qliquid[k], qrain[k], qice[k], qsnow[k], qgraupel[k]) cvm[m] = moist_heat_capacity(qvapor[m], qliquid[m], qrain[m], qice[m], qsnow[m], qgraupel[m]) dtime = min(timestep, (ze[m] - ze[m+1])/v_terminal[k]) dtime = min(1., dtime/tau_mlt) sink = min(qice[k] * delp[k] / delp[m], dtime * (temperature[m] - constants.TICE) / icpk[m]) qice[k] -= sink * delp[m] / delp[k] if zt[k] < zs: r1 += sink*delp[m] else: qrain[m] += sink temperature[k] = (temperature[k] * cvm[k] - li00 * sink * delp[m] / delp[k]) / moist_heat_capacity(qvapor[k], qliquid[k], qrain[k], qice[k], qsnow[k], qgraupel[k]) temperature[m] = (temperature[m] * cvm[m]) / moist_heat_capacity(qvapor[m], qliquid[m], qrain[m], qice[m], qsnow[m], qgraupel[m]) if qice[k] < QCMIN: break ``` See https://github.com/GridTools/gt4py/issues/131. ### ICON Tracer Advection See around https://github.com/C2SM/icon-exclaim/blob/e72b8427c0dc9ad110eb60d9a7beb022544abd0a/src/advection/mo_advection_vflux.f90#L1566 ## Appendix ### Master thesis project outline 1. Getting started - implement sedimentation http://www.cosmo-model.org/content/model/documentation/core/docu_sedi_twomom.pdf with NumPy - rewrite as scan with k-limit - study with idealized data (worst and best case) - maybe analytical performance model? Need Torsten. - (1a.) Study performance in kinematic model 2. Study realistic cases to derive reasonable k-limits in ICON 3. Reimplement in CUDA for performance comparison of the 2 implementations - analyze performance as a function of the k-limit 4. Analyze common features of the double-k-loop algorithms that are translatable to scans and derive a GT4Py language construct that expresses this motif. - Concrete for sedim: express the algorithm such that the k-limit is user input and generates code for given k-limit - Generate GT4Py (frontend or IR) code from the above 5. Conclusion/discussion: Is this a pattern that makes sense for the domain scientist? Is it reasonable to have a k-limit for all use-cases, etc.