Greenline projects

# Greenline projects Authors: Magdalena Luz, Yilu Chen 15.12.2025 ## What is this? This document is a kind of informal task or Project list (aka backlog, but we don't do backlogs) for the upcoming Jablonowski-Williamson test case in greenline. It is not meant to prescribe any solution but collect some things that I am aware of and that might be helpful for planning future projects. This focuses on JW Williamson and model infrastructure. ## A) Single node JW testcase The driver code that runs a standalone single node JW test case has been merged [PR-902](https://github.com/C2SM/icon4py/pull/902) it still sets up configuration by hard coding configuration dataclasses. ### 1. Finish Configuration - clean the configuration [PR-936](https://github.com/C2SM/icon4py/pull/936) and integrate the changes from Enrique. ### 2. Plugin configuration in driver - Integrate the configuration into the driver and add what is missing from the original draft. That might be missing configuration for the time loop in the `RunConfig` and `ModelConfig` in the current version of the [PR-936](https://github.com/C2SM/icon4py/pull/936). - Add IO-Config to configuration. - make the runs reproducible and avoid overwrite of output: Configuration can dump its final/processed state to a yaml file, this should be done automatically at the end of any configuration parsing. - make sure that these outputs do not get overwritten by subsequent runs from the same input data: - for each run it should be clear what setup it has been using. This is most easily achieved by writing any output configuration but also model output to a subfolder generated at model startup including a timestamp in its name. It is then up to the user to delete output data that is not useful. ### 3. Think of automated "scientific" validation It would be nice to do a JW test run regularly and triggered from the CI (say in a nightly or weekly build pipeline). We should think of validation that we can automatically assert, like - mass conservation? - energy conservation? - what else? (maybe check the DKRZ MR for their bubble experiment for inspiration..) - produce a nice plot? MPI-M/DKRZ have a setup for their *bubble* experiment that does something like that (https://gitlab.dkrz.de/icon/icon-mpim/-/merge_requests/919) that might also be a source of inspiration. ## B) Parallel JW testcase Parallelisation of the entire model infrastructure needs to be finished in order to do full parallel runs in the greenline. What we currently have is - halo exchanges in dycore and diffusion - halo exchanges in the computation of static fields in the factories where necessary - first version of global reduction (`global_min`) with this [PR-966](https://github.com/C2SM/icon4py/pull/966) ### 1. MPI tests on CI There is a [draft PR](https://github.com/C2SM/icon4py/pull/692) that I never got to run and that can be taken as inspiration (my findings or struggles are written down there), but I take this to be the highest priority project for the entire parallelization. While there was only a `datatest` for diffusion and dycore it has been manageable to fix them once in while when running locally and realizing that they were broken, (and that happened a lot not due to changes in the halo exchange or decomposition infrastructure but due to other changes). With the amount of tests increasing this becomes impossible and we should add it to the regular CI runs. This is most probably a task for somebody knowledgeable about MPI, CI and the ALPS infrastructure. Once it runs do consider adding icon4py as an example for the CSCS-CI in the knowledge base, because afaik there is no example running containerized CI with MPI. ### 2. Global reduction for mean values Add a `mean` (maybe `max`) to the global reductions. And compute `mean_cell_area`, `mean_edge_length`, `mean_dual_cell_area`, `mean_dual_edge_length` via global reduction in `geometry.py`. For this [PR-848](https://github.com/C2SM/icon4py/pull/848) should be partially reverted such that the input fields for the mean values get computed in the factory instead of read from the grid file, and the mean values are computed by `global_mean` **Be aware** that the compuation of the `cell_area` used to be wrong in at least one of the grid file generators (Will fixed that and knows all about) and hence the validation of a correct computation via serialized data will fail, because our grid files are older than this fix. ### 3. domain decomposition itself There is [PR-540](https://github.com/C2SM/icon4py/pull/540) does implements the feature for **global grids**. It needs to reviewed and cleaned. I (magdalena) do volunteer to still do fixes there. I also broke some tests again recently by adding more edges to the edge halo. The halo construction done in this PR is very faithful to the current ICON setup. It could easily be run with a decomposition dumped from ICON, but it could also be extended or changed for further investigations. ##### 3a) decomposition for LAM grids This is *not* needed for JW test case but might be nice to have in the future. Needs to be tested whether it works or what needs to be done to make it work. I suspect it is mostly an issue of ordering such that the array structure is correct (lateral boundaries, nudging zone, ...) ### 4. IO The IO module that in `icon4py-common` does not deal with parallel IO. It is also not ready for at-scale runs as it will slowdown processing a lot. There are several options: 1. Quick fix: Parallel IO with netdcf4-python https://unidata.github.io/netcdf4-python/#parallel-io Would be lovely to be able to do that without a necessarily having correctly pre-compiled version of netcdf4-python in a `uenv`, such that we can build with uv. This will not decouple IO enough to make it performant at scale. - For that it might be necessary to gather fields on IO node and write from a single node. 2. use [YAC](https://gitlab.dkrz.de/dkrz-sw/yac) and [hiopy](https://ican.pages.gwdg.de/hiopy/) to decouple IO from the model runs. Benefit is that we already have YAC availble for coupling further components. YAC has Python bindings but there might be some work to be done that we could contribute in order to improve its integration in the Python world. ### 5. Communicator structure Currently there is only one communicator and when launched with `n` MPI ranks all ranks are assigned a patch of the compute domain. It could helpful to be able to have dedicated IO rank(s). ### 6. Restart Add restart functionality, in what sense does it restrict or impact the decision taken for IO in the projects above? ## C) Warm bubble Warm bubble runs on torus and uses additional physics components that are not present in JW. With a multicore JW in place the extra work is essentially in bringing in additional components. I would align the setup we do and tests with the MPI-M bubble experiment: (https://gitlab.dkrz.de/icon/icon-mpim/-/merge_requests/919) ### 1. Torus grid This is currently under active development. ### 2. integration of physics components in to the model Warm bubble adds physics components and it should be used to rework the way we call components in the model in the driver in terms of general interfaces of the components. It is a good exercise to do a step towards a flexible model where components become easily interchangeable. Warm bubble needs: - Tracer Advection - Turbulence - Microphysics ### 2. Tracer Advection There is tracer advection granule in `icon4py`: `icon4py-advection` which has not received much love in recent month. It does use | | scheme | delimiter | | ---------- |:-------------------------------------- |:----------------- | | horizontal |2 - 2nd order Miura | positive definite | | vertical | 1 - Upwind 1st order, 3 - PPM 3d order | Semi monotonic | | | | | and runs on a single tracer field. MCH setup runs the scheme 52 (FFSL_HYB with miura_cycl) stencils have been ported for this one but they are not used in the granule. #### Work to be done - On the granule: In terms of schemes and delimiters that simple implemented schemes should be enough for a warm bubble. But the granule probably needs some refactoring, optimization and parallelization over several tracer fields. - **Least square** coefficients. Advection needs some least square interpolation coefficients that need to be ported. They are availabel in the serialiazed data. ### 5. Initial conditions and additional diagnostics - There might be different initial condition (different from JW), -> Chia Rui knows that. - Possibly additional diagnostics (others than used in microphysics) -> Chia Rui knows that. ### 3. Turbulence 1d turbulence in Fortran has been "isolated" by Yilu and it has generated python bindings using `F2Py` The granule version has `openacc` directives, and the functionality needs to be tested. The Python bindings need to be adjusted in order to pass GPU pointers to the Fortran component. `F2Py` maybe (or not) support this. ### 4. Land There is something called "simple" land what I don't know what it exactly means but it does not involve using a complicated land component like JS BACH. ### 5. Microphysics Both ICON `microphysics` and `muphys` are ported and could be used.