- Shaped by: Christoph, Daniel, Abishek
- Appetite (FTEs, weeks):
- Developers:
## Problem
Currently the interface (integration) of gt4py in ICON can only deal with one block (usually `jb` loop in Fortran). This leads to the constraint that nproma has to be the number of edges for the gt4py/DSL branch of ICON, since there are more edges than cells or vertices, ensuring that there is always only one block.
This puts us at a performance disadvantage of 3% compared to icon-nwp executed on GPUs compiled with OpenACC, since that can handle any number of blocks.
## Background
The 3% was measured by running the `exp.mch_bench_r19b07_icon2.run` with icon-nwp OpenACC compiled on balfrin, once with `nproma=num_cells` and once with `nproma=num_edges`. The cell variant was 1.03x faster. The experiment was done with double precision, results for mixed could be different The experiment was done with double precision, results for mixed could be different
However, there are parts of ICON which are slightly faster with `nproma=num_edges`, so the most performant variant is probably to have two or three different npromas for edges, cells and vertices.
Note: Some sub-timers of ICON were _faster_ with `nproma=num_edges`, so completing this project will also slightly speed up OpenACC compiled icon-nwp. But this speedup is probably below 1%.
## Appetite
For the research task, the appetite is one week.
(For the actual project, since this is a performance feature for production, the appetite is quite high. A full cycle at least.)
## Solution
- Introduce two or three different npromas in ICON (`nproma_c`, `nproma_e`, `nproma_v`) in order to always have one block for GPUs while having ideal memory usage. This idea can be extended to nesting, where at the moment memory is overallocated since the global `nproma_c` is used.
## Steps
- Confirm 3% slowdown of icon-exclaim with all 3 production experiments from MCH with mixed precision OpenACC compiled icon-nwp on balfrin
- Approach Florian and Marek from DWD to shape this project in details
- grep through icon-exclaim which parts are affected. Known parts: dycore, tracer advection, diffusion, ocean, turbulence, ...