# [ICON4Py] Benchmarking infrastructure
- Shaped by: Till, Enrique
- Appetite (FTEs, weeks):
- Developers: <!-- Filled in at the betting table unless someone is specifically required here -->
## Problem
1. The StencilTests are currently running on unrepresentative grids.
2. We don't have a continuous benchmarking setup for representative use-cases, i.e., MCH production, ICON-Exclaim users. As a result we often don't know how good or bad or overall performance actually is and what setup is required for it.
## Appetite
<!-- Explain how much time we want to spend and how that constrains the solution -->
## Solution
<!-- The core elements we came up with, presented in a form that’s easy for people to immediately understand -->
### 1. StencilTests with larger grids
Extend the bencher CI pipeline to execute on a larger grid. For this only the grid files are needed. See [here](https://github.com/C2SM/icon4py/blob/main/model/testing/src/icon4py/model/testing/pytest_config.py#L58) for the relevant fixture. Please make sure the testbed(s) in bencher are named nicely.
__TODO: What grids?__
global grid: R02B07
local (LAM): grid from MCH-CH2, and there is one smaller grid from an older MCH experiment which has roughly half the size of the one of MCH-CH2. (should be more similiar to the production setup for a single node run)
### 2. Blue-line with granule
1. Move the log parsing functionality from `icon-exclaim-perf-tools` into a new module `icon4py.tools.icon_log_utils` in icon4py.
- [log import functionality](https://github.com/C2SM/icon-exclaim-perf-tools/blob/master/src/icon_exclaim_perf_tools/log_import.py) -> `icon4py.tools.icon_log_utils.importer`
- [database schema](https://github.com/C2SM/icon-exclaim-perf-tools/blob/master/src/icon_exclaim_perf_tools/db/schema.py) -> `icon4py.tools.icon_log_utils.schema`
2. Add a new pipeline that executes the blue-line. The job should be parameterized by:
- Experiment
- Build type: openacc, granule
Make sure to coordinate this effort with the people working on / up-to-date on: the uenv (Rico) & CI (Christoph). If this hasn't progressed enough to be used right away be pragmatic: just install according to the build instructions https://github.com/C2SM/icon-exclaim/tree/icon-dsl/dsl until then, but be careful and don't burn to much CI time here.
3. Parse the log file using `icon4py.tools.icon_log_utils.importer.import_model_run_log_from_file`, export the relevant output of a `IconRun` instance into a json file and upload it to bencher. Ioannis & Christos have the knowledge on what format is expected (see also [this PR](https://github.com/C2SM/icon4py/pull/675/files)).
## Rabbit holes
<!-- Details about the solution worth calling out to avoid problems -->
## No-gos
<!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable -->
## Progress
### Greenline
- [ ] Make sure the historical timeline is correct after moving to larger grids for the benchmarks
### Blueline
- [ ] Double check which system should be used to run the CI job: Santis (with FirecREST) or Balfrin (with FirecREST or Jenkins?)
- [ ] Check the easiest way to build a base environment to run the Blue line: Matthieu's uenv / [Christoph's build scripts](https://github.com/C2SM/icon-exclaim/blob/icon-dsl/dsl/install_dependencies.sh)
## Discussion 20.5.2025
- Christoph
- Enrique
- Magdalena
## Global (1 GPU)
- R02B07 with initialization routines from JW (greenline)
- R02B06 for blueline (R02B07) does not fit on 1 GPU
### initialization
- static fields:
- geometry, interpolation coefficients, metrics computed via factory
- vct_a: computed for all experiments
- topography:
- read from `extpar` netcdf file:`float topography_c(cell)` for MCH test cases
- dynamics:
- initialization functions of JW (those are also used to initialize APE in Blueline)
-
## Local (1 GPU)
| Experiment | Description | Grid size | Grid file | extpar file |
|:------------------- |:----------------------------------------------------------------------------------- |:------------ |:----------------------------------- |:--------------------------------------------------------- |
| mch_icon-ch2-small | verification of mch_icon-ch2 | 11 k cells | mch_opr_r4b7_DOM01.nc | external_parameter_icon_mch_opr_r4b7_DOM01_tiles.nc |
| mch_icon-ch2 | 2 km production / 5 day forecast | 283876 cells | icon-2/icon_grid_0002_R19B07_mch.nc | icon-2/external_parameter_icon_grid_0002_R19B07_mch.nc |
| mch_icon-ch1_medium | old experiment with medium sized grid, revived by Christoph to benchmark single GPU | 44528 cells | opr_r19b08/domain1_DOM01.nc | opr_r19b08/external_parameter_icon_domain1_DOM01_tiles.nc |
Path for grid files : `/scratch/mch/jenkins/icon/pool/data/ICON/mch/grids`:
- /scratch/mch/jenkins/icon/pool/data/ICON/mch/grids/opr_r19b08/domain1_DOM01.nc
- /scratch/mch/jenkins/icon/pool/data/ICON/mch/grids/icon-2/icon_grid_0002_R19B07_mch.nc
For icon2_small:
/scratch/mch/icontest/testing-input-data/c2sm/icon-2_small/
## CI jobs
- Balfrin software stack: firecREST? uenv-v2?
- Setups:
- Minimal:
- Use Santis for everything
- Fair:
- Global experiments on Santis
- Local experiments on Balfrin
- Ideal:
- Both experiments on both machines
- Environments:
- Try if we can use the Mathieu's uenv on both machines (`icon/25.2:v3`?)
- Ask Will
- Starting point:
- Take the CI job running ICON-exclaim on Santis, extract performance data from the timer report and upload it to bencher