[ICON4Py] Benchmarking infrastructure

# [ICON4Py] Benchmarking infrastructure - Shaped by: Till, Enrique - Appetite (FTEs, weeks): - Developers:  ## Problem 1. The StencilTests are currently running on unrepresentative grids. 2. We don't have a continuous benchmarking setup for representative use-cases, i.e., MCH production, ICON-Exclaim users. As a result we often don't know how good or bad or overall performance actually is and what setup is required for it. ## Appetite  ## Solution  ### 1. StencilTests with larger grids Extend the bencher CI pipeline to execute on a larger grid. For this only the grid files are needed. See [here](https://github.com/C2SM/icon4py/blob/main/model/testing/src/icon4py/model/testing/pytest_config.py#L58) for the relevant fixture. Please make sure the testbed(s) in bencher are named nicely. __TODO: What grids?__ global grid: R02B07 local (LAM): grid from MCH-CH2, and there is one smaller grid from an older MCH experiment which has roughly half the size of the one of MCH-CH2. (should be more similiar to the production setup for a single node run) ### 2. Blue-line with granule 1. Move the log parsing functionality from `icon-exclaim-perf-tools` into a new module `icon4py.tools.icon_log_utils` in icon4py. - [log import functionality](https://github.com/C2SM/icon-exclaim-perf-tools/blob/master/src/icon_exclaim_perf_tools/log_import.py) -> `icon4py.tools.icon_log_utils.importer` - [database schema](https://github.com/C2SM/icon-exclaim-perf-tools/blob/master/src/icon_exclaim_perf_tools/db/schema.py) -> `icon4py.tools.icon_log_utils.schema` 2. Add a new pipeline that executes the blue-line. The job should be parameterized by: - Experiment - Build type: openacc, granule Make sure to coordinate this effort with the people working on / up-to-date on: the uenv (Rico) & CI (Christoph). If this hasn't progressed enough to be used right away be pragmatic: just install according to the build instructions https://github.com/C2SM/icon-exclaim/tree/icon-dsl/dsl until then, but be careful and don't burn to much CI time here. 3. Parse the log file using `icon4py.tools.icon_log_utils.importer.import_model_run_log_from_file`, export the relevant output of a `IconRun` instance into a json file and upload it to bencher. Ioannis & Christos have the knowledge on what format is expected (see also [this PR](https://github.com/C2SM/icon4py/pull/675/files)). ## Rabbit holes  ## No-gos  ## Progress ### Greenline - [ ] Make sure the historical timeline is correct after moving to larger grids for the benchmarks ### Blueline - [ ] Double check which system should be used to run the CI job: Santis (with FirecREST) or Balfrin (with FirecREST or Jenkins?) - [ ] Check the easiest way to build a base environment to run the Blue line: Matthieu's uenv / [Christoph's build scripts](https://github.com/C2SM/icon-exclaim/blob/icon-dsl/dsl/install_dependencies.sh) ## Discussion 20.5.2025 - Christoph - Enrique - Magdalena ## Global (1 GPU) - R02B07 with initialization routines from JW (greenline) - R02B06 for blueline (R02B07) does not fit on 1 GPU ### initialization - static fields: - geometry, interpolation coefficients, metrics computed via factory - vct_a: computed for all experiments - topography: - read from `extpar` netcdf file:`float topography_c(cell)` for MCH test cases - dynamics: - initialization functions of JW (those are also used to initialize APE in Blueline) - ## Local (1 GPU) | Experiment | Description | Grid size | Grid file | extpar file | |:------------------- |:----------------------------------------------------------------------------------- |:------------ |:----------------------------------- |:--------------------------------------------------------- | | mch_icon-ch2-small | verification of mch_icon-ch2 | 11 k cells | mch_opr_r4b7_DOM01.nc | external_parameter_icon_mch_opr_r4b7_DOM01_tiles.nc | | mch_icon-ch2 | 2 km production / 5 day forecast | 283876 cells | icon-2/icon_grid_0002_R19B07_mch.nc | icon-2/external_parameter_icon_grid_0002_R19B07_mch.nc | | mch_icon-ch1_medium | old experiment with medium sized grid, revived by Christoph to benchmark single GPU | 44528 cells | opr_r19b08/domain1_DOM01.nc | opr_r19b08/external_parameter_icon_domain1_DOM01_tiles.nc | Path for grid files : `/scratch/mch/jenkins/icon/pool/data/ICON/mch/grids`: - /scratch/mch/jenkins/icon/pool/data/ICON/mch/grids/opr_r19b08/domain1_DOM01.nc - /scratch/mch/jenkins/icon/pool/data/ICON/mch/grids/icon-2/icon_grid_0002_R19B07_mch.nc For icon2_small: /scratch/mch/icontest/testing-input-data/c2sm/icon-2_small/ ## CI jobs - Balfrin software stack: firecREST? uenv-v2? - Setups: - Minimal: - Use Santis for everything - Fair: - Global experiments on Santis - Local experiments on Balfrin - Ideal: - Both experiments on both machines - Environments: - Try if we can use the Mathieu's uenv on both machines (`icon/25.2:v3`?) - Ask Will - Starting point: - Take the CI job running ICON-exclaim on Santis, extract performance data from the timer report and upload it to bencher