[ICON4Py] Benchmarking infrastructure

# [ICON4Py] Benchmarking infrastructure - Shaped by: Enrique, Magdalena - Appetite (FTEs, weeks): - Developers:  ## Problem The benchmarking infrastructure in ICON4Py has gone through several improvements in the last cycles but it still needs more work: 1. The (single-stencil) `StencilTest`s are currently running on unrepresentative grids. 2. We don't have a continuous benchmarking setup for representative use-cases, i.e., MCH production, ICON-Exclaim users. As a result we often don't know how good or bad or overall performance actually is and what setup is required for it. ## Goal The final goal is to bring it to a state where it continuously measures the relevant data to support the tuning of the GT4Py/DaCe pipeline to achieve the best performance. The following table shows an overview of the benchmarking experiments/ setups that we want to target: | Repository | System | Experiment | #GPUs | Grid file | Remarks | |:------------ |:------- |:------------------------------------------------------- |:-----:|:---------------------------- |:-------------- | | icon4py | Santis | StencilTests / exclaim_ape_R2B07 | 1 | icon_grid_R02B07_G.nc | | | icon4py | Santis | StencilTests / mch_icon-ch2 | 1 | icon_grid_0002_R19B07_mch.nc | if it fits... | | icon4py | Santis | Granule / exclaim_ape_R2B06 | 1 | icon_grid_R02B06_G.nc | G-B comparison | | icon4py | Santis | Granule / mch_icon-ch1_medium | 1 | opr_r19b08/domain1_DOM01.nc | G-B comparison | | icon-exclaim | Santis | exclaim_ape_R2B07 (essentially *new* exclaim_ape_R2B06) | 1 | icon_grid_R02B06_G.nc | G-B comparison | | icon-exclaim | Santis | mch_icon-ch1_medium | 1 | opr_r19b08/domain1_DOM01.nc | G-B comparison | | icon-exclaim | Balfrin | mch_icon-ch1_medium | 1 | opr_r19b08/domain1_DOM01.nc | | | icon-exclaim | Balfrin | mch_icon-ch2 | 4 | icon_grid_0002_R19B07_mch.nc | | ## Tasks ### 1. StencilTests with larger grids - Refactor testing infrastructure in ICON4Py (WIP [PR #768](https://github.com/C2SM/icon4py/pull/768/) by Enrique) (estimated 1-2 weeks of work) - Switch to use grids mentioned above (half a day of work once the refactor is finished) - clean out "old" bencher testbeds. Please make sure the testbed(s) in bencher are named nicely. - Expand parametrizations and benchmarks switches to satisfy the requirements of [DaCe benchmarks](https://hackmd.io/QhXmRJJUQMCllpdvTar1Sg) ### 2. Greenline standalone granule benchmark runs Setup benchmark tests for granules in icon4py. Contrary to the datatests they should run in "standalone mode": only requiring a grid file and a configuration. The benchmark results should be published to bencher. And use the grid and configuration (from run scripts / namelist in icon-exclaim) mentioned above. The standalone tests should be added to bencher Refactoring of the infrastructure and setup of standalone tests for greenline granules can be done in parallel, using the "old" grids until the refactoring is merged. #### Initialization of the standalone tests - static fields: - IconGrid from gridfile via grid_manager - geometry, interpolation coefficients, metrics computed via factory - vct_a: computed for all experiments - use function in vertical.py - topography: - read from `extpar` netcdf file:`float topography_c(cell)` for MCH (LAM)test cases, for APE it is all zeros - dynamics: - initialization functions of JW (those are also used to initialize APE in Blueline) All relevant grid files, except for icon_grid_R02B06.nc are already on polybox and can be accessed from the URLs in `model/testing/src/icon4py/model/testing/definitions.py` Original location of the local area grids in CSCS file systems are collected. The table below lists the extpar file which we need to read the topography array from for the MCH experiments. | Experiment | Description | Grid size | Grid file | extpar file | |:------------------- |:-------------------------------------------------------- |:------------ |:----------------------------------- |:--------------------------------------------------------- | | mch_icon-ch2-small | verification of mch_icon-ch2 | 11 k cells | mch_opr_r4b7_DOM01.nc | external_parameter_icon_mch_opr_r4b7_DOM01_tiles.nc | | mch_icon-ch2 | 2 km production / 5 day forecast | 283876 cells | icon-2/icon_grid_0002_R19B07_mch.nc | icon-2/external_parameter_icon_grid_0002_R19B07_mch.nc | | mch_icon-ch1_medium | old experiment with medium grid to benchmark single GPU | 44528 cells | opr_r19b08/domain1_DOM01.nc | opr_r19b08/external_parameter_icon_domain1_DOM01_tiles.nc | Path for grid files : `/scratch/mch/jenkins/icon/pool/data/ICON/mch/grids`: - /scratch/mch/jenkins/icon/pool/data/ICON/mch/grids/opr_r19b08/domain1_DOM01.nc - /scratch/mch/jenkins/icon/pool/data/ICON/mch/grids/icon-2/icon_grid_0002_R19B07_mch.nc For icon2_small: /scratch/mch/icontest/testing-input-data/c2sm/icon-2_small/ ### 3. Blue-line with granule The main task is to set up the missing benchmark experiments from the table above to run on CI and upload its result to Bencher. Extra/optional enhancements: - `uv` to run the `icon-exclaim-perf-tools` script in CI instead of creating a venv from the - Check if it is already possible to switch from Jenkins to CSCS-CI on Balfrin - Find out where is the documentation for the ICON CI scripts in the repo and update it with the new experiments (or document them from scratch if there isn't anything). ## Rabbit holes  ## No-gos  ## Progress ### Greenline - [ ] refactor infrastructure - [ ] relevant grid files are available on CI - [ ] bencher testbeds cleaned and adapted - [ ] standalone pytest-benchmark test for diffusion - [ ] add the test to bencher - [ ] standalone pytest-benchmark test for dycore - [ ] add the test to bencher ### Blueline - [ ] Double check which system should be used to run the CI job: Santis (with FirecREST) or Balfrin (with FirecREST or Jenkins?) - [ ] Check the easiest way to build a base environment to run the Blue line: Matthieu's uenv / [Christoph's build scripts](https://github.com/C2SM/icon-exclaim/blob/icon-dsl/dsl/install_dependencies.sh)