# [ICON4Py] Benchmarking (2nd try)
<!-- Add the tag for the current cycle number in the top bar -->
- Shaped by: Philip, Magdalena
- Appetite (FTEs, weeks):
- Developers: <!-- Filled in at the betting table unless someone is specifically required here -->
## Problem
<!-- The raw idea, a use case, or something we’ve seen that motivates us to work on this -->
We want to automatically track performance in ICON4Py. To this aim Benchmark runs for entire granules and stencils should be run regularly and the result uploaded to bencher.
For the stencil performance there is a custom repo where the DaCe performance engineers have been using to track performance of the dace backend for ICON4Py stencils and compare them to the OpenACC icon baseline. This should become obsolete and the functionality moved to ICON4Py repository.
## Appetite
<!-- Explain how much time we want to spend and how that constrains the solution -->
## Solution
<!-- The core elements we came up with, presented in a form that’s easy for people to immediately understand -->
### granules benchmark
- Diffusion see [PR-747](https://github.com/C2SM/icon4py/pull/747)
- add analogous test for dycore
- Run them on the Benchmark grids:
- [MCH CH2][icon-ch2] production
- [MCH medium size][mch-medium]
- [R02B07][R02B07]
- The backend should run in non-blocking (async mode) and a sync add after the granule.run
- run the test in CI and upload to [bencher]
### StencilTest s
We want to re-use the benchmark option of the StencilTest to reproduce the setup of the Benchmark-repo:
- add compile time args to the StencilTest
- allow the StencilTest to run several code paths, that is run a test for each relevant compile time variant
- Allow parametrization for the stencil test with the relevant compile time (or also runtime variants), all parametrized tests should also run and pass verification mode. (For verification that might require fixing the numpy references.)
- Separate verification runs from benchmark runs in StencilTests: (the reason for this is taht we want to use separate grid for validation and for benchmarking()
- Stencil Benchmarks should run on
- [Medium sized grid for single node MCH][mch-medium] run
- [MCH ICON-CH2][icon-ch2] production grid
- [R02B07][R02B07]
- Verification should run on
- [MCH-ICON2-Small][icon-ch2-small]
- [R02B04][R02B04]
- customize backend for single stencil timings:
- make sure caching is enabled and first run (compilation!) is excluded from the benchmark run (benchmark) might already do some warm up runs)
- make sure the backend does not launch async but waits at the end of a stencil run (or run async backend and wait in the benchmark code)
- for other options - see the backend parametrization in the [Benchmark-repo][benchmark-repo]
- configure pytest benchmark fixture to our needs. (possible already done)
### bencher
- Run with dace and gtfn backends and the benchmark grids (see above)
- reset the data in [bencher][bencher]
- add the OpenAcc baseline (could be added as a [pytest benchmark][pytest-benchmark] run to the repo,then use --benchmark-compare)
## Rabbit holes
<!-- Details about the solution worth calling out to avoid problems -->
## No-gos
<!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable -->
[icon-ch2]:https://github.com/C2SM/icon4py/blob/6c97904d8bf3363703f1fc4fdfe4b673d79025ef/model/testing/src/icon4py/model/testing/definitions.py#L82
[mch-medium]:https://github.com/C2SM/icon4py/blob/6c97904d8bf3363703f1fc4fdfe4b673d79025ef/model/testing/src/icon4py/model/testing/definitions.py#L98
[R02B07]: https://github.com/C2SM/icon4py/blob/6c97904d8bf3363703f1fc4fdfe4b673d79025ef/model/testing/src/icon4py/model/testing/definitions.py#L74
[R02B04]: https://github.com/C2SM/icon4py/blob/6c97904d8bf3363703f1fc4fdfe4b673d79025ef/model/testing/src/icon4py/model/testing/definitions.py#L65
[icon-ch2-small]: https://github.com/C2SM/icon4py/blob/6c97904d8bf3363703f1fc4fdfe4b673d79025ef/model/testing/src/icon4py/model/testing/definitions.py#L90
[bencher]:https://bencher.dev/console/organizations/c2sm/projects
[pytest-benchmark]:https://pytest-benchmark.readthedocs.io/en/latest/
[benchmark-repo]:https://github.com/philip-paul-mueller/benchmark_2
## Progress
<!-- Don't fill during shaping. This area is for collecting TODOs during building. As first task during building add a preliminary list of coarse-grained tasks for the project and refine them with finer-grained items when it makes sense as you work on them. -->
- [x] Task 1 ([PR#xxxx](https://github.com/GridTools/gt4py/pulls))
- [x] Subtask A
- [x] Subtask X
- [ ] Task 2
- [x] Subtask H
- [ ] Subtask J
- [ ] Discovered Task 3
- [ ] Subtask L
- [ ] Subtask S
- [ ] Task 4
## SCRATCH
| scope \ | data test | standalone |
|:------------ | --------- | ---------- |
| unit test | -k 'datatest' | StencilTest |
| granule test | -k 'datatest' | Yilu's PR |
----
```python
backend_variant1 = make_custom_backend(param1=True, param2="foo")
backend_variant2 = make_custom_backend(param1=True, param2="foo")
# run with: pytest ... --backend model.backends.module_foo.backend_variant1 --backend variant2
class SampleStencilTestForDace(StencilTest):
PROGRAM = helpers.average_two_vertical_levels_downwards_on_cells
OUTPUTS = ("average")
STATIC_ARGS = {
"variant1": None,
"variant2": ("flag", "horizontal_start")
}
# second step
# @pytest.fixture
# def backend(backend):
# if backend == dace_gpu:
# return backend_variant1
# return backend
@staticmethod
def reference(
connectivities: dict[gtx.Dimension, np.ndarray],
input_field: np.ndarray,
**kwargs: Any,
) -> dict:
shp = input_field.shape
res = 0.5 * (input_field + np.roll(input_field, shift=-1, axis=1))[:, : shp[1] - 1]
return dict(average=res)
@pytest.fixture
def input_data(self, grid: base.Grid) -> dict:
input_field = data_alloc.random_field(grid, dims.CellDim, dims.KDim, extend={dims.KDim: 1})
flag = 1
horizontal_start = grid.get_horizontal_start(...)
result = data_alloc.zero_field(grid, dims.CellDim, dims.KDim)
return dict(
input_field=input_field,
average=result,
flag=1
horizontal_start=horizontal_start,
horizontal_end=gtx.int32(grid.num_cells),
vertical_start=gtx.int32(0),
vertical_end=gtx.int32(grid.num_levels),
)
```