[ICON4Py] DaCe benchmark in ICON4Py

# [ICON4Py] DaCe benchmark in ICON4Py - Shaped by: Edoardo - Appetite (FTEs, weeks): - Developers:  ## Problem  The SDFG transformations are currently being tested and profiled in a separate benchmark [repo](https://github.com/philip-paul-mueller/benchmark_2). The goal is to use ICON4Py for benchmarking. This benchmark should consider all variants of a precompiled stencil, focusing on the runtime of each stencil assuming synchronous execution. ## Appetite  ## Solution  The project is not fully shaped yet and it depends on the [`StencilTests` refactoring task](https://hackmd.io/nkaT0BGFRgevuDMW2t78xw?both#1-StencilTests-with-larger-grids) from the [[ICON4Py] Benchmarking infrastructure](https://hackmd.io/@gridtools/S1dAIqMSxl) project. The benchmark [repo](https://github.com/philip-paul-mueller/benchmark_2) provides a large degree of flexibility, which is difficult to provide in icon4py production repository. However, some requirements should be met: - It should be possible to run each stencil individually, not only as part of a granule or driver test. - The benchmark should use some meaningful grid, with large size. - It should be possible to run a battery of benchmarks and collect the results though Bencher. - It should be possible to run the same benchmark on both gtfn and dace backends. For measuring the stencil time, we could use the performance metrics provided by GT4PY backends (see PR [#1978](https://github.com/GridTools/gt4py/pull/1978)). This performance metric measures the total time spent in stencil execution, considering both host and device compiled code. ## Rabbit holes  ## No-gos  ## Progress  - [x] Task 1 ([PR#xxxx](https://github.com/GridTools/gt4py/pulls)) - [x] Subtask A - [x] Subtask X - [ ] Task 2 - [x] Subtask H - [ ] Subtask J - [ ] Discovered Task 3 - [ ] Subtask L - [ ] Subtask S - [ ] Task 4