# [Blueline] Benchmarking icon-exclaim
- Shaped by: Till
- Appetite (FTEs, weeks):
- Developers: <!-- Filled in at the betting table unless someone is specifically required here -->
TODO(Christoph): Write how much time of you is needed here
## Problem
Remaining tasks from previous cycles to be able to measure also icon-exclaim performance.
## Appetite
<!-- Explain how much time we want to spend and how that constrains the solution -->
TODO(Christoph): Please fill time needed for the non GT4Py task.
2 days for the GT4Py task (temporary pass extension).
## Solution
- Fused stencils integrated and validated (fusing itself is [a different project](https://hackmd.io/J0jTgdbKROeb0P-VaUOVDg))
- Stencils with output fields that have nlev and nlevp1 number of vertical layers need to be split into 2 field_operator calls (see below for approach).
- Also separate istep 1 and 2
- ICON timers are added for fused stencils
- Log parser works in CI (proposal `tail -c+0 --pid=$PID -f log.txt`)
- Number of cells, edges, vertices used to allocate temporaries can be passed at runtime. Targeted approach: Extend the temporary pass to optionally use (configurable) fencil parameters for the sizes instead of resorting to offset providers given at compile time.
- Fused stencils are tested with temporaries. working ones are enabled, non working ones disabled and feedback given to GT4Py team (e.g. Till).
- Fused stencils can be enabled on a stencil-by-stencil basis
Targeted approach for `nlev`, `nlevp1`:
```python
@field_operator
def fused_stencil(...) -> tuple[KField, Kp1Field]:
...
@program
def fused_program(...):
fused_stencil(..., out=(k_field, kp1_field))
@field_operator
def mo_velocity_advection_stencil_fused_restricted(...):
return fused_stencil(...)[1]
@program
def mo_velocity_advection_stencil_fused_restricted(...):
fused_stencil_restricted(..., out=kp1_field) # bounds are not important as only used in blue-line
# alternative doesn't work out of the box, but should be simple to change in icon4pygen
@program
def fused_program_blueline(...):
fused_stencil(..., out=(k_field, kp1_field), domain={K: (0, nlev), ...})
fused_stencil_restricted(..., out=kp1_field, domain={K: (nlev, nlev+1), ...})
```
Fortran approach:
```fortran
!$DSL START STENCIL(name=mo_velocity_advection_stencil_fused_restricted; ...; vertical_lower=nlevp1; vertical_upper=nlevp1;)
! Top and bottom levels
!$ACC PARALLEL IF(i_am_accel_node) DEFAULT(PRESENT) ASYNC(1)
!$ACC LOOP GANG VECTOR
!DIR$ IVDEP
DO je = i_startidx, i_endidx
p_diag%vn_ie(je,nlevp1,jb) = &
p_metrics%wgtfacq_e(je,1,jb)*p_prog%vn(je,nlev,jb) + &
p_metrics%wgtfacq_e(je,2,jb)*p_prog%vn(je,nlev-1,jb) + &
p_metrics%wgtfacq_e(je,3,jb)*p_prog%vn(je,nlev-2,jb)
ENDDO
!$ACC END PARALLEL
!$DSL END STENCIL(name=mo_velocity_advection_stencil_fused_restricted)
!$DSL START STENCIL(name=mo_velocity_advection_stencil_fused; ...; vertical_lower=1; vertical_upper=nlev;)
...
!$DSL END STENCIL(name=mo_velocity_advection_stencil_fused)
```
Previous projects:
- [Fused Liskov continuation and fused stencils
](https://hackmd.io/iQsw7AbCSBC2q5OYSZpTrg)
- [CI: Performance measurement](https://hackmd.io/doZ9l59LTYejEuUWig4yfw)
- [ICON dycore optimizations
](https://hackmd.io/We6afwwhTeaZwFvnqJxJ3g)
## Rabbit holes
<!-- Details about the solution worth calling out to avoid problems -->
## No-gos
<!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable -->
## TODO
1. [ ] Split stencils with nlev and nlevp1 vertical layers into 2 field_operator calls (see approach below).
- [ ] Separate istep 1
- [ ] Separate istep 2
2. [ ] Add ICON timers for fused stencils.
3. [x] Ensure the log parser works in CI
4. [x] Allow the number of cells, edges, and vertices used to allocate temporaries to be passed at runtime.
- [x] Extend the temporary pass in gt4py to optionally use configurable stencil parameters for sizes.
- [x] Extend icon4py to pass grid size parameters.
- [x] Extend icon-exclaim CMake to turn on and off temporaries.
- [x] Pass Domain args to all programs
5. [ ] Test fused stencils with temporaries. Enable working ones, disable non-working ones, and provide feedback to GT4Py team (e.g., Till).
6. [ ] Implement the ability to enable fused stencils on a stencil-by-stencil basis.