# [Blueline] Benchmarking icon-exclaim - Shaped by: Till - Appetite (FTEs, weeks): - Developers: <!-- Filled in at the betting table unless someone is specifically required here --> TODO(Christoph): Write how much time of you is needed here ## Problem Remaining tasks from previous cycles to be able to measure also icon-exclaim performance. ## Appetite <!-- Explain how much time we want to spend and how that constrains the solution --> TODO(Christoph): Please fill time needed for the non GT4Py task. 2 days for the GT4Py task (temporary pass extension). ## Solution - Fused stencils integrated and validated (fusing itself is [a different project](https://hackmd.io/J0jTgdbKROeb0P-VaUOVDg)) - Stencils with output fields that have nlev and nlevp1 number of vertical layers need to be split into 2 field_operator calls (see below for approach). - Also separate istep 1 and 2 - ICON timers are added for fused stencils - Log parser works in CI (proposal `tail -c+0 --pid=$PID -f log.txt`) - Number of cells, edges, vertices used to allocate temporaries can be passed at runtime. Targeted approach: Extend the temporary pass to optionally use (configurable) fencil parameters for the sizes instead of resorting to offset providers given at compile time. - Fused stencils are tested with temporaries. working ones are enabled, non working ones disabled and feedback given to GT4Py team (e.g. Till). - Fused stencils can be enabled on a stencil-by-stencil basis Targeted approach for `nlev`, `nlevp1`: ```python @field_operator def fused_stencil(...) -> tuple[KField, Kp1Field]: ... @program def fused_program(...): fused_stencil(..., out=(k_field, kp1_field)) @field_operator def mo_velocity_advection_stencil_fused_restricted(...): return fused_stencil(...)[1] @program def mo_velocity_advection_stencil_fused_restricted(...): fused_stencil_restricted(..., out=kp1_field) # bounds are not important as only used in blue-line # alternative doesn't work out of the box, but should be simple to change in icon4pygen @program def fused_program_blueline(...): fused_stencil(..., out=(k_field, kp1_field), domain={K: (0, nlev), ...}) fused_stencil_restricted(..., out=kp1_field, domain={K: (nlev, nlev+1), ...}) ``` Fortran approach: ```fortran !$DSL START STENCIL(name=mo_velocity_advection_stencil_fused_restricted; ...; vertical_lower=nlevp1; vertical_upper=nlevp1;) ! Top and bottom levels !$ACC PARALLEL IF(i_am_accel_node) DEFAULT(PRESENT) ASYNC(1) !$ACC LOOP GANG VECTOR !DIR$ IVDEP DO je = i_startidx, i_endidx p_diag%vn_ie(je,nlevp1,jb) = & p_metrics%wgtfacq_e(je,1,jb)*p_prog%vn(je,nlev,jb) + & p_metrics%wgtfacq_e(je,2,jb)*p_prog%vn(je,nlev-1,jb) + & p_metrics%wgtfacq_e(je,3,jb)*p_prog%vn(je,nlev-2,jb) ENDDO !$ACC END PARALLEL !$DSL END STENCIL(name=mo_velocity_advection_stencil_fused_restricted) !$DSL START STENCIL(name=mo_velocity_advection_stencil_fused; ...; vertical_lower=1; vertical_upper=nlev;) ... !$DSL END STENCIL(name=mo_velocity_advection_stencil_fused) ``` Previous projects: - [Fused Liskov continuation and fused stencils ](https://hackmd.io/iQsw7AbCSBC2q5OYSZpTrg) - [CI: Performance measurement](https://hackmd.io/doZ9l59LTYejEuUWig4yfw) - [ICON dycore optimizations ](https://hackmd.io/We6afwwhTeaZwFvnqJxJ3g) ## Rabbit holes <!-- Details about the solution worth calling out to avoid problems --> ## No-gos <!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable --> ## TODO 1. [ ] Split stencils with nlev and nlevp1 vertical layers into 2 field_operator calls (see approach below). - [ ] Separate istep 1 - [ ] Separate istep 2 2. [ ] Add ICON timers for fused stencils. 3. [x] Ensure the log parser works in CI 4. [x] Allow the number of cells, edges, and vertices used to allocate temporaries to be passed at runtime. - [x] Extend the temporary pass in gt4py to optionally use configurable stencil parameters for sizes. - [x] Extend icon4py to pass grid size parameters. - [x] Extend icon-exclaim CMake to turn on and off temporaries. - [x] Pass Domain args to all programs 5. [ ] Test fused stencils with temporaries. Enable working ones, disable non-working ones, and provide feedback to GT4Py team (e.g., Till). 6. [ ] Implement the ability to enable fused stencils on a stencil-by-stencil basis.