# Summary ###### tags: `cycle 16` <!-- Change to the current cycle number --> Currently, we are only running the DaCe backend of gt4py.next on a subset of internal FieldView frontend tests. We do not understand well enough how complete the class of supported stencils is. The goal of this project is to run the diffusion of icon4py to assess what is needed to run it and if a full support is possible, establish a performance baseline. - Appetite (FTEs, weeks): 3 weeks - Developers: <!-- Filled in at the betting table unless someone is specifically required here --> # Status at week 31 Summary of results on this task: * 111 testcases in ``model/atmosphere/dycor``: 67 pass, 44 xfail * 32 xfail: tuple-returns are not supported in DaCe * SPCL is adding support for nested data in this [PR](https://github.com/spcl/dace/pull/1324) * 12 xfail: neighbour-reduction is missing support for lambda as reduction input * we could use a map with lambda nested-SDFG and a WCR-memlet * Total measured time with DaCe backend in each benchmark test is 1-2 seconds * profiling with ``pytest-benchmark`` includes all gt4py steps * AST parsing, ITIR transformations, DaCe transformations and validation, code generation, and finally program execution * use DaCe CLI-tool (``daceprof``) for profiling of SDFG execution * one drawback: testcases with in/out parameters will fail for #repetitions > 1 ``` e.g.: daceprof --warmup 10 --csv --module pytest -v --disable-warnings --benchmark-disable model/atmosphere/dycore ``` * Quick try of [npbench](https://github.com/spcl/npbench) cpu transformations on Hohgant cpu node * compare a) initial simplified sdfg, b) fusion, c) parallel and d) auto-opt * daceprof: warm-up 10 runs, profiling 100 runs, 16 failures because of in/out parameters * no noticeable improvement in execution time (median) compared to initial sdfg * note: small problem size in icon4py benchmarks (mean/median < 1 ms) * Quick try of gpu execution * small change required, pushed to [dev](https://github.com/edopao/gt4py/tree/dace-gpu) branch * functional verification OK, still no optimization for gpu * icon4py benchmarks pass, only 1 failure in gt4py tests (``test_trivial``)