# Performance of GT4Py ICON dycore ###### tags: `functional cycle 10` Developers: Matthias Appetite: 1 week ## Report There is a very short performance comparison in this [google doc](https://docs.google.com/document/d/1wtCraqP7apEeqy4YY-MiyYYpKylzMl1LF5wJACs7jY4/edit?usp=sharing). There were performance issues at first, both on the dawn and gt4py side. The performance issues for dawn were explained by improperly set horizontal subdomains, see [PR](https://github.com/C2SM/icon-exclaim/pull/16). The performance issues on the gt4py side were due to missing neighbor checks which have been addressed [here](https://github.com/C2SM/icon4py/pull/48) and [here](https://github.com/GridTools/gridtools/pull/1721). For now, the performance using gt4py is equal to the dusk / dawn performance up to measurement error. It might make sense to repeat this task after the inclusion of stencils using the `scan` feature and/or vertical index fields. --- ## Problem The already gt4py translated and integrated dry dycore stencils have to be timed to make sure the performance is comparable to dusk/dawn. Bigger discrepancies need to be documented and explored in follow-up tasks. ## Background For the dry dycore there already exists a DSL translation and integration using dusk/dawn which produces low level CUDA code with rather easy to understand performance properties. With the dusk/dawn version and the OpenACC code to compare to, it is possible to assess the performance of the integrated gt4py stencils. ## Appetite 1 week ## Known steps Compare nvprof (nsys) output for gt4py and dusk/dawn translated/integrated stencils. ## Things to keep in mind All gt4py stencils appear in the timeline with the name "kernel". It is not yet clear how to work around that. ## Potential Rabbit Holes This is just an exploration task, fixing possible performance issues is not part of it.