# Performance of GT4Py ICON dycore
###### tags: `functional cycle 10`
Developers: Matthias
Appetite: 1 week
## Report
There is a very short performance comparison in this [google doc](https://docs.google.com/document/d/1wtCraqP7apEeqy4YY-MiyYYpKylzMl1LF5wJACs7jY4/edit?usp=sharing). There were performance issues at first, both on the dawn and gt4py side. The performance issues for dawn were explained by improperly set horizontal subdomains, see [PR](https://github.com/C2SM/icon-exclaim/pull/16). The performance issues on the gt4py side were due to missing neighbor checks which have been addressed [here](https://github.com/C2SM/icon4py/pull/48) and [here](https://github.com/GridTools/gridtools/pull/1721).
For now, the performance using gt4py is equal to the dusk / dawn performance up to measurement error. It might make sense to repeat this task after the inclusion of stencils using the `scan` feature and/or vertical index fields.
---
## Problem
The already gt4py translated and integrated dry dycore stencils have to be timed to make sure the performance is comparable to dusk/dawn.
Bigger discrepancies need to be documented and explored in follow-up tasks.
## Background
For the dry dycore there already exists a DSL translation and integration using dusk/dawn which produces low level CUDA code with rather easy to understand performance properties.
With the dusk/dawn version and the OpenACC code to compare to, it is possible to assess the performance of the integrated gt4py stencils.
## Appetite
1 week
## Known steps
Compare nvprof (nsys) output for gt4py and dusk/dawn translated/integrated stencils.
## Things to keep in mind
All gt4py stencils appear in the timeline with the name "kernel". It is not yet clear how to work around that.
## Potential Rabbit Holes
This is just an exploration task, fixing possible performance issues is not part of it.