# Optimization of- and Performance Experiments on gt4py Dycore ###### tags: `cycle 14` Developers: - Support: - Appetite: 1 week ## Background The complete icon dycore is now ported to gt4py. It is observed that out of the box, the gt4py dycore is slightly slower than the OpenACC dycore, c.f. [report](https://docs.google.com/presentation/d/186OmCQ7Qhno9HW8jPdc0MBB2uNVAbRfdsrHXmZAohaw/edit#slide=id.p). It is important to understand what is going on ## Goals * [This report](https://docs.google.com/presentation/d/186OmCQ7Qhno9HW8jPdc0MBB2uNVAbRfdsrHXmZAohaw/edit?usp=sharing) shows that the fused diffusion does not outperform the stencil-by-stencil diffusion (like it was the case on tsa) * Try the temporary heuristics to see if the situation improves * If not, try to garner some understanding of what is going on ## Non-Goals * Detailed nsight-compute investigations for each stenicl are too time intensive for this task, especially for the first goal. * If nsight-compute is employed, time-box the efforts