# [DaCe] Optimization X
<!-- Optimizer-dza-dza! -->
- Shaped by: Philip
- Appetite (FTEs, weeks):
- Developers:
## Problem
We continue the _Long Optimization March_ that was started a long time ago.
This cycle is a little bit different since now the whole dycore is there.
In the last cycle it was noticed that DaCe is slower then GTFN, which is in turn slower than OpenACC.
The reason why DaCe is slower than GTFN is currently unknown.
Thus, the first step would be to figuring out why this is the case[^OtherProject].
A possible reason for the bad performance could be the anomaly seen in [`dy_15_to_28_predictor`](https://docs.google.com/presentation/d/1yYmSMU5JPHMUyor98ITKZsFTD0d0L7ULH1mZKuZn45s/edit?slide=id.g372d046ca7c_1_12#slide=id.g372d046ca7c_1_12), however, it is not clear if it is the only issue.
Otherwise the optimization on `dy_{39, 41}_to_60` stencil should be continued, since their speedups are very slow and they take up a significant amount of proportion.

Regarding this stencil, there are two aspects that should be further investigated.
First, there is an AccessNode that is not split.
This is because there is a special rule (if the producer is a Map then the whole output must be consumed) in the splitter that prevents that.
The rule was created to simplify the implementation or rather, ensure that some cases does not show up.
It should be investigated if this rule can be removed.
Most likely it is needed that it can be removed and the reason why it was not done in the first place, should be fixed.
The second point, which is has been [shaped separately](https://hackmd.io/ot702TuJT7OJXp07WveiPA) is a special form of inlining.
This case appears in the dycopre and ICON-FORTRAN contains an `#ifdef` to switch to the inlined implementation on GPU.
Furthermore, SPCL mentioned that they have experienced that the transformation is important.
#### Left Overs
A list of standing issues in DaCe and/or GT4Py can be found [here](https://hackmd.io/tZ3BKzwNTlWwv81fW2H2ww).
In previous cycle this was part of the respective shaping document but it was now decided that it should become its own document.
## Appetite
As long as it takes.
## Solution
[Shut up and optimize!](https://en.wikipedia.org/wiki/N._David_Mermin)
To figuring out what the makes the DaCe powered dycore slower one should probaly generate a timeline using `nsys` or similar.
Afterwards the stencils should be further optimized.
Either one should start with optimizing `dy_15_to_28_predictor` or continue where [Ioannis left](https://hackmd.io/KBSF3NsVS66ENDHVG_83bQ).
## No-Go
## Rabbit holes
Uncountable.
## Progress
<!-- Don't fill during shaping. This area is for collecting TODOs during building. As first task during building add a preliminary list of coarse-grained tasks for the project and refine them with finer-grained items when it makes sense as you work on them. -->
- [x] Task 1 ([PR#xxxx](https://github.com/GridTools/gt4py/pulls))
- [x] Subtask A
- [x] Subtask X
- [ ] Task 2
- [x] Subtask H
- [ ] Subtask J
- [ ] Discovered Task 3
- [ ] Subtask L
- [ ] Subtask S
- [ ] Task 4
<!--=====================================-->
[^OtherProject]: It seems that [another project](https://hackmd.io/bBD94d-wRGqYLo_cJ77EaA) has been shaped about that.