# Cartesian GTC: DaCe Optimizations
###### tags: `cycle 2`
shaped by: Linus
## Assigned
Only Part a
to Linus (full cycle) and Rico (second half of cycle) together with [Backend](https://hackmd.io/NhISzhkGRfq88HqHr7IHdQ).
## Summary
This project is a logical next step in the development of the DaCe integration to GT4Py. The goal is to start developing tools for using the previous work on OIR-level DaCe for optimizing GT4Py stencils.
The project is largely independent of but related to the "Cartesian GTC: DaCe backend" project. If it is not possible to allocate two developers to the projects, a single developer can implement parts a. of each project in the current cycle.
## Appetite
full cycle
possible developers:
- Linus, Felix, Anyone?
### Part a: API for Graph-based OIR Transformations
Currently, the implemented OIR optimizations try to merge nodes that are adjacent in their OIR representation. More candidates can exist however, which could easily be found in a graph representation.
#### Task: develop graph-based OIR-optimization API
For the existing (OIR-Visitor based transformations) create API to
1. produce candidates for merging in SDFG
2. test existing transformation's applicability check
3. apply the transformation if it is feasible
A possible API for this could look similar to this:
```python
class FuseStencils(PathTransformation):
pattern = [oir.HorizontalComputation, oir.HorizontalComputation]
def can_be_applied(self, a: oir.HorizontalComputation, b: oir.HorizontalComputation) -> bool:
if a.extents != b.extents:
return False
# ...
return True
def apply(self, sdfg: SDFG, a: oir.HorizontalComputation, b: oir.HorizontalComputation):
return oir.HorizontalComputation(body=a.body + b.body, extents=a.extents, ...)
```
which is similar to the DaCe transformation interface but tailored to the case of 2 merge candidates.
In this task, the goal is not to make a choice on what a good transformation would be but just to list valid options and apply them to the graph representation.
For a first step we can still apply transformation to all candidates eagerly.
### Part b: global, graph-based transformations
Using this API, better candidates can be generated for existing OIR transformations. To further use the graph representation, new transformations can be implemented (list not exhaustive):
* Transformation that splits the inner SDFG if possible, s.t. they can be put in separate `VerticalLoop` nodes. In its most simple form, this could target just independent graph components.
Example: In the following loop section, two completely independent sets of fields are being worked on, and it may be better to break them up.

* Reusing temporaries to reduce memory footprint. If multiple temporaries are allocated but only used once, and one will always happen after the other, it is possible to skip the second allocation and reuse the memory of the first. In this way, less memory will be used. DaCe has this kind of transformation for low level graphs already.
* Conversly, splitting transients that cause non-data dependency edges. This can help to enable more mergings or allow more computations to run concurrently.
Example: the temporary "res" implies that the `HorizontalExecution_101914` and`HorizontalExecution_101915` depend on each other. The code would also be valid if separate `res1` and `res2` were used and would allow for the two `HorizontalExecution`s to be executed concurrently.
