Extension of full program optimization to the entire dynamical core

# Extension of full program optimization to the entire dynamical core ## Goal In order to reach our performance goal we need to enable full program optimization in the entire dynamical core. The preliminary results for the DaCe powered full program optimization showed results that are promising enough that we expect that to come close to our goal if we can expose the full dycore to our full program code ## Tasks In order to expose the full dycore to our full program optimization framework we need to tackle the following tasks: ### 1. Enable parsing of regions code #### Prior work Linus' work on regions in the DaCe backend [here](https://github.com/gronerl/gt4py/commit/5b58abee6de2886867a6006a34f7e2efa21576cc) #### Problem An implementation of regions in the DaCe backend would allow us to represent that code in a way that is not plain numpy and would be a significant step towards entire dycore parsing. Our current workaround is not sustainable as we are diverging further away from the code currently in master. #### Solution Push the implementation over the finish line by finishing up with the missing pieces of the extent analysis #### Goals The **minimal goal** until the end of the year is to have a stable, validating version of this code in our develop branch that is tagged and used in AI2's master-branch. The **ideal goal** is that this code is in gt4py/master. ### 2. Enable verified parsing of halo codes #### Prior work Right now we have an implementation of the DaCe haloexchange module that has working non-interface halo exchanges on CPU. This contains both vector as well as single data point data. #### Problem The two challenges we're facing here is to extend this framework to also do the right thing for vector interface halo exchanges - we have an implementation for it that does not validate now - as well as fixing the framework to also make the GPU haloexchanges pass. #### Solution Both main challenges are just bug-fixes in the implementation. For the interfaces the issues are on the dace side where we either change how dace works or represent the target differently in the SDFG. For GPU validation we do not fully understand what the issue is yet, it seems to be related to how the code is generated though. #### Goals The **minimal goal** is to have a validating version of the `DaceHaloExchanger` that can replace our current framework for halo exchanges based on the mode in which we execute our codes The **ideal goal** is that we have an interface layer built on top of both halo exchange frameworks that - with a common interface that is used in our dycore code and does the switch behind the scene. ### 3. Enable final features in GT4Py master #### Prior Work The code for while loops has an implementation that is in a [PR](https://github.com/GridTools/gt4py/pull/422) - and an equivalent on how to do it in the DaCe backend. The code for indirect addressing is already in gt4py master but does not have a DaCe implementation in master yet. We have a prototype for this that works. #### Problem The codes are not in the most mature state yet as they were meant as vertical prototypes. #### Solution We need cycles to clean up the PRs to meet the standards we have for code in the master-branch #### Goals The **ideal goal** is to push the while-loop code (with all backends) to master and push the indirect addressing code for the DaCe backend to master ## No-Goals This project does not aim for a convergence with the fv3core master-branch just yet. We are still in the phase where we allow ourselves to be in rapid-development that might break validation on the way - we want to move as quickly as possible to see validating full program codes by the end of the cycle ## Other issues Since we're not aiming for a convergence with the master-branch very soon we need to make sure that someone monitors what is happening there to ensure that we're not finding ourselves too far away once our demonstrator should finally make its way to master