# Project Plan Tom
## Goal
The main goal of this project is to improve the state of the radiation code in pace.
This project has two main focus areas:
- Use all the new features available in the radiation code in an efficient way
- Improve the performance of the code by applying transformation in the front-end moving more from the plain fortran transpliation to a more optimized version
## Performance
### GPU port
The biggest win on the performance side would be to have a code that can run GPU-enabled.
#### Smaller Problem Size
Andrew was already able to run a subset of the code on GPU, but presumably because the driver was not ported the code was really slow.
> The subset of the code that could be run on the GPU at present was about 2.5x slower than on the CPU. Obviously improvements should be possible.
For an easy win at the start it would be good to establish that performance baseline again and understand where the slowdown is coming from.
#### Reduction of temporaries
As Andrew noted: *"Currently some of the stencils have so many arguments that they exceed a hardware memory limit on the GPU and cannot be run."*
With more flexibility to make fields temporary variables we are able to reduce the argument count.
This would mean interaction with Johann to make sure we're using temporaries the right way - allowing for both higher dimensional temporaries as well as accesses in the extra dimension that are on the fly variables.
#### Stencil splitting
If this does not suffice, we should follow strategies similar to what we've done in microphysics and split stencils up manually.
Stencil splitting can be done in a variety of ways:
- Whenever splitting is required to reduce the argument count we can just set hard limits on how many different fields we allow to be accessed in each stencil (naive forward process)
- We can generate a graph of each access and have dependencies on fields be represented as edges and find a solution that minimizes outgoing edges
### Full code performance baseline
In order to establish a fair baseline and explore the potential we have it would be interesting to get performance numbers on the three relevant variations of the code:
1. CPU baseline in Fortran
for this we just need a run of the same problem. This was already done by Andrew, so we know it is possible. Just would be a process of re-establishing it.
2. CPU baseline in gt4py
We know this is already possible, but slow. We want a way to reproduce these numbers.
3. GPU baseline in gt4py
If we were successful with the GPU port, this would be the logical next number to look at
### Improving performance
#### FrozenStencil
One of the biggest factors in terms of performance was the use of FrozenStencil to reduce the python overhead we find. We could / should apply this to the code in an early stage
#### DaCe
It would be a very nice experiment to see how much we can do with dace in a physics package. The big question here is if higher dimensional fields are already supported in the GTC:Dace backend. This we would need to investigate if we want to go that route.
#### Other improvement
By now we have a pretty good arsenal of tools to our disposal to make gt4py code perform better:
- Reducing the memory pressure by minimizing the amount of temporaries needed
- Reordering statements to improve data locality
- Using features more efficiently
- are there external constants that we can bake into compilation?
- are there if statements that we can make compile time?
- how well is the `PARALLEL` keyword used?
## Other goals
In order to make the radiation scheme more usable for us, we would like the state of it to improve. This means tackling a few core problems:
### Code structure
- Right now the testing infrastructure is embedded in the main code workflow. Ideally we would separate the two
- The code does not have the same interface as all the other packages in `physics_standalone` - something we should fix
### Validation
https://jenkins.ginko.ch/job/radiation_standalone/ is failing now. We need to get it back in order
## Notes
For the code see [here](https://github.com/ai2cm/physics_standalone/tree/radiation_dev/radiation)
For Andrew's readme see [here](https://docs.google.com/document/d/1m5UEtJuYb1w2iyQF4S2rgpRXDFkqqUQhS_8Bwu2cSVI/edit?usp=sharing)
For the jenkins plan see [here](https://jenkins.ginko.ch/job/radiation_standalone/)