# [Greenline] Benchmarks on granules
<!-- Add the tag for the current cycle number in the top bar -->
- Shaped by:
- Appetite (FTEs, weeks):
- Developers: <!-- Filled in at the betting table unless someone is specifically required here -->
## Problem
<!-- The raw idea, a use case, or something we’ve seen that motivates us to work on this -->
We run benchmark tests with [pytest-benchmark](https://pytest-benchmark.readthedocs.io/en/latest/) on simple and combined stencils. The results are uploaded to [bencher](https://bencher.dev/perf/icon4py) for continuous performance tracking.
We would like to add data there for entire granule runs, like the icon4py dycore and diffusion.
For these tests to become interesting and meaningful they should be run on grids sizes that saturate a GPU.
## Appetite
<!-- Explain how much time we want to spend and how that constrains the solution -->
## Solution
<!-- The core elements we came up with, presented in a form that’s easy for people to immediately understand -->
1. port the remaining static fields:
- `rbf_vec_coeff_v`: *usage*: diffusion and dycore
- `rbf_vec_coeff_e`: *usage*: dycore
- `rbf_vec_coeff_c`: *usage*: initial conditions
3. Setup benchmark test for diffusion and dycore:
1. load a large enough grid
4. compute the static fields for that grid: geometry, interpolation, metrics
5. initialize the dynamic data (random data or Jablonowski-Williamson idealized initial conditions)
6. run tests and upload pytest-benchmark result to bencher
### Details
Original computation in ICON can be found in `shr_horizontal/intp_rbf_coeffs.f90`.
- subroutine rbf_vec_compute_coeff_cell
- subroutine rbf_vec_compute_coeff_vertex
- subroutine rbf_vec_compute_coeff_edge
There is a draft [PR-709](https://github.com/C2SM/icon4py/pull/709) where the port was originally started.
The algorithm is something like:
1. build RBF interpolation matrices:
- This is an array of size `(dim, (rbf_vec_dim x rbf_vec_dim)` where `rbf_vec_dim` is the rank of the RBF matrix for the dimension (for example for cells it is 9, and in terms of neighbor relation uses all edges of the triangle `C2E2CO` so something like `C2E(C2E2CO)` - `ptr_int%rbf_vec_idx_e`)
- build the matrix:
```
a_ij = dot(pi, pj) * kernel (dist(i, j))
```
where `dist(i, j)` is distance between the edge centers and `pi`, `pj` are the normals to the edges. `kernel` is the RBF kernel to be used currently either Gaussian or inverse multiquadratic.
2. Apply Cholesky decomposition to all those matrices
3. solve for coefficients
4. build the rhs of the insterpolation
5. compute the vector coefficients
6. solve the linear system
[Notes](https://drive.google.com/file/d/1_fR4SM-_vduKMKAwOygPCtMhWn6dPhIK/view?usp=sharing)
## Extra tasks
- Create a ci plan in GT4Py that triggers the ICON4Py benchmarking with current GT4Py branch
## Rabbit holes
<!-- Details about the solution worth calling out to avoid problems -->
## No-gos
<!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable -->
## Progress
<!-- Don't fill during shaping. This area is for collecting TODOs during building. As first task during building add a preliminary list of coarse-grained tasks for the project and refine them with finer-grained items when it makes sense as you work on them. -->
- [ ] RBF interpolation ([PR#709](https://github.com/C2SM/icon4py/pull/709))
- [x] Global grid tests
- [x] Cell
- [x] Edge
- [x] Vertex
- [ ] Regional grid tests
- [ ] Cell
- [x] Edge
- [ ] Vertex
- [x] cupy support
- [ ] Single-node benchmark
## Scratch
- RBF (+ standalone single-node benchmark)
- Multi-node standalone
- Custom grid-generator/re-ordered grid
- allow more optimizations that require certain properties
- overlap with Michael's thesis
- Multi-node-aware GT4Py