## Performance monitoring (brainstorming)
Set up an automatic system to track the performance history of GridTools-related projects (gridtools-cpp, gt4py, icon4py, pmap?, ...)
### Tasks
1. Create stable & reproducible environments for running automated benchmarks with different HW and system configuration (CI configuration?)
2. Create / select relevant benchmarks to track performance of the project (Project side)
3. Set up a system to store and query the gathered performance metrics
### Performance dashboards
- Options:
- Externally hosted:
- [bencher.dev](https://bencher.dev/) (example: https://bencher.dev/perf/servo)
- [codspeed.io](https://codspeed.io/) (example: https://codspeed.io/astral-sh/uv)
- Self-hosted
- [ELK](https://www.elastic.co/elastic-stack/) stack at CSCS.ch (@mikael)
- [bencher.dev](https://bencher.dev/)
- [python/codespeed](https://github.com/python/codespeed) (castor or similar at CSCS?) (example: https://speed.python.org/)
- Static (non-hosted):
- [airspeed-velocity](https://github.com/airspeed-velocity/asv?tab=readme-ov-file) (example: https://pv.github.io/numpy-bench/, ingestion snippet: https://github.com/flatsurf/conda-snippets/blob/master/asv/__init__.py)
- Questions:
- Automation features (CI alarms, ...)
- Visualization features
- Handling of raw data (import/export, update, ...)
- Access control (public access for external users, restricted access for private projects, ...)
- Maintenance costs
- ....
- Idea for evaluation: try to reproduce the current [gridtools-cpp custom performance dashboard](https://github.com/GridTools/gridtools/tree/master/pyutils/perftest) with some of the tools.
## Ideas for roadmap
- Option 1: Use StencilTests in `icon4py` to gather the initial performance data
- Easy to do with most performance regression frameworks due to pytest-benchmark integration or python APIs
- Option 2: Use GridTools C++ tests
## Evaluation
### Bencher
- Automation
- Can generate alerts based on thresholds/comparisons with past performance
- When alert is generated in the CI, the check fails in the PR
- Visualization
- Plots per benchmark
- Handling of raw data
- Easy use of pytest
- Handle extra data using json files
- Can be self-hosted (docker container)
- Access control
- Depends on self or externally hosted
- I think externally hosted projects are public
- Self hosted means that we need to take care of backups, etc
- Not possible according to my understanding without license [IM]
- Externally hosted is free for public projects, 0.01$ for every metric gathered for public/private projects if we want private projects
- Maintenance
- Depends on what we want to report on. For simple pytests looks ok
- Additional reports by use of json files
### Airspeed velocity
- Automation
- Has a list of recessions. Not built-in alerts in PR. Need manual work i.e. using `asv compare` (using already saved data) or `asv continuous` (using direct comparison of 2 branches/commits. this commands returns success or failure so it could be easily used in CI)
- Visualization
- Pretty nice to plot together multiple systems
- Not so clear when reporting multiple metrics and test when using pytest-benchmark
- Handling of raw data
- Everything uses their python API
- Can handle custom data
- Need to find a custom way to store the data. Can be elastic or github.io
- Access control
- Static website that can be hosted in github.io
- Maintenance
- Hacky way to run pytests: [git branch](https://github.com/iomaganaris/icon4py/tree/asv_tests)
### Codspeed
- Automation
- Generates alerts on PRs
- Visualization
- Nice plots
- Handling of raw data
- Use of pytest
- Not sure about custom data ?
- Access control
- Hosted on their website (public/private not sure how to select)
- Maintenance
- Use of pytests with extra decorator
- Seems more restrictive but seems like the UI/UX is nicer
## Progress
- [ ] icon4py
- [x] Run pytest benchmarks with the performance monitoring tools
- [x] Codspeed
- [x] Created [PR](https://github.com/C2SM/icon4py/pull/657) with PoC in icon4py
- Straightforward to use locally by taking advantage of pytest-benchmark
- For our CI (mirroring from GitHub to Gitlab), no full support. Although, with the upcoming release they will fix this case (Discord Discussion + [Open Issue](https://github.com/CodSpeedHQ/runner/issues/58))
- [x] Airspeed velocity
- [x] Created [PR](https://github.com/C2SM/icon4py/pull/661) in icon4py
- Automatically handle pytest benchmarks
- Comment on PR the state of the benchmarks
- Fail CI if there is a regression
- Push data to https://github.com/iomaganaris/icon4py_asv and plots to https://iomaganaris.github.io/icon4py_asv/
- Nice plots
- Manual implementation of lots of things
- [ ] Bencher
- [ ] [Historical Baseline](https://github.com/C2SM/icon4py/pull/666): Runs nightly & on push to main
- [x] Gathers historical data for main branch and [creates a baseline](https://bencher.dev/perf/icon4py) with which we compare the feature branches (open PRs)
- [x] Uploads the performance reports to Kibana (Elastic Search)
- [ ] [Feature Branches](https://github.com/C2SM/icon4py/pull/669): Runs on open PRs and compares with the baseline (failure above/below thresholds)
- [x] [Comment](https://github.com/C2SM/icon4py/pull/669#issuecomment-2663592827) for the Bencher Dashboard
- [x] [Comment](https://github.com/C2SM/icon4py/pull/669#issuecomment-2663592800) for the performance tables
- [x] Memray integration and other metrics [draft PR](https://github.com/C2SM/icon4py/pull/675)
- [x] Shorten test names to improve readability
- [x] Check feasibility of self-hosted instance
- [x] Minimize test overhead
- For now pytest-benchmark runtime statistics are enough. We can create more detailed reports when we can have more fine grained runtime measurement from gt4py
- [ ] Find larger grid for GPU execution
- [ ] Change Bencher cloud token to Bencher icon4py organization token instead of Christos' token
- [ ] Change github token to icon4py/gridtools organization token to allow commenting from Bencher to PR
- [ ] Use `Bencher` with `gt4py` *?*
- [ ] Use `Bencher` with `gridtools` *?*