Performance monitoring

## Performance monitoring (brainstorming) Set up an automatic system to track the performance history of GridTools-related projects (gridtools-cpp, gt4py, icon4py, pmap?, ...) ### Tasks 1. Create stable & reproducible environments for running automated benchmarks with different HW and system configuration (CI configuration?) 2. Create / select relevant benchmarks to track performance of the project (Project side) 3. Set up a system to store and query the gathered performance metrics ### Performance dashboards - Options: - Externally hosted: - [bencher.dev](https://bencher.dev/) (example: https://bencher.dev/perf/servo) - [codspeed.io](https://codspeed.io/) (example: https://codspeed.io/astral-sh/uv) - Self-hosted - [ELK](https://www.elastic.co/elastic-stack/) stack at CSCS.ch (@mikael) - [bencher.dev](https://bencher.dev/) - [python/codespeed](https://github.com/python/codespeed) (castor or similar at CSCS?) (example: https://speed.python.org/) - Static (non-hosted): - [airspeed-velocity](https://github.com/airspeed-velocity/asv?tab=readme-ov-file) (example: https://pv.github.io/numpy-bench/, ingestion snippet: https://github.com/flatsurf/conda-snippets/blob/master/asv/__init__.py) - Questions: - Automation features (CI alarms, ...) - Visualization features - Handling of raw data (import/export, update, ...) - Access control (public access for external users, restricted access for private projects, ...) - Maintenance costs - .... - Idea for evaluation: try to reproduce the current [gridtools-cpp custom performance dashboard](https://github.com/GridTools/gridtools/tree/master/pyutils/perftest) with some of the tools. ## Ideas for roadmap - Option 1: Use StencilTests in `icon4py` to gather the initial performance data - Easy to do with most performance regression frameworks due to pytest-benchmark integration or python APIs - Option 2: Use GridTools C++ tests ## Evaluation ### Bencher - Automation - Can generate alerts based on thresholds/comparisons with past performance - When alert is generated in the CI, the check fails in the PR - Visualization - Plots per benchmark - Handling of raw data - Easy use of pytest - Handle extra data using json files - Can be self-hosted (docker container) - Access control - Depends on self or externally hosted - I think externally hosted projects are public - Self hosted means that we need to take care of backups, etc - Not possible according to my understanding without license [IM] - Externally hosted is free for public projects, 0.01$ for every metric gathered for public/private projects if we want private projects - Maintenance - Depends on what we want to report on. For simple pytests looks ok - Additional reports by use of json files ### Airspeed velocity - Automation - Has a list of recessions. Not built-in alerts in PR. Need manual work i.e. using `asv compare` (using already saved data) or `asv continuous` (using direct comparison of 2 branches/commits. this commands returns success or failure so it could be easily used in CI) - Visualization - Pretty nice to plot together multiple systems - Not so clear when reporting multiple metrics and test when using pytest-benchmark - Handling of raw data - Everything uses their python API - Can handle custom data - Need to find a custom way to store the data. Can be elastic or github.io - Access control - Static website that can be hosted in github.io - Maintenance - Hacky way to run pytests: [git branch](https://github.com/iomaganaris/icon4py/tree/asv_tests) ### Codspeed - Automation - Generates alerts on PRs - Visualization - Nice plots - Handling of raw data - Use of pytest - Not sure about custom data ? - Access control - Hosted on their website (public/private not sure how to select) - Maintenance - Use of pytests with extra decorator - Seems more restrictive but seems like the UI/UX is nicer ## Progress - [ ] icon4py - [x] Run pytest benchmarks with the performance monitoring tools - [x] Codspeed - [x] Created [PR](https://github.com/C2SM/icon4py/pull/657) with PoC in icon4py - Straightforward to use locally by taking advantage of pytest-benchmark - For our CI (mirroring from GitHub to Gitlab), no full support. Although, with the upcoming release they will fix this case (Discord Discussion + [Open Issue](https://github.com/CodSpeedHQ/runner/issues/58)) - [x] Airspeed velocity - [x] Created [PR](https://github.com/C2SM/icon4py/pull/661) in icon4py - Automatically handle pytest benchmarks - Comment on PR the state of the benchmarks - Fail CI if there is a regression - Push data to https://github.com/iomaganaris/icon4py_asv and plots to https://iomaganaris.github.io/icon4py_asv/ - Nice plots - Manual implementation of lots of things - [ ] Bencher - [ ] [Historical Baseline](https://github.com/C2SM/icon4py/pull/666): Runs nightly & on push to main - [x] Gathers historical data for main branch and [creates a baseline](https://bencher.dev/perf/icon4py) with which we compare the feature branches (open PRs) - [x] Uploads the performance reports to Kibana (Elastic Search) - [ ] [Feature Branches](https://github.com/C2SM/icon4py/pull/669): Runs on open PRs and compares with the baseline (failure above/below thresholds) - [x] [Comment](https://github.com/C2SM/icon4py/pull/669#issuecomment-2663592827) for the Bencher Dashboard - [x] [Comment](https://github.com/C2SM/icon4py/pull/669#issuecomment-2663592800) for the performance tables - [x] Memray integration and other metrics [draft PR](https://github.com/C2SM/icon4py/pull/675) - [x] Shorten test names to improve readability - [x] Check feasibility of self-hosted instance - [x] Minimize test overhead - For now pytest-benchmark runtime statistics are enough. We can create more detailed reports when we can have more fine grained runtime measurement from gt4py - [ ] Find larger grid for GPU execution - [ ] Change Bencher cloud token to Bencher icon4py organization token instead of Christos' token - [ ] Change github token to icon4py/gridtools organization token to allow commenting from Bencher to PR - [ ] Use `Bencher` with `gt4py` *?* - [ ] Use `Bencher` with `gridtools` *?*