# [Blueline] Towards a spack build of icon-exclaim - Shaped by: Christoph, Daniel, Will - Appetite (FTEs, weeks): 3 - Developers: <!-- Filled in at the betting table unless someone is specifically required here --> ## Problem <!-- The raw idea, a use case, or something we’ve seen that motivates us to work on this --> We need a more seamless build system and would like to leverage off of the ICON uenv put together by Rico, which includes `icon4py`. This would eliminate at least one of the current scripts, making the build system more maintainable. ## Appetite <!-- Explain how much time we want to spend and how that constrains the solution --> Full Cycle for 1/2 FTE. ## Solution <!-- The core elements we came up with, presented in a form that’s easy for people to immediately understand --> ### Preliminary testing of current icon-exclaim.git:icon-dsl Preliminary tests indicate that `build_gpu2py` builds with uenv 25.2v3 out of the box on both Balfrin and Santis. Simple tests (e.g., mch_icon-ch1/2_small) run and verify. In addition, with some minor configuration changes `exclaim_ape_R02B04` (a global test case) also runs and verifies. However, * on Santis `mch_icon-ch1` does not run (OOM error) as it should on 2 nodes, will run on 3 nodes, but it crashes with a horizontal CFL violation after about 570 steps. We suspect similar behavior on Balfrin; * `mch_icon-ch2` runs on 2 nodes (OOM on 1 node), but has a horizontal CFL violation after about 270 steps. Same behavior on Balfrin. We suspect a bug in the `vn` calculation, which needs to be addressed in a separate task. ### Proposed solution for next cycle As a first step, we will attempt to use the new uenv of Rico for icon dependencies, thus avoiding the need of build_dependencies.sh. We will then use the same `setup.sh` script for `build_gpu2py`. We verify with the tests (above) we know are working. In a second step, we will write a first version of a spack package for icon-exclaim with dsl variant, i.e., it has a DSL variant in `configure.ac` and `configure`. It is not clear at this point what exactly we want to expose to Spack. Finally, in the third step, coordinate with Rico to put the full Spack-enabled solution into the uenv. ## Rabbit holes <!-- Details about the solution worth calling out to avoid problems --> We will potentially have problems with the new ICON uenv. The resolution of these won't be part of this task, but we will keep Rico informed. ## No-gos <!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable --> ## Current status 08.08.25 There is a branch with some workarounds, see: https://github.com/C2SM/icon-exclaim/pull/366 Be aware that updating with icon-dsl will break compatibilty with icon4py. ### Säntis Using ` uenv start icon-dsl/25.8:1965073287 --view=default` | build | builds | runs (mch_icon-ch2_small) | runs (mch_icon-ch2) | runs (mch_icon-ch1) | runs (mch_kenda-ch1) | | ------------------- | ------ | ------------------------- | ------------------- | ------------------- | -------------------- | | build_cpu | yes | yes | - | - | - | | build_acc | yes | yes | - | - | - | | build_gpu2py_verify | yes | yes | - | - | - | | build_gpu2py | yes | yes | almost, `horizontal CFL number exceeded` and ran into timelimit after 307 time steps | no, OOM | no, OOM | | build_cpu2py_verify | yes | no, needs manually setting nblocks_e=1, error: `2025-08-08 14:16:43.277 - ERROR - A Python error occurred: ERROR: failed to allocate GPU memory` | - | - | - | | build_cpu2py | yes | no, needs manually setting nblocks_e=1, error: `2025-08-08 14:16:43.277 - ERROR - A Python error occurred: ERROR: failed to allocate GPU memory` | - | - | - | - | ### Balfrin | build | builds | runs (mch_icon-ch2_small) | runs (mch_icon-ch2) | runs (mch_icon-ch1) | runs (mch_kenda-ch1) | | ------------------- | ------ | ------------------------- | ------------------- | ------------------- | -------------------- | | build_cpu | yes | yes | - | - | - | | build_acc | yes | yes | - | - | - | | build_gpu2py_verify | yes | no, verification is slightly off but all above 95% and the crashes with `GTL_DEBUG: [1] cudaEventQuery: an illegal memory access was encountered` | - | - | - | | build_gpu2py | yes | no, error: `FINISH PE: 0 mo_nh_vert_interp_ipz:z_at_plevels: Error in computing interpolation coefficients` | error in diffusion: `2025-08-08 14:00:48.215 - ERROR - A Python error occurred: cudaErrorIllegalAddress: an illegal memory access was encountered` | ? | ? | | build_cpu2py_verify | yes | no, needs manually setting nblocks_e=1, error: `2025-08-08 14:16:43.277 - ERROR - A Python error occurred: ERROR: failed to allocate GPU memory` | - | - | - | | build_cpu2py | yes | no, needs manually setting nblocks_e=1, error: `2025-08-08 14:16:43.277 - ERROR - A Python error occurred: ERROR: failed to allocate GPU memory` | - | - | - |