# Blue-line packaging + testing ###### tags: `cycle 14` Developers: Abishek, Christoph, Jonas Appetite: 3 weeks ## Goals - Spack can build one branch of icon-exclaim in all three modes - CPU, OpenACC and GT4Py - on Daint and Balfrin. - Enable spack-buildbot tests for all three modes - Finalize verification of the exclaim aquaplanet experiments <!-- The raw idea, a use case, or something we’ve seen that motivates us to work on this --> ## Dependencies ## Known tasks - Spack build for icon-exclaim - [x] Basic build working on Daint (Abishek) - [x] Basic build working on Balfrin (Christoph) - [x] Remove hacks/overrides (all) - [ ] Add spack build variants for dsl (Abishek, Christoph) - [x] DEP: `icon-liskov` merged into `icon-dsl` - [x] DEP: `master` (rename?) synced up with recent icon-nwp - [ ] DEP: `icon-dsl` merged into `master` - [x] Decide variants needed and setup - [ ] Initial testing of all DSL variants - [x] Resolve issue with blas/lapack dependencies (all) - Possibilities - openblas with gt4py, nvidia-blas/lapack with icon - **remove scipy from gt4py (Enrique)** - possibly link scipy/numpy as external in spack - Adding a development environment (Jonas, Christoph) - Need some research. Consult with Enrique - Jonas has a PR (on spack?) with a few possibilities on doing this, but it involves copying the files from dev folder to spack install prefix. But perhaps it's possible to make pip do this instead? - Testing with spack - [ ] Add test to spack-c2sm testing (Jonas) - [x] Test spack icon gt4py build when developing gt4py/icon4py (Jonas) - [x] enable testing of all buildbot tests via spack (Jonas) - Some will pass for dsl, deactivate rest temporarily ``` Failing tests: ----------------- dwd_run_ICON_09_R2B4N5_EPS out of memory allocating 36400000 bytes of device memory run with srun -n 16 --ntasks-per-node 2 mch_bench_r19b07_dev_sppt FATAL ERROR in mo_nh_vert_interp_ipz:z_at_plevels: Error in computing interpolation coefficients mch_ch_lowres FAILED_TESTS: tolerance mch_opr_r04b07 *** Error in scratch/snx3000/juckerj/EXCLAIM/dsl/icon-exclaim/bin/icon': corrupted size vs. prev_size: 0x0000000069cf9ad0 *** mch_opr_r04b07_lhn_00 FAILED_TESTS: mpi tolerance mch_opr_r04b07_lhn_12 FAILED_TESTS: mpi tolerance mch_opr_r04b07_sstice_inst horizontal CFL number exceeded (in divide_flux_area_list) at: je = 1437 jk = 79 jb = 1 lon(deg)= 10.00 la t(deg)= 45.71 vn(m/s)=********** vt(m/s)=********** mch_opr_r19b07_lpi call to cuStreamSynchronize returned error 700: Illegal address during kernel execution mch_opr_r19b07_turb FAILED_TESTS: mpi tolerance Passing tests: ----------------- atm_ape PASSED_TESTS: base restart nproma mpi tolerance atm_heldsuarez PASSED_TESTS: base restart nproma mpi ``` - Model Verification (Abishek) - [ ] Setup probtesting for `exclaim_ape_R2B05` - [ ] short report summarizing verification status - In progress here https://unlimited.ethz.ch/display/EXCLAIM/Aquaplanet+Verification+and+Validation ## More TODO - [ ] Re-enable metadata generation in Liskov (Sam, Abishek, Jonas) ## Rabbit holes - Need to balance time spent in research + work on development environment, with how much time we will spend in the future developing the blue-line. ## No-gos - We will not try to get all buildbot tests passing with GT4Py dycore - We don't plan to go into what other formalities may be required to release to the wider user community. This will involves top-level discussion re. support for older versions of gt4py, etc.