ICON stencils on the Declarative GT4Py

# ICON stencils on the Declarative GT4Py ###### tags: `functional cycle 7` shaped by: Matthias, Hannes appetite: 1 CSCS (Rico) and 1 MCH (Matthias or Christoph) developer for second 1/2 of cycle ## Motivation As a starting point for the collaboration between CSCS and MCH and as a first test of the new toolchain we - implement selected stencils of the ICON dycore in the field view of the declarative GT4Py, and - integrate the generated code into the existing ICON infrastructure developed by MCH. Additionally, this project will allow knowledge transfer between the 2 teams: - from MCH to CSCS: - general knowledge about the ICON code (building, structure, etc) - details about the bindings of the generated code to Fortran - from CSCS to MCH: - the new DSL - deepen knowledge of the infrastructure (e.g. Eve and other utilities) ## Steps ### ICON build workshop Before the actual start of the project (i.e. in the first half of the cycle), we propose to organize a 1/2 day workshop for the whole team to build and test the ICON code. - building icon-dsl in verification and substitution mode - short overview of the integration and verification procedure ### Implementation #### Simple stencil in Python We select `mo_nh_diffusion_stencil_01` and `mo_nh_diffusion_stencil_05` which were already used in the hackathon 2021, see dusk and hackathon versions at the end of the file. As a very simple first step, we implement a stencil for which the reference can be computed with NumPy's advanced indexing and a simple triangular neighbortable. An example is available here https://github.com/havogt/gt4py/blob/icon_stencils_iterator_view/tests/iterator_tests/test_nh_diff_05.py. #### C++ code generator and bindings As a next step we use the same stencil to generate the C++ code and write Fortran bindings that allow to call the stencil from ICON Fortran. Here the developers should choose how to do validation in the next steps. a) Add temporarily some copy and validation statements in the ICON Fortran and keep the bindings minimal (i.e. just a direct call to the C++ stencil). We used this approach in the hackathon 2021. For inspiration, the bindings generator from the hackathon is here https://github.com/GridTools/gt4py/blob/icon_hackathon_cleanup/src/gtc_unstructured/irs/icon_bindings_codegen.py and a generated code example is at the end of the document. Note that the interface changed significantly, but the pieces are the same (e.g. the array views). b) Write the bindings such that they are compatible with the dawn validation approach (thicker bindings layer). Note: As a preparational step the Fortran integration could be tested in the following steps - with a simple copy stencil - with a simple neighbor reduction #### Conclusions for bindings design As a preparation for the next cycle, the conclusions from the previous step should be translated into a design for automatically generated bindings and (nice-to-have) a prototype could be developed. #### Performance comparison Simple performance comparison of the dawn generated code vs the code generated in this project. E.g. by running ICON through nvprof. Take care that - setup work (e.g. allocation of temporaries) are not measured - the same neighbor tables are used in both codes (in case there is a layout transformation involved) #### Optional: more stencils More stencils can be ported with a focus on testing more features: - field conditionals - `mo_velocity_advection_stencil_18` or `mo_velocity_advection_stencil_21` (more involved) - stencils that are fusable (this would allow us to test the inline transformation) - fusing `mo_nh_diffusion_stencil_16` with `mo_nh_diffusion_stencil_17` leads to a situation two subsequent reductions can be inlined - maybe mixing of staggered and non-staggered fields in the vertical - fusing `wrap_setup_mo_velocity_advection_stencil_05_a` and `wrap_setup_mo_velocity_advection_stencil_05_b` leads to such a situation Note that the `scan` will not be ready and therefore we should not try to implement a vertical solver. ### Appendix #### Selected stencils ##### mo_nh_diffusion_stencil_01 ###### dusk ```python= from dusk.script import * @stencil def icon_laplacian_diamond( diff_multfac_smag: Field[K], tangent_orientation: Field[Edge], inv_primal_edge_length: Field[Edge], inv_vert_vert_length: Field[Edge], u_vert: Field[Vertex, K], v_vert: Field[Vertex, K], primal_normal_x: Field[Edge > Cell > Vertex], primal_normal_y: Field[Edge > Cell > Vertex], dual_normal_x: Field[Edge > Cell > Vertex], dual_normal_y: Field[Edge > Cell > Vertex], vn: Field[Edge, K], kh_smag: Field[Edge, K], kh_smag_ec: Field[Edge, K], nabla2: Field[Edge, K], smag_limit: Field[K], smag_offset: Global ) -> None: vn_vert: Field[Edge > Cell > Vertex, K] dvt_tang: Field[Edge, K] dvt_norm: Field[Edge, K] kh_smag_1: Field[Edge, K] kh_smag_2: Field[Edge, K] with levels_upward: # fill sparse dimension vn vert using the loop concept with sparse[Edge > Cell > Vertex]: vn_vert = u_vert * primal_normal_x + v_vert * primal_normal_y # dvt_tang for smagorinsky dvt_tang = sum_over( Edge > Cell > Vertex, (u_vert * dual_normal_x) + (v_vert * dual_normal_y), weights=[-1.0, 1.0, 0.0, 0.0] ) dvt_tang = dvt_tang * tangent_orientation # dvt_norm for smagorinsky dvt_norm = sum_over( Edge > Cell > Vertex, u_vert * dual_normal_x + v_vert * dual_normal_y, weights=[0.0, 0.0, -1.0, 1.0], ) # compute smagorinsky kh_smag_1 = sum_over( Edge > Cell > Vertex, vn_vert, weights=[-1.0, 1.0, 0.0, 0.0] ) kh_smag_1 = (kh_smag_1 * tangent_orientation * inv_primal_edge_length) + ( dvt_norm * inv_vert_vert_length ) kh_smag_1 = kh_smag_1 * kh_smag_1 kh_smag_2 = sum_over( Edge > Cell > Vertex, vn_vert, weights=[0.0, 0.0, -1.0, 1.0] ) kh_smag_2 = (kh_smag_2 * inv_vert_vert_length) - ( dvt_tang * inv_primal_edge_length ) kh_smag_2 = kh_smag_2 * kh_smag_2 kh_smag = diff_multfac_smag * sqrt(kh_smag_2 + kh_smag_1) # compute nabla2 using the diamond reduction nabla2 = (sum_over( Edge > Cell > Vertex, vn_vert, weights=[1.0, 1.0, 0.0, 0.0], ) - 2.0 * vn) * (inv_primal_edge_length * inv_primal_edge_length) nabla2 = nabla2 + (sum_over( Edge > Cell > Vertex, vn_vert, weights=[0.0, 0.0, 1.0, 1.0], ) - 2.0 * vn) * (inv_vert_vert_length * inv_vert_vert_length) nabla2 = 4.0 * nabla2 kh_smag_ec = kh_smag kh_smag = max(0., kh_smag - smag_offset) kh_smag = min(kh_smag, smag_limit) ``` ###### GT4Py deprecated prototype ```python= def nh_diff_01( e2v: E2V, diff_multfac_smag: Field[K, dtype], tangent_orientation: Field[Edge, dtype], inv_primal_edge_length: Field[Edge, dtype], inv_vert_vert_length: Field[Edge, dtype], u_vert: Field[[Vertex, K], dtype], v_vert: Field[[Vertex, K], dtype], primal_normal_x: SparseField[E2V, dtype], primal_normal_y: SparseField[E2V, dtype], dual_normal_x: SparseField[E2V, dtype], dual_normal_y: SparseField[E2V, dtype], vn: Field[[Edge, K], dtype], kh_smag: Field[[Edge, K], dtype], kh_smag_ec: Field[[Edge, K], dtype], nabla2: Field[[Edge, K], dtype], smag_limit: Field[K, dtype], smag_offset: Field[K, dtype] ): with computation(FORWARD), interval(0, None): with location(Edge) as e: weights_tang = LocalField[E2V, dtype]([-1., 1., 0., 0.]) weights_norm = LocalField[E2V, dtype]([0., 0., -1., 1.]) weights_close = LocalField[E2V, dtype]([1., 1., 0., 0.]) weights_far = LocalField[E2V, dtype]([0., 0., 1., 1.]) # fill sparse dimension vn vert using the loop concept vn_vert = (u_vert[v] * primal_normal_x[e, v] + v_vert[v] * primal_normal_y[e, v] for v in e2v[e]) # dvt_tang for smagorinsky dvt_tang = sum(weights_tang[e, v]*((u_vert[v] * dual_normal_x[e, v]) + (v_vert[v] * dual_normal_y[e, v])) for v in e2v[e]) dvt_tang = dvt_tang * tangent_orientation # dvt_norm for smagorinsky dvt_norm = sum(weights_norm[e, v]*(u_vert[v] * dual_normal_x[e, v] + v_vert[v] * dual_normal_y[e, v]) for v in e2v[e]) # compute smagorinsky kh_smag_1 = sum(weights_tang[e, v]*vn_vert[e, v] for v in e2v[e]) kh_smag_1 = (kh_smag_1 * tangent_orientation * inv_primal_edge_length) + ( dvt_norm * inv_vert_vert_length) kh_smag_1 = kh_smag_1 * kh_smag_1 kh_smag_2 = sum(weights_norm[e, v]*vn_vert[e, v] for v in e2v[e]) kh_smag_2 = (kh_smag_2 * inv_vert_vert_length) - ( dvt_tang * inv_primal_edge_length) kh_smag_2 = kh_smag_2 * kh_smag_2 kh_smag = diff_multfac_smag * sqrt(kh_smag_2 + kh_smag_1) # compute nabla2 using the diamond reduction nabla2 = (sum(weights_close[e, v]*vn_vert[e, v] for v in e2v[e]) - 2.0 * vn) * (inv_primal_edge_length * inv_primal_edge_length) nabla2 = nabla2 + (sum(weights_far[e, v]*vn_vert[e, v] for v in e2v[e]) - 2.0 * vn) * (inv_vert_vert_length * inv_vert_vert_length) nabla2 = 4.0 * nabla2 kh_smag_ec = kh_smag kh_smag = max(0., kh_smag - smag_offset) kh_smag = min(kh_smag, smag_limit) ``` ##### mo_nh_diffusion_stencil_05 ###### dusk ```python= mo_nh_diffusion_stencil_05( z_nabla4_e2: Field[Edge, K], u_vert: Field[Vertex, K], v_vert: Field[Vertex, K], primal_normal_vert_v1: Field[Edge > Cell > Vertex], primal_normal_vert_v2: Field[Edge > Cell > Vertex], z_nabla2_e: Field[Edge, K], inv_vert_vert_length: Field[Edge], inv_primal_edge_length: Field[Edge], ): nabv_tang: Field[Edge, K] nabv_norm: Field[Edge, K] with domain.upward.across[nudging:halo]: nabv_tang = sum_over( Edge > Cell > Vertex, u_vert * primal_normal_vert_v1 + v_vert * primal_normal_vert_v2, weights=[1.0, 1.0, 0.0, 0.0], ) nabv_norm = sum_over( Edge > Cell > Vertex, u_vert * primal_normal_vert_v1 + v_vert * primal_normal_vert_v2, weights=[0.0, 0.0, 1.0, 1.0], ) z_nabla4_e2 = 4.0 * ( (nabv_norm - 2.0 * z_nabla2_e) * inv_vert_vert_length ** 2 + (nabv_tang - 2.0 * z_nabla2_e) * inv_primal_edge_length ** 2 ) ``` ###### GT4Py deprecated prototype ```python= def nh_diff_05( e2v: E2V, z_nabla4_e2: Field[[Edge, K], dtype], u_vert: Field[[Vertex, K], dtype], v_vert: Field[[Vertex, K], dtype], primal_normal_vert_v1: SparseField[E2V, dtype], primal_normal_vert_v2: SparseField[E2V, dtype], z_nabla2_e: Field[[Edge, K], dtype], inv_vert_vert_length: Field[Edge, dtype], inv_primal_edge_length: Field[Edge, dtype], ): with computation(FORWARD), interval(0, None): with location(Edge) as e: weights_0 = LocalField[E2V, dtype]([1., 1., 0., 0.]) weights_1 = LocalField[E2V, dtype]([0., 0., 1., 1.]) nabv_tang = sum(weights_0[e, v]*(u_vert[v]*primal_normal_vert_v1[e, v] + v_vert[v]*primal_normal_vert_v2[e, v]) for v in e2v[e]) nabv_norm = sum(weights_1[e, v]*(u_vert[v]*primal_normal_vert_v1[e, v] + v_vert[v]*primal_normal_vert_v2[e, v]) for v in e2v[e]) z_nabla4_e2 = 4.0*((nabv_norm[e] - 2.0*z_nabla2_e[e])*inv_vert_vert_length[e]*inv_vert_vert_length[e] + (nabv_tang[e] - 2.0*z_nabla2_e[e])*inv_primal_edge_length[e]*inv_primal_edge_length[e]) ``` #### Bindings example from 2021 hackathon ```cpp namespace icon_bindings_sten_impl { struct default_tag {}; template <int...> struct field_kind {}; // template<class Tag> using neigh_tbl_t = gridtools::fortran_array_view<int, 2, default_tag, false>; auto alloc_sten_impl(int n_edges, int n_k, neigh_tbl_t e2v) { return sten({-1, n_edges, -1, n_k}, icon::make_connectivity_producer<2>(e2v)); } BINDGEN_EXPORT_BINDING_WRAPPED(3, alloc_sten, alloc_sten_impl); // template<class Tag> using sten_t = decltype(alloc_sten_impl(0, 0, neigh_tbl_t /*<Tag>*/ {{}})); template <class T> void sten_impl(sten_t sten, gridtools::fortran_array_view<T, 1, field_kind<0>> field_in, gridtools::fortran_array_view<T, 1, field_kind<0>> field_out) { sten(field_in, field_out); } BINDGEN_EXPORT_GENERIC_BINDING_WRAPPED(3, sten, sten_impl, (double)); } // namespace icon_bindings_sten_impl ```