Continue: ICON stencils on the Declarative GT4Py

# Continue: ICON stencils on the Declarative GT4Py ###### tags: `functional cycle 8` shaped by: Matthias, Hannes appetite: half cycle developers: developers who are free in the 2nd half (+ Peter (EXCLAIM) with mentoring) ## Motivation The previous project was only partially completted with a simple point-wise stencil. After the ICON bindings are completed (or partly in parallel), we propose to finish this project. ## Implementation ### Simple stencil in Python Implement `mo_nh_diffusion_stencil_01` and `mo_nh_diffusion_stencil_05` first in pure Python, with validation against a NumPy solution, see https://github.com/havogt/gt4py/blob/icon_stencils_iterator_view/tests/iterator_tests/test_nh_diff_05.py. ### Integration into ICON with generated bindings With the result of the ICON bindings project, we can integrate these stencils into ICON. ### Performance comparison Simple performance comparison of the dawn generated code vs the code generated in this project. E.g. by running ICON through nvprof. Take care that - setup work (e.g. allocation of temporaries) are not measured - the same neighbor tables are used in both codes (in case there is a layout transformation involved) ### Optional: more stencils More stencils can be ported with a focus on testing more features: - field conditionals - `mo_velocity_advection_stencil_18` or `mo_velocity_advection_stencil_21` (more involved) - stencils that are fusable (this would allow us to test the inline transformation) - fusing `mo_nh_diffusion_stencil_16` with `mo_nh_diffusion_stencil_17` leads to a situation two subsequent reductions can be inlined - maybe mixing of staggered and non-staggered fields in the vertical - fusing `wrap_setup_mo_velocity_advection_stencil_05_a` and `wrap_setup_mo_velocity_advection_stencil_05_b` leads to such a situation Note that the `scan` will not be ready and therefore we should not try to implement a vertical solver. ### Appendix #### Selected stencils ##### mo_nh_diffusion_stencil_01 ###### dusk ```python= from dusk.script import * @stencil def icon_laplacian_diamond( diff_multfac_smag: Field[K], tangent_orientation: Field[Edge], inv_primal_edge_length: Field[Edge], inv_vert_vert_length: Field[Edge], u_vert: Field[Vertex, K], v_vert: Field[Vertex, K], primal_normal_x: Field[Edge > Cell > Vertex], primal_normal_y: Field[Edge > Cell > Vertex], dual_normal_x: Field[Edge > Cell > Vertex], dual_normal_y: Field[Edge > Cell > Vertex], vn: Field[Edge, K], kh_smag: Field[Edge, K], kh_smag_ec: Field[Edge, K], nabla2: Field[Edge, K], smag_limit: Field[K], smag_offset: Global ) -> None: vn_vert: Field[Edge > Cell > Vertex, K] dvt_tang: Field[Edge, K] dvt_norm: Field[Edge, K] kh_smag_1: Field[Edge, K] kh_smag_2: Field[Edge, K] with levels_upward: # fill sparse dimension vn vert using the loop concept with sparse[Edge > Cell > Vertex]: vn_vert = u_vert * primal_normal_x + v_vert * primal_normal_y # dvt_tang for smagorinsky dvt_tang = sum_over( Edge > Cell > Vertex, (u_vert * dual_normal_x) + (v_vert * dual_normal_y), weights=[-1.0, 1.0, 0.0, 0.0] ) dvt_tang = dvt_tang * tangent_orientation # dvt_norm for smagorinsky dvt_norm = sum_over( Edge > Cell > Vertex, u_vert * dual_normal_x + v_vert * dual_normal_y, weights=[0.0, 0.0, -1.0, 1.0], ) # compute smagorinsky kh_smag_1 = sum_over( Edge > Cell > Vertex, vn_vert, weights=[-1.0, 1.0, 0.0, 0.0] ) kh_smag_1 = (kh_smag_1 * tangent_orientation * inv_primal_edge_length) + ( dvt_norm * inv_vert_vert_length ) kh_smag_1 = kh_smag_1 * kh_smag_1 kh_smag_2 = sum_over( Edge > Cell > Vertex, vn_vert, weights=[0.0, 0.0, -1.0, 1.0] ) kh_smag_2 = (kh_smag_2 * inv_vert_vert_length) - ( dvt_tang * inv_primal_edge_length ) kh_smag_2 = kh_smag_2 * kh_smag_2 kh_smag = diff_multfac_smag * sqrt(kh_smag_2 + kh_smag_1) # compute nabla2 using the diamond reduction nabla2 = (sum_over( Edge > Cell > Vertex, vn_vert, weights=[1.0, 1.0, 0.0, 0.0], ) - 2.0 * vn) * (inv_primal_edge_length * inv_primal_edge_length) nabla2 = nabla2 + (sum_over( Edge > Cell > Vertex, vn_vert, weights=[0.0, 0.0, 1.0, 1.0], ) - 2.0 * vn) * (inv_vert_vert_length * inv_vert_vert_length) nabla2 = 4.0 * nabla2 kh_smag_ec = kh_smag kh_smag = max(0., kh_smag - smag_offset) kh_smag = min(kh_smag, smag_limit) ``` ###### GT4Py deprecated prototype ```python= def nh_diff_01( e2v: E2V, diff_multfac_smag: Field[K, dtype], tangent_orientation: Field[Edge, dtype], inv_primal_edge_length: Field[Edge, dtype], inv_vert_vert_length: Field[Edge, dtype], u_vert: Field[[Vertex, K], dtype], v_vert: Field[[Vertex, K], dtype], primal_normal_x: SparseField[E2V, dtype], primal_normal_y: SparseField[E2V, dtype], dual_normal_x: SparseField[E2V, dtype], dual_normal_y: SparseField[E2V, dtype], vn: Field[[Edge, K], dtype], kh_smag: Field[[Edge, K], dtype], kh_smag_ec: Field[[Edge, K], dtype], nabla2: Field[[Edge, K], dtype], smag_limit: Field[K, dtype], smag_offset: Field[K, dtype] ): with computation(FORWARD), interval(0, None): with location(Edge) as e: weights_tang = LocalField[E2V, dtype]([-1., 1., 0., 0.]) weights_norm = LocalField[E2V, dtype]([0., 0., -1., 1.]) weights_close = LocalField[E2V, dtype]([1., 1., 0., 0.]) weights_far = LocalField[E2V, dtype]([0., 0., 1., 1.]) # fill sparse dimension vn vert using the loop concept vn_vert = (u_vert[v] * primal_normal_x[e, v] + v_vert[v] * primal_normal_y[e, v] for v in e2v[e]) # dvt_tang for smagorinsky dvt_tang = sum(weights_tang[e, v]*((u_vert[v] * dual_normal_x[e, v]) + (v_vert[v] * dual_normal_y[e, v])) for v in e2v[e]) dvt_tang = dvt_tang * tangent_orientation # dvt_norm for smagorinsky dvt_norm = sum(weights_norm[e, v]*(u_vert[v] * dual_normal_x[e, v] + v_vert[v] * dual_normal_y[e, v]) for v in e2v[e]) # compute smagorinsky kh_smag_1 = sum(weights_tang[e, v]*vn_vert[e, v] for v in e2v[e]) kh_smag_1 = (kh_smag_1 * tangent_orientation * inv_primal_edge_length) + ( dvt_norm * inv_vert_vert_length) kh_smag_1 = kh_smag_1 * kh_smag_1 kh_smag_2 = sum(weights_norm[e, v]*vn_vert[e, v] for v in e2v[e]) kh_smag_2 = (kh_smag_2 * inv_vert_vert_length) - ( dvt_tang * inv_primal_edge_length) kh_smag_2 = kh_smag_2 * kh_smag_2 kh_smag = diff_multfac_smag * sqrt(kh_smag_2 + kh_smag_1) # compute nabla2 using the diamond reduction nabla2 = (sum(weights_close[e, v]*vn_vert[e, v] for v in e2v[e]) - 2.0 * vn) * (inv_primal_edge_length * inv_primal_edge_length) nabla2 = nabla2 + (sum(weights_far[e, v]*vn_vert[e, v] for v in e2v[e]) - 2.0 * vn) * (inv_vert_vert_length * inv_vert_vert_length) nabla2 = 4.0 * nabla2 kh_smag_ec = kh_smag kh_smag = max(0., kh_smag - smag_offset) kh_smag = min(kh_smag, smag_limit) ``` ##### mo_nh_diffusion_stencil_05 ###### dusk ```python= mo_nh_diffusion_stencil_05( z_nabla4_e2: Field[Edge, K], u_vert: Field[Vertex, K], v_vert: Field[Vertex, K], primal_normal_vert_v1: Field[Edge > Cell > Vertex], primal_normal_vert_v2: Field[Edge > Cell > Vertex], z_nabla2_e: Field[Edge, K], inv_vert_vert_length: Field[Edge], inv_primal_edge_length: Field[Edge], ): nabv_tang: Field[Edge, K] nabv_norm: Field[Edge, K] with domain.upward.across[nudging:halo]: nabv_tang = sum_over( Edge > Cell > Vertex, u_vert * primal_normal_vert_v1 + v_vert * primal_normal_vert_v2, weights=[1.0, 1.0, 0.0, 0.0], ) nabv_norm = sum_over( Edge > Cell > Vertex, u_vert * primal_normal_vert_v1 + v_vert * primal_normal_vert_v2, weights=[0.0, 0.0, 1.0, 1.0], ) z_nabla4_e2 = 4.0 * ( (nabv_norm - 2.0 * z_nabla2_e) * inv_vert_vert_length ** 2 + (nabv_tang - 2.0 * z_nabla2_e) * inv_primal_edge_length ** 2 ) ``` ###### GT4Py deprecated prototype ```python= def nh_diff_05( e2v: E2V, z_nabla4_e2: Field[[Edge, K], dtype], u_vert: Field[[Vertex, K], dtype], v_vert: Field[[Vertex, K], dtype], primal_normal_vert_v1: SparseField[E2V, dtype], primal_normal_vert_v2: SparseField[E2V, dtype], z_nabla2_e: Field[[Edge, K], dtype], inv_vert_vert_length: Field[Edge, dtype], inv_primal_edge_length: Field[Edge, dtype], ): with computation(FORWARD), interval(0, None): with location(Edge) as e: weights_0 = LocalField[E2V, dtype]([1., 1., 0., 0.]) weights_1 = LocalField[E2V, dtype]([0., 0., 1., 1.]) nabv_tang = sum(weights_0[e, v]*(u_vert[v]*primal_normal_vert_v1[e, v] + v_vert[v]*primal_normal_vert_v2[e, v]) for v in e2v[e]) nabv_norm = sum(weights_1[e, v]*(u_vert[v]*primal_normal_vert_v1[e, v] + v_vert[v]*primal_normal_vert_v2[e, v]) for v in e2v[e]) z_nabla4_e2 = 4.0*((nabv_norm[e] - 2.0*z_nabla2_e[e])*inv_vert_vert_length[e]*inv_vert_vert_length[e] + (nabv_tang[e] - 2.0*z_nabla2_e[e])*inv_primal_edge_length[e]*inv_primal_edge_length[e]) ``` #### Bindings example from 2021 hackathon ```cpp namespace icon_bindings_sten_impl { struct default_tag {}; template <int...> struct field_kind {}; // template<class Tag> using neigh_tbl_t = gridtools::fortran_array_view<int, 2, default_tag, false>; auto alloc_sten_impl(int n_edges, int n_k, neigh_tbl_t e2v) { return sten({-1, n_edges, -1, n_k}, icon::make_connectivity_producer<2>(e2v)); } BINDGEN_EXPORT_BINDING_WRAPPED(3, alloc_sten, alloc_sten_impl); // template<class Tag> using sten_t = decltype(alloc_sten_impl(0, 0, neigh_tbl_t /*<Tag>*/ {{}})); template <class T> void sten_impl(sten_t sten, gridtools::fortran_array_view<T, 1, field_kind<0>> field_in, gridtools::fortran_array_view<T, 1, field_kind<0>> field_out) { sten(field_in, field_out); } BINDGEN_EXPORT_GENERIC_BINDING_WRAPPED(3, sten, sten_impl, (double)); } // namespace icon_bindings_sten_impl ```