# New model project ###### tags: `archive` [TOC] ## Links to other documents - [Problems with the old models](https://hackmd.io/@havogt/rk_ELMUq_) - [Iterator view presentation](https://docs.google.com/presentation/d/1QjGZ3V8YIeOESTnp7bwF1AAate5v47Tc4IoVnqyDH1s/edit?usp=sharing ) - [Iterator view model description](https://hackmd.io/@gridtools/B1Yh6B8cd) - [Field view (without sizes)](https://hackmd.io/@havogt/ryaxJKX5_) - [Cartesian accessor view (first "new model")](https://hackmd.io/@gridtools/SkEKFlGEO) - [Repository with toy implementations](https://github.com/fthaler/gt4py_new_model): `main` contains the accessor view, then several branches with slightly different models for unstructured. - [Mixed / Till Model](https://hackmd.io/8WXjrwEGQaabUnqMFtvg5A?both) ## Meetings #### Agenda 1. Field view details: - Intersection of domain - `and` of fields ### 2021-07-11 4pm CET ### 2021-07-06 4pm-5pm CET #### Agenda 1. Discuss toy frontend and review discussion from Monday meeting ### 2021-07-05 4pm CET finalize iterator view #### Agenda 1. Review builtins 2. Review offset interpretation 3. Show toy frontend (skipped) 4. Decide and organize implementation steps ?? - embedded mode (temporary, waiting for Field view) - parse/trace to IR (temporary, waiting for Field view) - Codegen C++ - Python bindings to C++ - C++ backend #### Minutes Anton+Hannes start implementing iterator view with simple frontend ### 2021-06-30 4pm CET work on intermediate representation #### Agenda 1. I'm convinced that one can transform from the Field View to the Local View. However, is there a clear idea how this would be achieved? Would one try to rewrite a pass (as a first step of the toolchain) that transforms the python AST accordingly or...? 2. Today we have seen python and C++ representations of the IIR. It seems to me that the C++ representation is tightly coupled to the (U)SID concepts. Does the C++ way of doing things still allow to introduce a more "traditional" text based code generation (akin to e.g. https://github.com/GridTools/gt4py/blob/master/src/gtc/gtcpp/gtcpp_codegen.py) ? 3. BNF IR from Anton #### Minutes ##### Possible toolchain DSL -> Field view IR -> Local view IR -> Backend Code How does the generated code look like? C-like or higher-level C++? Matthias: MCH has the requirement to generate lower level code (for debuggability and profiling) Anton: in the first step we take away load from Python code generator and do more work in C++, if we need/want to change it, we can move more logic to the code generator. => The design should allow moving complexity between C++ and Python. ##### Anton's IR see EBNF section in https://hackmd.io/@gridtools/B1Yh6B8cd ### 2021-06-29 4pm CET summary of previous session #### Agenda 1. ~~Review updated FV3 examples~~ #### Minutes Reviewed model prototypes. ### 2021-06-29 9am Finalizing the iterator view #### Open questions/comments - Hannes: Describe OffsetProvider part of the model. - Hannes: Describe how Offsets interact with optimizations. - Hannes: Improve lift definition with semantic (roughly, lift behaves such that shifting lifted function is same as passing shifted iterators (with the same offset) to the non-lifted function) - Hannes: Describe "orthogonality" of dimensions in the context of `shift` to "non-existent location" (`shift(V2E, K(1), nth(0))` or `shift(V2E, E2C, nth(0), nth(1)`). #### Minutes ### 2021-06-22 4pm CET Sync #### Action items Anton: indirect -> strided Hannes: multishift with sparsefield (related to field view problem) Till: investigate built-ins (generic vs domain specific); fusion and blocking in Till's execution model; #### Agenda 1. Reduce and SparseFields #### Minutes ##### Reduce and SparseFields Discussed "reduce and sparse field" section in https://hackmd.io/@gridtools/B1Yh6B8cd. Questions: How are Offsets provided? - Hannes: Mapping of OffsetLiteral to OffsetProvider is argument of the fencil - Anton: Fencil needs to get OffsetProvider which can interpret all offsetliterals (and composed offsetliterals) in this fencil. We need OffsetLiteral interpretation also for optimizations. ### 2021-06-16 4pm CET Sync #### Agenda 1. Problems 2. Summary of discussions since last meeting - What is "offset" in the iterator view 3. Questions - neighbor `reduce` as built-in? - how to express sparse field in iterator view? #### Minutes ##### Problems ###### Not clear how unstructured to strided representation can be done Looks like color dimension is special - How does extend analysis work? ###### reduce and shift with offset range and sparse field How to make this play together ```python= def neighbors(iterator, offset_range) -> Iterator[SparseField[offset_range]]: ... def reduce(fun, init, *sparse_field) -> value: ... def shiftn(iterator: Tuple[Iterator], offset_range: Tuple[Offset]) -> Tuple[Iterator] def sten(): reduce(neighbors(neighbors(field, offsets) other_offsets)) ``` ### 2021-06-15 4pm CET Sync #### Agenda 1. Walk through use-cases #### Minutes ##### Walk through use-cases see comments in use case section #### Questions: - neighbor `reduce` as built-in? - how to express sparse field in iterator view? #### Action items Hannes: describe unstructured optimzations ### 2021-06-09 3pm CET Overview of current state of the 2 models #### Agenda 1. Presentation of local view 2. Presentation of field view 3. How do we organize work? Split team? 4. Postponed questions from previous meeting - Examples of IR extensions (in the vertical) - Examples for syntactic sugar extensions - Local AST transformations - Tracing #### Minutes ##### Local view https://docs.google.com/presentation/d/1QjGZ3V8YIeOESTnp7bwF1AAate5v47Tc4IoVnqyDH1s/edit?usp=sharing Open questions: - Do we need reduce? - Do we need Location anywhere? ##### Field view Open questions: - How to unambiguously track field sizes for eager execution. ##### How to organize work - Local view is better suitable for the performant backends - Field view is better for a NumPy execution and most likely the easier to explain to the user Split groups: Field view: Enrique Local view: Anton #### Action items Anton + Till: discuss Till's mixed model ### 2021-06-08 Kick-off meeting #### Agenda 1. Refine agenda 2. Design goals: + Embedded DSL (Optional: fast execution in embedded mode) + Represent use cases (which means we need to collect them, specially ICON cases from MCH) + Represent all the known optimizations 3. Organization of work + Collect use cases from all available models (ICON, FVM, FV3, ...) + Make a list of known optimizations + Discussions on both views: + Field view for a possible eager/fast frontend implementation + Local view for the backend implementations + Mapping from frontend to backend + Sync meetings: + How often? On demand possible? 4. Present examples of current models (cartesian or unstructured) + Local view: quick refresh ? + Field view: introduction #### Minutes ##### What's the outcome of the project? To contribute the discussion, I am repeating what I have said in the meeting: the output is a mature design (ready for implementation stage) accepted by the whole team and not rejected by the customers for the technical reason. In other words: we are going to develop unstructured new model to the level we currently have for cartesian. At the end of the cycle we will have: - all known use cases and optimisations are collected - the requirements are formed from the use cases - there is a design doc where the design is formulated - the design is verified during in-team discussion and the team has the consensus - the reference python implementation is written that allows to challenge the design against use cases - the design is presented to the customers - the customers do not have unsolved technical objections against the design ##### Design goals + Embedded DSL (Optional: fast execution in embedded mode) + Represent use cases (which means we need to collect them, specially ICON cases from MCH) + Can we make the DSL extensible by the users? + 2 types of extensions: + syntax sugar (no modifications to the IR) + extensions to the IR (e.g. special vertical patterns) + Represent all the known optimizations + What are the targets? + Efficient CUDA and OpenMP + DaCe + NumPy + Do we consider distributed execution in this project? + No, not possible, too big for this project. + If we support blocking, then halo exchanges should be doable. + Where do we get use-case from? Which use-cases are out-of-scope? + ICON (starting from existing dusk implementation) + horizontal indirections: we need this example! + FV3 core (fully implemented in GTScript) + do we consider Lagrangian remapping (vertical indirection)? + FV3 physics + FVM Cartesian + we don't consider the global reduction + Why don't we consider global reductions? + We are currently exploring a solution with DaCe. + It's not a stencil pattern. ##### Organization of work ###### Getting use-cases ... ###### Known optimizations Matthias, Hannes, Anton (for the known ones from Cartesian), Felix ###### Do we work on both views? Talk about it tomorrow ##### Action items for next meeting - Anton: present local view - Hannes: present field view + send meeting invite - Matthias: collect use-cases + optimizations - Johann: collect use-cases (+ optimizations) - Felix: collect optimizations ## Use-cases ### From unstructured #### Sparse fields - Are they input or output to stencils or only internal? In icon: they are either input or stencil local temporaries Assigning to a sparse field in dusk ```python # fill sparse dimension vn vert using the loop concept with sparse[Edge > Cell > Vertex]: vn_vert = u_vert * primal_normal_vert_x + v_vert * primal_normal_vert_y ``` Most common use case in ICON are "geometrical factors" computed at start up time and left constant during the simulation ```python= @stencil def mo_nh_diffusion_stencil_03(kh_smag_ec: Field[Edge, K], vn: Field[Edge, K], e_bln_c_s: Field[Cell > Edge], geofac_div: Field[Cell > Edge], diff_multfac_smag: Field[K], kh_c: Field[Cell, K], div: Field[Cell, K]): with domain.across[nudging:halo].upward: kh_c = sum_over(Cell > Edge, kh_smag_ec*e_bln_c_s)/diff_multfac_smag div = sum_over(Cell > Edge, vn*geofac_div) ``` Questions: - One way would be to represent sparse field as Field with Tuple values. - Maybe SparseField values should have the same interface as Iterator #### Horizontal indirection ```fortran DO jk = 1, nlev DO je = i_startidx, i_endidx ilc0 = btraj%cell_idx(je,jk) ! Calculate "edge values" of rho and theta_v ! Note: z_rth_pr contains the perturbation values of rho and theta_v, ! and the corresponding gradients are stored in z_grad_rth. z_rho_e(je,jk) = REAL(p_nh%metrics%rho_ref_me(je,jk),wp) & + z_rth_pr(1,ilc0,jk) & + btraj%distv_bary(je,jk,1) * z_grad_rth(1,ilc0,jk) & + btraj%distv_bary(je,jk,2) * z_grad_rth(2,ilc0,jk) z_theta_v_e(je,jk) = REAL(p_nh%metrics%theta_ref_me(je,jk),wp) & + z_rth_pr(2,ilc0,jk) & + btraj%distv_bary(je,jk,1) * z_grad_rth(3,ilc0,jk) & + btraj%distv_bary(je,jk,2) * z_grad_rth(4,ilc0,jk) ENDDO ENDDO ``` #### Vertical indirection ICON Example, dusk syntax ```python @stencil def mo_solve_nonhydro_stencil_19(theta_v: Field[Cell, K], ikidx: IndexField[Edge > Cell, K], zdiff_gradp: Field[Edge > Cell, K], theta_v_ic: Field[Cell, K], inv_ddqz_z_full: Field[Cell, K], inv_dual_edge_length: Field[Edge], z_hydro_corr: Field[Edge, K]): z_theta1: Field[Edge,K] z_theta2: Field[Edge,K] with domain.across[nudging:halo].upward[-1:]: z_theta1 = sum_over(Edge > Cell, theta_v[ikidx] + zdiff_gradp*(theta_v_ic[ikidx] - theta_v_ic[ikidx+1])*inv_ddqz_z_full[ikidx], weights=[1,0]) z_theta2 = sum_over(Edge > Cell, theta_v[ikidx] + zdiff_gradp*(theta_v_ic[ikidx] - theta_v_ic[ikidx+1])*inv_ddqz_z_full[ikidx], weights=[0,1]) z_hydro_corr = grav_o_cpd*inv_dual_edge_length*(z_theta2-z_theta1)*4./((z_theta1+z_theta2)) ``` **Questions/Comments:** - Possible representations: - static dispatch (because it's bounded) - index field - Can we express this with a generalized offset and then decide at code generation time, if we do dispatch (Cartesian-like offset) or index field (neighbor table offset). Open question: how is this "neighbor table" constructed. #### Vertical & Horizontal Subdomains ICON Example that runs from the 3rd onion from the start up until the first halo line on the horizontal domain, and on the top level in the vertical domain: ```python @stencil def mo_solve_nonhydro_stencil_04(wgtfacq_c: Field[Cell, K], z_exner_ex_pr: Field[Cell, K], z_exner_ic: Field[Cell, K]): with domain.across[lb+3:halo-1].upward[-1:] as k: z_exner_ic = wgtfacq_c[k-1]*z_exner_ex_pr[k-1] + wgtfacq_c[k-2]*z_exner_ex_pr[k-2] + wgtfacq_c[k-3]*z_exner_ex_pr[k-3] ``` [Supplementary Document for ICON subdomains](https://docs.google.com/document/d/1MtlHPB3-eO6sOk0TvHE1rnnbp5r5QbRXRoL8V3Tpw6M/edit?usp=sharing) #### Mutable Fields ICON Example that mutates a field (in-out field) ```python @stencil def mo_nh_diffusion_stencil_12(area: Field[Cell], z_nabla2_c: Field[Cell, K], geofac_n2s: Field[Origin + Cell > Edge > Cell], w: Field[Cell, K]): with domain.upward.across[lb:halo]: w = w - diff_multfac_w*area*area* sum_over(Origin + Cell > Edge > Cell, z_nabla2_c[Origin + Cell > Edge > Cell]*geofac_n2s) ``` Comments: - Live with a temporary copy during transition - Or: work on fencil optimizing this pattern #### Conditional Assignment ICON Example that mutates a field depending on a mask ```python @stencil def mo_solve_nonhydro_stencil_20(ipeidx_dsl: Field[Edge, K], pg_exdist: Field[Edge, K], z_hydro_corr: Field[Edge, K], z_gradh_exner: Field[Edge, K]): with domain.upward: if ipeidx_dsl: z_gradh_exner += z_hydro_corr*pg_exdist ``` Conclusion: - ternary #### Control Flow ICON Example that contains control flow ```python with domain.upward[0:-1] as k: if (levelmask[k] or levelmask[k+1]): w_con_e = reduce_over(Edge > Cell, c_lin_e * z_w_con_c_full, sum, init=0.0,) if (abs(w_con_e) > cfl_w_limit*ddqz_z_full_e): difcoef = scalfac_exdiff * \ min(0.85 - cfl_w_limit*d_time, abs(w_con_e) * d_time/ddqz_z_full_e - cfl_w_limit*d_time) ddt_vn_adv = ddt_vn_adv + difcoef * \ area_edge*(vn*geofac_grdiv_center + reduce_over(Edge > Cell > Edge, geofac_grdiv*vn[Edge > Cell > Edge], sum, init=0.0, )) + tangent_orientation * inv_primal_edge_length * \ reduce_over(Edge > Vertex, zeta, sum, init=0.0) ``` Comments: - If this is a physics `if` (different outputs in true and false branch), we need to find a DSL syntax to express this user-friendly. #### Dispatch on control flow variable Future: structure of the icon dycore is as follows ``` if (substep ==1) #predictor ... lots of stencils ... else #corrector ... even more stencils ... ``` Comments: - Could be in the driver code, e.g. handled by DaCe #### Sequential k-loop (solver) ICON vertical solver ```python @stencil def mo_solve_nonhydro_stencil_49(vwind_impl_wgt: Field[Cell], theta_v_ic: Field[Cell, K], ddqz_z_half: Field[Cell, K], z_alpha: Field[Cell, K], z_beta: Field[Cell, K], z_w_expl: Field[Cell, K], z_exner_expl: Field[Cell, K], z_q: Field[Cell, K], w: Field[Cell, K]): z_gamma: Field[Cell, K] z_a: Field[Cell, K] z_b: Field[Cell, K] z_c: Field[Cell, K] z_g: Field[Cell, K] with domain.across[nudging:halo].upward[2:] as k: z_gamma = dtime*cpd*vwind_impl_wgt*theta_v_ic/ddqz_z_half z_a = -z_gamma*z_beta[k-1]*z_alpha[k-1] z_c = -z_gamma*z_beta[k]*z_alpha[k+1] z_b = 1.+z_gamma*z_alpha*(z_beta[k-1]+z_beta) z_g = 1./(z_b+z_a*z_q[k-1]) z_q = - z_c*z_g w = z_w_expl - z_gamma *(z_exner_expl[k-1]-z_exner_expl[k]) w = (w - z_a*w[k-1])*z_g ``` Comments: - scan #### Weighted (and partial) reductions ICON example to compute an edge-centered gradient ```python @stencil def mo_solve_nonhydro_stencil_16(inv_dual_edge_length: Field[Edge], z_exner_ex_pr :Field[Cell, K], z_gradh_exner: Field[Edge, K]): with domain.across[nudging:halo].upward[:4]: #1, nflatlev(jg)-1 z_gradh_exner = inv_dual_edge_length * sum_over(Edge > Cell, z_exner_ex_pr, weights=[-1,1]) ``` ICON snippet (ab-)using weights to perform a partial reduction ```python dvt_tang = sum_over( Edge > Cell > Vertex, (u_vert * dual_normal_vert_x) + (v_vert * dual_normal_vert_y), weights=[-1.0, 1.0, 0.0, 0.0] ) ``` Comments: - Write as: `-deref(shift(z_exner_ex_pr, Edge(0)))+deref(shift(z_exner_ex_pr, Edge(1)))` #### Nested reductions No "direct" use case in ICON. Required for stencil inlining optimization ```python= @stencil def nested( a: Field[Edge], b: Field[Cell], c: Field[Vertex] ): with levels_downward: a = reduce_over(Edge > Cell, b*reduce_over(Cell > Vertex, c, sum, init=0.0), sum, init=0.0) ``` Comment: - composable shifts ### From structured #### FV3 dycore: specialized stencils at edges/corners ```python def update_vorticity_and_kinetic_energy(...): from __externals__ import i_end, i_start, j_end, j_start with computation(PARALLEL), interval(...): ke = uc if ua > 0.0 else uc[1, 0, 0] vort = vc if va > 0.0 else vc[0, 1, 0] with horizontal(region[:, j_start - 1], region[:, j_end]): vort = vort * sin_sg4 + u[0, 1, 0] * cos_sg4 if va <= 0.0 else vort with horizontal(region[:, j_start], region[:, j_end + 1]): vort = vort * sin_sg2 + u * cos_sg2 if va > 0.0 else vort with horizontal(region[i_end, :], region[i_start - 1, :]): ke = ke * sin_sg3 + v[1, 0, 0] * cos_sg3 if ua <= 0.0 else ke with horizontal(region[i_end + 1, :], region[i_start, :]): ke = ke * sin_sg1 + v * cos_sg1 if ua > 0.0 else ke ke = 0.5 * dt2 * (ua * ke + va * vort) ``` Comments: - Field view needs to be defined - Extents analysis is complicated, but required for "normal" boundary conditions anyway. #### FV3GFS physics: interval bounds from run-time variables access current index as limiter: ```python def limt_cloud_top(qa_dt: Field[IJK, float], ktop: Field[IJ, int]): with computation(PARALLEL): with interval(...): k_s = ... if k_s < ktop: # ktop different for each ij point qa_dt[0,0,0] = 0. # else: # qa_dt = qa_dt ``` Comments: - ternary #### FV3 dycore and physics: need for type hinting on temporary fields Need lower dimensional temporaries ```python # fill a 2d field inside computataions: def compute(...): with computation(BACKWARD): with interval(-1, None): if (prog_ccn == 0) and (use_ccn == 1): # ccn is formulted as ccn = ccn_surface * (den / den_surface) ccn = ccn * rdgas * tz / p1 with interval(0, -1): if (prog_ccn == 0) and (use_ccn == 1): # Propagate downwards previously computed values of ccn ccn = ccn[0, 0, +1] ``` Comments: - if this is a reduction-like operation in k: scan #### FV3 dycore: vertical reduction onto uniform grid (remapping) Simple example: ```python=3.9 def vertical_reduction( heights: Field[float], qin: Field[float], qout: Field[float], k_index: Field[IJ, int], next_height: Field[IJ, float], dz: float ): with computation(FORWARD), interval(...): next_height += dz while current_height < next_height: qout += qin[0, 0, k_index] current_height += heights[0, 0, k_index] k_index += 1 ``` In FV3, this is a particular reduction that takes a set of fields and reduces them onto a regular grid. This also interpolating the values in a complicated way. See [GDP-4](https://github.com/GridTools/gt4py/pull/426/files#diff-56bdca65cce5741927079c9fd01096c8f2f6adc0dbb14b360ca27676b78efa67R120). Comments: - For such patterns we might need the "do-whatever-you-want-in-k" (or keep it out of the language) #### FV3GFS physics: inner loops from current vertical index to end of vertical axis An example of this is in finding the mixing length in the [turbulence physics](https://github.com/VulcanClimateModeling/physics_standalone/blob/5ea17c3679046cdd2520f8c3911377da247330a2/turb/python/turb_gt.py#L1171). Note (JD): This example does not make much sense to me either... Reduced example: ```python # Plain Python code for k in range(km1): ... for n in range(k, km1): ... ptem = gotvx[i, 0, n] * (thvx[i, 0, n + 1] - thvx[i, 0, k]) * dz ``` **Another example**: Loop from current to top / bottom (with exit) and write to both indices: ```python do k = kbot - 1, k0, - 1 do m = k + 1, kbot if (zt (k + 1) >= ze (m)) exit [...] tz (m) = tz (m) - sink * icpk (m) qi (k) = qi (k) - sink * dp (m) / dp (k) ``` Here is a more complete loop: ``` if (k0 < kbot) then do k = kbot - 1, k0, - 1 if (qi (k) > qrmin) then do m = k + 1, kbot if (zt (k + 1) >= ze (m)) exit if (zt (k) < ze (m + 1) .and. tz (m) > tice) then dtime = min (1.0, (ze (m) - ze (m + 1)) / (max (vr_min, vti (k)) * tau_imlt)) sink = min (qi (k) * dp (k) / dp (m), dtime * (tz (m) - tice) / icpk (m)) tmp = min (sink, dim (ql_mlt, ql (m))) ql (m) = ql (m) + tmp qr (m) = qr (m) - tmp + sink tz (m) = tz (m) - sink * icpk (m) qi (k) = qi (k) - sink * dp (m) / dp (k) endif enddo endif enddo endif ``` *Note:* The if checks do not matter, as the port removes these. Comment: - We need the complete example to understand the pattern. #### FV3GFS physics: looping over data dimensions ```fortran do k = 1, nlay ... do ib = 1, nbdsw jb = nblow + ib - 1 taucw(k,ib) = tauliq(jb)+tauice(jb)+tauran+tausnw ssacw(k,ib) = ssaliq(jb)+ssaice(jb)+ssaran(jb)+ssasnw(jb) asycw(k,ib) = asyliq(jb)+asyice(jb)+asyran(jb)+asysnw(jb) enddo taucw = reduce(lambda ) ``` Comment: - Is `nblow` an index field or constant (compile-time/run-time)? Answer: Yes. #### FV3GFS physics: reduction pattern across k ```python def compute(...): with computation(FORWARD): with interval(0, 1): if qrz > qrmin: no_fall = 0 else: no_fall = 1 with interval(1, None): # no_fall value at current level is a function of the previous level value if no_fall[0, 0, -1] == 1: if (qrz > qrmin): no_fall = 0 else: no_fall = 1 else: no_fall = 0 ``` Workaround for not having a k access: this code would compute the level and then check if k is that. Now we need a boolean 3d field. Also see the forward parallel trick applied here. We need those to be in sequence: ```python def compute(...): # Find significant melting level """ k0 removed to avoid having to introduce a k_idx field """ with computation(**FORWARD**): with interval(0, 1): if tz > tice: stop_k = 1 else: stop_k = 0 with interval(1, -1): if stop_k[0, 0, -1] == 0: if tz > tice: stop_k = 1 else: stop_k = 0 else: stop_k = 1 with interval(-1, None): stop_k = 1 with computation(**PARALLEL**), interval(...): if stop_k == 1: # Melting of cloud ice (before fall) tc = tz - tice if (qiz > qcmin) and (tc > 0.): sink = min(qiz, fac_imlt * tc / icpk) tmp = min(sink, dim(ql_mlt, qlz)) qlz = qlz + tmp qrz = qrz + sink - tmp qiz = qiz - sink q_liq = q_liq + sink q_sol = q_sol - sink cvm = c_air + qvz * c_vap + q_liq * c_liq + q_sol * c_ice tz = tz - sink * lhi / cvm tc = tz - tice ``` Comments: scan #### IFS-FVM LAM Nabla with BC ```python @stencil(backend=backend, **backend_opts) def _nabla_x( pp : gtscript.Field[dtype], px : gtscript.Field[dtype], dx : dtype ): with computation(PARALLEL), interval(...): dx2i = 0.5 / dx px[0,0,0] = (pp[1,0,0] - pp[-1,0,0]) * dx2i ... class Nabla(Operator): def __init__(self, grid): Operator.__init__(self, grid) def __call__(self, pp, pdx, pdy, pdz): _nabla_x(self.grid, (I, J, K), pp=pp, px=pdx, dx=self.dx, subdomain=("interior", (1, 0, 0))) if self.bcx == 0: _nabla_wbcx0(self.grid, (I, J, K), pp=pp, px=pdx, dx=self.dx, subdomain=("prescribed_boundary", "west")) _nabla_ebcx0(self.grid, (I, J, K), pp=pp, px=pdx, dx=self.dx, subdomain=("prescribed_boundary", "east")) _nabla_y(self.grid, (I, J, K), pp=pp, py=pdy, dy=self.dy, subdomain=("interior", (0, 1, 0))) if self.bcy == 0: _nabla_sbcy0(self.grid, (I, J, K), pp=pp, py=pdy, dy=self.dy, subdomain=("prescribed_boundary", "south")) _nabla_nbcy0(self.grid, (I, J, K), pp=pp, py=pdy, dy=self.dy, subdomain=("prescribed_boundary", "north")) _nabla_z(self.grid, (I, J, K), pp=pp, pz=pdz, dz=self.dz, subdomain=("interior", (0, 0, 1))) if self.bcz == 0: _nabla_bbcz0(self.grid, (I, J, K), pp=pp, pz=pdz, dz=self.dz, subdomain=("prescribed_boundary", "bottom")) _nabla_tbcz0(self.grid, (I, J, K), pp=pp, pz=pdz, dz=self.dz, subdomain=("prescribed_boundary", "top")) ``` Comments: see corner-edge computation #### IFS-FVM LAM LinAlg ```python @stencil(backend=backend,**backend_opts) def _pressure_coefficients( g11 : gtscript.Field[dtype], g12 : gtscript.Field[dtype], g13 : gtscript.Field[dtype], g21 : gtscript.Field[dtype], g22 : gtscript.Field[dtype], g23 : gtscript.Field[dtype], g33 : gtscript.Field[dtype], pc11 : gtscript.Field[dtype], pc12 : gtscript.Field[dtype], pc13 : gtscript.Field[dtype], pc21 : gtscript.Field[dtype], pc22 : gtscript.Field[dtype], pc23 : gtscript.Field[dtype], pc31 : gtscript.Field[dtype], pc32 : gtscript.Field[dtype], pc33 : gtscript.Field[dtype] ): with computation(PARALLEL), interval(...): pc11[0,0,0] = g11[0,0,0]*g11[0,0,0] + g21[0,0,0]*g21[0,0,0] pc12[0,0,0] = g11[0,0,0]*g12[0,0,0] + g21[0,0,0]*g22[0,0,0] pc13[0,0,0] = g11[0,0,0]*g13[0,0,0] + g21[0,0,0]*g23[0,0,0] pc21[0,0,0] = g12[0,0,0]*g11[0,0,0] + g22[0,0,0]*g21[0,0,0] pc22[0,0,0] = g12[0,0,0]*g12[0,0,0] + g22[0,0,0]*g22[0,0,0] pc23[0,0,0] = g12[0,0,0]*g13[0,0,0] + g22[0,0,0]*g23[0,0,0] pc31[0,0,0] = g13[0,0,0]*g11[0,0,0] + g23[0,0,0]*g21[0,0,0] pc32[0,0,0] = g13[0,0,0]*g12[0,0,0] + g23[0,0,0]*g22[0,0,0] pc33[0,0,0] = g13[0,0,0]*g13[0,0,0] + g23[0,0,0]*g23[0,0,0] + g33[0,0,0]*g33[0,0,0] ``` Comments: - matrix, vector valued field, e.g. tuple - this is A\*A with symmetric A ## Optimizations ### dawn / MCH Optimizations currently either in dawn or being investigated as future optimizations by MCH - stage merging (we start at maximum split) - demotion of temporary fields to scalars - stencil inlining - nested reduction to weighted chained reduction* - (k caching) Most optimizations are explained [here](https://docs.google.com/presentation/d/1JvqmD_9xd1CSuWFN_erkNOl6Knn5b1Sqy4SIUcUaeyA/edit?usp=sharing), with the exception of *, which is explained [here](https://drive.google.com/file/d/1dUkZbhGWqfgg_kad-_aqyrBSTcqNIBP1/view?usp=sharing)