# Research: DSL preprocessor for verfication/integration ###### tags: `functional cycle 11` gt4icon (preliminary name, very open to suggestions) is a proposed preprocessor that facilitates integration of **gt4**py code into the **icon** model. The motivation for this project is three-fold: * DWD may not accept the changes made to the dycore required to call the compiled stencil sources * The integration code is essentially boiler plate. Dealing with it is cumbersome and error-prone. * A dsl preprocessor could fuse and unfuse stencils as required for production or debug runs. This is especially worthwhile if code already integrated were to change. gt4icon currently only exists as a rough idea, with some requirements and limitations discussed below. The aim of this project is to come up with a complete design, to be used as the basis for a future shaped project. ## Appetite 2 Developers for a half-cycle (**?**) ## Tasks * Research pp_ser * Turn the rough design & requirements given in the section below into a proper design document, ready to be used as the basis to a shaped project. Use the learnings from the research task. ### Research pp_ser [pp_ser](https://github.com/GridTools/serialbox/blob/master/src/serialbox-python/pp_ser/pp_ser.py) is a preprocessor that understands pragmas (directives) injected into FORTRAN code that enable the serialization (deseralization?) of FORTRAN arrays. It is thus somewhat related to this project in the sense that it is also a purpose built FORTRAN preprocessor. Hence, it is worthwhile to study this project. This task entails to contact developers and users of pp_ser to answer questions like: * What are the challenges unique to developing a FORTRAN preprocessor? * Apparently, there are various traps & pitfalls when creating such a preprocessor due to requirements of Fortran. We should try to properly research them to ensure we don't fall into them (e.g., older fortran compilers had a limitation on how many characters are allowed per line which the preprocessing directives and the translated code would have to respect). There are various people we could contact that have experience with such issues (e.g., Oli, Hannes, Xavier, Carlos). * What is the user experience like? What do users expect from such a preprocessor? Are there any current pain points? * ... ### Design gt4icon #### Requirements Currently, the typical integration code of a dsl stencil into FORTRAN looks like this: ```@FORTRAN #ifdef __DSL_VERIFY !$ACC PARALLEL IF( i_am_accel_node .AND. acc_on ) DEFAULT(NONE) ASYNC(1) out_before(:,:,:) = out(:,:,:) !$ACC END PARALLEL !$ACC PARALLEL LOOP DEFAULT(NONE) GANG VECTOR COLLAPSE(2) ASYNC(1) IF( i_am_accel_node .AND. acc_on ) DO jk = 1, nlev DO jc = i_startidx, i_endidx out(jc,jk,jb) = ... ENDDO ENDDO !$ACC END PARALLEL LOOP #endif CALL wrap_run_mo_stencil_01(in_1 = in_1(:,:,1), in_2 = in_2(:,:,1), & out = out(:,:,1), out_before = out_before(:,:,1), & out_abs_tol=1e-18_wp, out_rel_tol=1e-18_wp) ``` We note that from the `$ACC PARALLEL` statements perspective there is a _header_ that saves the before state and a _trailer_ that performs the actual dsl stencil call. The purpose of gt4icon is to automate the generation of these components as much as reasonable. If either header or trailer are to be auto generated gt4icon can not be a "pure" FORTRAN preprocessor but it has to interact with the DSL toolchain to some degree; otherwise the fields and their dimensions need to be stated in the pragmas, requiring the same amount of information currently being present. This leads to a situation that is worse than the current approach since the same information needs to be injected, but since it is now contained in comments, there is no IDE support available. It is observed: * The fields touched by a stencil and their dimensions can be extracted from the tool chain * The information which dsl field is bound to which FORTRAN field needs to be provided by some means * The same is true for the relative and absolute tolerances There is also another component to the integration: All the `*_before` fields need to be allocated on the FORTRAN side. In the future, this needs to be done in the stencil setup. In consequence: the introduction of gt4icon entails changes to the dsl integration code. In the optimal case, the above code would be turned into the following when using gt4icon: ```@FORTRAN !$DSL mo_stencil_01 !$ACC PARALLEL LOOP DEFAULT(NONE) GANG VECTOR COLLAPSE(2) ASYNC(1) IF( i_am_accel_node .AND. acc_on ) DO jk = 1, nlev DO jc = i_startidx, i_endidx out(jc,jk,jb) = ... ENDDO ENDDO !$ACC END PARALLEL LOOP ``` #### No-Go's * gt4icon should not contain any components that need to parse FORTRAN code. * automatic fusion of stencils is out of scope for the _current_ iteration of gt4icon. At the same time, care should be taken to not design a solution that _prevents_ automatic fusion in some way #### Notes on front-end gt4icon coupling Some specific questions that should be answered are: * How is gt4icon coupled to a dsl front-end * Does it only work specifically with gt4py or could the tool also be used for other front-ends (more general purpose) * How do gt4icon and a dsl front-end interface with each other In the following some possibilities are presented to illustrate important aspects of these questions: ##### (Simple) Verbatim copying of header & trailer blocks In the fortran source file the start and end of the fortran block which should be replaced with a DSL stencil is marked and the name of the DSL stencil is given: ```@FORTRAN !$DSL start mo_stencil_01 !$ACC PARALLEL LOOP DEFAULT(NONE) GANG VECTOR COLLAPSE(2) ASYNC(1) IF( i_am_accel_node .AND. acc_on ) DO jk = 1, nlev DO jc = i_startidx, i_endidx out(jc,jk,jb) = ... ENDDO ENDDO !$ACC END PARALLEL LOOP !$DSL end mo_stencil_01 ``` Then the dsl front-end generates two files, e.g., `mo_stencil_01_header` & `mo_stencil_01_trailer`. These files will then be copy pasted verbatim at the start and end of the fortran block. For the current dusk/dawn integration these files would contain: `mo_stencil_01_trailer`: ```@FORTRAN #endif CALL wrap_run_mo_stencil_01( & in_1 = in_1(:,:,1), & in_2 = in_2(:,:,1), & out = out(:,:,1), & out_before = out_before(:,:,1), & out_abs_tol=1e-18_wp, & out_rel_tol=1e-18_wp & ) ``` `mo_stencil_01_header`: ```@FORTRAN #ifdef __DSL_VERIFY !$ACC PARALLEL IF( i_am_accel_node .AND. acc_on ) DEFAULT(NONE) ASYNC(1) out_before(:,:,:) = out(:,:,:) !$ACC END PARALLEL ``` However, the header should probably also correspond to just simply a call that creates copies of the fields that can be used for verification: `mo_stencil_01_header`: ```@FORTRAN #ifdef __DSL_VERIFY CALL create_field_copies_mo_stencil_01( & in_1 = in_1(:,:,1), & in_2 = in_2(:,:,1), & out = out(:,:,1) & ) ``` Then we wouldn't have to pass verification fields like `out_before` to the stencil call (since these copies would be handled internally by the generated C++ stencil object): `mo_stencil_01_trailer`: ```@FORTRAN #endif CALL wrap_run_mo_stencil_01( in_1 = in_1(:,:,1), & in_2 = in_2(:,:,1), & out = out(:,:,1), & out_abs_tol=1e-18_wp, & out_rel_tol=1e-18_wp & ) ``` It's the responsibility of the front-end to ensure the content of these two files will work correctly for the Fortran files. This would be similar to C++'s include directives: ```@FORTRAN #include "mo_stencil_01_header" !$ACC PARALLEL LOOP DEFAULT(NONE) GANG VECTOR COLLAPSE(2) ASYNC(1) IF( i_am_accel_node .AND. acc_on ) DO jk = 1, nlev DO jc = i_startidx, i_endidx out(jc,jk,jb) = ... ENDDO ENDDO !$ACC END PARALLEL LOOP #include "mo_stencil_01_trailer" ``` Issues: How does the front-end know the association of fortran arrays (names, dimensions & layout) to the parameters/variables of the dsl stencils (e.g., `in_1 = in_1(:,:,1)`) Pros: Very simple & gt4icon is very weakly coupled to a dsl front-end. ##### (Medium) Templated header & trailer blocks While the previous approach has the advantage that gt4icon has a very weak coupling to a dsl front-end, it's unclear how the fortran arrays are associated with the dsl stencil parameters/variables. An obvious extension would be to have the header and trailer blocks be templated and the dsl directives can pass a dictionary to the templates. ```@FORTRAN !$DSL start mo_stencil_01 !$DSL in_1 = p_patch(jg)%ph_metric%in_1(:,:,1) !$DSL in_2 = in_2(:,:,1) !$DSL out = out(:,:,1) !$DSL out_abs_tol=1e-18_wp !$DSL out_rel_tol=1e-18_wp !$ACC PARALLEL LOOP DEFAULT(NONE) GANG VECTOR COLLAPSE(2) ASYNC(1) IF( i_am_accel_node .AND. acc_on ) DO jk = 1, nlev DO jc = i_startidx, i_endidx out(jc,jk,jb) = ... ENDDO ENDDO !$ACC END PARALLEL LOOP !$DSL end mo_stencil_01 ``` This would create a dictionary: ```@PYTHON integration_params = { "in_1" : "in_1(:,:,1)", "in_2" : "in_2(:,:,1)", "out" : "out(:,:,1)", "out_abs_tol" : "1e-18_wp", "out_rel_tol" : "1e-18_wp", } ``` Then the header and trailer could be templated and use this dictionary (here we demonstrate this with jinja2 templates): `mo_stencil_01_header`: ```@FORTRAN #ifdef __DSL_VERIFY CALL create_field_copies_mo_stencil_01( & {% for name, value in integration_params %} {% if not name.endswith("_tol") %} {{ name }} = {{ value }}, & {% endif %} {% endfor %} ) ``` `mo_stencil_01_trailer`: ```@FORTRAN #endif CALL wrap_run_mo_stencil_01( & {% for name, value in integration_params %} {{ name }} = {{ value }}, & {% endfor %} ) ``` Pros: Association of fortran arrays to dsl stencil parameters/variables is done with the directives + still fairly simple Issues: What if the dictionary and the templates don't match up? Is it still robust in this case? ##### (Advanced) Specify integration in Python dsl code This variant pushes some of the coupling on the dsl front-end. The dsl directives in the fortran source files would be quite simple: ```@FORTRAN !$DSL start mo_stencil_01 !$ACC PARALLEL LOOP DEFAULT(NONE) GANG VECTOR COLLAPSE(2) ASYNC(1) IF( i_am_accel_node .AND. acc_on ) DO jk = 1, nlev DO jc = i_startidx, i_endidx out(jc,jk,jb) = ... ENDDO ENDDO !$ACC END PARALLEL LOOP !$DSL end mo_stencil_01 ``` But the integration and association of the fortran arrays to dsl stencil parameters/variables would happen in the Python DSL code. Let's say this fortran module with multiple stencils would be translated to a Python class with stencils like so (dsl syntax is more or less made-up here): `nh_diffusion.py`: ```@python # this class corresponds to `mo_nh_diffusion.f90` class NonHydrostaticDiffusion(StencilModule): in_1: Field[Edge, K] in_2: Field[Cell, K] out: Field[Cell, K] # more fields could be declared for this module including temporaries @stencil def stencil_01(): self.out = ... # more stencils could be declared for this module @program def do_nh_diffusion(...): # driver code for main non-hydrostatic diffusion # (corresponds to Fortran driver code, not needed for _fine-grained_ integration) ... ``` This module could be annotated with information such as the associations of: 1. Python class to Fortran source file 2. DSL stencil to DSL stencil directives 3. Fortran array to dsl stencil parameters/variables This could look like this: `nh_diffusion.py`: ```@python from fortran_integration import ( FortranArray, Vertical, Horizontal, NPROMA, stencil_integration ) # this class corresponds to `mo_nh_diffusion.f90` class NonHydrostaticDiffusion(StencilModule): # Here we specify association (1) FortranFile = Path("atm_dyn_iconam") / "mo_nh_diffusion.f90" # Associations (3) are specified like this in_1: Field[Edge, K] = FortranArray( name = "in_1", layout = (Horizontal, Vertical, NPROMA), ) in_2: Field[Cell, K] = FortranArray( name = "in_2", layout = (Horizontal, Vertical, NPROMA), ) out: Field[Cell, K] = FortranArray( name = "out", layout = (Horizontal, Vertical, NPROMA), ) # more fields could be declared for this module including temporaries # Association (2) is specified here: @stencil_integration("mo_stencil_01") @stencil def stencil_01(): self.out = ... # more stencils could be declared for this module @program def do_nh_diffusion(...): # driver code for main non-hydrostatic diffusion # (corresponds to Fortran driver code, not needed for _fine-grained_ integration) ... ``` This _inline_ integration may not be desirable, because it mixes integration associations with the pure dsl stencils & fields. These annotations could also be added externally via _integration stubs_: `nh_diffusion_integration.py`: ```@python from fortran_integration import ( FortranArray, Vertical, Horizontal, NPROMA, stencil_integration ) from nh_diffusion import * @stencil_module_integration(NonHydrostaticDiffusion) class NonHydrostaticDiffusionIntegration: # Here we specify association (1) FortranFile = Path("atm_dyn_iconam") / "mo_nh_diffusion.f90" # Associations (3) are specified like this exner_exfac = FortranArray( name = "p_nh%metrics%exner_exfac", layout = (Horizontal, Vertical, NPROMA), ) in_2 = FortranArray( name = "in_2", layout = (Horizontal, Vertical, NPROMA), ) out = FortranArray( name = "out", layout = (Horizontal, Vertical, NPROMA), ) # more fields could be declared for this module including temporaries # Association (2) is specified here: stencil_01 = stencil_integration("mo_stencil_01") # more stencils could be declared for this module ``` With such an integration stub all information is provided to generate the appropriate header and trailer blocks. Pros: Integration is specified in DSL code. Issues: Requires that the DSL front-end allows to specify integration associations (rather heavy requirement). External integration stub may be awkward to use. # TODO list * Research * Interview people with experience with `pp_ser` * Find people that have experience with * `pp_ser.py` * _Serialbox serialization directives_ * `!$ser ...` directives * Remo (MCH) * Carlos (MCH) * (Will Sawyer (CSCS)) * Interview about integration with gt4py * Some person/write slack * Write a summary of interviews * Research existing libraries * **How to enable fusion of stencils** - across simple if/else driver code - sometimes there are halo exchanges between stencils (nh_diffusion) - across function calls (nested function calls) - within/across loops? - See hand-fusion experiment: https://gitlab.dkrz.de/dsl/icon-cscs/-/merge_requests/3/ - Contrast implementation methods (see above) and make recommondation - Should also support profiling (adding nvptxStartProfiler/nvptxEndProfiler calls) * Naming * Liskov + something (maybe not exclaim, because it's a limited-time project)