# Project Liskov (aka the DSL preprocessor) ###### PR https://github.com/C2SM/icon4py/pull/117 ###### tags: `functional cycle 12` Icon-liskov (preliminary name) is a proposed preprocessor that facilitates integration of **gt4py** code into the **icon** model. The motivation for this project is three-fold: * DWD may not accept the changes made to the dycore required to call the compiled stencil sources * The integration code is essentially boiler plate. Dealing with it is cumbersome and error-prone. * A dsl preprocessor could fuse and unfuse stencils as required for production or debug runs. This is especially worthwhile if code already integrated were to change. Icon-liskov currently only exists as a [rough idea](https://hackmd.io/kxzXdBQfQW-fB-Bl-cFRAw?view#Design-gt4icon), with some design/implementation coniderations and limitations discussed below. The goals of the project are: * Complete research to come up with a suitable set of design/implementation considerations and an idea for a solution * Review proposed solution (inform and discuss with stake-holders) * Implement icon-liskov (if possible, also take care of the complete integration within icon-exclaim/icon-dsl and icon4py) Note: Implementing a *perfect solution* is outside the scope of this project. The proposed implementation should stay small. One consequence of this is that we will have to strike hard compromises and we will most likely not support all use-cases perfectly. ## Research topics ### Learn from `pp_ser.py` of serialbox [Serialbox](https://github.com/GridTools/serialbox) ([doc](https://gridtools.github.io/serialbox/)) also implements a simple preprocessor ([`pp_ser.py`](https://github.com/GridTools/serialbox/blob/master/src/serialbox-python/pp_ser/pp_ser.py), [doc](https://github.com/GridTools/serialbox/blob/master/docs/latex/ppser_doc.pdf)). Serialbox preprocessor directives are Fortran comments of the form `!$ser ...`. However, this preprocessor has been the source of many *pain-points* for its users. The goal is to learn from `pp_ser.py`, avoid mistakes and try to improve so that we can ensure a good user-experience for icon-liskov. * [x] Interview people knowledgeable with `pp_ser.py` and take notes: * [x] Remo Dietlicher ([notes](https://docs.google.com/document/d/1eyjaqdMiSes5fIWeKWaVJeuphjeScACezJyVaGl5cwQ/edit?usp=sharing)) - Ben * [x] Carlos Osuna ([notes](https://docs.google.com/document/d/1eyjaqdMiSes5fIWeKWaVJeuphjeScACezJyVaGl5cwQ/edit?usp=sharing)) - Ben * [x] Will Sawyer ([notes](https://docs.google.com/document/d/1eyjaqdMiSes5fIWeKWaVJeuphjeScACezJyVaGl5cwQ/edit?usp=sharing)) - Sam & Ben * [x] Review `pp_ser.py`([code](https://github.com/GridTools/serialbox/blob/master/src/serialbox-python/pp_ser/pp_ser.py), [doc](https://github.com/GridTools/serialbox/blob/master/docs/latex/ppser_doc.pdf)) - Ben * [x] Incorporate knowledge into design/implementation considerations ### User use-cases Icon-liskov will be relevant for a wide variety of users. Their uses-cases should be taken into consideration and flow into the design/implementation considerations and proposed solution. Some of the users include: * Researchers/model developers * Production weather forecasting * Stencil porting/translating * Performance engineers One common important use-case is the verification of a gt4py stencil against its Fortran/OpenACC variant. The direct verification and possible data serialization (e.g., to disk) are explicitly not part of icon-liskov. However, icon-liskov should be extensible enough to enable verification by another component (see [extensibility](#Extensibility)). Verification in the presence of fusion is also an important aspect. - [ ] Research model developers use-cases (maybe interview model-developer?, e.g., Stephanie?, David?, Anurag?) - [x] Research production nwp use-cases discuss with Carlos ([notes](https://docs.google.com/document/d/1eyjaqdMiSes5fIWeKWaVJeuphjeScACezJyVaGl5cwQ/edit?usp=sharing)) - Ben - [ ] Incorporate into design/implementation considerations ### Research existing libraries/solutions Could we reuse existing libraries? What libraries exist and could they be useful for icon-liskov? #### [Fypp](https://github.com/aradi/fypp) > Fypp is a Python powered preprocessor. It can be used for any programming languages but its primary aim is to offer a Fortran preprocessor, which helps to extend Fortran with condititional compiling and template metaprogramming capabilities. Instead of introducing its own expression syntax, it uses Python expressions in its preprocessor directives, offering the consistency and versatility of Python when formulating metaprogramming tasks. It puts strong emphasis on robustness and on neat integration into developing toolchains. - [x] Research existing libraries - [x] Evaluate suitability ### Support stencil fusion Stencil fusion is important for performance and is directly tied to this project due to the integration of fused or *unfused* stencils. Stencil fusion is an important requirement. However, supporting stencil fusion _perfectly_ is almost impossible, because stencil fusion has to account for Fortran *driver code*. The goal is to find a pragmatic solution that supports stencil fusion and unfusing. A meeting will be arranged with Christoph and Matthias to take difficult fusion use-cases into account and brain-storm practical solutions. - [x] Do brain-storming session with Christoph & Matthias ([notes](https://docs.google.com/document/d/1eyjaqdMiSes5fIWeKWaVJeuphjeScACezJyVaGl5cwQ/edit?usp=sharing)) - Ben & Sam - [x] Incorporate into design/implementation considerationsdesign/implementation considerations ### Ensure integration is in line with gt4py's vision Icon-liskov will implement a way to call gt4py code from Fortran code. We should ensure that we will propose an implementation that doesn't conflict with gt4py's vision regarding such an integration. We should find out whether there exist ideas about this topic in gt4py, who would be suitable to talk to and have a chat about this. Notes taken during this meeting should be incoorporated into the design/implementation considerations. - [x] Find out if there are such visions, possibly who would be suitable to talk to and take notes - Sam & Ben - [x] Possibly incorporate notes into design/implementation considerations ### Extensibility Generally, extensibility is desirable to delegate some responsibilities outside the tool. But an _extensions system_ should strive to not increase the complexity of icon-liskov too much. Icon-liskov is in a unique place to enable instrumentation of stencils (both Fortran/OpenACC and gt4py) and many instrumentation use-cases are an important aspect (e.g., verification). It would be nice, if icon-liskov was extensible to accomodate possible future use-cases. One good example is instrumenting an external profiler (e.g., [NVIDIA's Nsights Compute](https://developer.nvidia.com/nsight-compute)) to compare the performance between Fortran/OpenACC and gt4py stencils. This use-case shouldn't be directly supported by icon-liskov. However, it should be extensible enough to allow adding such support externally (almost like some sort of plugin system). Delegating this outside the tool itself, should also keep the tool smaller and simpler. - [ ] Think about use-cases that would benefit from extensibility (e.g., profiler instrumentation) - [ ] Put use-cases into design/implementation considerations - [ ] Brain-storm solutions for extensibility ### Integration into icon-exclaim/icon-dsl and icon4py The tool also has to find a suitable place within icon's build system. There exist already two preprocessors (serialbox' `pp_ser.py` and CLAW). Hopefully, it won't be too hard to add a third one. But we should look at how the existing preprocessors are integrated and how icon-liskov fits there. Additionally, icon4py's bindings generator might need some adjustments which we should take into consideration. - [ ] Research icon's build-system and how `pp_ser.py` and CLAW are integrated - [ ] Consider how to best integrate icon-liskov into icon-build system - [ ] Incorporate into design/implementation considerations ### Naming The project is currently code-named icon-liskov after [Barbara Liskov](https://en.wikipedia.org/wiki/Barbara_Liskov) who is well known for the [Liskov substitution principle](https://en.wikipedia.org/wiki/Liskov_substitution_principle). This principle is about the exact conditions when substituting one type for another is valid (subtyping). This is also relevant for this project which aims to enable substitution of Fortran/OpenACC stencils with gt4py stencils. However, the name may still change if someone gets a better idea. ## Design/implementation considerations & solution candidates After a research phase, we should try to collect the design/implementation considerations into a separate document. Then, various implementation methods & designs should be brainstormed (related: [previous shaped project](https://hackmd.io/kxzXdBQfQW-fB-Bl-cFRAw?view#Design-gt4icon)). ### Review with stake-holders and get approval We should prepare a presentation and a meeting to review the proposal with stake-holders. It's probably not feasible to go too much into detail. However, we should get approval from various stake-holders to ensure the success of the project. - [x] Review possible candidate solutions, see if there's a strong favorite or if there are multiple similarly good candidates. Make preselection. - [x] Prepare presentation (present candidate solution(s) and contrast pro/con) - [x] Do meeting (presentation & discussions) (currently 7. Dec.) - [x] Incorporate into implementation ## Implementation & Integration Having completed an implementation is also the goal of this project. However, it also needs to be integrated into icon-exclaim/icon-dsl (including its build system) and might require changes to the bindings generation (e.g., creating *before copies* should probably be taken care of by the bindings generator). Unfortunately, the design/implementation considerations and proposed solution will be finished during the project which makes it difficult to estimate how long the implementation & integration will take. Additionally, the full integration is not strictly part of the tool itself. So the focus of the project is to finish the implementation itself. If possible, the full integration can also be started as part of the project, but the implementation takes precedence. - [x] Take some time to think through DSL preprocessor (requirements/design) in isolation. Take notes and share with Ben. - Sam - [x] Start with prototype and iterative development ([PR](https://github.com/C2SM/icon4py/pull/117)) - Sam - [x] Consider possible changes that would be required in the bindings generator - [x] Do second brain-storming session - [x] Review how well design/implementation considerations can be fulfilled # Design/Implementation considerations Unordered list of design/implementation considerations that are important for the success of icon-liskov. Because the implementation should be kept pragmatic and reasonable, it is clear that not all of those can be fulfilled. Nevertheless, they can be used to compare different candidate solutions. This is currently work-in-progress and will be adjusted/restructured as the project moves forward. * Use cases * Researches/Model developers * TBD * Production/operational nwp forecasting * Error resilience * Ensure Fortran array is correctly on GPU/CPU * TBD * Stencil translating/porting * Verification * TBD * Performance engineering * Fusing multiple stencils together * Trying out different ways to fuse stencils * TBD * Implementation * How do we deal with Fortran line-breaks? * TBD * Integration * Into icon build-system * icon4py repository * icon4py bindings generator * TBD * Extensibility * Possible use cases * Profiler instrumentation (`nvtxRangeStart`, `nvtxRangeEnd`) * Serialization of data to disk * TBD * TBD * TBD ### Code Design https://hackmd.io/dZ3wfkIRSGeDStDOrMPlEQ ### TODO ### High priority - Parsing - [x] Improve error handling in `DirectivesSerialiser`. - [x] Parsing of gt4py stencil in order to deduce input/output type for each field in the stencil. - [x] Refactor parser to decouple `CodeGenInput` creation from parsing. - [x] Create more specific exceptions classes. - Code generation - [x] Generate second `DATA CREATE` statement. - [x] Generation of in/out field declarations. - [x] Generation of wrap function call. - [x] Copying of output fields. - [x] Generation of profile call (requires adding command-line flag). - [x] Generation of wrapped function call import statements (requires adding `IMPORT` directive). - [x] Define `IntegrationWriter` class which writes generated code to file. - Testing infrastructure - [x] Added basic parsing and integration tests. - [x] Expand the test suite - [x] Expand/improve `parser` tests - [x] Add `scanner` tests - [x] Add `serialiser` tests - [x] Add `utils` tests - [x] Add `generation` tests - [x] Add `writer` tests - Other - [x] Start writing documentation (README.md) - [x] Static type checking - [x] Add docstrings where necessary - [x] Logging - [x] Test preprocessor on different stencils in mo_nh_diffusion. - [x] Identify supported patterns in the dycore and what is currently not yet supported. ### Future work - Integration into ICON build system. - Supporting stencil fusion. - Generation/modification of `DATA CREATE` statement in Fortran code. - Output field copies in the bindings generator? - Annotation of more parts of the dycore. ### Supported stencil breakdown