Migration plan for new Model

# Migration plan for new Model ###### tags: `archive` * How will current GT4Py code be supported in transition / long term? * How will current and past contributions not lose value? * How do we ensure we are not wasting time on something that will ultimately not work? * How big / small will the discontinuity between DSL versions be? ## Tradeoffs Concentrating all work on new model (speed) vs. putting some effort into supporting and transitioning from old model (safety & UX). DSL: Functional purity in all things (simplicity, predictability) vs. resemblance of other implementation languages (teachability, learning curve, ease of use, ease of porting) Potential of unexplored alternative execution models vs. bias for the current new execution model ## Possible routes ### Incremental changes to the current execution model (no new execution model) Support and collaborations proceed much the same, the execution model does not change fundamentally in one go. The final state of the execution model is open, breaking changes are still likely. #### Notes * Features like composability, halo exchanges etc will still probably require backwards incompatible changes to the model (and front & backends). Each step in the direction of one feature might make the next feature more difficult. * Each big feature likely to require multiple cycles (prototyping plus implementation) * DSL will still face increasing explainability problems, unless breaking changes are introduced. * Team members working on the big tasks will not be available for support & collaborations #### Advantages * Some projects per cycle will be available for collabs and feature requests etc * Breaking changes will come one at a time and should be easy to track #### Disadvantages * The most important remaining bits of functionality have high uncertainty * Progress on the important stuff will be slow * Less people per cycle, one change at a time * Exploration cycles per functionality, implementation restricted by minimizing visible changes * If breaking changes are necessary they will not be predictable beyond one cycle. * For DSL code to track breaking changes, multiple passes of rewriting will likely be necessary. * The shape of each next step will +- dictate how many ressources will be spent on this vs. collaborations & minor features ### Explore alternative new execution model "Scrap" the proposed execution model and design an alternative one with the constraint of minimizing breaking changes during transition from the current execution model. #### Advantages * If successful, the way forward will be clear and with low risk cycles and predictable breaking changes * Trade off between ongoing work on the current system and exploration of the new execution model can be tuned #### Disadvantages * High risk of not being successful within 1-2 cycles * Bias towards ideas behind current proposed model * Many constraints to take into account * Possibly an overconstrained problem (no guarantee for success given more time) ### Transition to proposed execution model Clear up the most important uncertainties about the proposed execution model in 1-2 cycles by expanding the existing prototype and connecting it to DSL / backends, focusing on integration and currently blocked functionality rather than supporting all existing features first. In the meantime continue ongoing developments and / or additionally explore ways around current limitations. After this exploration phase, a subset of existing GT4Py code should run (no performance guarantees). From there we work towards a new DSL ("version 2") which takes full advantage of the proposed execution model to be at least as productive, more explainable and support more functionality compared to the current DSL. This can in turn be more or less iterative to guarantee a reasonable migration path for existing DSL code. #### Notes Exploration early on would encompass: * connecting the prototype to DaCe * creating an unstructured example with the proposed execution model #### Advantages * The size and shape of breaking changes is known or worked out early on * Reduces risk by failing quickly if unsuccessful * The direction is clear * Path to target state is iterative and can be tracked by current users & contributors * Existing infrastructure is used where possible #### Disadvantages * Not the most direct path towards enabling blocked functionality by implementing the proposed model, trading off speed and resources for safety and user experience ### Rewrite from scratch aiming for the proposed execution model Write a gt4py "version 2" from scratch and drop support for the current code if successful. Target the high uncertainty tasks of the rewrite in the beginning to fail fast if unsuccessful. #### Advantages * Most effective use of resources * High certainty of delivering the currently blocked functionality quicker than the other routes * Likely to fail within the first two cycles if unsuccessful #### Disadvantages * Uncertain to be successful within 1-2 cycles * Likely to alienate collaborators and users if goes on longer