# Proposal: The Berlin Pipeline ###### tags: `Juvix` `juvix-project` In this document we outline the Berlin Pipeline. The document is broken up into three core chunks along with a miscellaneous section that outlines helper structures which are integral to the smooth operation of the pipeline. These four sections are 1. The Berlin Pipeline Step 2. The Berlin Pipeline Proper 3. The Berlin Pipeline Automatisierung 4. The Berlin Pipeline Extensions The fourth section covers extensions that have made it in the codebase that may alter the shape of the first three sections that are written about. With the Following sections to prop up the understanding of the Model 0. Motivation 1. Das Modell 2. Computational Meta Data A Plan 3. Cyclic Lists and Their Uses 4. A Note about Notation The full code that backs the proposal can be found [here](https://github.com/anoma/juvix/blob/mariari/pipeline-modeling/experimental/Pipeline/pipeline.mli). The Last section of the document will cover the expected time and how the tasks should be distributed ## Motivation One may ask why we should even begin the Berlin Pipeline, afterall we currently can get `LLVM`, `Michelson`, and `Plonk` a term to compile. However what this does not take into account is the following properties: 1. Having the ability to stop between any step to see the results of a certain phase 2. Proper feedback to the compiler through the steps - The pass simply returns an either, it does not have the capacity to return the types 3. A consistent environment to thread through all passes 4. A consistent environment to compile not just a term, but a list of definitions 5. A consistent environment between compilation passes. Out of these issues, only issue `2.` is solvable with the current architecture. This is becuase the pipeline has the following structure: ```haskell= type Pipeline = Feedback.FeedbackT [] [Char] IO class HasBackend b where type Ty b = ty | ty -> b type Val b = val | val -> b type Err b = e | e -> b stdlibs :: b -> [FilePath] stdlibs _ = [] -- | Parse juvix source code passing a set of libraries explicitly to have them in scope toML' :: [FilePath] -> b -> Text -> Pipeline [(NameSymbol.T, [Types.TopLevel])] toSexp :: b -> [(NameSymbol.T, [Types.TopLevel])] -> Pipeline (Context.T Sexp.T Sexp.T Sexp.T) toHR :: (Show (Ty b), Show (Val b)) => Param.Parameterisation (Ty b) (Val b) -> Context.T Sexp.T Sexp.T Sexp.T -> Pipeline (Core.RawGlobals HR.T (Ty b) (Val b)) toIR :: Core.RawGlobals HR.T (Ty b) (Val b) -> Pipeline (Core.PatternMap Core.GlobalName, Core.RawGlobals IR.T (Ty b) (Val b)) toErased :: Constraints b => Param.Parameterisation (Ty b) (Val b) -> (Core.PatternMap Core.GlobalName, Core.RawGlobals IR.T (Ty b) (Val b)) -> Pipeline (ErasedAnn.AnnTermT (Ty b) (Val b)) typecheck :: Context.T Sexp.T Sexp.T Sexp.T -> Pipeline (ErasedAnn.AnnTermT (Ty b) (Val b)) compile :: FilePath -> ErasedAnn.AnnTermT (Ty b) (Val b) -> Pipeline () ``` Starting form `toSexp` we get the `Pipeline` type which is just an alias for Feedback. Thus no environment could be shared. However, even if we extend the `Pipeline` type with Context Information, which context do we share? All the passes up to `toHR` are compatabile with the `Context.T` type, however at that point core uses its own hash table structure. Thus to properly fix this situation we need a consistent type to thread between the backends. Further we can see with this structure, that the details of each pass are not expressed. in fact `toSexp` hides over 7 passes that we have no way of stopping between unless we reconstruct the pipeline. Hence, the `Easy` module explicitly repeats this code. Thus we need some kind of structure that allows us to talk about passes and sub-passes. The Berlin Pipeline seeks to unify both the type of a step and build up the type of the pipeline itself. Through this structure we can get all 5 of the missing properties and have a principled way of talking about what a pass is. This unification also allows tools to be made around a pass allowing easy creation of a new pass and the placement in the pipeline. ## Das Modell The model of the new pipeline is laid out in the following way: 1. There exists the Berlin Step. This notion will be carried out in the `Pipeline.Step` module and `Pipeline.Step.t` will serve as the type any compiler step inhabits. 2. The Berlin Pipeline Proper refers to the `Pipeline.Environment` module, with the `Pipeline.Environment.t` being the constructed compiler pipeline itself. 3. The Berlin Automatisierung concept refers to the `Pipeline.Automation` module which has functions to make inhabiting `Pipeline.Step` easier. In this model `Pipeline.Environment` will give the pipline organizer the tools to clearly lay out a straightforward Pipeline. It will also give the caller of the pipeline the tools needed to introspect, end the pipleine early etc. The Pipeline pass writer will utilize the `Pipeline.Automation` module to create passes for the pipeline, inhabiting `Pipeline.Step.t` and finally naming the type giving the pipeline organizer the `Pipeline.Step.named` type. ## Computational Meta Data: a Plan The current Juvix pipeline already has a notion of Meta information through the use of the `Feedback` monad. However, the current `Feedback` monad conflates Feedback and the result of computation along with not being easily modifiable to add extra meta information. Thus we extend and break apart the `Feedback` monad as such. ```ocaml= module Meta : sig type t = { feedback : Feedback.t; trace : Trace.t } end module ComputationResult : sig type 'a t = | Success of {meta : Meta.t; result : 'a} | Failure of {meta : Meta.t; partial_feedback : 'a option} end ``` The `Meta.t` type is filled with the meta information from computation. Namely `Feedback.t` and `Trace.t`. The `Feedback.t` is the actual feedback part of the `Feedback` monad, and the `Trace.t` is any traces we could have gotten from a piece of computation. The `Meta.t` type could be extended with further information if further information should be captured. Further we have the new concept of `ComputationResult.t` stemming from the `Failure` and `Success` case of the old `Feedback` monad. However we have a twist in that if we fail computation, we may get a partial result. ## Cyclic Lists and Their Uses Another import structure in the Berlin pipeline is concept of the circular list we give this type stub to represent the idea ```ocaml= (** A Recursive list represents the concept of a list with a chance to nest * This allows us to get a tree like structure, * where A pipeline step can contain more steps *) module RecursiveList : sig type 'a t end (** A circular list represents the idea of steps that can be recursive * on each other *) module CircularList : sig (* TODO : add a recursion function to the recursive bit, that tells * it how it's recursive. perhaps it uses a predicate function * on the 'a type to determine what function to go to next. *) type 'a recursive_schema = | Recursive of 'a list | NonRecursive of 'a type 'a t = 'a recursive_schema RecursiveList.t end ``` The `CircularList.t` is broken down into two parts. The first is a `RecursiveList.t` which represents a list which can recurse. This is very similar to the `Sexp.t` concept except instead of holding s-expressions it instead holds arbitrary data. Using the recursive list, we are able to construct a circular list which has the notion of cycles. The cycle behavior is determined by the `recursive_schema` type which states how the contents are recursive on each other. The `recursive_schema` type should be refined in future specification. ## A Note about Notation In this proposal we will be using OCaml code for signatures. OCaml as opposed to Haskell was chosen as it has a notion of modules in language so the specification can be more straightforwardly laid out. This does mean that for many examples we will have `Sexp.t list` instead of `List.t Sexp.t` or even `[Sexp.t]`. Further when we convert the OCaml functions to Haskell, we will turn `snake_case` into `camelCase`. Since OCaml lacks notions of effects, modules may be given that represent the Haskell effects and Haskell may be given to supliment where the OCaml may be confusing. ## The Berlin Pipeline Step ### Purpose and Specification The Step Module serves as the base interface for any compiler step. The module is quite minimal, with only a few types along with a small set of operations. This main type of the module, `t`, is simply a function type that takes a structure type that has more than it needs to do any pass called `computational_input` and gives back the result of the computation plus `Meta`-data. ```ocaml= module Step : sig ... type t = computational_input -> working_environment ComputationResult.t ... end ``` We say that the pass takes more than it needs because not only is `computational_input` composed of the `context` and the `current_expression` we are trying to compile against: ```ocaml= (** [env_or_sexp] expresses data that may be already added to the environment * or are still in a base sexp form that requires no lookup *) type env_or_sexp = | InEnv of NameSymbol.t | Sexp of Sexp.t type computational_input = { language_data : working_environment; ... } type working_environment = { current_expression : env_or_sexp list; context : Sexp.t Context.t } ``` but also a surrounding environment, which contains information that should be put in the passes working environment. ```ocaml= type computational_input = { language_data : working_environment; surrounding_data : surrounding_environment; } type surrounding_environment = { current_step_name : NameSymbol.t option; meta_information : Meta.t } ``` This `surrounding_environment` is composed of `Meta` information that is discussed in the [#Computational Meta Data section], along with the name of the `current_step`. The usefulness of the name will be discussed in a later section, but this environment should be able to grow without effecting passes which don't touch the extra information. The `Step` Module also has a notion of a `named` pass ```ocaml= module Step : sig type t = computational_input -> working_environment ComputationResult.t type named = { name : NameSymbol.t; func : t } val name_pass : t -> NameSymbol.t -> named end ``` The `named` type simply allows us to attach a name to any given Pipeline Step function. This will serve as the type that `Pipeline.Environment.register_step` will deal with. This type should be exported opaquely. Lastly, the `name_pass` function is a public constructor of this type. ### Rational Overly non constraining types like the input type to `Step.t` should be met with criticism, however in the case of the `Step` module it is necessary to avoid the situation of trying to enforce the `Pipeline.Environment`'s effect type upon the passes. Since this is a needed separation, the `Step` module should not be programmed against directly, but rather through the `Pipeline.Automation` module that will be able to make a pass work over a single `env_or_sexp` and a `Context.t Sexp.t` along with the pass's custom environment. ## The Berlin Pipeline Proper Now that we have the `Pipeline.Step` properly specified for the passes, we can get to the infrastructure this allows the creation of. ### Specification #### Reiteration First before we get to the `Environemnt` in which the majority of the compiler pipeline lives, we must first take a closer look at the `surrounding_environment` the pass gets. ```ocaml= (** [surrounding_environment] serves as the minimum surrounding information * in the environment of the Pipeline Steps. *) type surrounding_environment = { (* This will be taken as a read value *) (** The [current_step_name] represents the current running step name *) current_step_name : NameSymbol.t option; meta_information : Meta.t } (** [computational_input] is the actual data passed into the [Step.t] type. * The [language_data] field serves as the direct data that the direct data * a pass wishes to take, while [surrounding_data] serves as extra environment * constraints that one wants to include. *) type computational_input = { language_data : working_environment; surrounding_data : surrounding_environment; } ``` We can see the input type of the pass along with the `surrounding_environment` it caries. For this proposal the `surrounding_environment`code is quite simple, with having `meta_information` which contains the `Trace.t` and `Feedback.t` inside of it. This inclusion just lets Trace and Feedback information be threaded through the passes, as the output type directly has new versions of these values. It also has the `current_step_name`, which reflects the current name of the pass running. We will see some application of this in the ##Automation Section. The important part of this `surrounding_environment` is that through the output type of `Pipeline.Step.t`, we can have enough data to fill the dynamic parts of the `Pipeline.Environment` itself. #### Pipeline Environment The Pipeline Type is defined as follows ```ocaml= module Environment : sig type t = { (** [information] serves as the public facing data to the environment * that passes can access *) information : computational_input; (* All other information is made to be exclusive with the Pipeline as * a whole that the Steps do not bother with. * This consists of information like how to deal with Traces between passes * to the entire pass infrastructure itself. *) (** [registered_pipeline] is the pipeline itself. * It features all the steps named and registered with the system *) registered_pipeline : Step.named CircularList.t; stopping_step : NameSymbol.t option } ... end ``` It contains a few pieces of information. The first piece is the `information` which contains the same type that `Pipeline.Step.t` takes. This is considered the "Dynamic" data of the pass and serves a few purposes: 1. It contains enough information to support directly calling any pass. - Combining this with the output of the pass we can, with a little glue, directly construct a new `computation_input`. 2. It alleviates the Passes of having to use any capability regarding `Environment.t` directly, by feeding it directly as input. - This will, through `Automation` become part of the running environment of the pass directly, but we avoid the issue of the passes needing to use anything in `Environment.t` that would stop it being ran in it's own environment. This is Crucial as we can't run partial effects with Capabilities. 3. It clearly separates the concerns of the Pass, and the Concerns of the Pipeline itself. The next few pieces of data relate to the pipeline itself and will be constructed fully from functions that make up the pipeline or wish to manipulate the execution of the pipeline. The first piece of data in this list, is the `registered_pipeline` which is what records the passes. The type is a `CircularList.t` over the `Step.named` structure we analyzed previously. The `CircularList.t` type was chosen because the recursive nature gives us a few advantages over a normal List of `Step.named`: 1. We get the same behavior as `Testy` with their `T.TestCase` ```haskell= testGroup :: TestName -> [TestTree] -> TestTree ``` 2. If we wish to support mutually recursive passes, then having the ability to explicitly state how these passes are circular is important. We will state more on how `registered_pipeline` works when we get to the API function portion. Lastly for now, we will also have the value `stopping_step`, this simply states at which step we should stop running the compiler and return any working data that is left. Although we only include two fields exclusive to the pipeline environment, the design is in such a way that more fields and behaviors should be easily added without much hassle. - An example of an inclusion, is the ability to skip passes if a particular backend does not wish to have a pass or if it wants special treatment in some way. - Another route is to have Alternative passes that are mutually exclusive but converge upon the same points. #### API Functions With the environment out of the way, let us consider how one is supposed to use this environment and the functions on the environment. The pipeline is intended to grow a single compile function, in which we shall call `eval`. We do this by registering the `Pipeline.step.named` into the environment. After we register all the steps, we can chose if the pipeline stops early, introspect if some pass fails etc. When it comes time to executing the pipeline a `run` function will be given that takes the `Pipeline.Environment.t` value and starts executing the registered passes, stopping either when a pass fails, the `stopping_step` says we should stop at this step, or if we have successfully compiled the given form. ```ocaml= (** [register_step] registers the pipeline function to the environment *) val register_step : Step.named CircularList.t -> unit ``` `register_step` has the simple job of taking a named pass and added it to the `registered_pipeline` type. Note that we take a `CircularList.t` over this type to simply make the operation monoidal. An example implementation could be something as simple as. This will be a core way one interacts with the pipeline ```haskell= registerStep pass = modify @"registeredPipeline" (<> pass) ``` While the above function is useful, it has no way of helping us build groups of steps. For that we have ```ocaml= (** [def_pipline_group] creates a named group of pipeline steps or a * nested grouping of pipeline steps *) val def_pipline_group : NameSymbol.t -> Step.named CircularList.t list -> Step.named CircularList.t ``` This function should remind one of the `testGroup` function that is used in `Testy` tests. It has the same behavior as that, we can then register this function with `register_step`. The next set of functions deal with talking about stopping the pipeline early. ```ocaml= val stop_at : NameSymbol.t -> unit val stop_at_nothing : unit ``` `stop_at`, tells the environment to simply stop when we reach the pass with the given name. `stop_at_nothing` just resets the stopping behavior to none. The next function in the environment is the actual `eval` function which is responsible for running the code. ```ocaml= (** [eval] is responsible for taking the environment, running it * to the desired point and giving back what data is left. * The output type should be more carefully considered * and have many operations on it. *) val eval : t -> computational_input ``` The `eval` function has a tough job, as that it is responsible for taking the information in the `t` type and properly threading the calls to the passes. As mentioned in the doc comment above the function, the output type is not fully considered. For this proposal we chose the `computational_input` type. This ends the core API of the `Pipeline`, more functions will be needed in regards to introspecting the output type, but these are the core functions to properly construct and change the core environment. #### The Full API ```ocaml= module Pipeline : sig (** [env_or_sexp] expresses data that may be already added to the environment * or are still in a base sexp form that requires no lookup *) type env_or_sexp = | InEnv of NameSymbol.t | Sexp of Sexp.t (** [working_environment] serves as the input data relevant to any * pipeline step function. This is what is actively being processed * in the pipeline itself. *) type working_environment = { current_expression : env_or_sexp list; context : Sexp.t Context.t } (** [surrounding_environment] serves as the minimum surrounding information * in the environment of the Pipeline Steps. *) type surrounding_environment = { (* This will be taken as a read value *) (** The [current_step_name] represents the current running step name *) current_step_name : NameSymbol.t option; meta_information : Meta.t } (** [computational_input] is the actual data passed into the [Step.t] type. * The [language_data] field serves as the direct data that the direct data * a pass wishes to take, while [surrounding_data] serves as extra environment * constraints that one wants to include. *) type computational_input = { language_data : working_environment; surrounding_data : surrounding_environment; } module Step : sig (** the pipeline step, takes the entire [computational_input] as input. * This is quite a wide type in that it has data that a pass does not * directly want to handle. * Thus it is expected one works with the [Automation] module, to make * fulfilling both the [ComputationResult.t] and the input form bearable. *) (* May promote [working_environment] to [information_record] *) type t = computational_input -> working_environment ComputationResult.t type named = { name : NameSymbol.t; func : t } val name_pass : t -> NameSymbol.t -> named val name_of_pass : named -> NameSymbol.t end module Environment : sig type t = { (** [information] serves as the public facing data to the environment * that passes can access *) information : computational_input; (* All other information is made to be exclusive with the Pipeline as * a whole that the Steps do not bother with. * This consists of information like how to deal with Traces between passes * to the entire pass infrastructure itself. *) (** [registered_pipeline] is the pipeline itself. * It features all the steps named and registered with the system *) registered_pipeline : Step.named CircularList.t; stopping_step : NameSymbol.t option } (** [register_step] registers the pipeline function to the environment *) val register_step : Step.named CircularList.t -> unit (** [def_pipline_group] creates a named group of pipeline steps or a * nested grouping of pipeline steps. *) val def_pipline_group : NameSymbol.t -> Step.named CircularList.t list -> Step.named CircularList.t (** [stop_at] tells the environment to stop at a particular step when * running the environment. *) val stop_at : NameSymbol.t -> unit (** [stop_at_nothing] tells the environment to run the compiler fully. *) val stop_at_nothing : unit (** [eval] is responsible for taking the environment, running it * to the desired point and giving back what data is left. * The output type should be more carefully considered * and have many operations on it. *) val eval : t -> computational_input (** [run] serves as the running point, the unit argument is a placeholder * for the monadic functions used to build up the environment * the [t] is the given pre-built environemnt *) val run : unit -> t -> computational_input val extract : unit -> t end end ``` ### Rational This Structure was chosen to maximize the following: 1. Find a way to minimize the Haskell Hassle 2. Give a straight forward way to obsolete any `EasyPipeline` notions. 3. Give a flexible architecture which can satisfy our current and future needs. With that said there may some questions one may have in regards to the API #### Why A Registration System? This is brought up in regards to why not a more functional approach to composition, indeed the following does not look functional! ```haskell= eval = Pipeline.extract $ do Pipeline.register_step (Pipeline.def_pipeline_group "Desugar" [ Desugar.condToIf , Desugar.ifToCond , ...]) Pipeline.register_step (Pipeline.def_pipeline_group "Contextify" [ Contextify.resolveInfix , ...]) ... ``` This was chosen over the more functional [approach given in the idealized pipeline proposal](https://heliaxdev.github.io/juvix-docs/docs/overview/pipeline/idealized/), because it allows us to name each step and stop in the middle of any of them. Where as the more functional approach would leave that structure ambiguous and behave much like a `tagless final` system where the constructors fold onto itself leaving no room for interpretation of the bare structure. Further interest was given in programatically moving passes around to properly see what makes the most sense. The given registration system along with an extension to talk about dependencies in passes would allow this experimentation. ### Examples ```haskell= module EasyPipeline where passNames :: CircularList.t NameSymbol.t passNames = fmap (Step.nameOfPass . Pipeline.registered_pipeline) Compiler.eval prettyPassNames :: IO () prettyPassNames = CircularList.prettyPrintList passNmaes runToCondWithTrace = Pipeline.stopAt "Desugar.if-to-case" -- Extra functions to trace what we care about Trace.enableRecursively ["Pass.cond-to-if"] λ> Pipeline.run runToCond Compiler.eval >>| Pipeline.run-trace ``` ## The Berlin Pipeline Automatisierung We have seen in the Step section that we have a general consistent data type for a Pipeline Step, however we also discovered that the type is: - Too large and has matters a pass should not worry about. - Has no environment effect, so the pass is expected to be in IO to get information. Thus we propose the `Automation` module to alleviate these issues. ### Specification #### Input and Output Types To simplify matters to what a pass cares about we transform the `computational_input` form below ```ocaml= type env_or_sexp = | InEnv of NameSymbol.t | Sexp of Sexp.t type working_environment = { current_expression : env_or_sexp list; context : Sexp.t Context.t } type computational_input = { language_data : working_environment; surrounding_data : surrounding_environment; } ``` into ```ocaml= type pass_argument = { current : Pipeline.env_or_sexp; context : Sexp.t Context.t } type simplified_pass_argument = { current : Sexp.t; context : Sexp.t Context.t } ``` Notice, how instead of working over a `Pipeline.env_or_sexp list` for the current expression we instead worry about the current expression. And we even have a more simplified version that removes the `InEnv` potential of the `Pipeline.env_or_sexp` in the `simplified_pass_argument`. This allows a pass to not worry about if a `Sexp.t` is referenced in the environment, and allows all the attention to be had on the transformation if the pass simply does not care. Further, the entire `surrounding_environemnt` is dropped from the signature entirely. This information is not completely gone, but instead we relegate it to the environment surrounding the pass, rather than a direct input to the pass. The output of the pass also moves away from returning a `working_environment ComputationResult.t` ```ocaml= type working_environment = { current_expression : env_or_sexp list; context : Sexp.t Context.t } module ComputationResult : sig type 'a t = | Success of {meta : Meta.t; result : 'a} | Failure of {meta : Meta.t; partial_feedback : 'a option} end ``` into a more stream-lined type ```ocaml= type t = | ProcessJob of process_job_no_env | UpdateJob of { new_context : Sexp.t Context.t; process : process_job } type job = t type stage = Current | FromTopToCurrent | Eval type process_job = { current : Pipeline.env_or_sexp; new_forms : (stage * Pipeline.env_or_sexp) list } type process_job_no_env = { current : Sexp.t; new_forms : (stage * Sexp.t) list } ``` What changes, that instead of giving back a list of `current_expressions`, we instead have this idea of a `process_job` to go along with the new input type of taking a single expression. What this means, is that for any pass which defines a new top level definitions to be consumed, instead of adding it to the `current_expression` list, we instead add it to the `new_forms`, letting the Automation handle how this commits back to `current_expresions`. The current expression stays as the current expression, thus the pass takes a `Pipeline.env_or_sexp` and gives back a potentially new `Sexp.t`, with any new definitions as `new_forms`. Further we introduce the idea of an `UpdateJob` vs a `ProcessJob`, meaning that instead of every pass that does not touch the `context` just giving back the current `context`, we can instead denote, with the `ProcessJob` that our pass has no changes to the context itself. The `UpdateJob` indicates that we have changed the context, so commit back the changed context. This also explains the difference between `process_job_no_env` and `process_job`, since the `ProcessJob` can not touch the `Context` it ca not change any reference to the name which can be found in `Pipeline.env_or_sexp` Lastly, we introduce the concept of staging the newly defined expressions through the `stage` type. This can take the form of `Current`, `FromTopToCurrent`, or `Eval`. - `Current` would act exactly like adding any new expression to the `current_expression` list. - `FromTopToCurrent` acts almost like `Current` but runs all passes that have previous ran to the current position on the new `Sexp.t` defined - `Eval` runs the entire compiler on the newly defined `Sexp.t`, updating the `Context` in the process before we proceed with the `current` form. This is best to seen in action in the ###Examples Section #### Environment Types and Effects With the input and output types out of the way (`pass_argument -> job`), we can now talk about the surrounding environment that was alluded too earlier. ```ocaml= (* This represents the monad type that we require for a pass *) module type MonadEff = sig type 'a t type 'a output_eff (* Trace State effect *) val get_trace : unit -> Trace.t t val set_trace : Trace.t -> unit t (* feed_back State effect *) val get_feedback : unit -> Feedback.t t val set_feedback : Feedback.t -> unit t (* This is in all likelihood what is needed to setup the environment * usurping the above effects for actually setting up the pass * we leave the above effects for demonstrative purposes *) val has_env : unit -> Pipeline.surrounding_environment t val set_env : Pipeline.surrounding_environment -> unit t (* Comonad effect *) val run : 'a t -> 'a output_eff end ``` This OCaml module refers to the following Haskell signatures ```haskell= type class Functor m => Comonad m where run :: m a -> a type HasTrace m = HasState @"trace" Trace.t type HasFeedback m = HasState @"Feedback" Feedback.t type HasMeta m = (HasFeedback m, HasTrace m) type HasEnv m = (HasMeta m, HasReader @"CurrentStepName" (Maybe NameSymbol.t) m) ``` What is important here is that the `has_nev`/`HasEnv` effect entails all the data that is passed through `surrounding_environment` plus any pass specific data that it may want. This allows the automation tooling to automatically hook into the environment of the pass to convert the `job` into a `ComputationResult.t job` for any given environment that a pass may care about. We can see this signature in action with the automation functions of this module. #### Automation Abstractions ```ocaml= (* Type class in Haskell that dictates being able to extract to a step *) module Runable : functor (M : MonadEff) -> sig type 'a t = 'a M.t (** [run] runs the given environment, extracting out a result with meta data * over the resulting value. *) val run : 'a t -> 'a ComputationResult.t M.output_eff (** [apply_simplified_pass] serves as a HOF that allows for passes to be * a more simplified type, namely a function that takes a [pass_arugment] * to an effectual result over [job] that determines how the pass should * be brought together. * Note that [Pipeline.computational_input -> Pipeline.working_environment t] * is an approximation of the [Step.t] type without the Meta information attached *) val apply_simplified_pass : (pass_argument -> job t) -> Pipeline.computational_input -> Pipeline.working_environment t (** [run_simplified_pass] simply combines the [run] function with * [apply_simplified_pass], to get the output effect, which corresponds to a * [Pipeline.Step.t] with an effect attached to it *) val run_simplified_pass : (pass_argument -> job M.output_eff) -> Pipeline.computational_input -> Pipeline.working_environment ComputationResult.t M.output_eff (** [simplify] allows a pass to ignore the fact that expression coming in may * be added to the [Context.t] already, and we can act as if it were just a * normal [Sexp.t] being passed in. *) val simplify : (simplified_pass_argument -> job M.output_eff) -> (pass_argument -> job M.output_eff) end ``` This roughly corresponds to the Haskell Signature ```haskell= type class HasExtract a m | a -> m where extract :: a x -> m (ComputationResult.t x) data Automation.PassArgument = PassArg { context :: Context.t Sexp.t, current :: Sexp.t } transformOutputType :: Automation.t -> Pipeline.workingEnvironment -- we share the one m monad here for the function -- because we will extract it after we run the m effect applySimplifiedPass :: (HasTrace m, HasMeta m) => (Automation.PassArgument -> m Automation.t) -- These form Pipeline.Step.t for some m over the -- output modulo the ComputationResult.t over it -> Pipeline.ComputationalInput -> m Pipeline.WorkingEnvironment runSimplifiedPass :: (HasExtract _a m) => (Automation.PassArgument -> m Automation.t) -> Pipeline.ComputationalInput -> m (ComputationResult.t Pipeline.WorkingEnvironment) runSimplifiedPass f = do extract . applySimplifiedPass f ``` We can see the `extract` or `run` as the OCaml version calls it, takes the environment needed to run the pass, runs it, and gives us back out a `ComputationResult.t` over the output type in the effect. The `HasMeta` and `HasTrace` effects are needed in practice to support this. Note that we also get an output Monad type. This is needed as the pass may invoke IO actions, and thus we really get back an `output_eff`ect from the pass itself. The `applySimplifiedPass` reflects the underlying purpose of the `Automation` modules. Namely a pass has become as simple as `(Automation.PassArgument -> m Automation.t)`, where we take only what we care about, and return back an effect over the `job` type. the goal of this function then becomes the following: 1. Split the surrounding_environment to the env of the pass 2. Split the list of expressions into one and invoke the pass one or many times 3. Take a list of `job`/`Automation.t` types and give back a `Pipeline.WorkingEnvironment` Since the module has enough information via the constraints, this is fairly easy to achieve. This leads into `runSimplifiedPass`, which simply just composes `applySimplifiedPass` and `extract`/`run`. We can see here how these two functions combine and fulfill the signature of `Pipeline.Step.t` which we will repast here ```ocaml= module Step : sig type t = computational_input -> working_environment ComputationResult.t ``` The `simplify` function is a simplification that allows as pass to not worry if the argument has come from the context or not. This is useful for passes which do simple transformations but are ran after the point in which functions show up in the environment, or any passes before that we know the terms don't show up in the environment. #### Full API ```ocaml= (** the job of [Automation] is to make the step function more amenable to writing passes * It is not ergonomic to take extra information one may not care about *) module Automation : sig type stage = Current | FromTopToCurrent | Eval type process_job = { current : Pipeline.env_or_sexp; new_forms : (stage * Pipeline.env_or_sexp) list } type t = | ProcessJob of process_job_no_env | UpdateJob of { new_context : Sexp.t Context.t; process : process_job } type job = t (** [pass_arguments] is a pipeline processing function, namely we wrap * the current expression and the context into a single function *) type pass_argument = { current : Pipeline.env_or_sexp; context : Sexp.t Context.t } (* This represents the monad type that we require for a pass *) module type MonadEff = sig type 'a t type 'a output_eff (* Trace State effect *) val get_trace : unit -> Trace.t t val set_trace : Trace.t -> unit t (* feed_back State effect *) val get_feedback : unit -> Feedback.t t val set_feedback : Feedback.t -> unit t (* combining the two effects above *) val get_meta : unit -> Meta.t t val set_meta : Meta.t -> unit t (* This is in all likelihood what is needed to setup the environment * usurping the above effects for actually setting up the pass * we leave the above effects for demonstrative purposes *) val has_env : unit -> Pipeline.surrounding_environment t val set_env : Pipeline.surrounding_environment -> unit t (* Comonad effect *) val run : 'a t -> 'a output_eff end val transform_output_type : job list -> Pipeline.working_environment (* Type class in Haskell that dictates being able to extract to a step *) module Runable : functor (M : MonadEff) -> sig type 'a t = 'a M.t (** [run] runs the given environment, extracting out a result with meta data * over the resulting value. *) val run : 'a t -> 'a ComputationResult.t M.output_eff (** [apply_simplified_pass] serves as a HOF that allows for passes to be * a more simplified type, namely a function that takes a [pass_arugment] * to an effectual result over [job] that determines how the pass should * be brought together. * Note that [Pipeline.computational_input -> Pipeline.working_environment t] * is an approximation of the [Step.t] type without the Meta information attached *) val apply_simplified_pass : (pass_argument -> job t) -> Pipeline.computational_input -> Pipeline.working_environment t (** [run_simplified_pass] simply combines the [run] function with * [apply_simplified_pass], to get the output effect, which corresponds to a * [Pipeline.Step.t] with an effect attached to it *) val run_simplified_pass : (pass_argument -> job M.output_eff) -> Pipeline.computational_input -> Pipeline.working_environment ComputationResult.t M.output_eff (** [simplify] allows a pass to ignore the fact that expression coming in may * be added to the [Context.t] already, and we can act as if it were just a * normal [Sexp.t] being passed in. *) val simplify : (simplified_pass_argument -> job M.output_eff) -> (pass_argument -> job M.output_eff) end end ``` ### Examples It would be remiss of us to not include any examples in this demonstration, so let us model a simple pass that separates an ADT sum type declaration from the constructors it may have. Let our running example be ```lisp= ;; type foo = Bar int int | Baz int int (:type foo () (Bar int int) (Baz int int)) ``` Let us first try to compile this down like ```lisp= (:declare-type foo ()) (:constructor Bar foo int int) (:constructor Baz foo int int) ``` We can use the new-pipeline like so ```haskell= deconstructPass :: Pipeline.Step.Named deconstructPass = Step.name_pass (Automation.runSimplifiedPass deconstructType) "Desugar.deocnstruct-type" deocnstructType Automation.{current} = Trace.with "Desugar.deconstruct-type" [show current] $ Sexp.foldSearchPredWithExtra (Automation.simplify f) (== Structure.typeName) current |> \Sexp.Extra{data, extra} -> Automation.PrcoessJob {current = data, newForms = extra} where f car cdr = Trace.with "Desugar.deconstruct-type-pass" [show car, show cdr] $ case Structure.toType (car Sexp.:> cdr) of Just typ -> let extraData = (typ ^. body) >>= (\case Structure.Sum {name, arguments} -> Structure.Constructor name (typ ^. name) arguments |> pure _ -> [] ) >>| \x -> (Automation.Current, Structure.fromConstructor x) let currentForm = Structure.DeclareType (typ ^. name) (typ ^. arguments) |> Structure.fromDeclareType Structure.Extra {data = currentForm, extra = extraData} Nothing -> -- we get back a Failure on the ComputationResult throw @"failure" (ComputationResult.InvalidForm (car Sexp.:> cdr)) ``` `deconstructorPass` is the pass itself, that we can directly register. the implementation is held in the function `deconstructType`. We show in gory details how precisely this pass works. we search for a certain structure, using the normal `Sexp` functions, once it's found we run our pass which we call `f`. `f` now constructs two pieces of data, the current form to return, along with any new forms which are defined from this. We construct the extra forms by checking if they are sum types, if they are, then great they are a constructor. We then take the constructors and say that we will compile it `Current`ly along with the `declaration-type`. Note that if we wanted to compile these ahead of time we could simply replace `Current` with `Eval`. In this case, it makes more sense to compile them `Current`ly instead of ahead of time. As for the current form, this gets transformed into a generic declaration with the arguments in tact, and given back as the current expression. Finally, the `FromEvalTopToCurrent` uses the `current_step_name` discussed in the ##Pipeline Section, to run the compiler up until the current point and then work with them `Current`ly together. in 25 lines we have an entire new compiler pass! That may seem like a lot of lines, however most of them are simply type conversion to type safe versions in which to do processing on, or shifting one data type to another. ## Berlin Pipline: Extensions ### Injection Passes In short the Injection passes are a way to register pipeline passes that run on every step. For both passes that take signular `env_or_sexp` through `Automation`, and through passes that ought to be run befroe and after the `Step.t` type. To view the issue on inspiration and the state of this extension please read https://linear.app/anoma/issue/JUVIX-939/injection-passes The signature laid out to do this work is as follows The siganture of these functions are ```ocaml= open Pipeline val Env.registerBeforePass : Automation.pass_argument -> m Automation.job val Env.registerAfterPass : Automation.pass_argument -> m Automation.job val Env.register_before_pass_entireList : computational_input -> m computational_output val Env.register_after_pass_entireList : comptuational_input -> m computational_output ``` As for us the `Env.register*Pass` variants will run on individual s-expresions, and will run before or after the pass itself. When one runs `Automation.aplySimplifiedPass` that function will automatically apply the registered functions. the `Env.rgister*PassEntireList` variants will run before or after each step at the eval level. Thus eval will handle running these registered steps before and after each step is called. The implementation of this requires tweaking where functions are stored, since we want to tweak `surroudning_environemnt` to be the following ```ocaml= type around = Before | After type surrounding_environment = { (* This will be taken as a read value *) (** The [current_step_name] represents the current running step name *) current_step_name : NameSymbol.t option; meta_information : Meta.t; on_single_pass : (around * Automation.pass_argument -> Automation.pass_argument aroundIO) list } type around_env = { trace : Trace.t; feedback : Feedback.t; } type aroundEnvIO a = MinIO {run : Sexp.t (IO.t around_env State.t) Except.t} ``` Since this requires values in `Pipeline` to have information about `Pipeline.Automation`, we must remove all data declarations in `Pipeline.Automation`, and put them in `Pipeline`, rexporting them in `Pipeline.Automation` <!-- ## A Breakdown of Tasks | Part | Person | Time | |:------------------------------ |:-------- |:----:| | Library Creation | April | 2H | | Refactoring Feedback | ???? | 4H | | Circular List | Terrence | 7H | | Meta Setup | ???? | 3H | | Berlin Step | April | 3H | | Berlin Pipeline Proper | ???? | 20H | | Berlin Pipeline Automat | ???? | 10H | | Refactoring Frontend to Step | ???? | 3H | | Refactoring Core to Step | ???? | 7H | | Refactoring Pipeline to Berlin | ???? | 15H | -->