Try   HackMD

Proposal: The Berlin Pipeline

tags: Juvix juvix-project

In this document we outline the Berlin Pipeline. The document is broken up into three core chunks along with a miscellaneous section that outlines helper structures which are integral to the smooth operation of the pipeline.

These four sections are

  1. The Berlin Pipeline Step
  2. The Berlin Pipeline Proper
  3. The Berlin Pipeline Automatisierung
  4. The Berlin Pipeline Extensions

The fourth section covers extensions that have made it in the codebase that may alter the shape of the first three sections that are written about.

With the Following sections to prop up the understanding of the Model

  1. Motivation
  2. Das Modell
  3. Computational Meta Data A Plan
  4. Cyclic Lists and Their Uses
  5. A Note about Notation

The full code that backs the proposal can be found here.

The Last section of the document will cover the expected time and how the tasks should be distributed

Motivation

One may ask why we should even begin the Berlin Pipeline, afterall we currently can get LLVM, Michelson, and Plonk a term to compile. However what this does not take into account is the following properties:

  1. Having the ability to stop between any step to see the results of a certain phase
  2. Proper feedback to the compiler through the steps
    • The pass simply returns an either, it does not have the capacity to return the types
  3. A consistent environment to thread through all passes
  4. A consistent environment to compile not just a term, but a list of definitions
  5. A consistent environment between compilation passes.

Out of these issues, only issue 2. is solvable with the current architecture.

This is becuase the pipeline has the following structure:

type Pipeline = Feedback.FeedbackT [] [Char] IO class HasBackend b where type Ty b = ty | ty -> b type Val b = val | val -> b type Err b = e | e -> b stdlibs :: b -> [FilePath] stdlibs _ = [] -- | Parse juvix source code passing a set of libraries explicitly to have them in scope toML' :: [FilePath] -> b -> Text -> Pipeline [(NameSymbol.T, [Types.TopLevel])] toSexp :: b -> [(NameSymbol.T, [Types.TopLevel])] -> Pipeline (Context.T Sexp.T Sexp.T Sexp.T) toHR :: (Show (Ty b), Show (Val b)) => Param.Parameterisation (Ty b) (Val b) -> Context.T Sexp.T Sexp.T Sexp.T -> Pipeline (Core.RawGlobals HR.T (Ty b) (Val b)) toIR :: Core.RawGlobals HR.T (Ty b) (Val b) -> Pipeline (Core.PatternMap Core.GlobalName, Core.RawGlobals IR.T (Ty b) (Val b)) toErased :: Constraints b => Param.Parameterisation (Ty b) (Val b) -> (Core.PatternMap Core.GlobalName, Core.RawGlobals IR.T (Ty b) (Val b)) -> Pipeline (ErasedAnn.AnnTermT (Ty b) (Val b)) typecheck :: Context.T Sexp.T Sexp.T Sexp.T -> Pipeline (ErasedAnn.AnnTermT (Ty b) (Val b)) compile :: FilePath -> ErasedAnn.AnnTermT (Ty b) (Val b) -> Pipeline ()

Starting form toSexp we get the Pipeline type which is just an alias for Feedback. Thus no environment could be shared. However, even if we extend the Pipeline type with Context Information, which context do we share? All the passes up to toHR are compatabile with the Context.T type, however at that point core uses its own hash table structure. Thus to properly fix this situation we need a consistent type to thread between the backends.

Further we can see with this structure, that the details of each pass are not expressed. in fact toSexp hides over 7 passes that we have no way of stopping between unless we reconstruct the pipeline. Hence, the Easy module explicitly repeats this code. Thus we need some kind of structure that allows us to talk about passes and sub-passes.

The Berlin Pipeline seeks to unify both the type of a step and build up the type of the pipeline itself. Through this structure we can get all 5 of the missing properties and have a principled way of talking about what a pass is. This unification also allows tools to be made around a pass allowing easy creation of a new pass and the placement in the pipeline.

Das Modell

The model of the new pipeline is laid out in the following way:

  1. There exists the Berlin Step. This notion will be carried out in the Pipeline.Step module and Pipeline.Step.t will serve as the type any compiler step inhabits.
  2. The Berlin Pipeline Proper refers to the Pipeline.Environment module, with the Pipeline.Environment.t being the constructed compiler pipeline itself.
  3. The Berlin Automatisierung concept refers to the Pipeline.Automation module which has functions to make inhabiting Pipeline.Step easier.

In this model Pipeline.Environment will give the pipline organizer the tools to clearly lay out a straightforward Pipeline. It will also give the caller of the pipeline the tools needed to introspect, end the pipleine early etc.

The Pipeline pass writer will utilize the Pipeline.Automation module to create passes for the pipeline, inhabiting Pipeline.Step.t and finally naming the type giving the pipeline organizer the Pipeline.Step.named type.

Computational Meta Data: a Plan

The current Juvix pipeline already has a notion of Meta information through the use of the Feedback monad. However, the current Feedback monad conflates Feedback and the result of computation along with not being easily modifiable to add extra meta information.

Thus we extend and break apart the Feedback monad as such.

module Meta : sig type t = { feedback : Feedback.t; trace : Trace.t } end module ComputationResult : sig type 'a t = | Success of {meta : Meta.t; result : 'a} | Failure of {meta : Meta.t; partial_feedback : 'a option} end

The Meta.t type is filled with the meta information from computation. Namely Feedback.t and Trace.t. The Feedback.t is the actual feedback part of the Feedback monad, and the Trace.t is any traces we could have gotten from a piece of computation. The Meta.t type could be extended with further information if further information should be captured.

Further we have the new concept of ComputationResult.t stemming from the Failure and Success case of the old Feedback monad. However we have a twist in that if we fail computation, we may get a partial result.

Cyclic Lists and Their Uses

Another import structure in the Berlin pipeline is concept of the circular list

we give this type stub to represent the idea

(** A Recursive list represents the concept of a list with a chance to nest * This allows us to get a tree like structure, * where A pipeline step can contain more steps *) module RecursiveList : sig type 'a t end (** A circular list represents the idea of steps that can be recursive * on each other *) module CircularList : sig (* TODO : add a recursion function to the recursive bit, that tells * it how it's recursive. perhaps it uses a predicate function * on the 'a type to determine what function to go to next. *) type 'a recursive_schema = | Recursive of 'a list | NonRecursive of 'a type 'a t = 'a recursive_schema RecursiveList.t end

The CircularList.t is broken down into two parts. The first is a RecursiveList.t which represents a list which can recurse. This is very similar to the Sexp.t concept except instead of holding s-expressions it instead holds arbitrary data.

Using the recursive list, we are able to construct a circular list which has the notion of cycles. The cycle behavior is determined by the recursive_schema type which states how the contents are recursive on each other. The recursive_schema type should be refined in future specification.

A Note about Notation

In this proposal we will be using OCaml code for signatures. OCaml as opposed to Haskell was chosen as it has a notion of modules in language so the specification can be more straightforwardly laid out.

This does mean that for many examples we will have Sexp.t list instead of List.t Sexp.t or even [Sexp.t].

Further when we convert the OCaml functions to Haskell, we will turn snake_case into camelCase.

Since OCaml lacks notions of effects, modules may be given that represent the Haskell effects and Haskell may be given to supliment where the OCaml may be confusing.

The Berlin Pipeline Step

Purpose and Specification

The Step Module serves as the base interface for any compiler step. The module is quite minimal, with only a few types along with a small set of operations.

This main type of the module, t, is simply a function type that takes a structure type that has more than it needs to do any pass called computational_input and gives back the result of the computation plus Meta-data.

module Step : sig ... type t = computational_input -> working_environment ComputationResult.t ... end

We say that the pass takes more than it needs because not only is computational_input composed of the context and the current_expression we are trying to compile against:

(** [env_or_sexp] expresses data that may be already added to the environment * or are still in a base sexp form that requires no lookup *) type env_or_sexp = | InEnv of NameSymbol.t | Sexp of Sexp.t type computational_input = { language_data : working_environment; ... } type working_environment = { current_expression : env_or_sexp list; context : Sexp.t Context.t }

but also a surrounding environment, which contains information that should be put in the passes working environment.

type computational_input = { language_data : working_environment; surrounding_data : surrounding_environment; } type surrounding_environment = { current_step_name : NameSymbol.t option; meta_information : Meta.t }

This surrounding_environment is composed of Meta information that is discussed in the [#Computational Meta Data section], along with the name of the current_step. The usefulness of the name will be discussed in a later section, but this environment should be able to grow without effecting passes which don't touch the extra information.

The Step Module also has a notion of a named pass

module Step : sig type t = computational_input -> working_environment ComputationResult.t type named = { name : NameSymbol.t; func : t } val name_pass : t -> NameSymbol.t -> named end

The named type simply allows us to attach a name to any given Pipeline Step function. This will serve as the type that Pipeline.Environment.register_step will deal with. This type should be exported opaquely.

Lastly, the name_pass function is a public constructor of this type.

Rational

Overly non constraining types like the input type to Step.t should be met with criticism, however in the case of the Step module it is necessary to avoid the situation of trying to enforce the Pipeline.Environment's effect type upon the passes.

Since this is a needed separation, the Step module should not be programmed against directly, but rather through the Pipeline.Automation module that will be able to make a pass work over a single env_or_sexp and a Context.t Sexp.t along with the pass's custom environment.

The Berlin Pipeline Proper

Now that we have the Pipeline.Step properly specified for the passes, we can get to the infrastructure this allows the creation of.

Specification

Reiteration

First before we get to the Environemnt in which the majority of the compiler pipeline lives, we must first take a closer look at the surrounding_environment the pass gets.

(** [surrounding_environment] serves as the minimum surrounding information * in the environment of the Pipeline Steps. *) type surrounding_environment = { (* This will be taken as a read value *) (** The [current_step_name] represents the current running step name *) current_step_name : NameSymbol.t option; meta_information : Meta.t } (** [computational_input] is the actual data passed into the [Step.t] type. * The [language_data] field serves as the direct data that the direct data * a pass wishes to take, while [surrounding_data] serves as extra environment * constraints that one wants to include. *) type computational_input = { language_data : working_environment; surrounding_data : surrounding_environment; }

We can see the input type of the pass along with the surrounding_environment it caries. For this proposal the surrounding_environmentcode is quite simple, with having meta_information which contains the Trace.t and Feedback.t inside of it. This inclusion just lets Trace and Feedback information be threaded through the passes, as the output type directly has new versions of these values. It also has the current_step_name, which reflects the current name of the pass running. We will see some application of this in the ##Automation Section.

The important part of this surrounding_environment is that through the output type of Pipeline.Step.t, we can have enough data to fill the dynamic parts of the Pipeline.Environment itself.

Pipeline Environment

The Pipeline Type is defined as follows

module Environment : sig type t = { (** [information] serves as the public facing data to the environment * that passes can access *) information : computational_input; (* All other information is made to be exclusive with the Pipeline as * a whole that the Steps do not bother with. * This consists of information like how to deal with Traces between passes * to the entire pass infrastructure itself. *) (** [registered_pipeline] is the pipeline itself. * It features all the steps named and registered with the system *) registered_pipeline : Step.named CircularList.t; stopping_step : NameSymbol.t option } ... end

It contains a few pieces of information.

The first piece is the information which contains the same type that Pipeline.Step.t takes.

This is considered the "Dynamic" data of the pass and serves a few purposes:

  1. It contains enough information to support directly calling any pass.
    • Combining this with the output of the pass we can, with a little glue, directly construct a new computation_input.
  2. It alleviates the Passes of having to use any capability regarding Environment.t directly, by feeding it directly as input.
    • This will, through Automation become part of the running environment of the pass directly, but we avoid the issue of the passes needing to use anything in Environment.t that would stop it being ran in it's own environment. This is Crucial as we can't run partial effects with Capabilities.
  3. It clearly separates the concerns of the Pass, and the Concerns of the Pipeline itself.

The next few pieces of data relate to the pipeline itself and will be constructed fully from functions that make up the pipeline or wish to manipulate the execution of the pipeline.

The first piece of data in this list, is the registered_pipeline which is what records the passes.

The type is a CircularList.t over the Step.named structure we analyzed previously. The CircularList.t type was chosen because the recursive nature gives us a few advantages over a normal List of Step.named:

  1. We get the same behavior as Testy with their T.TestCase
    ​​​testGroup :: TestName -> [TestTree] -> TestTree
  2. If we wish to support mutually recursive passes, then having the ability to explicitly state how these passes are circular is important.

We will state more on how registered_pipeline works when we get to the API function portion.

Lastly for now, we will also have the value stopping_step, this simply states at which step we should stop running the compiler and return any working data that is left.

Although we only include two fields exclusive to the pipeline environment, the design is in such a way that more fields and behaviors should be easily added without much hassle.

  • An example of an inclusion, is the ability to skip passes if a particular backend does not wish to have a pass or if it wants special treatment in some way.
  • Another route is to have Alternative passes that are mutually exclusive but converge upon the same points.

API Functions

With the environment out of the way, let us consider how one is supposed to use this environment and the functions on the environment.

The pipeline is intended to grow a single compile function, in which we shall call eval. We do this by registering the Pipeline.step.named into the environment. After we register all the steps, we can chose if the pipeline stops early, introspect if some pass fails etc. When it comes time to executing the pipeline a run function will be given that takes the Pipeline.Environment.t value and starts executing the registered passes, stopping either when a pass fails, the stopping_step says we should stop at this step, or if we have successfully compiled the given form.

(** [register_step] registers the pipeline function to the environment *) val register_step : Step.named CircularList.t -> unit

register_step has the simple job of taking a named pass and added it to the registered_pipeline type. Note that we take a CircularList.t over this type to simply make the operation monoidal. An example implementation could be something as simple as.

This will be a core way one interacts with the pipeline

registerStep pass = modify @"registeredPipeline" (<> pass)

While the above function is useful, it has no way of helping us build groups of steps. For that we have

(** [def_pipline_group] creates a named group of pipeline steps or a * nested grouping of pipeline steps *) val def_pipline_group : NameSymbol.t -> Step.named CircularList.t list -> Step.named CircularList.t

This function should remind one of the testGroup function that is used in Testy tests. It has the same behavior as that, we can then register this function with register_step.

The next set of functions deal with talking about stopping the pipeline early.

val stop_at : NameSymbol.t -> unit val stop_at_nothing : unit

stop_at, tells the environment to simply stop when we reach the pass with the given name. stop_at_nothing just resets the stopping behavior to none.

The next function in the environment is the actual eval function which is responsible for running the code.

(** [eval] is responsible for taking the environment, running it * to the desired point and giving back what data is left. * The output type should be more carefully considered * and have many operations on it. *) val eval : t -> computational_input

The eval function has a tough job, as that it is responsible for taking the information in the t type and properly threading the calls to the passes.

As mentioned in the doc comment above the function, the output type is not fully considered. For this proposal we chose the computational_input type.

This ends the core API of the Pipeline, more functions will be needed in regards to introspecting the output type, but these are the core functions to properly construct and change the core environment.

The Full API

module Pipeline : sig (** [env_or_sexp] expresses data that may be already added to the environment * or are still in a base sexp form that requires no lookup *) type env_or_sexp = | InEnv of NameSymbol.t | Sexp of Sexp.t (** [working_environment] serves as the input data relevant to any * pipeline step function. This is what is actively being processed * in the pipeline itself. *) type working_environment = { current_expression : env_or_sexp list; context : Sexp.t Context.t } (** [surrounding_environment] serves as the minimum surrounding information * in the environment of the Pipeline Steps. *) type surrounding_environment = { (* This will be taken as a read value *) (** The [current_step_name] represents the current running step name *) current_step_name : NameSymbol.t option; meta_information : Meta.t } (** [computational_input] is the actual data passed into the [Step.t] type. * The [language_data] field serves as the direct data that the direct data * a pass wishes to take, while [surrounding_data] serves as extra environment * constraints that one wants to include. *) type computational_input = { language_data : working_environment; surrounding_data : surrounding_environment; } module Step : sig (** the pipeline step, takes the entire [computational_input] as input. * This is quite a wide type in that it has data that a pass does not * directly want to handle. * Thus it is expected one works with the [Automation] module, to make * fulfilling both the [ComputationResult.t] and the input form bearable. *) (* May promote [working_environment] to [information_record] *) type t = computational_input -> working_environment ComputationResult.t type named = { name : NameSymbol.t; func : t } val name_pass : t -> NameSymbol.t -> named val name_of_pass : named -> NameSymbol.t end module Environment : sig type t = { (** [information] serves as the public facing data to the environment * that passes can access *) information : computational_input; (* All other information is made to be exclusive with the Pipeline as * a whole that the Steps do not bother with. * This consists of information like how to deal with Traces between passes * to the entire pass infrastructure itself. *) (** [registered_pipeline] is the pipeline itself. * It features all the steps named and registered with the system *) registered_pipeline : Step.named CircularList.t; stopping_step : NameSymbol.t option } (** [register_step] registers the pipeline function to the environment *) val register_step : Step.named CircularList.t -> unit (** [def_pipline_group] creates a named group of pipeline steps or a * nested grouping of pipeline steps. *) val def_pipline_group : NameSymbol.t -> Step.named CircularList.t list -> Step.named CircularList.t (** [stop_at] tells the environment to stop at a particular step when * running the environment. *) val stop_at : NameSymbol.t -> unit (** [stop_at_nothing] tells the environment to run the compiler fully. *) val stop_at_nothing : unit (** [eval] is responsible for taking the environment, running it * to the desired point and giving back what data is left. * The output type should be more carefully considered * and have many operations on it. *) val eval : t -> computational_input (** [run] serves as the running point, the unit argument is a placeholder * for the monadic functions used to build up the environment * the [t] is the given pre-built environemnt *) val run : unit -> t -> computational_input val extract : unit -> t end end

Rational

This Structure was chosen to maximize the following:

  1. Find a way to minimize the Haskell Hassle
  2. Give a straight forward way to obsolete any EasyPipeline notions.
  3. Give a flexible architecture which can satisfy our current and future needs.

With that said there may some questions one may have in regards to the API

Why A Registration System?

This is brought up in regards to why not a more functional approach to composition, indeed the following does not look functional!

eval = Pipeline.extract $ do Pipeline.register_step (Pipeline.def_pipeline_group "Desugar" [ Desugar.condToIf , Desugar.ifToCond , ...]) Pipeline.register_step (Pipeline.def_pipeline_group "Contextify" [ Contextify.resolveInfix , ...]) ...

This was chosen over the more functional approach given in the idealized pipeline proposal, because it allows us to name each step and stop in the middle of any of them. Where as the more functional approach would leave that structure ambiguous and behave much like a tagless final system where the constructors fold onto itself leaving no room for interpretation of the bare structure.

Further interest was given in programatically moving passes around to properly see what makes the most sense. The given registration system along with an extension to talk about dependencies in passes would allow this experimentation.

Examples

module EasyPipeline where passNames :: CircularList.t NameSymbol.t passNames = fmap (Step.nameOfPass . Pipeline.registered_pipeline) Compiler.eval prettyPassNames :: IO () prettyPassNames = CircularList.prettyPrintList passNmaes runToCondWithTrace = Pipeline.stopAt "Desugar.if-to-case" -- Extra functions to trace what we care about Trace.enableRecursively ["Pass.cond-to-if"] λ> Pipeline.run runToCond Compiler.eval >>| Pipeline.run-trace

The Berlin Pipeline Automatisierung

We have seen in the Step section that we have a general consistent data type for a Pipeline Step, however we also discovered that the type is:

  • Too large and has matters a pass should not worry about.
  • Has no environment effect, so the pass is expected to be in IO to get information.

Thus we propose the Automation module to alleviate these issues.

Specification

Input and Output Types

To simplify matters to what a pass cares about we transform the computational_input form below

type env_or_sexp = | InEnv of NameSymbol.t | Sexp of Sexp.t type working_environment = { current_expression : env_or_sexp list; context : Sexp.t Context.t } type computational_input = { language_data : working_environment; surrounding_data : surrounding_environment; }

into

type pass_argument = { current : Pipeline.env_or_sexp; context : Sexp.t Context.t } type simplified_pass_argument = { current : Sexp.t; context : Sexp.t Context.t }

Notice, how instead of working over a Pipeline.env_or_sexp list for the current expression we instead worry about the current expression. And we even have a more simplified version that removes the InEnv potential of the Pipeline.env_or_sexp in the simplified_pass_argument. This allows a pass to not worry about if a Sexp.t is referenced in the environment, and allows all the attention to be had on the transformation if the pass simply does not care.

Further, the entire surrounding_environemnt is dropped from the signature entirely.

This information is not completely gone, but instead we relegate it to the environment surrounding the pass, rather than a direct input to the pass.

The output of the pass also moves away from returning a working_environment ComputationResult.t

type working_environment = { current_expression : env_or_sexp list; context : Sexp.t Context.t } module ComputationResult : sig type 'a t = | Success of {meta : Meta.t; result : 'a} | Failure of {meta : Meta.t; partial_feedback : 'a option} end

into a more stream-lined type

type t = | ProcessJob of process_job_no_env | UpdateJob of { new_context : Sexp.t Context.t; process : process_job } type job = t type stage = Current | FromTopToCurrent | Eval type process_job = { current : Pipeline.env_or_sexp; new_forms : (stage * Pipeline.env_or_sexp) list } type process_job_no_env = { current : Sexp.t; new_forms : (stage * Sexp.t) list }

What changes, that instead of giving back a list of current_expressions, we instead have this idea of a process_job to go along with the new input type of taking a single expression.

What this means, is that for any pass which defines a new top level definitions to be consumed, instead of adding it to the current_expression list, we instead add it to the new_forms, letting the Automation handle how this commits back to current_expresions. The current expression stays as the current expression, thus the pass takes a Pipeline.env_or_sexp and gives back a potentially new Sexp.t, with any new definitions as new_forms.

Further we introduce the idea of an UpdateJob vs a ProcessJob, meaning that instead of every pass that does not touch the context just giving back the current context, we can instead denote, with the ProcessJob that our pass has no changes to the context itself. The UpdateJob indicates that we have changed the context, so commit back the changed context.

This also explains the difference between process_job_no_env and process_job, since the ProcessJob can not touch the Context it ca not change any reference to the name which can be found in Pipeline.env_or_sexp

Lastly, we introduce the concept of staging the newly defined expressions through the stage type. This can take the form of Current, FromTopToCurrent, or Eval.

  • Current would act exactly like adding any new expression to the current_expression list.
  • FromTopToCurrent acts almost like Current but runs all passes that have previous ran to the current position on the new Sexp.t defined
  • Eval runs the entire compiler on the newly defined Sexp.t, updating the Context in the process before we proceed with the current form.

This is best to seen in action in the ###Examples Section

Environment Types and Effects

With the input and output types out of the way (pass_argument -> job), we can now talk about the surrounding environment that was alluded too earlier.

(* This represents the monad type that we require for a pass *) module type MonadEff = sig type 'a t type 'a output_eff (* Trace State effect *) val get_trace : unit -> Trace.t t val set_trace : Trace.t -> unit t (* feed_back State effect *) val get_feedback : unit -> Feedback.t t val set_feedback : Feedback.t -> unit t (* This is in all likelihood what is needed to setup the environment * usurping the above effects for actually setting up the pass * we leave the above effects for demonstrative purposes *) val has_env : unit -> Pipeline.surrounding_environment t val set_env : Pipeline.surrounding_environment -> unit t (* Comonad effect *) val run : 'a t -> 'a output_eff end

This OCaml module refers to the following Haskell signatures

type class Functor m => Comonad m where run :: m a -> a type HasTrace m = HasState @"trace" Trace.t type HasFeedback m = HasState @"Feedback" Feedback.t type HasMeta m = (HasFeedback m, HasTrace m) type HasEnv m = (HasMeta m, HasReader @"CurrentStepName" (Maybe NameSymbol.t) m)

What is important here is that the has_nev/HasEnv effect entails all the data that is passed through surrounding_environment plus any pass specific data that it may want.

This allows the automation tooling to automatically hook into the environment of the pass to convert the job into a ComputationResult.t job for any given environment that a pass may care about.

We can see this signature in action with the automation functions of this module.

Automation Abstractions

(* Type class in Haskell that dictates being able to extract to a step *) module Runable : functor (M : MonadEff) -> sig type 'a t = 'a M.t (** [run] runs the given environment, extracting out a result with meta data * over the resulting value. *) val run : 'a t -> 'a ComputationResult.t M.output_eff (** [apply_simplified_pass] serves as a HOF that allows for passes to be * a more simplified type, namely a function that takes a [pass_arugment] * to an effectual result over [job] that determines how the pass should * be brought together. * Note that [Pipeline.computational_input -> Pipeline.working_environment t] * is an approximation of the [Step.t] type without the Meta information attached *) val apply_simplified_pass : (pass_argument -> job t) -> Pipeline.computational_input -> Pipeline.working_environment t (** [run_simplified_pass] simply combines the [run] function with * [apply_simplified_pass], to get the output effect, which corresponds to a * [Pipeline.Step.t] with an effect attached to it *) val run_simplified_pass : (pass_argument -> job M.output_eff) -> Pipeline.computational_input -> Pipeline.working_environment ComputationResult.t M.output_eff (** [simplify] allows a pass to ignore the fact that expression coming in may * be added to the [Context.t] already, and we can act as if it were just a * normal [Sexp.t] being passed in. *) val simplify : (simplified_pass_argument -> job M.output_eff) -> (pass_argument -> job M.output_eff) end

This roughly corresponds to the Haskell Signature

type class HasExtract a m | a -> m where extract :: a x -> m (ComputationResult.t x) data Automation.PassArgument = PassArg { context :: Context.t Sexp.t, current :: Sexp.t } transformOutputType :: Automation.t -> Pipeline.workingEnvironment -- we share the one m monad here for the function -- because we will extract it after we run the m effect applySimplifiedPass :: (HasTrace m, HasMeta m) => (Automation.PassArgument -> m Automation.t) -- These form Pipeline.Step.t for some m over the -- output modulo the ComputationResult.t over it -> Pipeline.ComputationalInput -> m Pipeline.WorkingEnvironment runSimplifiedPass :: (HasExtract _a m) => (Automation.PassArgument -> m Automation.t) -> Pipeline.ComputationalInput -> m (ComputationResult.t Pipeline.WorkingEnvironment) runSimplifiedPass f = do extract . applySimplifiedPass f

We can see the extract or run as the OCaml version calls it, takes the environment needed to run the pass, runs it, and gives us back out a ComputationResult.t over the output type in the effect.

The HasMeta and HasTrace effects are needed in practice to support this. Note that we also get an output Monad type. This is needed as the pass may invoke IO actions, and thus we really get back an output_effect from the pass itself.

The applySimplifiedPass reflects the underlying purpose of the Automation modules. Namely a pass has become as simple as (Automation.PassArgument -> m Automation.t), where we take only what we care about, and return back an effect over the job type. the goal of this function then becomes the following:

  1. Split the surrounding_environment to the env of the pass
  2. Split the list of expressions into one and invoke the pass one or many times
  3. Take a list of job/Automation.t types and give back a Pipeline.WorkingEnvironment

Since the module has enough information via the constraints, this is fairly easy to achieve.

This leads into runSimplifiedPass, which simply just composes applySimplifiedPass and extract/run. We can see here how these two functions combine and fulfill the signature of Pipeline.Step.t which we will repast here

module Step : sig type t = computational_input -> working_environment ComputationResult.t

The simplify function is a simplification that allows as pass to not worry if the argument has come from the context or not. This is useful for passes which do simple transformations but are ran after the point in which functions show up in the environment, or any passes before that we know the terms don't show up in the environment.

Full API

(** the job of [Automation] is to make the step function more amenable to writing passes * It is not ergonomic to take extra information one may not care about *) module Automation : sig type stage = Current | FromTopToCurrent | Eval type process_job = { current : Pipeline.env_or_sexp; new_forms : (stage * Pipeline.env_or_sexp) list } type t = | ProcessJob of process_job_no_env | UpdateJob of { new_context : Sexp.t Context.t; process : process_job } type job = t (** [pass_arguments] is a pipeline processing function, namely we wrap * the current expression and the context into a single function *) type pass_argument = { current : Pipeline.env_or_sexp; context : Sexp.t Context.t } (* This represents the monad type that we require for a pass *) module type MonadEff = sig type 'a t type 'a output_eff (* Trace State effect *) val get_trace : unit -> Trace.t t val set_trace : Trace.t -> unit t (* feed_back State effect *) val get_feedback : unit -> Feedback.t t val set_feedback : Feedback.t -> unit t (* combining the two effects above *) val get_meta : unit -> Meta.t t val set_meta : Meta.t -> unit t (* This is in all likelihood what is needed to setup the environment * usurping the above effects for actually setting up the pass * we leave the above effects for demonstrative purposes *) val has_env : unit -> Pipeline.surrounding_environment t val set_env : Pipeline.surrounding_environment -> unit t (* Comonad effect *) val run : 'a t -> 'a output_eff end val transform_output_type : job list -> Pipeline.working_environment (* Type class in Haskell that dictates being able to extract to a step *) module Runable : functor (M : MonadEff) -> sig type 'a t = 'a M.t (** [run] runs the given environment, extracting out a result with meta data * over the resulting value. *) val run : 'a t -> 'a ComputationResult.t M.output_eff (** [apply_simplified_pass] serves as a HOF that allows for passes to be * a more simplified type, namely a function that takes a [pass_arugment] * to an effectual result over [job] that determines how the pass should * be brought together. * Note that [Pipeline.computational_input -> Pipeline.working_environment t] * is an approximation of the [Step.t] type without the Meta information attached *) val apply_simplified_pass : (pass_argument -> job t) -> Pipeline.computational_input -> Pipeline.working_environment t (** [run_simplified_pass] simply combines the [run] function with * [apply_simplified_pass], to get the output effect, which corresponds to a * [Pipeline.Step.t] with an effect attached to it *) val run_simplified_pass : (pass_argument -> job M.output_eff) -> Pipeline.computational_input -> Pipeline.working_environment ComputationResult.t M.output_eff (** [simplify] allows a pass to ignore the fact that expression coming in may * be added to the [Context.t] already, and we can act as if it were just a * normal [Sexp.t] being passed in. *) val simplify : (simplified_pass_argument -> job M.output_eff) -> (pass_argument -> job M.output_eff) end end

Examples

It would be remiss of us to not include any examples in this demonstration, so let us model a simple pass that separates an ADT sum type declaration from the constructors it may have.

Let our running example be

;; type foo = Bar int int | Baz int int (:type foo () (Bar int int) (Baz int int))

Let us first try to compile this down like

(:declare-type foo ()) (:constructor Bar foo int int) (:constructor Baz foo int int)

We can use the new-pipeline like so

deconstructPass :: Pipeline.Step.Named deconstructPass = Step.name_pass (Automation.runSimplifiedPass deconstructType) "Desugar.deocnstruct-type" deocnstructType Automation.{current} = Trace.with "Desugar.deconstruct-type" [show current] $ Sexp.foldSearchPredWithExtra (Automation.simplify f) (== Structure.typeName) current |> \Sexp.Extra{data, extra} -> Automation.PrcoessJob {current = data, newForms = extra} where f car cdr = Trace.with "Desugar.deconstruct-type-pass" [show car, show cdr] $ case Structure.toType (car Sexp.:> cdr) of Just typ -> let extraData = (typ ^. body) >>= (\case Structure.Sum {name, arguments} -> Structure.Constructor name (typ ^. name) arguments |> pure _ -> [] ) >>| \x -> (Automation.Current, Structure.fromConstructor x) let currentForm = Structure.DeclareType (typ ^. name) (typ ^. arguments) |> Structure.fromDeclareType Structure.Extra {data = currentForm, extra = extraData} Nothing -> -- we get back a Failure on the ComputationResult throw @"failure" (ComputationResult.InvalidForm (car Sexp.:> cdr))

deconstructorPass is the pass itself, that we can directly register. the implementation is held in the function deconstructType. We show in gory details how precisely this pass works. we search for a certain structure, using the normal Sexp functions, once it's found we run our pass which we call f.

f now constructs two pieces of data, the current form to return, along with any new forms which are defined from this. We construct the extra forms by checking if they are sum types, if they are, then great they are a constructor. We then take the constructors and say that we will compile it Currently along with the declaration-type. Note that if we wanted to compile these ahead of time we could simply replace Current with Eval. In this case, it makes more sense to compile them Currently instead of ahead of time. As for the current form, this gets transformed into a generic declaration with the arguments in tact, and given back as the current expression. Finally, the FromEvalTopToCurrent uses the current_step_name discussed in the ##Pipeline Section, to run the compiler up until the current point and then work with them Currently together.

in 25 lines we have an entire new compiler pass! That may seem like a lot of lines, however most of them are simply type conversion to type safe versions in which to do processing on, or shifting one data type to another.

Berlin Pipline: Extensions

Injection Passes

In short the Injection passes are a way to register pipeline passes that run on every step. For both passes that take signular env_or_sexp through Automation, and through passes that ought to be run befroe and after the Step.t type.

To view the issue on inspiration and the state of this extension please read

https://linear.app/anoma/issue/JUVIX-939/injection-passes

The signature laid out to do this work is as follows

The siganture of these functions are

open Pipeline val Env.registerBeforePass : Automation.pass_argument -> m Automation.job val Env.registerAfterPass : Automation.pass_argument -> m Automation.job val Env.register_before_pass_entireList : computational_input -> m computational_output val Env.register_after_pass_entireList : comptuational_input -> m computational_output

As for us the Env.register*Pass variants will run on individual s-expresions, and will run before or after the pass itself. When one runs Automation.aplySimplifiedPass that function will automatically apply the registered functions.

the Env.rgister*PassEntireList variants will run before or after each step at the eval level. Thus eval will handle running these registered steps before and after each step is called.

The implementation of this requires tweaking where functions are stored, since we want to tweak surroudning_environemnt to be the following

type around = Before | After type surrounding_environment = { (* This will be taken as a read value *) (** The [current_step_name] represents the current running step name *) current_step_name : NameSymbol.t option; meta_information : Meta.t; on_single_pass : (around * Automation.pass_argument -> Automation.pass_argument aroundIO) list } type around_env = { trace : Trace.t; feedback : Feedback.t; } type aroundEnvIO a = MinIO {run : Sexp.t (IO.t around_env State.t) Except.t}

Since this requires values in Pipeline to have information about Pipeline.Automation, we must remove all data declarations in Pipeline.Automation, and put them in Pipeline, rexporting them in Pipeline.Automation