Juvix
juvix-project
In this document we outline the Berlin Pipeline. The document is broken up into three core chunks along with a miscellaneous section that outlines helper structures which are integral to the smooth operation of the pipeline.
These four sections are
The fourth section covers extensions that have made it in the codebase that may alter the shape of the first three sections that are written about.
With the Following sections to prop up the understanding of the Model
The full code that backs the proposal can be found here.
The Last section of the document will cover the expected time and how the tasks should be distributed
One may ask why we should even begin the Berlin Pipeline, afterall we currently can get LLVM
, Michelson
, and Plonk
a term to compile. However what this does not take into account is the following properties:
Out of these issues, only issue 2.
is solvable with the current architecture.
This is becuase the pipeline has the following structure:
Starting form toSexp
we get the Pipeline
type which is just an alias for Feedback. Thus no environment could be shared. However, even if we extend the Pipeline
type with Context Information, which context do we share? All the passes up to toHR
are compatabile with the Context.T
type, however at that point core uses its own hash table structure. Thus to properly fix this situation we need a consistent type to thread between the backends.
Further we can see with this structure, that the details of each pass are not expressed. in fact toSexp
hides over 7 passes that we have no way of stopping between unless we reconstruct the pipeline. Hence, the Easy
module explicitly repeats this code. Thus we need some kind of structure that allows us to talk about passes and sub-passes.
The Berlin Pipeline seeks to unify both the type of a step and build up the type of the pipeline itself. Through this structure we can get all 5 of the missing properties and have a principled way of talking about what a pass is. This unification also allows tools to be made around a pass allowing easy creation of a new pass and the placement in the pipeline.
The model of the new pipeline is laid out in the following way:
Pipeline.Step
module and Pipeline.Step.t
will serve as the type any compiler step inhabits.Pipeline.Environment
module, with the Pipeline.Environment.t
being the constructed compiler pipeline itself.Pipeline.Automation
module which has functions to make inhabiting Pipeline.Step
easier.In this model Pipeline.Environment
will give the pipline organizer the tools to clearly lay out a straightforward Pipeline. It will also give the caller of the pipeline the tools needed to introspect, end the pipleine early etc.
The Pipeline pass writer will utilize the Pipeline.Automation
module to create passes for the pipeline, inhabiting Pipeline.Step.t
and finally naming the type giving the pipeline organizer the Pipeline.Step.named
type.
The current Juvix pipeline already has a notion of Meta information through the use of the Feedback
monad. However, the current Feedback
monad conflates Feedback and the result of computation along with not being easily modifiable to add extra meta information.
Thus we extend and break apart the Feedback
monad as such.
The Meta.t
type is filled with the meta information from computation. Namely Feedback.t
and Trace.t
. The Feedback.t
is the actual feedback part of the Feedback
monad, and the Trace.t
is any traces we could have gotten from a piece of computation. The Meta.t
type could be extended with further information if further information should be captured.
Further we have the new concept of ComputationResult.t
stemming from the Failure
and Success
case of the old Feedback
monad. However we have a twist in that if we fail computation, we may get a partial result.
Another import structure in the Berlin pipeline is concept of the circular list
we give this type stub to represent the idea
The CircularList.t
is broken down into two parts. The first is a RecursiveList.t
which represents a list which can recurse. This is very similar to the Sexp.t
concept except instead of holding s-expressions it instead holds arbitrary data.
Using the recursive list, we are able to construct a circular list which has the notion of cycles. The cycle behavior is determined by the recursive_schema
type which states how the contents are recursive on each other. The recursive_schema
type should be refined in future specification.
In this proposal we will be using OCaml code for signatures. OCaml as opposed to Haskell was chosen as it has a notion of modules in language so the specification can be more straightforwardly laid out.
This does mean that for many examples we will have Sexp.t list
instead of List.t Sexp.t
or even [Sexp.t]
.
Further when we convert the OCaml functions to Haskell, we will turn snake_case
into camelCase
.
Since OCaml lacks notions of effects, modules may be given that represent the Haskell effects and Haskell may be given to supliment where the OCaml may be confusing.
The Step Module serves as the base interface for any compiler step. The module is quite minimal, with only a few types along with a small set of operations.
This main type of the module, t
, is simply a function type that takes a structure type that has more than it needs to do any pass called computational_input
and gives back the result of the computation plus Meta
-data.
We say that the pass takes more than it needs because not only is computational_input
composed of the context
and the current_expression
we are trying to compile against:
but also a surrounding environment, which contains information that should be put in the passes working environment.
This surrounding_environment
is composed of Meta
information that is discussed in the [#Computational Meta Data section], along with the name of the current_step
. The usefulness of the name will be discussed in a later section, but this environment should be able to grow without effecting passes which don't touch the extra information.
The Step
Module also has a notion of a named
pass
The named
type simply allows us to attach a name to any given Pipeline Step function. This will serve as the type that Pipeline.Environment.register_step
will deal with. This type should be exported opaquely.
Lastly, the name_pass
function is a public constructor of this type.
Overly non constraining types like the input type to Step.t
should be met with criticism, however in the case of the Step
module it is necessary to avoid the situation of trying to enforce the Pipeline.Environment
's effect type upon the passes.
Since this is a needed separation, the Step
module should not be programmed against directly, but rather through the Pipeline.Automation
module that will be able to make a pass work over a single env_or_sexp
and a Context.t Sexp.t
along with the pass's custom environment.
Now that we have the Pipeline.Step
properly specified for the passes, we can get to the infrastructure this allows the creation of.
First before we get to the Environemnt
in which the majority of the compiler pipeline lives, we must first take a closer look at the surrounding_environment
the pass gets.
We can see the input type of the pass along with the surrounding_environment
it caries. For this proposal the surrounding_environment
code is quite simple, with having meta_information
which contains the Trace.t
and Feedback.t
inside of it. This inclusion just lets Trace and Feedback information be threaded through the passes, as the output type directly has new versions of these values. It also has the current_step_name
, which reflects the current name of the pass running. We will see some application of this in the ##Automation Section.
The important part of this surrounding_environment
is that through the output type of Pipeline.Step.t
, we can have enough data to fill the dynamic parts of the Pipeline.Environment
itself.
The Pipeline Type is defined as follows
It contains a few pieces of information.
The first piece is the information
which contains the same type that Pipeline.Step.t
takes.
This is considered the "Dynamic" data of the pass and serves a few purposes:
computation_input
.Environment.t
directly, by feeding it directly as input.
Automation
become part of the running environment of the pass directly, but we avoid the issue of the passes needing to use anything in Environment.t
that would stop it being ran in it's own environment. This is Crucial as we can't run partial effects with Capabilities.The next few pieces of data relate to the pipeline itself and will be constructed fully from functions that make up the pipeline or wish to manipulate the execution of the pipeline.
The first piece of data in this list, is the registered_pipeline
which is what records the passes.
The type is a CircularList.t
over the Step.named
structure we analyzed previously. The CircularList.t
type was chosen because the recursive nature gives us a few advantages over a normal List of Step.named
:
Testy
with their T.TestCase
We will state more on how registered_pipeline
works when we get to the API function portion.
Lastly for now, we will also have the value stopping_step
, this simply states at which step we should stop running the compiler and return any working data that is left.
Although we only include two fields exclusive to the pipeline environment, the design is in such a way that more fields and behaviors should be easily added without much hassle.
With the environment out of the way, let us consider how one is supposed to use this environment and the functions on the environment.
The pipeline is intended to grow a single compile function, in which we shall call eval
. We do this by registering the Pipeline.step.named
into the environment. After we register all the steps, we can chose if the pipeline stops early, introspect if some pass fails etc. When it comes time to executing the pipeline a run
function will be given that takes the Pipeline.Environment.t
value and starts executing the registered passes, stopping either when a pass fails, the stopping_step
says we should stop at this step, or if we have successfully compiled the given form.
register_step
has the simple job of taking a named pass and added it to the registered_pipeline
type. Note that we take a CircularList.t
over this type to simply make the operation monoidal. An example implementation could be something as simple as.
This will be a core way one interacts with the pipeline
While the above function is useful, it has no way of helping us build groups of steps. For that we have
This function should remind one of the testGroup
function that is used in Testy
tests. It has the same behavior as that, we can then register this function with register_step
.
The next set of functions deal with talking about stopping the pipeline early.
stop_at
, tells the environment to simply stop when we reach the pass with the given name. stop_at_nothing
just resets the stopping behavior to none.
The next function in the environment is the actual eval
function which is responsible for running the code.
The eval
function has a tough job, as that it is responsible for taking the information in the t
type and properly threading the calls to the passes.
As mentioned in the doc comment above the function, the output type is not fully considered. For this proposal we chose the computational_input
type.
This ends the core API of the Pipeline
, more functions will be needed in regards to introspecting the output type, but these are the core functions to properly construct and change the core environment.
This Structure was chosen to maximize the following:
EasyPipeline
notions.With that said there may some questions one may have in regards to the API
This is brought up in regards to why not a more functional approach to composition, indeed the following does not look functional!
This was chosen over the more functional approach given in the idealized pipeline proposal, because it allows us to name each step and stop in the middle of any of them. Where as the more functional approach would leave that structure ambiguous and behave much like a tagless final
system where the constructors fold onto itself leaving no room for interpretation of the bare structure.
Further interest was given in programatically moving passes around to properly see what makes the most sense. The given registration system along with an extension to talk about dependencies in passes would allow this experimentation.
We have seen in the Step section that we have a general consistent data type for a Pipeline Step, however we also discovered that the type is:
Thus we propose the Automation
module to alleviate these issues.
To simplify matters to what a pass cares about we transform the computational_input
form below
into
Notice, how instead of working over a Pipeline.env_or_sexp list
for the current expression we instead worry about the current expression. And we even have a more simplified version that removes the InEnv
potential of the Pipeline.env_or_sexp
in the simplified_pass_argument
. This allows a pass to not worry about if a Sexp.t
is referenced in the environment, and allows all the attention to be had on the transformation if the pass simply does not care.
Further, the entire surrounding_environemnt
is dropped from the signature entirely.
This information is not completely gone, but instead we relegate it to the environment surrounding the pass, rather than a direct input to the pass.
The output of the pass also moves away from returning a working_environment ComputationResult.t
into a more stream-lined type
What changes, that instead of giving back a list of current_expressions
, we instead have this idea of a process_job
to go along with the new input type of taking a single expression.
What this means, is that for any pass which defines a new top level definitions to be consumed, instead of adding it to the current_expression
list, we instead add it to the new_forms
, letting the Automation handle how this commits back to current_expresions
. The current expression stays as the current expression, thus the pass takes a Pipeline.env_or_sexp
and gives back a potentially new Sexp.t
, with any new definitions as new_forms
.
Further we introduce the idea of an UpdateJob
vs a ProcessJob
, meaning that instead of every pass that does not touch the context
just giving back the current context
, we can instead denote, with the ProcessJob
that our pass has no changes to the context itself. The UpdateJob
indicates that we have changed the context, so commit back the changed context.
This also explains the difference between process_job_no_env
and process_job
, since the ProcessJob
can not touch the Context
it ca not change any reference to the name which can be found in Pipeline.env_or_sexp
Lastly, we introduce the concept of staging the newly defined expressions through the stage
type. This can take the form of Current
, FromTopToCurrent
, or Eval
.
Current
would act exactly like adding any new expression to the current_expression
list.FromTopToCurrent
acts almost like Current
but runs all passes that have previous ran to the current position on the new Sexp.t
definedEval
runs the entire compiler on the newly defined Sexp.t
, updating the Context
in the process before we proceed with the current
form.This is best to seen in action in the ###Examples Section
With the input and output types out of the way (pass_argument -> job
), we can now talk about the surrounding environment that was alluded too earlier.
This OCaml module refers to the following Haskell signatures
What is important here is that the has_nev
/HasEnv
effect entails all the data that is passed through surrounding_environment
plus any pass specific data that it may want.
This allows the automation tooling to automatically hook into the environment of the pass to convert the job
into a ComputationResult.t job
for any given environment that a pass may care about.
We can see this signature in action with the automation functions of this module.
This roughly corresponds to the Haskell Signature
We can see the extract
or run
as the OCaml version calls it, takes the environment needed to run the pass, runs it, and gives us back out a ComputationResult.t
over the output type in the effect.
The HasMeta
and HasTrace
effects are needed in practice to support this. Note that we also get an output Monad type. This is needed as the pass may invoke IO actions, and thus we really get back an output_eff
ect from the pass itself.
The applySimplifiedPass
reflects the underlying purpose of the Automation
modules. Namely a pass has become as simple as (Automation.PassArgument -> m Automation.t)
, where we take only what we care about, and return back an effect over the job
type. the goal of this function then becomes the following:
job
/Automation.t
types and give back a Pipeline.WorkingEnvironment
Since the module has enough information via the constraints, this is fairly easy to achieve.
This leads into runSimplifiedPass
, which simply just composes applySimplifiedPass
and extract
/run
. We can see here how these two functions combine and fulfill the signature of Pipeline.Step.t
which we will repast here
The simplify
function is a simplification that allows as pass to not worry if the argument has come from the context or not. This is useful for passes which do simple transformations but are ran after the point in which functions show up in the environment, or any passes before that we know the terms don't show up in the environment.
It would be remiss of us to not include any examples in this demonstration, so let us model a simple pass that separates an ADT sum type declaration from the constructors it may have.
Let our running example be
Let us first try to compile this down like
We can use the new-pipeline like so
deconstructorPass
is the pass itself, that we can directly register. the implementation is held in the function deconstructType
. We show in gory details how precisely this pass works. we search for a certain structure, using the normal Sexp
functions, once it's found we run our pass which we call f
.
f
now constructs two pieces of data, the current form to return, along with any new forms which are defined from this. We construct the extra forms by checking if they are sum types, if they are, then great they are a constructor. We then take the constructors and say that we will compile it Current
ly along with the declaration-type
. Note that if we wanted to compile these ahead of time we could simply replace Current
with Eval
. In this case, it makes more sense to compile them Current
ly instead of ahead of time. As for the current form, this gets transformed into a generic declaration with the arguments in tact, and given back as the current expression. Finally, the FromEvalTopToCurrent
uses the current_step_name
discussed in the ##Pipeline Section, to run the compiler up until the current point and then work with them Current
ly together.
in 25 lines we have an entire new compiler pass! That may seem like a lot of lines, however most of them are simply type conversion to type safe versions in which to do processing on, or shifting one data type to another.
In short the Injection passes are a way to register pipeline passes that run on every step. For both passes that take signular env_or_sexp
through Automation
, and through passes that ought to be run befroe and after the Step.t
type.
To view the issue on inspiration and the state of this extension please read
https://linear.app/anoma/issue/JUVIX-939/injection-passes
The signature laid out to do this work is as follows
The siganture of these functions are
As for us the Env.register*Pass
variants will run on individual s-expresions, and will run before or after the pass itself. When one runs Automation.aplySimplifiedPass
that function will automatically apply the registered functions.
the Env.rgister*PassEntireList
variants will run before or after each step at the eval level. Thus eval will handle running these registered steps before and after each step is called.
The implementation of this requires tweaking where functions are stored, since we want to tweak surroudning_environemnt
to be the following
Since this requires values in Pipeline
to have information about Pipeline.Automation
, we must remove all data declarations in Pipeline.Automation
, and put them in Pipeline
, rexporting them in Pipeline.Automation