Juvix
juvix-project
Juvix as a project has decided upon the Berlin Pipeline as a means of unifying the codebase (see the writeup for motivation). The Berlin also enforces a strict structure on what is a pass. Namely, the data that gets sent between compiler passes must have the following properties:
Only a few representations fit the bill for all four of these properties, and out of those we have chosen S-expressions as the best candidate for this. This document will outline how S-expressions interact with 4.
, covering prior work and what we wish to do with it.
S-expression have a long storied history as programming language, a configuration language, and as a generic serializable format. Thus, what we present here is a tiny subset of total previous work, focusing on the use of s-expressions as a serialization format in OCaml.
The most user prominant use of S-expressions in OCaml, can be found in Jane Street's work. Here is a few places where they use it to serialize data.
Here are some Quotes from the Jane Street talk about how s-expressions have helped the generic process of OCaml's AST and allow more powerful tooling to be made.
So again, this was, I think people elsewhere in the OCaml community realized that this was important, and that it was specifically important to rationalize the tooling story and to get rid of a lot of the technical depth that had built up on top of Camlp4. And I think one of the reasons that we were nowhere near realizing this was important is our own engagement with the tooling at that point was still fairly limited, right? We were still much more in the user stance.
That was an incredibly powerful tool for us, because it meant all sorts of cases where we would otherwise need to do a bunch of boring and error-prone hand-coding of these serializers and de-serializers, we could replace with a system where we just once carefully wrote out a thing that automatically generates those for us. So it’s great, what is there to complain about? We have a metaprogramming facility, we can extend the language in whatever way we want.
Even firms outside of jane street find use of this serialization format
With the previous work demonstrated in OCaml, we can formulate our own serialization and de-serialization of datastructures with the following signature.
class Serializeable a where
toSexp :: a -> Sexp.T
fromSexp :: Sexp.T -> Maybe a
Thus if we had a minimal ADT like:
data Expression = Int Integer
| Add Expression Expression
| Multiply Expression Expression
Then we can write the serialization as follows:
instance Serializable Expression where
toSexp (Int integer) = Sexp.Num integer
toSexp (Add e1 e2) = Sexp.list [Sexp.symbol "Add", toSexp e1, toSexp e2]
toSexp (Multiply e1 e2) = Sexp.list [Sexp.symbol "Multiply", toSexp e1, toSexp e2]
fromSexp (Sexp.Num i) =
Int i
fromSexp (Sexp.List [Sexp.Symbol "Add", e1, e2]) =
Add <$> fromSexp e1 <*> fromSexp e2
fromSexp (Sexp.List [Sexp.Symbol "Multiply", e1, e2]) =
Multiply <$> fromSexp e1 <*> fromSexp e2
fromSexp _ =
Nothing
This is by doing the structure by hand, however for a function like fromSexp
what we may wish to use instead is our Sexp library which can generate toSexp
and fromSexp
for forms like
module Structure where
data Add = Add Sexp.T Sexp.T
data Multiply = Multiply Sexp.T Sexp.T
newtype Int = Int Integer
-- Now we can write
instance Serializable Expression where
fromSexp (Structure.Add e1 e2 <- Sturcture.toAdd) =
Add <$> fromSexp e1 <*> fromSexp e2
fromSexp (Structure.Multiply e1 e2 <- Structure.toMultiply)
Multiply <$> fromSexp e1 <*> fromSexp e2
fromSexp (Structure.Int i <- Structure.toInt) =
Int i
fromSexp _ =
Nothing
In the future, we could write a library like Jane streets to automate this process entirely, thus making this completely automated.
This is, however, not the end of the work that we can do. The given Structure
example, showcases other abilities this format can bring to working on serialized data. Namely the Structure
example shows transforming a shallow layer to and from s-expressions. However what if we could write something like this
data ExpressionSubset
= Add (Sexp.T ExpressionSubset) (Sexp.T ExpressionSubset)
| Multiply (Sexp.T ExpressionSubset) (Sexp.T ExpressionSubset)
deriving Sexp
What this means is that if Sexp.T
can be paramartirized over some primitive, then we can have functions which operate over subsets of the full expression. Like
-- recurse allows the function to work through the arguments
-- of Add and Multiply to have this automatically run on those
linearize
:: ExpressionSubset -> (Sexp.T ExpressionSubset -> Sexp.T ExpressionSubset) -> ExpressionSusbset
linearize (Add (Sexp.Param (Add a b)) y) recurse =
lineraize (Add a (Sexp.Param (Add b y))) recurse
linearize (Add a b) recurse =
Add (recurse a) (recurse b)
lineraize (Multiply ...) recurse =
...
flattenMath :: Sexp.T ExpressionSubset -> Sexp.T ExpressionSubset
flattenMath = Sexp.onParamarizations linearize
toExpressionSubset :: Sexp.T () -> Sexp.T ExpressionSubset
toExpressionSubset = Sexp.convertForms toExpressionSubset
linerizeTransform :: Sexp.T () -> Sexp.T ()
linearizeTransform sexp =
toExpressionSubset sexp
|> flattenMath
|> toUnitSexpression
lineraizeTransform <$> Sexp.parse "(Add (Add 4 (Sub 5 2)) 7)"
-->
Just "(Add 4 (Add (Sub 5 2) 7))"
lineraizeTransform <$> Sexp.parse "(Add (Add 4 (Sub (Add (Add 5 8) 6) 2)) 7)"
-->
Just "(Add 4 (Add (Sub (Add 5 (Add 8 6)) 2) 7))"
With the following Sexpression library additions
module Sexpression where
-- | @toUnitSexpression@, serializes out the parameterization of the S-expression itself
-- Giving back an unparameterized form which is marked by the `Unit` type.
toUnitSexpression :: Serializeable a => Sexp.T a -> Sexp.T ()
-- | @convertForms@ takes an un-parameterized s-expression and tries to run the given
-- transformation. If the function returns a `Just` then the Sexp is converted to
-- Sexp.Param a, otherwise it stays as the origional s-expressions and tries recursively.
convertForms :: (Sexp.T () -> Maybe a) -> Sexp.T () -> Sexp.T a
-- | @onParamrizations@ takes a transformation on the parameterized s-expression
-- and runs it every instance of every paramaretized atom in the given S-expression.
-- This function can not be ran automatically inside the paramaretized atom, and thu
-- a recursive function to continue this process is handed to it.
-- In the end we get back out a new paramaretized S-expression being changed by the
-- transofmration that was passed.
onParamarizations
:: (a -> (Sexp.T a -> Sexp.T b) -> b)
-- ^ The parameterized transformation function.
-- it takes the original parameterization `a`
-- along with a function which recurses on the `Sexp.T a`
-- that it may carry inside the `a` form. to allow nested changes.
-- Finally it produces a new parameterization with the desired
-- modifications preformed on the parameterization.
-> Sexp.T a
-> Sexp.T b
However this kind of transformation needs a Sexp.T
which can take a paramatirzation to work effectively.
Thus what we have seen in this section is three fold:
As the Prior Work has hinted at, Serialization is not just useful for unifying various intereconnected parts of code, it also acts as a great communication mechanism between languages and across the wired. This is less important right this second, however in the future the Context
data structure that is threaded through the Berlin Pipeline will be an in memory LLVM data structure. For when this time comes, S-expressions serve as an easy way to communicate between the code Juvix (The programming language) can store and the code that Haskell will operate and send data to.