Try   HackMD
tags: S-Juvix Juvix juvix-meeting-notes

The Why of S-Juvix

Juvix as a language is type theoretic in nature. In particular, it
wishes to leverage dependent type theory to make safe and fast
validity-predicates. In order to properly achieve this goal, it is
important to make a high level language that is fit for the domain. Thus
the following properties become of the utmost importance for Juvix.

  1. Correctness
  2. Speed
  3. Ergonomic Contract Code

Points 1. and 2. are mostly covered by the dependent type theory,
with parts of 2. being an implementation detail of the QTT logic
with practical compromises in the IR to allow impure data structures.
Even point 3. is somewhat satisfied if we ignore the boilerplate that comes
in trying to use Haskell/Idris style abstractions when it comes to
validity predicates. (See the link as to where this falls apart quite a
bit)

However, as we begin to write an implementation with compiling to a type
theoretic IR, we have noticed a lot of trouble in development

  1. The lack of Core being able to extend itself, leaves the requirement
    for a more complex front pass

    • There are only a couple of ways we can do this in Haskell

      1. Write out concrete data structure between each pass, and
        write a small pass between each structure
      2. Use Tagless Final to make changing a big AST less
        cumbersome
      3. Use Trees that Grow to simulate small changes within an
        AST
      4. Give up on small changes, and bundle together n changes in
        one pass.
      5. Use Generics to automate the structural changes.
      6. Use S-expressions to change the structures
    • The issue with 1) is that it incurs a code cost of the length
      of the structure you are working over. This means that for any
      unchanged part of the syntax, we must map it to itself recursing
      down.

    • The issue with 2) is that it becomes O(n) (find link to
      oleg's FTP site) to do any transformation which requires
      matching on arguments. Along with complications of forming the
      observation type in Haskell, make it a less compelling case
      than other options.

    • The issue with 3) is that it only automates forming the
      data-type itself, not any pass. Thus to make a transformation,
      it takes as much code as 1). With 5) the situation may
      improve but we will discuss it when we discuss 5).

    • The issue with 4) is that we now have to bundle n individual
      passes into one macro pass. This greatly simplifies the issues
      with the other passes, as we have very fixed layers of data
      types that we can scrutinize, however this leaves any proving or
      reasoning about the passes themselves much harder.

    • The issue with 5) is that we are leveraging on a system that
      splits information evenly at the type level and the term level.
      We can demonstrate this phenomenon by using to and from
      directly to see the representation this would take even for some
      very basic terms.

      ​​​​​​​​-- Getting the term from a basic structure
      ​​​​​​​​λ> Juvix.Library.from (Other 3)
      ​​​​​​​​M1 {unM1 = L1 (L1 (L1 (M1 {unM1 = M1 {unM1 = K1 {unK1 = 3}}})))}
      
      ​​​​​​​​-- Checking the type information which allows us to get back to a
      ​​​​​​​​-- concrete type
      
      ​​​​​​​​λ> :t Juvix.Library.from (Other 3)
      ​​​​​​​​Juvix.Library.from (Other 3)
      ​​​​​​​​  :: Num a =>
      ​​​​​​​​     D1
      ​​​​​​​​       ('MetaData "BinderPlus" "Juvix.Contextify.Binders" "main" 'False)
      ​​​​​​​​       (((C1
      ​​​​​​​​         …
      
      ​​​​​​​​λ> :t Juvix.Library.to
      ​​​​​​​​Juvix.Library.to :: Generic a => Rep a x -> a
      
      ​​​​​​​​λ> :t Juvix.Library.from
      ​​​​​​​​Juvix.Library.from :: Generic a => a -> Rep a x
      

      Here we can see that for the constructor Other, we have 3 L1
      structures and two M1 structures wee have to get rid of before
      we can get to the data. Worse if we look at the type level, we
      can see that the entire ADT is reserved through this. So if we
      wanted to create a pass that got rid of the boilerplate,
      translating from one ML data type to another, we can not just
      call to and from as the types are incompatible, even if we
      changed the subset of cases we wish to change. Thus we'd have
      to reconstruct the M1's and L1's to get back a proper typed
      representation. This means that we have to compute on the Rep
      representation of the type during the pass, calling to once
      all cases have been run, effectively ending up with the same
      amount of boilerplate at 1.

      One approach that might work is combining this with 3) to
      properly generate these transformations. However this would take
      more template Haskell to properly experiment with.

    • The issue with 6) is that we've expressed the passes as a
      bona fide macro-expansion, therefore proving that they can
      indeed be done within a language. Thus instead of working on
      getting the language to a state where it can support such
      expressions and centralizing work upon this effort, we put extra
      complexity on getting a large frontend language to meetup with
      the small core. Thus, infrastructure becomes an issue as we need
      proper tools to properly conduct these transformations.

  2. There is a large disconnect between Core and an ergonomic frontend
    puts extra cost upon the transformation.

    • An illustrative point of this is by seeing how many different
      kinds of ideas each can expressed. As by the calculation in
      Appendix A we can see that Core expresses a total of 36
      different types of ideas that can be fulfilled, whereas as the
      Frontend language exhibits a total of 65 ideas that can be
      expressed. Further, the forms of the Frontend language is more
      complex, thus forcing us to map them down into a much smaller
      core.
  3. What Core outputs is not an effective compilation target.

    • We can see this in that many high level ideas go through core
      such as

      1. Lambdas

      2. Pattern Matching in global function definitions

      3. List style records (not put together in an array
        representation for compilation)

      4. Deconstructors expressed in terms of higher order functions.

    • What this means is that we have a core language that then
      requires yet another series of passes to properly compile to a
      suitable backend.

In order to tackle these complexities to development and to uphold the
ideals of Juvix Ι propose the language that will become Juvix in time,
S-Juvix.

S-Juvix is an un-typed Lisp with the following properties:

  1. It's minimal
  2. It's a fast compile target
  3. It can extend itself
  4. It is easy to apply type theory onto

point 2. combined with 1. covers the awkwardness of making a
compiler for it, while also satisfying fast output code.

Point 3. is a direct answer to the complexity issues we've faced in
development, while also pushing for the ability to make ergonomic
contract code.

Lastly, point 4. is of special importance as this is what will allow
the language to be typed by any direction what our type theory takes us.
It further lets our type theory try out various ideas without throwing a
wrench in the rest of the compiler.

With the preface to the language lay out, let us introduce the full
language standard

The S-Juvix Language

S-Juvix is a rather standard low level lisp, offering very little
builtin definitions. Below we will briefly describe all components of
the base language. For a more complete specification, view Appendix B.

  • The builtin data-structures of the language are:
    • Arrays
    • Lists
    • Hash-tables
    • Fixed length products
    • The Image
    • Modules (derived from Hash-tables)
    • Symbols
    • Numbers
  • The builtin evaluation functions are:
    • (eval <form>)
    • (macroexpand <form> <recursive-or-not>)
    • (apply <form> <list>)
  • The main definition functions are:
    • (def <name> <body>)
    • (lambda <binder> <body>)
    • (def-local <name> <body>)
  • The usual suspects with lisps are:
    • (gensym <name>)
    • (read <value>)
    • (eq <item-1> <item-2>)
    • (quote <form>)
    • (do <body>+)
  • Branching primitives:
    • (case <number-or-symbol> <body>+)

There are of course more functions (mainly in regards to manipulating
builtin data-structures), however we can notice that the language itself
is quite small. Even the given base forms are quite low level, with the
def forms lacking even optional arguments to take!

Thus to make a function that takes any amount of arguments, we'd have
to write.

(def factorial
    (lambda (x)
      (case (>= 0 x)
        :true  1
        :false (+ x (factorial (- x 1))))))

Which is not very ergonomic, as our main way to define a function is not
able to take any arguments, we can derive it with the functions derived
in Appendix D.

(defmacro defun (name arguments body)
  `(def ,name (lambda ,arguemnts ,body)))

(defmacro if (pred then else)
  `(case ,pred
     :true ,then
     :else ,else))

(defun factorial (x)
  (if (>= 0 x)
      1
      (+ x (factorial (- x 1)))))

Now, we can see that the factorial function is defined like we'd expect
in a high level language.

Even the function/macro defmacro is defined from macro which is
defined from def. Due to how this language can grow, we can from a
small core, derive a full blown ergonomic high level language.

Deriving Types in S-Juvix

Typing is traditionally done by making sequent calculus rules and
deriving the syntax one wants from there. Likewise we can apply typing
rules upon the primitives of this language. Thus if we can do this, then
all that remains to be seen, is how would we extend the language with
enough information to support explicit declarations and places for the
typechecker to insert type information.

Thus I present one example strategy for including type-signatures with
global definitions

(defmacro sig (name type)
  (let ((name-generated
          (gensym (symbol-append name `signature))))
    `(do
      (def ,name-generated ,type)

      (set-meta-data ,name
       (insert (meta-data ,name)
               :type
               ,name-generated))

       (set-meta-data ,name-generated
                      (insert (meta-data ,name-generated)
                              :siganture
                              :true))
       ,name-generated)))

In this example, we abuse the fact that each global definition has a
meta-data slot, we use this to refer to the new signature. We also
define the signature as a normal declaration. We can see an example
usage here

(sig factorial (-> int int))
(defun factorial (x)
  (if (>= 0 x)
      1
      (+ x (factorial (- x 1)))))

Thus, the syntax of the language can be extended properly with a typing
system that Juvix Core will manipulate.

More examples of this kind of extension can be found in Appendix C.

The S-Juvix Abstract Machine

Compilation Considerations

Appendix A Size Calculations

-- Core
λ> import qualified Generics.Deriving.ConNames as ConNames
λ> length (ConNames.conNames (undefined :: Term Int Int Int))
20
λ> length (ConNames.conNames (undefined :: RawGlobal Int Int Int))
4
λ> length (ConNames.conNames (undefined :: Elim Int Int Int))
5
λ> length (ConNames.conNames (undefined :: Pattern Int Int Int))
7
λ> 20 + 4 + 5 + 7
36

-- Frontend
λ> length (ConNames.conNames (undefined :: Expression))
25
λ> length (ConNames.conNames (undefined :: TopLevel))
10
λ> length (ConNames.conNames (undefined :: MatchLogicStart))
4
λ> length (ConNames.conNames (undefined :: NameSet Int))
2
λ> length (ConNames.conNames (undefined :: Header Int))
2
λ> length (ConNames.conNames (undefined :: Computation))
2
λ> length (ConNames.conNames (undefined :: Constant))
2
λ> length (ConNames.conNames (undefined :: Numb))
2
λ> length (ConNames.conNames (undefined :: Arg))
2
λ> length (ConNames.conNames (undefined :: GuardBody Int))
2
λ> length (ConNames.conNames (undefined :: Product))
3
λ> length (ConNames.conNames (undefined :: Adt))
2
λ> length (ConNames.conNames (undefined :: Name))
2
λ> length (ConNames.conNames (undefined :: Data))
2
λ> length (ConNames.conNames (undefined :: InfixDeclar))
3

λ> 3 + 2 + 2 + 2 + 3 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 4 + 10 + 25
65

Appendix B Base Language

;; Definition Facilities

(def <symbol> <body>)

;; we may want to unify all "marking" functions under some kind of set
;; system
(mark-def-as-macro <symbol>)

;; we may want a separate local def form. That or rely on inlining
;; being quite good.
(lambda <name> <body>)

;; May not be needed, but is handy
(def-local <name> <body>)

;; Builtin Types

list       (immutable)
array      (mutable)
hash-table (mutable)

;; Reader extensions are omitted

;; Image commands

(save-juvix-and-die <location>)

;; evaluation

(eval <form>)

(macroexpand <form> <recursive-or-not>)

(apply <form> <list>)

;; Module Manipulators

(in-module <quoted-name>)

(new-module <quoted-name>)

(meta-data <symbol>)

(set-meta-data <symbol> <value>) ; data should be a hashtable

(open <name>)

(include <name>)

;; Miscellanies Functions

(eq <item-1> <item-2>)

(garbage-collect)

(de-allocate <ref>)

(read <value>)

(gensym <name>)

(<application> <arg-1><arg-n>)

(do <body>+) ; traditionally called progn (CL) or begin (Scheme), or do (Haskell, Clojure)

(quote <form>) ; Literal constructors

;; choice

(case <thing> <body>+)

;; products co-products

(product <list-of-values>)

;; List (immutable)

(car <list>)

(cdr <list>)

(cons <ele> <list>)

(pair? <list>)

(nil? <list>)

nil

;; Array (mutable)

(elt <array> <ele>)

(set <array> <ele> <value>)

(allocate <array> <length> <initial-value>)

;; Hashtable (mutable)

(lookup <hashtable> <key>)

(insert <hashtable> <key> <element>)

(remove <hashtable> <key>)

Appendix C Type Theoretic Macros

  • We will be relying on Appendix D to give us enough definitions to
    write the following code with ease
;; Signatures

;; In reality we'd want to check if the sig already exists, and work
;; off that gensym, instead of defining a new symbol every time we try
;; to work this.
(defmacro sig (name type)
  (let ((name-generated
          (gensym (symbol-append name `signature))))
    `(do
      (def ,name-generated ,type)

      (set-meta-data ,name
       (insert (meta-data ,name)
               :type
               ,name-generated))

       (set-meta-data ,name-generated
                      (insert (meta-data ,name-generated)
                              :siganture
                              :true))
       ,name-generated)))


;; Need to derive this more explicitly, rather than just stating it

;; Examples of Sum types without explicit sum types

;; note we use cons explicitly here to demonstrate how we can
;; construct it, we wouldn't do that in practice due to safety
;; concerns. This is the exact same as categorical co-product is it
;; not?

(defadt foo ()
  (foo bart)
  (baz baz)
  (bar bar))

(def my-bar
    (make-bar <bar-record>))

---> (def my-bar (cons :bar <bar-record>))

(def my-foo
    (make-foo <foo-record>))

---> (def my-foo (cons :foo <bart-record>))

;; type checker assisted, so will give a warning about missing cases
(match my-bar
  (bar <bar-record>)  <body-with-bar-record>
  (foo <bart-record>) <body-with-bart-record>
  (baz <baz-record>)  <body-with-baz-record>)

--->
(at-compile-time-thus-not-in-generated-code
 (if (all-arguments-checked (constructors of-case))
     t
     (warning "cases missed......")))
;; this code shows up
(case (car my-bar)
  ;; symbols are O(1) compare, so basically integers when interned
  :bar (let <bind-bar-record>  <body-with-bar-record>)
  :foo (let <bind-bart-record> <body-with-bart-record>)
  :baz (let <bind-baz-record>  <body-with-baz-record>))

Appendix D Derivations

  • All the code is valid in this specification of Juvix, except for
    values that are related to the Juvix reader itself (thus, reader
    extensions and some literals).
    • All these exceptions will be noted, otherwise all code is
      compiling and extend Juvix itself
(def list
    (lambda (&rest arguments)
      arguments))

;; First macro facility
(def macro (lambda (name args body)
             (list (quote do)
               (list (quote def) name args body)
               (list (quote mark-def-as-macro) name)
               name)))

(mark-def-as-macro (quote macro))

;; defmacro next level of macro lever
(macro defmacro (lambda (name args body)
                  (list (quote macro)
                        (list (quote lambda) args body))))

(defmacro if (pred then else)
  (list (quote case) pred
        :true  then
        :false else))

(defmacro and (x y)
  (list (quote if)
        x
        (list (quote if)
              y
              :true
              :false)
        :false))

;; getting List function
(def foldr
    (lambda (f list acc)
      (if (nil? list)
          acc
          (f (car x) (foldr f (cdr xs) acc)))))

;; need this for quasi-quote
(def conc
    (lambda (xs ys)
      (foldr cons xs ys)))

;; And now the tricky bit, we want to implement `
;; We will have literals we will deal with, like
;; #(1 2 3 4) Array
;; Quasiquote formally works like this.
;;
;; Note 'x  = (quote x)
;;      `x  = (qquote x)
;;      ,x  = (unquote x)
;;      ,@x = (splice x)
;;
;; `x       = 'x
;; `,x      = x
;; `(x y)   = (cons `x `(y))
;; `(,x y)  = (cons  x `(y))
;; `(,@x y) = (conc  x `(y))
;;
;; The code will look ugly as I say (quote x) instead 'x, and Ι lack
;; the proper tools, which I'll make with this macro qquote! All of
;; this is the pains of bootstrapping yourself, and nothing more.
(defmacro qquote (x)
  ;; if we had cond, this would be a lot prettier
  (if (array? x)
      ;; array-literal and array-list are left as an exercise to the
      ;; reader
      (list (quote array-literal)
            (list (quote qquote)
                  (array-list x)))
      ;; Check if it's a hashtable literal
      (if (hashtable-literal? x)
          ;; hashtable-literal, and hash-list left as an exercise to
          ;; the reader.
          (list (quote qquote)
                (hash-list x))
          ;; next check if it's not a list... if so we're golden.
          (if (not (pair? x))
              (list (quote quote) x)
              ;; Check if we have the following case
              ;; `,x --> x
              (if (eq (quote unquote) (car x))
                  (car (cdr x))
                  ;; Check if we the following case
                  ;; `(,x y) --> (cons  x `(y))
                  (if (and (pair? (car x))
                           (eq (quote unquote) (car (car x))))
                      (list (quote cons)
                            (car (cdr (car x)))
                            (list (quote qquote)
                                  (cdr x)))
                      ;; Check if we the following case
                      ;; `(,@x y) --> (conc  x `(y))
                      (if (and (pair? (car x))
                               (eq (quote splice) (car (car x))))
                          (list (quote conc)
                                (car (cdr (car x)))
                                (list (quote (cdr x))))
                          ;; no other case thus it's just a list
                          ;; `(x y) --> (cons `x `(y))
                          (list (quote cons)
                                (list (quote qquote) (car x))
                                (list (quote qquote) (cdr x))))))))))


;; register the reader extension somehow. This is the one part that is not specified
(reader-extension qquote ...)

;; We'll likely make this along with and CPU instructions
(defmacro or (x y)
  `(if ,x
       :true
       (if ,y
           :true
           :false)))

(def group-by-2
    (lambda (xs)
      (def-local recursive
          (lambda (xs acc)
            ;; drop an element if list is not empty
            ;; an assert would be better here
            (if (or (nil? xs) (nil? (cdr xs)))
                acc
                (group-by-2 (cdr (cdr xs))
                            (cons (list (car xs) (car (cdr xs)))
                                  acc)))))

      (recursive xs nil)))

(defmacro cond (&body lists)
  (foldr (lambda (x ys)
           `(if ,(car x)
                ,(car (cdr x))
                ,ys))
         (group-by-2 lists)
         nil))