Experimental Solidity Type Checker

# Experimental Solidity Type Checker ## Regular Calls and Notes - [Calls Tuesdays 2pm CEST](https://meet.ethereum.org/programming-language-research) - [Call Agenda](https://notes.ethereum.org/ZXVLmVDnRm2tUAtkESPXjQ) - Repos: - [Surface Language](https://github.com/rodrigogribeiro/solcore) - [Core Language](https://github.com/mbenke/hm-typecheck) - [Standard Library Snipets](https://github.com/ekpyron/solstdlib/) ## Links to Further Notes - [Compile-time constant expression evaluation](https://notes.ethereum.org/adUDeHnzRHSp-r18JhLeGg) - [Syntax Questions](https://hackmd.io/SyAf5-ZkTGmOsGO1BXZRmQ?view) - [ABI Encoding/Decoding snippets](https://notes.ethereum.org/6YDiLpOFRha6bmm95n-5ZA) ## Aims - Basic Idea: Hindley-Milner-Style type system; polymorphic types; type classes - Smallest possible set of builtin constructs. - Sufficient expressivity to reconstruct variants of current Solidity's types with in-language definitions, including reference types. - No implicit conversions - but convenient explicit conversion functions that can be polymorphically defined in-language. - Type classes and type class instantiations as replacement for overloading. - Previously builtin behaviour can be reconstructed by in-language implementations of type classes and well-defined syntactic sugar - Goal is to preserve decidable type inference and principal types (although this is not taken as a hard requirement). Special decorators such as `memory` should become (almost) ordinary type constructors; assignment behaviour should be described using appropriate instances of dedicated type classes such as ``` class ref : Ref[deref] { load : ref -> deref; store : ref -> deref -> Unit; }; ``` (this is explained in more detail later) ## Motivating examples **TODO: add examples ** - [Ideas for in-language implementations and translation to Yul](https://notes.ethereum.org/NJrYt1xTS82zjWDzZzLRpA?both) ### Previous (outdated) Documents - [References and Assignments](https://notes.ethereum.org/JlWbXMDcRyeG-jO7u3tgKg) - [Old Experimental Solidity Type System Overview](https://notes.ethereum.org/J94bU_4QRHyoH7sb1kMxHA) ## The core language The surface language is desugared into small core language. The desugaring details are described later. ### Expressions and statements Core expressions include variables, literals, constructors, function definitions and calls. Even though no let-polymorphism is envisioned, the usual `let` expression has been kept, mostly for testing purposes. ``` data Expr = ELam [Arg] Expr -- function \args -> expr | ELet Name Expr Expr -- local definition: let name = expr1 in expr2 | EApp Expr Expr -- function call: f(arg) | EVar Name -- variable | ECon Name -- value constructor | EInt Integer -- integer literal | EBlock [Stmt String] -- desugared statements annotated with their source form ``` The last form of an expression is a block of statements - so that we can easily desugar a Solidity function whose body is a block into a corresponding expression. In the current prototype we have only the two kinds of statements in the core language: allocating a local (stack) variable of a given type and expressions. Assignments desugar into appropriate application of the overloaded `store` function. ## Types ### Monomorphic types Monomorphic types are reduced into two forms only: fully applied type constructors (such as `Int` or `Pair[Int,Bool]` and type variables. Function arrows are treated as yet another two-argument type constructor (although they obviously have special concrete syntax). ``` data Type = TCon Name [Type] | TVar Tyvar deriving Eq pattern (:->) a b = TCon "(->)" [a,b] ``` ### Algebraic types The language allows declaring (polymorphic) types such as tuples, lists and such, e.g. ``` type Option[a] = None | Some[a]; type Pair[a,b] = Pair[a,b]; type List[a] = Nil | Cons[a,List[a]]; ``` We do not envisage using higher kinds, therefore type constructors are always completely applied. On the other hand, value constructors can be used in expression in a manner similar to ordinary functions and can be partially applied. ### Type classes One of the goals of the current prototype was handling overloading with type classes. Type classes come in two flavours: single parameter type classes define sets (or properties) of types; for example they are able to express that some types support (overloaded) equality: ``` class a : Eq { eq : a -> a -> Bool }; instance Int : Eq; instance Bool : Eq; instance a : Eq => List[a] : Eq; instance (a:Eq, b:Eq) => Pair[a,b] : Eq; ``` Note that the last two instance are qualified: the type of lists `List[a]` supports equality provided that the type of elements `a` supports it (and similarly for pairs). In the presence of type classes, overloaded functions have qualified types, for example the function `elem` (checking whether some element belongs to a list using equality) works for types supporting equality ``` elem : ∀a.(a:Eq) => a -> List[a] -> Bool ``` Multi-parameter type classes (MPTC) define relations between types. While it's admittedly a more advanced concept, we need them to describe references - the fact that the type `X` is a reference to `Int` is a relation between them, and we may have more than one types being a reference to `Int` (for example types representing stack memory and storage). Furthermore, accessing elements of containers like arrays and in particular mappings will require multiple type parameters. The power of multi-parameter type classes comes with a cost: type inference is more complicated and its decidability is much less obvious (in fact, it is undecidable without careful restriction on the instance forms). Luckily, it seems that we do not need the full power of MPTC to achieve our goals, but only a limited form, where one parameter fully determines the others. For example, if the type `X` is a reference to `Int`, it cannot be a reference to any other type. We call such form of MPTC "Weak MPTC" and write (the concrete syntax is obviously tentative) `a:C[b,...,x]` to express that the types `a,b,...,x` are in relation `C` and `a` fully determines `b,...,x` (that is there is no other tuple beginning with `a` in the relation `C`); in `a:C[b,...,x]` the type `a` is the main argument and `b,...,x` are called *weak arguments*. For example ``` class ref : Ref[deref] { load : ref -> deref; store : ref -> deref -> Unit; }; ``` expresses that the type `ref` can be considered a reference to `deref`, provided appropriate `load` and `store` functions are defined (obviously we may wsant to separate loading and storing into different classes such as `Loadable` and `Storable`, but keeping them together simplifies the presentation). #### Class declarations Class declarations have the form ``` class type_variable : C[args] { method declarations } ``` where - `args` is an (optional) list of type variables representing weak type arguments, absent for single-parameter type classes (such as `Eq`). - method declarations have the form `name : type` and introduce ovderloaded global names of type `(type_variable : C[args]) => type`; for example `load` from the `Ref` class described above has type `ref:Ref[deref] => ref -> deref` #### Instance declarations Instance declarations have the form ``` instance Context => type : C[args] { method definitions } ``` where - `Context` (the part before the `=>`) is a tuple of qualifiers of the form `tvar : D[as]`; - the part after `=>` is called the instance head; - `args` is an (optional) list of type variables representing weak type arguments, absent for single-parameter type classes (such as `Eq`); - method definitions are needed for program execution, but not crucially important for typechecking, therefore are omitted in the current prototype. For example, one might use the following vclasses and instances to model array indexing ``` class a:IndexAccessible[index, baseType] { indexAccess : a -> index -> baseType; }; type MemoryArray[a]; instance (a:MemoryBaseType) => MemoryArray[a]:IndexAccessible[Int, Memory[a]]; ``` ### Equality constraints To ease modelling of Weak MPTC, apart from class constraints, another type of constraints is used internally: equality constraints (there is no plan to expose them to the user, but this can be easily done if needed). For example, consider the following instance of the class `Ref`: ``` instance Memory[a] : Ref[a] ``` the type variable `a` occurs more than once in the instance head, which makes type inference (and instance resolution in particular) more difficult; therefore it is internally translated into an administrative normal form where all variables in the instance head are distinct: ``` instance (a ~ b) => Memory[a] : Ref[b] ``` where the constraint `(a ~ b)` is satisfied if the same type substituted for `a` and `b`. The idea is borrowed from Haskell - see for example [GHC User guide](https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/equality_constraints.html#equality-constraints). To see why transformation is needed, consider the following example: ``` instance Memory[a]: Ref[a] ; mi : Memory[Int]; pragma log; x1 = load mi; ``` wihout equality constraints, we get ``` load : (r:Ref[d]) => r -> d load mi : (Memory[Int]:Ref[d]) => d ``` and then trying to generalise the type of `x1` fails with ``` Error: no instance of Memory[Int]:Ref[d] ``` Of course we might try to enforce functionale dependencies by other means, but so far, equality constraints seem cleanest (see also [Understanding functional dependencies via constraint handling rules](https://www.cambridge.org/core/journals/journal-of-functional-programming/article/understanding-functional-dependencies-via-constraint-handling-rules/49E533CD7975431B5339456255DA9BE5) ). ## Type Inference We want to have type inference, i.e. the ability to omit types (e.g. of function arguments) and have them inferred. Therefore in the typechecking process, unspecified types are represented by existential (unification) variables which are then either instantiated by solving constraints using unification or quantified when generalising the type of a toplevel entity (such as a function declaration), for example ``` id x = x id :: ∀c.c -> c ``` The argument of the function `id` has no declared type and in fact can be of any type, which is expressed by the polymorphic type of `id`. On the other hand, the body of the function often enforces the argument type even if it is not specified, e.g. in ``` mul : Int -> Int -> Int; square x = mul x x; ``` the argument of `square` must be of type `Int`. Another possibility is that the function body only partially constrants the types of its arguments, e.g. in ``` eq : (a:Eq) => a -> a -> Bool refl x = eq x x ``` in contrast to `id`, the function `refl` does not work for all argument types, but only for those supporting equality, thus ``` refl : (a:Eq) => a -> Bool ``` ### Instance resolution When an overloaded function is applied to arguments of concrete types, its class constraints are instantiated to the types of these arguments, e.g. ``` refl 42 : (Int:Eq) => Bool ``` However, such types are not useful; either there is an `Eq` instance for `Int` and the expression has just the type `Bool`, or there isn't, and the expression is not type-correct. But because instances can have contexts (i.e. be conditional), we may encounter situations such as ``` instance (a:Eq, b:Eq) => Pair[a,b] : Eq nested : Pair[Int,Pair[Int,Bool]] refl nested : (Pair[Int,Pair[Int,Bool]] : Eq) => Bool ``` where instance resolution is a complex process: - `Pair[Int,Pair[Int,Bool]] : Eq` if `Int:Eq` and `Pair[Int,Bool]:Eq` - `Pair[Int,Bool]:Eq` if `Int:Eq` and `Bool:Eq` Given that `Int` and `Bool` belong to class `Eq`, in this case the context is resolved completely. To ensure that this process always terminates we need to guarantee that every subsequent step solves a smaller problem. For this it is enough to introduce a (quite natural) restriction that, in an instance declaration: - the main type begins with a constructor - all constraints in the instance context are applied only to variables - all the variables in the instance head are distinct For example the following instance declarations might lead to nontermination and are forbidden: ``` instance (a:Bar) => a:Foo instance (List[a]:Foo) => a:Foo instance (a:Foo,a:Bar) => Pair[a,a] : Foo // FIXME ``` The termination proof sketch is included in the Implementation Details section. ## Context reduction This step attempts to simplify constraints in the context, so that for example instead of the context ``` (Pair[a,b]:Eq, List[a]: Eq) ``` we get a simpler constraint set ``` (a:Eq, b:Eq) ``` when the process reaches a constraint on a base type (such as `Int:Eq`), such constraint is eliminated if the corresponding instance is present, otherwise appropriate error is reported. ## Expressive power examples ### Uniform references The goal was a uniform treatment of references to various storage areas (stack, memory, storage); the following example shows that this is indeed possible: ``` class ref : Ref[deref] { load : ref -> deref; store : ref -> deref -> Unit; }; instance Stack[Int] : Ref[Int]; instance Memory[a]: Ref[a] ; siExample : Stack[Int]; instance (ra:Ref[a], rb:Ref[b]) => Pair[ra,rb] : Ref[Pair[a,b]]; mi = newMRef 42; x1 = load mi; x2 = load siExample; x4 = pair mi siExample; x5 = load x4; x6 = store mi x2; copy from to = store to (load from); x7 = copy siExample mi; ``` **TODO:** More examples ## Desugaring statements For now, we have the following kinds of statements: * allocation `let x : t` - introduces a variable of type `Stack[t]`` * expression `e` * assignment `e1 = e2` (note that LHS can be a complex expression such as `a[i].first`) Only assignments are subject to nontrivial desugaring; left-hand right-hand sides are treated differently. An assignment $e1 = e2$ desugars into $store(L[\![e1]\!], R[\![e_2]\!])$ where L and R are desugar schemes for the left- and righ-hand side respectively. ### LHS desugaring Numeric literals are not l-values. Variables on the left stand for the cells to be assigned to, hence are kept unchanged: `L[x] = x`. For function calls, the arguments are desugared using R but the function itself is desugared using L. For example if `f : int -> stackref(int)`, then `f(1) = 2` desugars to `store(f(1), 2)` Member access: `L[a.f] = L[f](L[a])`; e.g. `L[a.first] = first(a)` Chained member access: `L[x.first.second] = second(first(x))` Arrays: `L[a[i]] = indexAccess R[a] R[i]` ### RHS desugaring Variables on the right stand for the values they hold, hence `R[x] = load(x)`. In a function call `f(x)` we do not want to add `load` to `f`, but do want to add one to `x`; hence `R[e1(e2)] = L[e1](R[e2])` Member access: `R[e1.e2] = load(L[e2](L[e1])) ` E.g. `R[x.first] = load(first(x))` **Note:** chained member access such as `x.first.second` is a bit more complicated: `R[(x.first).second] = load(second(L[x.first])) = load(second(first(x))))` Arrays: `R[a[i]] = load(L[a[i]]` Other forms (literals, constructors, lambda) are kept unchanged. ### Examples and Discussion #### Memory Variables The code for copying betwen memory locations: ``` let x : Memory[Int]; let y : Memory[Int]; *x = 42; x = y; // *x = y does not work, but the following work *x = y.load; ``` desugars as follows: ``` -------------------------------------------------------------- alloc x : Memory[Int] ~> x : Stack[Memory[Int]] alloc y : Memory[Int] ~> y : Stack[Memory[Int]] * x = 42 ~> store (load x) 42 x = y ~> store x (load y) * x = y . load ~> store (load x) (load (load y)) * x = * y ~> store (load x) (load (load y)) // * x = y ~> store (load x) (load y) ``` OTOH, we might write and use a function for that: ``` // copy : ∀a b.(b:Ref[d], a:Ref[d]) => a -> b -> Unit copy from to = store to (load from); let x : Memory[Int]; let y : Memory[Int]; copy x y; ``` ##### Pairs This seems to work well for ordinary nested pairs: ``` let q : Pair[Int,Pair[Int,Bool]]; q.second.first = 42; x = q.second.first; ``` Memory pairs seem more tricky: ``` let x : Memory[Int]; let y : Memory[Int]; *x = 42; x = y; let p : Pair[Int, Bool]; let mp : Memory[Pair[Int, Bool]]; p.first = 1; p.second = False; p = pair 1 False; *mp = p; *mp.first = 42; *x = mp.load.first; // *x = mp.first does not work, but we may write this as *x = mp->first ``` At first this looks a bit awkward, but in fact isn't really different from copying integer values between memory locations, which we had to write as `*x = y.load`. We might also borrow another bit of syntactic sugar from C/C++ and write `mp -> first` instead of `mp.load.first` Working with nested pairs follows similar patterns: ``` // Nested pair in contiguous memory area: let mq : Memory[Pair[Int, Pair[Int,Bool]]]; // this is a problem - mq.second has wrong type // *mq.second = mp // these seem to work though *mq.second = mp.load; *mq.second = p; *x = mq.load.second.first; // Pair containing a reference to another pair: let mpm : Memory[Pair[Int, Memory[Pair[Int,Bool]]]]; *mpm.second.load.first = 1; ``` ##### Arrays ``` arrays = { let array : MemoryArray[Int]; let x : Int; array[1] = 42; array[1] = x; x = array[1]; } ``` this desugars as follows ``` alloc array : MemoryArray[Int] ~> array : Stack[MemoryArray[Int]] alloc x : Int ~> x : Stack[Int] array [1] = 42 ~> store (indexAccess (load array) 1) 42 array [1] = x ~> store (indexAccess (load array) 1) (load x) x = array [1] ~> store x (load (indexAccess (load array) 1)) ``` ## Implementation details A program in our core language is a sequence of declarations; of these the most interesting are function definitions of the form ``` f arg1 ... argn = expression ``` Conventional function definitions of the form ``` function f(arg1, ... argn) { statements } ``` can be easily desugared into this form. Type inference is conducted in the context of an environment containing information of the known variables, constructors, types, classes and their instances. After checking a declaration, information about the declared entity is added to the environment. ### Typechecking declaration Typechecking a top-level function declaration ``` f args = body ``` consists of the following steps: - save the current environment - add types of the arguments to the environment; for arguments without a declared type, create a fresh type variable (unknown) - infer the type of the body, which yields - a qualified type of the form `constraints => type` - a substitution of types for the unknown variables - apply the substitution to the argument types - restore the saved environment - generalize the obtained function type, i.e. - reduce the context(constraints) - add a universal quantifier for every type variable not occurring free in the environment (so that for example the function `id x = x` gets the polymorphic type `forall a.a`) ### Typechecking expressions Because of the small number of constructs in our core language, typechecking expressions is not overly complicated: the most interesting case is function application (function call). To infer type for the expression `e1(e2)`: - infer the (qualified) types `ps => t1` and `qs => t2` for `e1` and `e2` respectively - create a fresh type variable `tr` for the result type - unify `t1` with `t2 -> tr`, amending the current substitution - return the qualified type `ps, qs => tr` Another interesting case is inferring a type for a lambda (function) where argument types are not given; e.g. `\x y z -> x z(y z)`. In this case we - introduce fresh type variables (unknowns) for the arguments - typecheck the function body in the extended environment (this can place constraints on the argument types, solving which extend the current substitution) - return the resulting function type with the current substitution applied. ## Context reduction and simplification This involves two main operations on the context (described in more details in the following subsections): 1. Reducing constraints to their head-normal forms (function `toHNfs`), so that for example instead of the context ``` (Pair[a,b]:Eq, List[a]: Eq) ``` we get a simpler constraint set ``` (a:Eq, b:Eq) ``` 2. Simplifying equality constraints - function `simplifyEqualities` This step eliminates equality constraints from the context, e.g. replacing ``` { d:Ref[Int], d ~ Memory[Int] } ``` by ``` { Memory[Int]:Ref[Int] } ``` Finally, we need to eliminate redundant constraints from the context. The current prototype does not implement superclasses, but in their presence (for example `class a:Eq => a:Ord`), we might simplify the context `a:Eq, a:Ord` to `a:Ord` since every instance of `Ord` must, by definition, be an instance of `Eq`. Without superclasses, we only need to eliminate duplicates. ### Context reduction Because our Weak MPTC introduce equality constraints, step 1 is always preceded by solving such constraints (via unification) - function `simplifyEqualities`. After this procedure, the remaining constraints (under current substitution) are reduced one by one, using the function `toHnf`. Constraints of the form `a:D` where `a` is a type variable, are considered to be in head normal form and therefore not reducible. For every constraint of the form `t0:D0`, where `t = T[arg1,...,argn]`, `T` is a type constructor and `D0` is a class, we search for an instance `(a1:C1,...,am:Cm) => t : D0`, where `t` matches `T[arg1,...,argn]` under substitution `phi` (there can be at most one, because instance declarations can't be overlapping). If successful, we replace original constraint by `phi((C1,...,Cm))`; otherwise we report an error `no instance D0 for t`. For example, given the instances ``` instance Int:Eq instance (a:Eq) => List[a]:Eq ``` the constraint `List[List[Int]]: Eq` goes through following reduction steps: ``` { List[List[Int]] : Eq } { List[Int] : Eq } { Int : Eq } {} ``` On the other hand `Int -> Int : Eq` leads to an error - there is no `Eq` instance for `->`. This process can (at least temporarily) increase the both the number and total size of the constraints , e.g. with ``` instance (a:Eq, a:Ord, a:Show) => instance Set[a]:Show ``` `Set[M[u]]:Show` reduces to `{ u : Eq, u : Ord, u : Show }`. However, note that every step replaces a constraint with some number of constraints of smaller depth. Hence the process terminates, which can be proved using a multiset ordering. ### Simplifying equality constraints For simple type classes the context reduction described above might be sufficient. However, the presence of Weak MPTC and equality constraints introduced to handle them, things get more complicated; consider for example ``` class a:MemoryBaseType { stride : Itself[a] -> Int; }; instance Int : MemoryBaseType; class a:IndexAccessible[baseType] { indexAccess : a -> Int -> baseType; }; instance (a:MemoryBaseType) => MemoryArray[a]:IndexAccessible[Memory[a]]; array : MemoryArray[Int]; x44 = store (indexAccess array 1) 42; ``` While inferring the type of `x44` we need to reduce the context ``` {d3:Ref[Int],MemoryArray[Int]:IndexAccessible[d3]} ``` the first constraint is ostensibly in hnf, while the second can be reduced using the instance ``` instance (a:MemoryBaseType) => MemoryArray[a]:IndexAccessible[Memory[a]]; ``` the administrative nf of which is ``` instance (a:MemoryBaseType, b ~ Memory[a]) => MemoryArray[a]:IndexAccessible[b]; ``` which leads us to the following reduced context (under the substitution `[a:=Int, b:=d]`) ``` { d:Ref[Int], d ~ Memory[Int] } ``` The `simplifyEqualities` step simplifies this context to ``` { Memory[Int] : Ref[Int] } ``` Now the subsequent `toHnfs` step can use the instance ``` instance Memory[a] : Ref[a] ``` administratively equivalent to ``` instance (c ~ a) => Memory[a] : Ref[c] ``` applying which leads to ``` {Int ~ Int} ``` which gets simplified away. **TODO:** discuss is how this influences termination. Can equality constraints make the measure grow? ``` {(a ~ Pair[b,c], c ~ b, c ~ Pair[d,e], Memory[a]:Ref[e]) } ``` Luckily loops like ``` { (a ~ List[b]), b ~ List[a] } ``` get caught by unification. ## Monomorphisation (Work in Progress) Our language allows parametrically polymorphic functions. There are two common ways of implementing such functions: - allow polymorphic functions to be used only on boxed values (essentially pointers) - this is the approach taken in Haskell (or, more precisely, GHC); - monomorphisation: create specialised versions of polymorphic function for all concrete types they are going to be used on. **TODO:** discuss relative merits of both approaches. (Note:perhaps also a mixed approach is possible: use polymorphic version on boxed types, specialize for unboxed) On the other hand, we also support ad hoc polymorphism in the form of type classes (indeed they are one of the pillars of the type system). Again, there are two common ways of implementing them: - dictionary passing - an overloaded function is augmented with additional parameters representing dictionaries containing implementations of all needed class methods - monomorphisation, described above. Monomorphisation is a solution to both above problems, hence it seems a natural candidate to consider in the current implementation. ### General approach Starting with a number of entrypoints (such as public functions in case of a Solidity contract), rewrite the program, creating specialised version of the used function. In our prototype, we assume only one entrypoint called `main`. Invariant: a translated function has always concrete argument and result types. A polymorphic function `f : forall a1 ...an.t` when used at instatiation `[a1:=T1,...,an:=Tn]` becomes istantiated to `f<T1,...,Tn>` (where `<...>`, possibly mangled, becomes a part of the new function name). Monomorphization takes the source program and the resolution environment, mapping overloaded identifiers to their variants at respective types. The resolution environment ``` type ResolutionEnv = Map Name [Resolution] type Resolution = (Type, Expr) ``` is essentially a set of definitions; it relates a globally defined identifier, along with its type, to a source expression: the right-hand-side of its definition. The original algorithm monomorphizes only overloaded functions (such as equality), while paramtrically polymorphic functions (such as `map`) are left untouched. To monomorphise such function as well, we can add them to the resolution environment as well (although they would have an unique resolution). Monomorphization works definition-by-definition, expression-by-expression, re-writing source expressions into the target expression and possibly updating the REnv. It recursively replaces all overloaded identifiers with whatever they resolve to in the REnv, until no overloading is left. This algorithm is based on <https://okmij.org/ftp/Computation/typeclass.html#inlining>, which in turn is based on work of Stefan Kaes ([article: Parametric overloading in polymorphic programming languages](https://link.springer.com/chapter/10.1007/3-540-19027-9_9) and [dissertation: Parametrischer Polymorphismus, Überladungen und Konversionen](https://tuprints.ulb.tu-darmstadt.de/epda/000544/)) A side effect of this is that functions get inlined. Ideas for noninlining specialisation is outlined below. ### Examples of inlining specialisation ##### Polymorphic identity ``` id x = x; main = id id 42 ``` becomes ``` main = (\(x:Int -> Int) -> x) (\(x:Int) -> x) 42 ``` ##### singleton List ``` singleton x = Cons x Nil main = singleton 42 ``` becomes ``` main = (\(x:Int) -> Cons<Int> x Nil<Int>) 42 ``` ##### Pairs: ``` type Pair [a, b] = Pair [a, b]; primFst : Pair [a, b] -> a; fst p = primFst p; main = fst (Pair 1 True) ``` becomes ``` main = (\(p:Pair[Int, Bool]) -> primFst p) (Pair<Int,Bool> 1 True) ``` (Note: `primfst` and primitives in general need some more thought) `type Option` #### Simple classes ``` primMap : (a -> b) -> List[a] ->List[b]; primNegInt : Int -> Int; map = primMap; singleton x = Cons x Nil; class a:Neg { neg : a -> a}; instance Bool:Neg { neg = not }; instance Int:Neg { neg = primNegInt }; instance (a:Neg) => List[a]:Neg { neg = map neg }; main = neg (singleton 42); ``` becomes ``` main = primMap primNegInt ((\(x:Int) -> Cons<Int> x Nil<Int>) 42) ``` while ``` type Pair[a,b] = Pair[a,b]; primNegInt : Int -> Int; class a:Neg { neg : a -> a}; instance Bool:Neg { neg = not }; instance Int:Neg { neg = primNegInt }; instance (a:Neg, b:Neg) => Pair[a,b]:Neg { neg p = Pair (neg (fst p)) (neg (snd p)) }; main = neg (Pair 42 true); ``` becomes ``` main = (\(p:Pair[Int, Bool]) -> Pair<Int,Bool> (primNegInt (fst p)) (not (snd p))) (Pair<Int,Bool> 42 true) ``` #### MPTC - References ``` primMloadInt : M[Int] -> Int; primMstoreInt : M[Int] -> Int -> Unit; class r:Ref[d] { load: r -> d; store: r -> d -> Unit}; instance M[Int] : Ref[Int] { load = primMloadInt; store = primMstoreInt }; mi : M[Int]; main = load mi; ``` becomes ``` main = primMloadInt mi ``` while `main = store mi 42;` gets specialised to `primMstoreInt mi 42` The next example is more challenging: ``` primMloadInt : M[Int] -> Int; primMstoreInt : M[Int] -> Int -> Unit; primLoadInt : Stack[Int] -> Int; primStoreInt : Stack[Int] -> Int -> Unit; class r:Ref[d] { load: r -> d; store: r -> d -> Unit}; instance M[Int] : Ref[Int] { load = primMloadInt; store = primMstoreInt }; instance Stack[Int] : Ref[Int] { load = primLoadInt; store = primStoreInt }; mi : M[Int]; si : Stack[Int]; // copy : ∀a b c.(b:Ref[a], c:Ref[a]) => b -> c -> Unit copy from to = store to (load from); main = copy si mi; ``` the problem here is that the resolution for `copy`: ``` copy : a -> b -> Unit = \from to -> store to (load from) ``` does not mention the intermediate type (in this case Int) and we fail to find proper resolutions for `load` and `store`. Luckily using unification instead of matching when searching the resolution environment we can recover enough information to get ``` main = (\(from:Stack[Int]) (to:M[Int]) -> primMstoreInt to (primLoadInt from)) si mi ``` This works even if the types of `si` and `mi` do not mention `Int` e.g. ``` type SI; type MI; primMloadI instance MI : Ref[Int] { load = primMloadInt; store = primMstoreInt }; instance SI : Ref[Int] { load = primLoadInt; store = primStoreInt }; mi : MI; si : SI; copy from to = store to (load from); main = copy si mi; ``` ### Examples (noninlining) Note: this is just an idea, not implemented yet gets specialised to Convention: names starting with`$` such as `$addInt` are considered primitive. #### Parametric polymorphism ##### Polymorphic identity: ``` id (x:a) = x main = id 42 ``` becomes ``` id<Int> (x:Int) = x main = id<Int> 42 ``` This works also for polymorphic functions applied to polymorphic functions: ``` id (x:a) = x main = id id 42 ``` becomes ``` id<Int> (x:Int) = x main = id<Int -> Int> id<Int> 42 ``` ##### singleton: ``` singleton x = Cons x Nil main = singleton 42 ``` becomes ``` singleton<Int> x = Cons<Int> x Nil<Int> main = singleton<Int> 42 ``` ##### Pairs: ``` pair : a -> b -> Pair[a,b]; fst : Pair[a,b] -> a; main = fst (pair 1 True); ``` becomes ``` pair<Int,Bool> : Int -> Bool -> Pair[Int,Bool] fst<Int,Bool> : Pair[Int,Bool] -> Int main = fst<Int,Bool> (pair<Int,Bool> 1 True) ``` #### Simple classes ``` map : (a -> b) -> List[a] ->List[b] class a:Neg { neg : a -> a} instance Bool:Neg { neg = not } instance Int:Neg { neg = $negInt } instance (a:Neg) => List[a]:Neg { neg = map (neg:a->a) } main = neg (singleton 42) ``` becomes ``` neg<Bool> = not neg<Int> = $negInt neg<ListInt> = map<Int,Int> neg<Int> main = neg<ListInt> (singleton<Int> 42) ``` #### Addition ``` class a:Add { add : a -> a -> a; zero : a} instance Int:Add { add = $addInt; zero = 0 } instance (a:Add, b:Add) => Pair[a,b] : Add { add p q = pair (add (fst p) (fst q)) (add (snd p) (snd q)); zero = pair (zero:a) (zero:b) } sump : (a:Add) => Pair[a,a] -> a sump t = add (fst t) (snd t) main (x:Int) = sump (add (pair x x) (pair 1 1)) ``` becomes ``` add<Int> = $addInt zero<Int> = 0 sump<Int> (t:Pair[Int,Int]) = add<Int> (fst<Int,Int> t) (snd<Int,Int> t) main (x:Int) = sump<Int> (add<Int,Int> (pair<Int,Int> x x) (pair<Int,Int> 1 1)) ``` Variation: consider ``` sump : (a:Add) => Pair[a,a] -> a sump t = add (add (fst t) (snd t)) zero ``` the difference is that we now use two methods from the `Add` class (TODO: check that this works out) #### References Weak MPTC by themselves do not seem to pose a bigger problem ``` class r:Ref[d] { load: r -> d; store: r -> d -> Unit}; instance M[Int] : Ref[Int]; mi : M[Int]; main = load mi; ``` becomes ``` load<M[Int],Int> : M[Int] -> Int main = load<M[Int],Int> mi ``` but what to do with the following example? ``` class r:Ref[d] { load: r -> d; store: r -> d -> Unit}; instance M[Int] : Ref[Int]; instance Stack[Int] : Ref[Int]; mi : M[Int]; si : Stack[Int]; // copy : ∀a b c.(b:Ref[a], c:Ref[a]) => b -> c -> Unit copy from to = store to (load from); main = copy si mi; ``` This might become ``` copy<Int,Stack[Int],M[Int]> (from:Stack[Int]) (to:M[Int]) = store<M[Int],Int> to (load<Stack[Int],Int> from) main = copy<Int,Stack[Int],M[Int]> si mi; ``` ## Type simplification The Core language has only very simple types: unit, int, sums and products. Hence the next step is to translate algebraic datatypes to these simple types. ``` type Color = Red | Green | Blue ``` may become `unit + (unit + unit)` and the constructors are translated as follows ``` Red ~> inl () Green ~> inr(inl ()) Blue ~> inr(inr ()) ``` (an alternative might be to translate enumeration types to `int`, but we want to describe the general process here). If this step is done after monomorphisation, we do not need to translate polymorphic types (such as `Option[a]`) but only monomorphic ones, e.g. `Option[Int] = None<Int> | Some<Int> Int ` which gets translated to `unit + int`, with constructors becoming ``` None<Int> ~> inl () Some<Int> x ~> inr x ``` There is of course a problem with recursive types such as `List`. A solution might be to use explicit memory pointers e.g. ``` type Memory[a] = Word type MList[a] = Nil | Cons a Memory[MList[a]] ``` then ``` MList[Int] ~> unit + (int, word) ``` ### Translating `case` expressions to Core `match` statements ``` case x of { Red -> 1; Green -> 2; Blue -> 4} ``` becomes ``` let y : int match x with { inl t1 => y := 1 inr t2 => match t2 with { inl t3 => y := 2 inr t4 => y := 4 } } ``` ### Translating products and sums of products Tuples: ``` type Triple[a] = Triple[a,a,a] // Triple<Int> ~> (int, (int,int)) case triple of { Triple[x,y,z] -> f x y z } ``` should become ``` f(triple.fst, triple.snd.fst, triple snd.snd) ``` Sums of products: ``` type Few[a] = Two a a | Three a a a // Few<Int> ~> (int,int) + (int, (int,int)) case few of { Two x y -> f x y 42; Three x y z -> f x y z } ``` should become ``` match few with { inl t2 => f(t2.fst, t2.snd, 42) inr t3 => } ``` ## Open questions * how to resolve member access on objects of polymorphic type? * recursive algebraic data types ("dead"/"live" type parameters?) * polymorphic primitives (complex pairs on the stack) * will it ever be necessary for a user to write a stackref type? Stackref/Localref? *