:::warning Announcements - Help hours 3:45-5:15pm in L037 - A5 out now - Due Thursday - Late deadline: Tuesday after SB. - DM me if interested in a partner - Quiz 2: - Monday (Mar 31) the week after week back from SB - Content through Mutation (before SB) - Assignments through A5 (type checking) ::: :::success Agenda for today - Type checking/Assignment 5 continued - Parametric polymorphism - Type inference - Motivation/context - Examples - Exercise: type inference in OCaml ::: # Type checking continued Recall from last time: the high-level summary of type checking. ### Dynamic type-checking - Examples: Racket, Python - Little/no static checking (only basic parsing/syntax checking) - Check for type issues (e.g., invalid operator use) at runtime, report dynamic errors. E.g.: trying to add incompatible types, calling a function with the wrong number of arguments. - Halt actual program execution for most type issues :::danger - Cons: - Runtime checks can slow program execution - coverage: only catches errors on paths executed => could expose more crashes to customers ::: :::success - Pros: - Easier to implement (don't need a separate type checker) - Flexibility: programs that don't crash on the paths actually run are allowed to run successfully ::: ### Static type checking - Examples: Java, Rust, OCaml, Assignment 5 `MiniLangT` - Reject a program before it runs to prevent possibility of some classes of errors. - Part of the language definition :::success - Pros: - Reliability: considers all possible program paths. Does not crash already-running programs for many kinds of errors - Can speed up program execution: no need for runtime checks ::: :::danger - Cons: - Harder to implement - Overly conservative: restricts flexibility ::: When we say that static type checking is *conservative*, we mean that it might reject more programs than it actually needs to. Assignment 5 asks you to consider this for `MiniLangT` specifically. In general, for almost all interesting languages, we can think of the world of programs like this: ``` +--------------------------------------------------+ | all programs | | +------------------------------------------+ | | | programs that can run without | | | | dynamic errors | | | | +----------------------------------+ | | | | | statically type-checkable | | | | | | programs | | | | | +----------------------------------+ | | | +------------------------------------------+ | +--------------------------------------------------+ ``` That is, the set of all statically type-checkable programs is a subset of the set of programs that can run without dynamic errors. For most languages, that means there are programs that would run perfectly fine, but that are rejected by a static type checker. Assignment 5 asks you to write some of these for `MiniLangT`. ## More on type checking for Assignment 5 :::info What type should we assign to: `(lam (x) x)` ::: `MiniLangT` does not have polymorphic types, we we require an annotation: `(lam ((x : Bool)) x)` :::info Similarly, what type should we assign to: `empty` E.g., in the expression `(cons 1 empty)` ::: Again, because `MiniLangT` does not have polymorphic types, we require an annotation: `(empty : Num)` However, while `MiniLangT` rejects these programs: does OCaml? :::success No! OCaml accepts these programs because of _parametric polymorphism_. ::: For the two examples from above, OCaml accepts both of them using polymorphic types: ```ocaml utop # fun x -> x ;; - : 'a -> 'a = <fun> ``` ```ocaml utop # [];; - : 'a list = [] ``` # Parametric polymorphism Polymorphism in general: _many forms_. You've seen basic subtyping polymorphism in Java in that inheritance allows functions to take in an object of a superclass and call methods defined on it, even when passed a superclass (more on this later in the semester). _Parametric_ polymorphism is when we parameterize a type based on type variables, like we've been seeing in OCaml with these `'a` and `'b` types. We can think of the type `'a -> 'a` as having type parameter `'a`. :::info Java also has parametric polymorphism: remember what it's called? ::: Answer: _generics_. When we write something like `List<T>` in Java, `T` is a type variable analogous in many ways to `'a` in OCaml. Back to OCaml: if we bind the identity function to the identifier `id`: ```ocaml let id = fun x -> x ``` We get type: ```ocaml 'a -> 'a ``` -`'a` is a polymorphic type variable that stands in for any type – We can read this type as "`id` takes an argument of any type and returns a result of that same type." – For all OCaml types `'a`, `id:'a->'a` If we apply the function like so: ``` utop # id 1 ;; - : int = 1 ``` We would say that `int` instantiates `'a` to type integer. :::info Is `(fun x -> x) (fun x -> x)` valid in OCaml? ::: Answer: yes, it instantiates `'a` with the type `'a->'a`! How does OCaml actually determine these types? With type _inference_! # Type inference Problem: - Give every subexpression a type such that type checking succeeds, even when missing annotations. For OCaml: this also requires giving every binding a type, in order. - Fail if and only if no solution exists 3 outcomes, in a language with parametric polymorphism: - (1) a valid, fully-specified type (e.g., `int` or `bool list` or `int -> bool`) - (2) some contradiction, such that no type can be assigned (e.g., a type error) - (3) underconstrained: polymorphic types (e.g. `'a -> 'a` or `'a list`). Considered a valid type in a language like OCaml. Implementation: - Easy, difficult, or impossible, depending on the language! - Could be a pass before a type checker, but often implemented as part of the type checker itself ## Type inference example: `f` ```ocaml let x = 42 let f (y, z, w) = if y then z + x else 0 ``` :::info What is the type of `f` here? ::: `f` is let-bound with the syntax `let f (y, z, w) =`, so `f` must be an `->` function type of the form `(_ * _ * _) -> _`. `y` is used as the first argument of `if`, so `y` must be a bool. `x` is a int, and `+` is used, so `z` must be an int. `w` is never used, so it can be any type. We'll assign it a polymorphic type, alphabetically by convention. The branches of the `if` must have the same type, 0 is an `int` and `z + x` must be an int, so we are all good there: the return type of the function must also be `int`. So the type of `f` is: `(bool * int * 'a) -> int`. :::info What if I change the program to this? ::: ```ocaml let x = 42 let f (y, z, w) = if y then z + x else true ``` Here, these is no valid type that we can assign! Our reasoning is the same as before until we get to the return type for `f`. One branch has type `int`, the other has type `bool`: these are fundamentally incompatible types. This is a contraction! Our type inference-based type checker must reject this program. As this example shows, type inference is fundamentally more difficult than type checking (for most cases): it requires reasoning over **set of constraints**. For example, constraints might take the form "this variable has type `T1` or "this type is equal to this other type" or "this type must be a list, and this variable is the same type as the elements of the list". ## General algorithm for type inference Many modern languages have some form of type inference Most powerful (general systems of constraints across the entire program): - OCaml, Haskell, Rust, Swift, plait Type inference with some restrictions (some constraints around locality, e.g., determining types from a function body) - Typescript, Go, C++, Java ## Powerful general solution: Hindley–Milner type inference Full details of implementing this beyond the scope of CS251, but we do need to understand the general principals and be able to, as humans, use the strategy for many examples. The general idea of Hindly-Milner type inference is that we will break the problem of type inference into two parts. First, we'll generate a set of constraints over the types of expressions in our program. Then, we'll solve the constraints using a process that is essentially the "solve systems of simultaneous equations" you may have seen in school, but generalized. This second step is often called "type unification". Focusing in on OCaml, we can think of the general strategy as: 1. Determine types of bindings in order – Cannot use later bindings. 2. For each `let` binding - Analyze definition for all necessary facts (constraints). - Example: `x > 0` implies `x : int` - Type error if no way for all facts to hold (over-constrained) 4. Use type variables (`'a`, `'b` ...) for any unconstrained types. An important goal of Hindly-Milner type systems is to never reject a program that could type check, if the programmer had written down types. :::success **Takeaway**: implementing a Hindley–Milner type inference algorithm in code is beyond what we have time for in CS251. However, it is totally in scope to ask you, a human, to typecheck simple OCaml programs that involve manually reasoning over a small set of constraints. ::: ## Constraint-based type inference examples: `sum` and `broken_sum` Note: in OCaml, type inference is integrated with the type checker. ```ocaml let sum xs = match xs with | [] -> 0 | x::xs' -> x + (sum xs') (* sum : T1 -> T2 [must be a function; all functions take one argument] xs : T1 [must have type of f's argument] x : T3 [pattern match against T3 list] xs' : T3 list [pattern match against T3 list] T1 = T3 list [else pattern-match on xs doesn't type-check] 0 : int, so this arm : int, so body : int, so T2=int T3 = int [because x : T3 and is argument to addition] T2 = int [because result of recursive call is argument to addition] sum xs' type-checks because xs' has type T3 list and T1 = T3 list case-expression type-checks because both branches have type int from T1 = T3 list and T3 = int, we know T1 = int list from that and T2 = int, we know f : int list -> int *) let broken_sum xs = match xs with | [] -> 0 | x::xs' -> x + (broken_sum x) (* type inference proceeds exactly like for sum for most of it: broken_sum : T1 -> T2 [must be a function; all functions take one argument] xs : T1 [must have type of f's argument] x : T3 [pattern match against T3 list] xs' : T3 list [pattern match against T3 list] T1 = T3 list [else pattern-match on xs doesn't type-check] 0 : int, so this arm : int, so body : int, so T2=int T3 = int [because x : T3 and is argument to addition] T2 = int [because result of recursive call is argument to addition] but now to type-check (broken_sum x) we need T3 = T1 and T1 = T3 list, so we need T3 = T3 list, which is impossible for any T3. Note: The actual type-checker might gather facts in a different order and therefore report a different error, but it will report an error. *) ``` # Group exercise: type inference Human type inference: For each expression below, determining its type or state that the expression is ill-typed. (Note: adapted from a UW CS341 practice problem). Recall that: - `fst` and `snd` are builtin functions for getting the first and second elements of tuples. - `()` is the unit value; `() : unit`. ```ocaml let e1 = (1,2) let e2 = (fst e1) + 10 let e3 = (snd e1, e2, fst e1) let e4 = e3 + 5 let e5 = ((), "abc", fun x -> x + 1) let e6 = fun x y -> (x, y * 2) let e7 = ("abc", fun e -> (fst e + snd e)) let e8 = snd e7 (5,6) let e9 = snd e7 10 ``` ### Answer for e1 ```ocaml int * int ``` ### Answer for e2 ```ocaml int ``` `(fst e1)` has type `int`, and `10` and `(fst e1)` are both `int`s, `(fst e1) + 10` will also be an `int`. ### Answer for e3 ```ocaml int * int * int ``` ### Answer for e4 :::danger ill-typed ::: The `+` operator operates on two `int`s but `e3` is an `int * int * int`. ### Answer for e5 ```ocaml unit * string * (int -> int) ``` `()` has type `unit`, `"abc"` is a `string`, and `fun x -> x + 1` is a function from `int` to `int` because the `+` operator has type `int -> int -> int`. ### Answer for e6 ```ocaml ‘a -> int -> ‘a * int ``` `e6` is a function that takes an `'a` and returns a function that given an `int`, returns a tuple of type `'a * int`. The type of `x` is under-constrained, so it can be any type: `'a`. However, `y` is required to be an `int` because it is multiplied by `2` in the body of the `e6`. ### Answer for e7 ```ocaml string * (int * int -> int) ``` `"abc"` is a string, and` fun e -> (fst e + snd e)` is a function which we can reason about in the following way: First, `+` is an operator on two `int`s, so `fst e` and `snd e` must be `int`s. Now, observe that `fst e` and `snd e` tell us that `e` is a tuple where the first element has the same type as `fst e` (which is an `int`), and the second element has the same type as `snd e` (which is an `int`). The body of the function `(fst e + snd e)` will have type `int`, so `fun e -> (fst e + snd e)` has type `int * int -> int`. ### Answer for e8 ```ocaml int ``` As established above, `e7` has type `string * (int * int -> int)`, so `(snd e7)` will have type `int * int -> int`. `(5, 6)` has type `int * int`, so the result of applying `(snd e7)` to `(5, 6)` will be a value of type `int`. ### Answer for e9 :::danger ill-typed ::: As established, `(snd e7)` has type `int * int -> int`. However, `10` has type `int`. ## A final thought experiment :::info Consider our identity function, `let id = fun x -> x`. If we plug this into `utop`, we get ```ocaml val id : 'a -> 'a = <fun> ``` If we ignore side effects inside the function (like printing a value), is it possible to write another function `foo` that returns a different answer than `id`, but has the same type? ::: Answer: it _feels like_ the answer should be yes: can't we just pick some other value of type `'a` to return? But how can the body `foo` create another value of type `'a`? In general, it cannot! We can think of this multiple ways: one is that `foo` knows _nothing_ about `'a`, so it has no information to go off of to pick a different value. Another is that any changes the body could make to the argument would concretize it's type to a specific value, rather than ` 'a -> 'a`. For example, `let foo = fun x -> fst x` has type `'a * 'b -> 'a`, rather than `'a -> 'a`. So in short: _the answer is no_. Any function `foo` with the type `'a -> 'a` must return its argument, if it returns any value at all (though it can also have side effects or result in an exception, rather than a value).