Lecture 11: Type Inference

Lecture 11: Type Inference --- :::warning **Announcements** - **Please sit in the first 2 rows of the room**. - Quiz 1 graded. Let me know if you have questions! - `A5: Type Checking` due Thursday - Next help hours: - Mine 6:30-8:30pm tonight, L037 - Grace's 1-2pm Wednesday, Zoom ::: :::success Agenda for today - Type checking/Assignment 5 continued - Parametric polymorphism - Type inference - Motivation/context - Examples - Exercise: type inference in OCaml - Final thought experiment ::: ## More on type checking for Assignment 5 :::success What type should we assign to: `(lam (x) x)` ::: `MiniLangT` does not have polymorphic types, so we require an annotation: `(lam ((x : Bool)) x)` :::success Similarly, what type should we assign to: `empty` E.g., in the expression `(cons 1 empty)` ::: Again, because `MiniLangT` does not have polymorphic types, we require an annotation: `(empty : Num)` :::success However, while `MiniLangT` rejects these programs without the annotation: does OCaml? ::: :::spoiler Think, then click! No! OCaml accepts these programs because of _parametric polymorphism_: the special `'a` types we have seen OCaml assign to ambiguous expressions, such as `[]`. For the two examples from above, OCaml accepts both of them using polymorphic types: ```ocaml utop # fun x -> x ;; - : 'a -> 'a = <fun> ``` ```ocaml utop # [];; - : 'a list = [] ``` ::: # Parametric polymorphism The word *polymorphism* in general means _many forms_. You've seen basic *subtyping polymorphism* in Java, in that inheritance allows functions to take in an object of a superclass and call methods defined on it, allowing multiple subclasses to be passed in. That is, one Java function can accept arguments of different types. :::info We'll dig much more into Java's subclassing later in the semester, for `A9: MiniJava`! ::: *Parametric polymorphism* is when we parameterize a type based on type variables, as we've been seeing in OCaml with these `'a` and `'b` types. We can think of the type `'a -> 'a` as having type parameter `'a`: the type itself takes an argument (a concrete type). If later code supplies a specific concrete type, like `int`, then the function types becomes `int -> int`. :::info Java also has parametric polymorphism: does anyone remember from CS230 (or other Java experience) what it's called? ::: :::spoiler Think, then click! Java has parametric polymorphism via its support for _generics_. When we write something like `List<T>` in Java, `T` is a type variable analogous in many ways to `'a` in OCaml. ::: Moving back to OCaml: if we bind the identity function to the identifier `id`: ```ocaml let id = fun x -> x ``` We get type: ```ocaml 'a -> 'a ``` -`'a` is a polymorphic type variable that stands in for any type – We can read this type as: "`id` takes an argument of any type and returns a result of that same type." – For all OCaml types `'a`, `id: 'a->'a` If we apply the function like so: ``` utop # id 1 ;; - : int = 1 ``` We would say that `int` instantiates `'a` to type `int`. :::success Is `(fun x -> x) (fun x -> x)` valid in OCaml? That is, does it type check? ::: :::spoiler Think, then click! Yes, the function application instantiates `id` 's `'a` with the type `'a->'a`! ::: How does OCaml actually determine these types? With the magic of type _inference_! :::info Many other languages, from Rust to TypeScript, also employ type inference! ::: # Type inference We can specify the problem type inference is solving like so: :::info Problem: - Give every subexpression a type such that type checking succeeds, even when missing annotations. For OCaml: this requires giving every binding a type, in the order in which each binding appears in the program. - Fail if and only if no solution exists. ::: Type inference should have 3 possible outcomes in a language with parametric polymorphism: - (1) a valid, fully-specified ("concrete") type (e.g., `int` or `bool list` or `int -> bool`) - (2) some contradiction, such that no type can be assigned (e.g., a type error) - (3) underconstrained: a polymorphic type (e.g. `'a -> 'a` or `'a list`). Considered a valid type in a language like OCaml. The implementation of type inference: - Can be easy, difficult, or impossible (to solve 100% of the time), depending on the language! - Could be a pass before a type checker, but often implemented as part of the type checker itself. :::info We'll walk through an example before we lay out one algorithm for type inference. Note that we don't have time in CS251 to have you implement type inference on an assignment. However, you will be expected to be able to *perform* type inference yourself, given relatively short OCaml programs! This is a useful skill to build across any language with polymorphism (e.g, most modern languages). ::: ## Type inference example: `f` Consider the following OCaml program: ```ocaml let x = 42 let f (y, z, w) = if y then z + x else 0 ``` :::success What is the type of `f` here? ::: Some steps we can use to figure this out: - `f` is let-bound with the syntax `let f (y, z, w) =`, so `f` must be an `->` function type of the form `(_ * _ * _) -> _`. - `y` is used as the first argument of `if`, so `y` must be a bool. - `x` is a int, and `+` is used, so `z` must be an int. - `w` is never used, so it can be any type. We'll assign it a polymorphic type, alphabetically by convention. - The branches of the `if` must have the same type, 0 is an `int` and `z + x` must be an int, so we are all good there: the return type of the function must also be `int`. So the type of `f` is: `(bool * int * 'a) -> int`! :::success What if I change the program to this? ::: ```ocaml let x = 42 let f (y, z, w) = if y then z + x else true ``` :::spoiler Think, then click! Here, there is no valid type that we can assign! Our reasoning is the same as before until we get to the return type for `f`. One branch has type `int`, the other has type `bool`: these are fundamentally incompatible types. This is a contradiction! OCaml's type-inference-based type checker must reject this program. ::: As these examples show, type inference is fundamentally more difficult than type checking (for most cases): it requires reasoning over **set of constraints**. For example, constraints might take the form "this variable has type `T1` or "this type is equal to this other type" or "this type must be a list, and this variable is the same type as the elements of the list". ## General algorithm for type inference Many modern languages have some form of type inference. The most powerful (general systems of constraints across the entire program): - OCaml, Haskell, Rust, Swift, plait Type inference with some restrictions (some constraints around locality, e.g., determining types from a function body): - Typescript, Go, C++, Java ## Powerful general solution: Hindley–Milner type inference As we said before, full details of implementing this beyond the scope of CS251. But, we do need to understand the general principles and be able to, as humans, use the strategy for understanding OCaml programs! The general idea of Hindley–Milner type inference is that we will break the problem of type inference into two parts: First, we'll generate a set of constraints over the types of expressions in our program. Then, we'll solve the constraints using a process that is essentially the "solve systems of simultaneous equations" you may have seen in school, but generalized. This second step is often called "type unification". This is the more difficult part to implement as a general solution, but tends to be quickly solvable for the short OCaml programs we'll consider manually. Focusing in on OCaml, we can think of the general strategy as: 1. Determine types of bindings in order — we cannot use later bindings when typing earlier ones. 2. For each `let` binding: - Analyze definition for all necessary facts (constraints). - Examples: - `x > 0` implies `x : int` - `1::xs` implies `xs: int list` - `match x | [] -> ...` implies `x : 'a list` 3. Use type variables (`'a`, `'b` ...) for any unconstrained types. 4. Type error if no way for all facts to hold (over-constrained)! An important goal of Hindley–Milner type systems is to never reject a program that could type check, if the programmer had written down types. :::info **Takeaway**: implementing a Hindley–Milner type inference algorithm in code is beyond what we have time for in CS251. However, it is totally in scope to ask you, for the type of simple OCaml programs that involve manually reasoning over a small set of constraints. ::: ## Constraint-based type inference examples: `sum` and `broken_sum` Note: in OCaml, type inference is integrated with the type checker. ```ocaml let sum xs = match xs with | [] -> 0 | x::xs' -> x + (sum xs') (* sum : T1 -> T2 [must be a function; all functions take one argument] xs : T1 [must have type of f's argument] x : T3 [pattern match against T3 list] xs' : T3 list [pattern match against T3 list] T1 = T3 list [else pattern-match on xs doesn't type-check] 0 : int, so this arm : int, so body : int, so T2=int T3 = int [because x : T3 and is argument to addition] T2 = int [because result of recursive call is argument to addition] sum xs' type-checks because xs' has type T3 list and T1 = T3 list case-expression type-checks because both branches have type int from T1 = T3 list and T3 = int, we know T1 = int list from that and T2 = int, we know f : int list -> int *) let broken_sum xs = match xs with | [] -> 0 | x::xs' -> x + (broken_sum x) (* type inference proceeds exactly like for sum for most of it: broken_sum : T1 -> T2 [must be a function; all functions take one argument] xs : T1 [must have type of f's argument] x : T3 [pattern match against T3 list] xs' : T3 list [pattern match against T3 list] T1 = T3 list [else pattern-match on xs doesn't type-check] 0 : int, so this arm : int, so body : int, so T2=int T3 = int [because x : T3 and is argument to addition] T2 = int [because result of recursive call is argument to addition] but now to type-check (broken_sum x) we need T3 = T1 and T1 = T3 list, so we need T3 = T3 list, which is impossible for any T3. Note: The actual type-checker might gather facts in a different order and therefore report a different error, but it will report an error. *) ``` # Group exercise: type inference :::success Human type inference: For each expression below, determine its type or state that the expression is ill-typed. (Note: adapted from a UW CS341 practice problem). Recall that: - `fst` and `snd` are builtin functions for getting the first and second elements of tuples. - `()` is the unit value; `() : unit`. ```ocaml let e1 = (1,2) let e2 = (fst e1) + 10 let e3 = (snd e1, e2, fst e1) let e4 = e3 + 5 let e5 = ((), "abc", fun x -> x + 1) let e6 = fun x y -> (x, y * 2) let e7 = ("abc", fun e -> (fst e + snd e)) let e8 = snd e7 (5,6) let e9 = snd e7 10 ``` ::: :::spoiler Think, then click! ### Answer for e1 ```ocaml int * int ``` ### Answer for e2 ```ocaml int ``` `(fst e1)` has type `int`, and `10` and `(fst e1)` are both `int`s, `(fst e1) + 10` will also be an `int`. ### Answer for e3 ```ocaml int * int * int ``` ### Answer for e4 :::danger ill-typed ::: The `+` operator operates on two `int`s but `e3` is an `int * int * int`. ### Answer for e5 ```ocaml unit * string * (int -> int) ``` `()` has type `unit`, `"abc"` is a `string`, and `fun x -> x + 1` is a function from `int` to `int` because the `+` operator has type `int -> int -> int`. ### Answer for e6 ```ocaml ‘a -> int -> ‘a * int ``` `e6` is a function that takes an `'a` and returns a function that given an `int`, returns a tuple of type `'a * int`. The type of `x` is under-constrained, so it can be any type: `'a`. However, `y` is required to be an `int` because it is multiplied by `2` in the body of the `e6`. ### Answer for e7 ```ocaml string * (int * int -> int) ``` `"abc"` is a string, and` fun e -> (fst e + snd e)` is a function which we can reason about in the following way: First, `+` is an operator on two `int`s, so `fst e` and `snd e` must be `int`s. Now, observe that `fst e` and `snd e` tell us that `e` is a tuple where the first element has the same type as `fst e` (which is an `int`), and the second element has the same type as `snd e` (which is an `int`). The body of the function `(fst e + snd e)` will have type `int`, so `fun e -> (fst e + snd e)` has type `int * int -> int`. ### Answer for e8 ```ocaml int ``` As established above, `e7` has type `string * (int * int -> int)`, so `(snd e7)` will have type `int * int -> int`. `(5, 6)` has type `int * int`, so the result of applying `(snd e7)` to `(5, 6)` will be a value of type `int`. ### Answer for e9 :::danger ill-typed ::: As established, `(snd e7)` has type `int * int -> int`. However, `10` has type `int`. ::: ## A final thought experiment :::success Consider our identity function, `let id = fun x -> x`. If we plug this into `utop`, we get ```ocaml val id : 'a -> 'a = <fun> ``` If we ignore side effects inside the function (like printing a value), is it possible to write another function `foo` that returns a different answer than `id`, but has the same type? ::: :::spoiler Think, then click! It _feels like_ the answer should be yes: can't we just pick some other value of type `'a` to return? But how can the body `foo` create another value of type `'a`? In general, it cannot! We can think of this multiple ways: one is that `foo` knows _nothing_ about `'a`, so it has no information to go off of to pick a different value. Another is that any changes the body could make to the argument would concretize it's type to a specific value, rather than ` 'a -> 'a`. For example, `let foo = fun x -> fst x` has type `'a * 'b -> 'a`, rather than `'a -> 'a`. So in short: _the answer is no_. Any function `foo` with the type `'a -> 'a` must return its argument, if it returns any value at all (though it can also have side effects or result in an exception, rather than a value). :::