Lexical scope continued

:::warning Announcements - A2 due 11:59pm tonight. - A3 was released Sunday - Fill out pair form by end of class today! - Preliminary deadlines: Thurs, Sun, Tues. - Good-faith submissions contribute to design score - Quiz 1 next Monday, the 24th - Logistics page [here](https://cs.wellesley.edu/~cs251/s25/notes/quiz1.html). ::: :::success Agenda for today - Finish lexical vs dynamic scope _(end of Quiz 1 material)_ --- - Motivation/context for OCaml - Static types - Intro to OCaml - Bindings for evaluation *and* types - Recursion in OCaml - (next time) Basics of pattern matching, tuples, records, variant types ::: # Lexical scope continued Recall from last class that _lexical scope_ is one way of resolved how free variables are resolved when used in a function/lambda body. For example, if I have the lambda: ```racket (define (f y) (+ x y)) ``` Only `y` is bound by the lambda argument. `x` is free: we must get its value from the enclosing scope. Lexical vs dynamic scope are answers to the question of: do we look up the value based on where the lambda is _defined_, or based on where it is _called_/_applied_? - Lexical scope says it should be based on the _definition_ (of which there is only one, which we can statically find in one location. ) - Dynamic scope says it should be based on where it is _called_/_applied_ (of which there might be multiple call sites, we only know which is actually relevant dynamically at runtime, while the program is being evaluated). Racket, and almost every other language you might encounter, use **lexical scope**. Historically, the original version of LISP used dynamic scope. But most programming language designers now agree: _lexical scope is better_. Let's consider some reasons why. ## Why lexical scope? ### 1: Function meaning does not depend on name choices. Consider this program: ```racket= (define (f y) (let ([x (+ y 1)]) (lambda (z) (+ x y z)))) ``` Now imagine we change body of `f` to replace `x` with `q`, which we intend to be an equivalent program: ```racket= (define (f y) (let ([q (+ y 1)]) (lambda (z) (+ q y z)))) ``` Under **lexical scope**, this has no impact. The meaning of the lambda defined on line 3. Under **dynamic scope**: depending on whether `x` or `q` are in the scope where the lambda defined on line 3 is _used_, we get different results! This is harder to reason about! With *lexical scope*, we can understand the entire meaning of lambda just by examining its definition. There are no "hidden parameters" that are passed based on where the lambda later gets used. ### 2 Closures remember the data they need Closures automatically “remember” the data they need. For example, consider the function `greater-than-x` which produces a `lambda`: ```racket= (define (greater-than-x x) (lambda (y) (> y x))) (define (no-negs xs) (filter (greater-than-x -1) xs)) (define (all-greater xs n) (filter (lambda (x) (> x n)) xs) ``` Under **lexical scope**, when `(greater-than-x -1)` is evaluated on line 5, the lambda is saved as a closure over the environment that binds `x` to `-1`, as intended. When `(lambda (y) (> y x))` is evaluated inside the body of `filter`, `x` will be mapped to `-1`. Under **dynamic scope**, when `(greater-than-x -1)` is evaluated on line 5, the value of `-1` is **not saved**. When `(lambda (y) (> y x))` is evaluated inside the body of `filter`, one of two things will happen: 1. If there happens to be another `x` bound in that scope, that different value of `x` will be used! 2. Otherwise, we would get a `variable not found` error. Either case is not what we intended! As you will see on Assignment 3, this means that lambdas, when saved as values under a language with dynamic scope, need 3 things: the argument names, the body, and a copy of the environment at the time they were defined. ## When does what happen when, again? It's important to remember when things are evaluated! In Racket (and most other languages you've used): 1. A **function body** is not evaluated until the function is called. A function body is evaluated every time the function is called. For example, the definition of `(define (f x) (/ 1 0))` does not result in an error, because the body is only evaluated on the function being _called_. 3. A function's **arguments** are evaluated before the called function's body. By convention, they are evaluated from left to right. For example, in `(g (+ 1 2) (* 3 4))`, the sum happens first, then the product, then the evaluation of `g`'s body with the two new bindings for the arguments. 4. A **binding** evaluates its expression when the binding is created, not every time the variable is used. For example, in `(let ([x (+ 1 2)]) (* x x))`, the sum happens just once, when `x` is bound to 3. Each use of `x` in the product just looks up `x` in the environment to get `3`. As with lexical/dynamic scope, there are other options here _not_ used by Racket. We'll talk about this more later in the semester when we get to _lazy_ evaluation! :::success This is the end of the material covered by Quiz 1! ::: --- # New language: OCaml! For the next segment of the course, we'll see how these language fundamentals apply in a new context: in the functional, statically-typed language **OCaml**! ## Why OCaml? As we discussed in the start of the semester, new languages help us (1) become better programmers in the languages we already know, and (2) practice the skill of learning new languages. In addition, OCaml is just really fun! Some reasons: - Powerful static types - Good libraries, industry use (e.g., Jane Street) I highly recommend the (free, online, videos included) [OCaml textbook](https://cs3110.github.io/textbook/cover.html) linked to from the Resources page. ## A very brief history of OCaml **"The other"** ML: ML: Meta-Language for Theorem-Proving - Developed in the early 1970s at the University of Edinburgh, for treating programs as proofs. - It's creator, Milner, won the [Turing Award](https://amturing.acm.org/award_winners/milner_1569367.cfm) (the "CS Nobel Prize") for "ML, the first language to include polymorphic type inference together with a type-safe exception-handling mechanism". - OCaml is the most widely-used/taught modern descendant. --- # Types in general Zooming out a bit, let's consider _types_ in general. :::success Question: what is a _type_ (in a programming sense)? ::: A type is... - How data is represented in binary, as 0s and 1s - A set of values that share some property - A prediction about the value an expression will yield. E.g., if the expression `(+ 1 0)` has type `int`, we can predict that it will yield an int-shaped value even before evaluating the expression. --- :::success Question: Why do programming languages use types? ::: - Types help catch certain kinds of errors (see below). - Types help with documentation. Comments can get out-of-date/state: types cannot! - Compilers can exploit types to make code faster. For example, if the type system rules out invalid field access, then the compiled code does not need to check the type of the data before accessing the field. :::success What kind of errors can types rule out? ::: - Adding a number and a string - Calling a function that returns the wrong number or type of arguments - Calling a function that returns the wrong result - Storing data of the wrong kind to a data structure - For those who have taken 240: silently corrupting memory In more advanced languages, like Rust: - Eliminating concurrency bugs In practice, type systems often reflect categories of values that share implementation-independent properties. Integers: - support arithmetic: adding, subtracting, _truncating_ divide - Produce other integers as a result of the above - Can be formatted a certain way Lists: - Can be concatenated - Can have elements appended - Have a first item ## Static type checking In static type checking, a program is checked for type errors before it is run/evaluated. The benefit of this are that (1) it catches errors as early as possible, and (2) it enables languages implementations (compilers, interpreters) to produce more efficient code. The downside of this is that it will rules out programs that _may_ have run as intended, because the static type analysis determines that it is _possible_ for them to have an error. We'll see some examples of this in OCaml. --- # Types as a window into OCaml In Assignment 3, you'll use environments to track variable bindings. These are _dynamic_ environments: values are added to them as the program executes (not to be confused with dynamic scope, which is about where lambda/function environments are saved). This is essentially how the core of Racket operates. OCaml instead maintains **two kinds** of environments: 1. Dynamic Environment (runtime): Maps names to actual values. This is what we have been using so far. 2. :new: Static Environment (compile-time): Maps names to types. These environments help the compiler and the runtime system keep track of variable types and values. The meaning of a binding is to update the environments. --- ## Running OCaml We'll first show running OCaml in a Read-Eval-Print Loop (REPL) similar to the DrRacket interactions window. We'll run this REPL in any IDE: I'll show examples in VSCode's terminal. The OCaml REPL is called utop. - To start utop: Install OCaml and then `utop`, then type `utop` in your terminal. - Run commands: Type OCaml expressions and see immediate results. Use `;;` to signal the end of the top-level expressions. - Load a file: Use `#use "filename.ml";;` to load and execute a file. Example: ```ocaml #use "my_program.ml";; ``` In OCaml, we use `let` bindings to associate names with values (including functions). `let` takes the place of `define` in Racket: its used for both top-level and local bindings. An OCaml program is a series of bindings. ```ocaml (* Binding a variable *) let x = 3 (* Static Environment Update: { x : int } *) (* Dynamic Environment Update: { x : 3 } *) (* Binding another variable *) let y = 6 (* Static Environment Update: { x : int; y : int } *) (* Dynamic Environment Update: { x : 3; y : 6 } *) ``` For OCaml, we'll use VSCode. If you have not used VSCode before, install it [here](https://code.visualstudio.com/download). ## Conditionals in OCaml Unlike Racket, OCaml conditionals do use the `then` and `else` keywords: ```ocaml utop [5]: let result = if y > 5 then 0 else 1 ;; val result : int = 0 ``` Note we use `;;` to indicate the end of the top-level expressions that we want evaluated. It's not required in files. Beyond the syntactic differences, OCaml's static types give conditionals a different semantics to Racket's. In particular, the following is an error: ```ocaml utop [6]: let result = if y > 5 then 0 else "1" ;; Error: This expression has type string but an expression was expected of type int ``` :::success A student asked: _why_ is this an error? ::: In order to prevent bugs with types, OCaml needs to be able to assign a type to _every_ subexpression. To assign a type to `if y > 5 then 0 else "1"` _without_ evaluating the expression, we need to impose the more strict requirement that regardless of whether `y > 5`, the overall value should have the same type. # Functions in OCaml Like in Racket, functions are first-class values in OCaml. They can be defined, passed as arguments, and returned from other functions. ```ocaml (* Defining Functions Syntax: let function_name ((param_1: type_1), ..., (param_n: type_n)) : return_type = (* function body *) Example: *) let max ((a: int), (b: int)) : int = if a < b then b else a (* Function Name: max *) (* Parameters: two ints named a and b both of type int *) (* Return Type: int *) (* Function Body: Uses an `if` expression to determine the maximum *) (* We also can leave off the types *) let max2 a b = if a < b then b else a;; (* val max2 : 'a -> 'a -> 'a = <fun> *) (* (We'll cover _type inference_ later in the semester *) let increment x = x + 1 (* val increment : int -> int = <fun> increment is the identifier to which the value was bound. int -> int is the type of the value. This is the type of functions that take an int as their argument and produce an int as output. The value is a function, which the utop chooses to display just as <val> rather than the full body. *) ``` # Recursion in OCaml OCaml, as a functional language, of course supports recursion! We use `let rec` to indicate a binding that can refer to itself: ```ocaml let rec pow ((base: int), (exp: int)) : int = if exp = 0 then 1 else base * pow (base, exp - 1) (* Function Name: pow *) (* Parameters: two ints, base and exp, both of type int *) (* Return Type: int *) (* Base Case: If exp = 0, return 1 *) (* Recursive Case: Multiply base by pow (base, exp - 1) *) ``` We can call this function with: ```ocaml pow (2, 2) ``` # Lists in OCaml Lists are also a recursive data structure in OCaml: we have the empty list, then non-empty lists that _cons_ together an element and a list. When we enter the empty list into `utop`, we get: ```ocaml utop [12]: [] ;; - : 'a list = [] ``` Because _every_ expression in OCaml must have a type, OCaml uses `'a` to indicate that this is a list of some polymorphic, unspecified type. That is, the empty list could be a list of anything! This is similar to generic in Java, and we will explore it in much more detail moving forward. We can create a non-empty list with cons, which is written as an infix `::`: ```ocaml utop [13]: 1::[] ;; - : int list = [1] ``` Now, we see the type is `int list`. We could also have a list of lists of int: `int list list`. As we make the list longer: ```ocaml utop [14]: 1::2::[] ;; - : int list = [1; 2] ``` We see that OCaml prints list values as `[v0; ... ; vn]`. Note the `;` instead of `,`. OCaml's static types also rule out many lists that were allowed by Racket. For example: ```ocaml utop [16]: "a"::1::[] ;; Error: This expression has type int but an expression was expected of type string ``` That is, lists must be homogeneous: every element should have the same type. We can get the first (_head_) and rest (_tail_) with: ```ocaml utop [18]: List.hd ls;; - : int = 1 utop [19]: List.hd(ls);; - : int = 1 utop [21]: List.tl(ls) ;; - : int list = [2; 3; 4] ``` Note, we can leave off the parens when calling functions in OCaml if the expression does not need them to specify order of operations. # Group exercise Write a function that sums the numbers in a list using explicit natural recursion, then tail recursion, in OCaml. Use `List.hd`, `List.tl` to access the head and tail. Check for empty with `= []`. Solution: explicit natural recursion. Note we can leave the types off, since the `+` in OCaml can only apply to `int`s: so OCaml infers the full types for us. ```ocaml let rec sum_list lst = if lst = [] then 0 else List.hd lst + sum_list (List.tl lst) ``` Solution: explicit tail recursion ```ocaml let sum_list ls = let rec helper (ls, acc) = if ls = [] then acc else helper((List.tl ls), acc + (List.hd ls)) in helper (ls, 0) ``` Note, for local `let` bindings, we use `in` to separate the expression being bound from the body: `let x = e1 in body`. Next class, we'll see the more cannonical way to write this in OCaml with _pattern matching_.