Lecture 6: ASTs, Interpreter basics, lexical scope

:::warning Announcements - A2 deadline pushed to Tuesday (but no additional 48-hour extension). - A3 deadline pushed to Thursday the 27th (out this weekend). - Quiz 1 in 1.5 weeks: Monday the 24th. - Topics: everything up to today's last topic of lexical vs. dynamic scope (which we will wrap up on Tuesday.) - New help hours moving forward: - Mondays 3:45-5:15 in L043 (our classroom). - Tuesday next week because holiday. - Thursdays 3:45-5:15 in L037 (the systems lab). ::: :::success Agenda for today - Representing programs as data - Abstract syntax trees, just a bit on parsing - Interpreters vs compilers - Interpreter basics: - Operating on programs as data - A Racket #lang with types: plait - Recursive functions on programs - Calculator example - Nuances of functions/lambdas as data - Lexical vs. dynamic scope - To be continued next class! ::: # Warmup: programs as data ```racket ;; WARMUP QUESTION: ;; How do you think a language implementation ;; represents a nested expression in memory? ;; How do you think it represents a lambda? ;; What information does the lambda need to track? ;; e.g. (+ (* 1 2) (- 5 1)) ;; This evaluates to 6, but how does Racket get ;; there? (define y 3) (define f (lambda (x) (+ x y))) ;; What does Racket need to keep track of for f? ``` # Interpreter basics ## Representing programs as data Programming language implementations (whatever actually lets you **run** your code) first need to go from the keystrokes we type into our IDE (e.g., strings) to treating the program itself as data, with program representations in memory that are easier to perform analysis on. ## A bit on parsing For any language implementation, the first step is _parsing_: consuming a string, character by character, and producing a program representation in memory. This includes details like e.g. are Racket parens balanced. [Parsing](https://en.wikipedia.org/wiki/Parsing) is a rich and interesting subfield, but not necessarily the most useful to understand for working programmers, so we won't have time to focus on it in CS251. I will typically provide parsers where needed for assignments. ## Abstract Syntax Trees (ASTs) Usually, the next program representation after the source string is an _abstract syntax tree_. An AST represents the structure of a program (or pieces of it), with the outer-most operators at the top of each subtree. For example, the abstract syntax tree for the warmup snippets: An AST for `(+ (* 1 2) (- 5 1))` is: ``` + / \ * - / \ / \ 1 2 5 1 ``` `+` is the outer-most operator, so it is the root of the entire tree. `*` and `-` are each the root of their own subtrees. Note that order matters: `1` comes before `2` in the multiplication, so `1` is the left subtree/leaf. An AST for `(define y 3)`: ``` define / \ y 3 ``` An AST for `(define f (lambda (x) (+ x y)))`: ``` define / \ f lambda / \ x + / \ x y ``` Sometimes AST edges will be labeled: for example, we could label the first use of `x` with `argument` and the sum as `body` of the `lambda`. To the second warmup question: at the end of the lecture, we'll talk more about additional information language implementations need to track for lambdas. After creating the AST, we have a choice in how to further evaluate this program. There are two primary methods: - An _interpreter_ traverses the AST to compute results **directly** (typically by using the _host_ or _implementation_ language operators). For example, when evaluating `(+ (* 1 2) (- 5 1))`, an interpreter would first evaluate `(* 1 2)`, then `(- 5 1)`, and finally apply `+` to the results. The `*`, `+`, `-` being invoked would be based on what language the interpreter is written in: e.g. Racket's operators. Python and Racket are both usually interpreted. - An _compiler_ traverses the AST and translates it to a lower-level representation: either bytecode (e.g., Java) or machine code (e.g., C). This lower-level code is then executed by a lower-level system: a language runtime or virtual machine, or directly by the hardware. Modern languages might do a mix of interpreting and compiling to produce results. The same language can have both interpreter and compiler implementations. For example, JavaScript is usually "just-in-time (JIT)" compiled, which is a mix of these two primary strategies in that some aspects of the program are compiled to hardware instructions _while_ the program is being executed. Interpreters are often slower, but are easier to implement and more directly map to the evaluation rules of a language's defined semantics. We'll build several interpreters throughout the remainder of this course, starting with A3 next week! ## Operating on programs as data Part of the power of Racket is being able to build and work in many different sublanguages for different purposes. For A3, we'll use a new Racket sublanguage: `#lang: plait` (named for the optional textbook, [Programming Languages: Application and Interpretation](https://www.plai.org/), + Typed). You can find the `plait` language documentation, including a tutorial, [here](https://docs.racket-lang.org/plait/). `plat` shares most of its syntax with Racket. :::info Install Plait by going into DrRacket, File | Package Manager… | Do What I Mean, and type `plait` (all lower-case) into the box before pressing `install`. DrRacket will find and install the latest version of `plait`. Then, use `#lang plait` at the start of your file. ::: ## Calculator example Rather than start off with showing a full interpreter in class, we will instead build a simplified interpreter. Our simplified essentially will be a calculator for a math-based language, with expression and constants, but no functions, let-bindings, etc (those, you will implement on A3!). ### `CalcLang` definition We want to support a simple calculator language with the following grammar: ``` e ::= n ; number | const ; constant | (+ e1 e2) ; addition ``` To start with, expressions can be either numbers, pre-defined mathematical constants, or additions on two expression arguments. How can we translate this to a programming language implementation? We want numbers, constants, and additions to all "be" expressions. (Fun note: Alexa once got a similar question in a tech/finance interview). In Java, we might express this by defining an `Expression` class with subclasses for `Num` `Const` and `Addition`. But then how each subclass differed in behavior might be spread out amongst the subclass definitions. In `plait`, as with many functional languages, we'll instead use a recursive data definition: ```racket (define-type MathExpr (e-num [value : Number]) (e-const [name : Symbol]) (e-add [left : MathExpr] [right : MathExpr])) ``` This definition says we have a new type, `MathExpr`, that can be one of 3 variants: 1. an `e-num` (read as: expression number) that contains a `value` of built-in type `Number`, or 2. an `e-const` that contains a `name` of built-in type `Symbol`, or 3. a recursive `e-add` that contains two `MathExpr`s, one named `left` and one named `right`. You can read more about the `define-type` keyword in the [plait documentation](https://docs.racket-lang.org/plait/Definitions.html#%28form._%28%28lib._plait%2Fmain..rkt%29._define-type%29%29). This type of recursive type definition, also known as an [algebraic data type](https://en.wikipedia.org/wiki/Algebraic_data_type), is common across many languages: plait, OCaml (which we will start learning next week), Haskell, and Rust use them heavily, while they are also now supported in some form in many more languages, including Java, C++, Python, TypeScript, and Scala. Now, we can write functions that take in a `MathExpr`, and define different behavior based on which of the 3 variants the data is. A note on the `e-const` definition: Racket's [`Symbol` type](https://docs.racket-lang.org/reference/symbols.html) is distinct from its `String` type, and is a more general language term that is often used for _names_. Symbols differ from strings in that they cannot be broken down into substrings or characters: the symbol `'alexa` is equal to itself and not equal to `"alexa"`; only the latter can be broken into substrings. This allows languages to implement symbols more efficiently than strings: a symbol can be _interned_ to store the mapping from an id (which takes less space than a string) to the full name. Going back to our overall `MathExpr` type: we can now write a `calc` function to interpret nested math expressions! First, some examples in the interactions window: `plait`, unlike Racket, tells us both _the type of a term_ and its value after evaluation. 1 unsurprisingly has builtin type `Number`: ```racket > 1 - Number 1 ``` Some `MathExpr`s: ```racket > (e-num 1) - MathExpr (e-num 1) > (e-const 'pi) - MathExpr (e-const 'pi) > (e-add (e-num 1) (e-const 'pi)) - MathExpr (e-add (e-num 1) (e-const 'pi)) ``` Note, plait gives us several things from a type definition: - As shown, we can construct values of each variant type using its name, e.g., `(e-num n)` where `n` is a `Number`. - We can use the variant name + a `?` to ask if a `MathExpr` is that variant: ```racket > (e-add? (e-num 1)) - Boolean #f ``` - We can access each field of the variant via `variant-id-field-id`, for example: ```racket > (define e (e-add (e-num 1) (e-const 'pi))) > (e-add-left e) - MathExpr (e-num 1) > (e-add-right e) - MathExpr (e-const 'pi) ``` #### Defining `calc` Our first `calc` case is easy: if we want to evaluate `MathExpr`s to final numeric values, then an `e-num` just needs to return its number field: ```racket (define (calc (e : MathExpr)) : Number ; e has type MathExpr ^ ^ calc returns a Number (type-case MathExpr e ; type, then the thing we want to case on ; [() ()] shape of each case: matching on variant for e, then result ; e-nums just evaluate to their value [(e-num value) value] ; report an error for unhandled cases, for now [else (error 'calc-error "cannot do calculation")] )) ``` `type-case` is similar to `cond` in `#lang racket`, except it matches an expression (here, `e`) to its variant (in each case). This is how we define different behavior per variant. #### Using a `hash` in `plait` To support constants, we'll use a `hash` map from `Symbol` to `Number`. Note that the way we're doing this in class is not the best style (it's overly verbose), but I'm doing it this way to show you some `hash` functions you'll need for A3. We can define a `hash` on these types as: ```racket (define-type-alias Constants (Hashof Symbol Number)) ``` We can then create an **immutable** hash. Like immutable lists, operations on this hash produce new hashes, rather than modifying the existing hash. ```racket (define (create-constants) ; Make an initially empty immutable hash (let* ([init (hash '())] ; Add pi [just-pi (hash-set init 'pi 3.14159)] ; Add e [more (hash-set just-pi 'e 2.71828)]) more)) ; Use a top-level binding in this example, because our consts ; won't change. Note this is *not* what you'll do for A3, since ; there, expressions can add bindings. (define consts (create-constants)) ``` `hash-ref` is the function to look up a value in a hash given its key. Note that when we look up a key in our `hash`, plait makes a different design decision than e.g. Python. Rather than producing a runtime error, plait returns a `none` value when a key is in the hash: ```racket > (hash-ref consts 'alexa) - (Optionof Number) (none) ``` This `none` is one variant of an `Option` algebraic data type: another pattern that is common across many languages, including OCaml, Rust, and newer versions of e.g. Java. In high level terms, an `Option` wraps another type and has two variants: - `some v` indicates there is some result, `v` - `none` indicates there is not a result Option types are nice for (at least!) two reasons: 1. They require programmers to explicitly handle no result, instead of falling back to e.g. `key not found` errors or null pointer exceptions. 2. Data structures that return option types sometimes allow us to write more efficient lookup code. For example, in Python, a programming might write the following to print a value if it is present in a dictionary: ``` if key in dict: # Check if key is there print(key[dict]) # Get value if so ``` This pattern checks the dictionary twice, one to check if `key` is present, and once to get its value. While both of these are amortized constant time, this is still somewhat redundant work. If a hashmap returns an option type, as we'll see below, we query the hashmap once, then do a very quick (worst case still constant time) check for whether it is `some` or `none`. Here is our case for looking up constants as part of `calc`: ```racket (define (calc (e : MathExpr)) : Number (type-case MathExpr e [(e-num value) value] [(e-const name) (let ([n (hash-ref consts name)]) (if (some? n) ; if result (some-v n) ; access the value field (error 'calc-error "constant not found")))] ; ... ``` #### Group exercise: semantics of `/` Interpreters allow us to implement whatever semantics we want for operators, even if these semantics differ from the host/implementation language. For example, dividing 1/0 in Racket and plait produces an error: ```racket > (/ 1 0) - Number . . /: division by zero ``` But maybe we want our `CalcLang` to instead produce ∞. In Racket/plait, one version of ∞ is `+inf.0` (we can have positive and negative infinity, but for simplicity of this example, we'll just say _any_ division by 0 should result in `+inf.0`). We can add `/` to our type definition as such: ```racket (define-type MathExpr (e-num [value : Number]) (e-const [name : Symbol]) (e-add [left : MathExpr] [right : MathExpr]) (e-divide [left : MathExpr] [right : MathExpr])) ``` :::success As a group, add a new case for `/` to `calc` such that division by 0 results in `+inf.0`. You can use `zero?` to check if a number is 0. ::: **Solution:** Most groups first came up with the following implementation: ```racket [(e-divide x y) (if (zero? (calc y)) +inf.0 (/ (calc x) (calc y)))] ``` However, consider if `y` is a very large subterm: this implementation has to run the recursive procedure `calc`, including creating a deep recursive stack, twice! It also processes `y` before `x`, which is usually not what we want. The following is the best solution, since it (1) avoids redundant recursion on `(calc y)` by using a let binding, and (2) it processes the arguments to division in left to right order, which does matters here in producing the correct error if both `x` and `y` have errors. ```racket [(e-divide x y) (let ([x-result (calc x)] [y-result (calc y)]) (if (zero? y-result) +inf.0 (/ x-result y-result)))] ``` #### A note on the result type of `calc` In the `calc` example, we only have one value type, `Number`, so we used plait's built-in `Number` type for simplicity. On A3, there will be more than one value type (numbers, strings, booleans _and_ lambdas), so the stencil code uses a new datatype, `Value`, as the result type. #### Full `calc.rkt` file ```racket #lang plait (define-type MathExpr (e-num [value : Number]) ; subtype, also known as variant (e-const [name : Symbol]) (e-add [left : MathExpr] ; to access, use e-add-left [right : MathExpr]) (e-divide [left : MathExpr] ; to access, use e-add-left [right : MathExpr])) ; Define a specific hash type (define-type-alias Constants (Hashof Symbol Number)) ; Build our hash (verbose) to show immutable hash functions (define (create-consts) : Constants (let* ([init (hash '())] [justpi (hash-set init 'pi 3.1415926535879)] [more (hash-set justpi 'e 2.71828)]) more)) (define consts (create-consts)) (define (calc (e : MathExpr)) : Number (type-case MathExpr e ; type, then the thing we want to case on ; [() ()] shape of each case: matching on, then result [(e-num value) value] [(e-const name) (let ([res (hash-ref consts name)]) (if (some? res) (some-v res) (error 'calc-error "constant used is not define")))] [(e-add x y) (+ (calc x) (calc y))] ;; fill in just this case such that anything/0 = +inf.0 [(e-divide x y) (let ([x-result (calc x)] [y-result (calc y)]) (if (zero? y-result) +inf.0 (/ x-result y-result)))] ; Comment out once we have all cases ; [else (error 'calc-error "cannot do calculation")] )) ``` # Lexical vs dynamic scope Circling back to the second warmup question: for A3, you'll need to implement creation of `lambda`s, as well as applications (or calls). Let's think through another example of these: ```racket= (define x 1) (define f (lambda (y) (+ x y))) ; which x is used here? (define z (let ([x 2] [y 3]) (f (+ x y)))) ``` Which value of x is used in line 2 when the lambda is called on line 7? `1`, based on the bindings on line 1 in scope at the definition of the lambda? OR `2`, based on the binding on line 5 in scope at the application/call of the lambda? Around half the class voted for each of these two responses! It turns out that these responses correspond to two distinct ways of handling scope: **lexical scope** corresponds to answer `1`, and **dynamic scope** corresponds to answer `2`. Real languages use either form, but the vast majority of languages use **lexical scope**. A bit more terminology: a variable, `x`, is free in an expression, `e`, if `x` is referenced in `e` outside the scope of any binding of `x` within `e`. Which bindings do free variables of a function refer when the function is applied? Lexical vs. dynamic scope determines the answer. ### Answer 1: lexical scope > Free variables of a function refer to bindings in the environment where the function is defined, regardless of where it is applied. In short: with **lexical scope**, function/lambda environments are created and remembered at **definition**. ### Answer 2: dynamic scope > Free variables of a function refer to bindings in the environment where the function is applied, regardless of where it is defined. In short: with **dynamic scope**, function/lambda environments are created at **application** (or call site). In Racket (and most other languages!) closures implement lexical scope. This means: when you go to implement lambdas, the language implementation needs to **save the environment** at definition, along with the argument names and the lambda bodies. Closures allow functions to use any binding in the environment where the function is defined, regardless of where it is applied. :star: In class, someone asked if lambdas and closures are the same. They are closely related, but not quite the same. Lambdas refer to anonymous (unnamed) functions. Closures are lambdas that are closed over their environment (that is, that have lexical scope). For example, original LISP implemented lambdas, but used dynamic scope, and thus did not have "closures" by today's terminology. At the start of the next class, we'll discuss _why_ lexical scope is almost always universally preferred to dynamic scope. # Lexical vs. dynamic scope definitions Recall from last class that _lexical scope_ is one way of resolved how free variables are resolved when used in a function/lambda body. For example, if I have the lambda: ```racket (define (f y) (+ x y)) ``` Only `y` is bound by the lambda argument. `x` is free: we must get its value from the enclosing scope. Lexical vs dynamic scope are answers to the question of: do we look up the value based on where the lambda is _defined_, or based on where it is _called_/_applied_? - Lexical scope says it should be based on the _definition_ (of which there is only one, which we can statically find in one location. ) - Dynamic scope says it should be based on where it is _called_/_applied_ (of which there might be multiple call sites, we only know which is actually relevant dynamically at runtime, while the program is being evaluated). Racket, and almost every other language you might encounter, use **lexical scope**. Historically, the original version of LISP used dynamic scope. But most programming language designers now agree: _lexical scope is better_. Let's consider some reasons why. ## Why lexical scope? ### 1: Function meaning does not depend on name choices. Consider this program: ```racket= (define (f y) (let ([x (+ y 1)]) (lambda (z) (+ x y z)))) ``` Now imagine we change body of `f` to replace `x` with `q`, which we intend to be an equivalent program: ```racket= (define (f y) (let ([q (+ y 1)]) (lambda (z) (+ q y z)))) ``` Under **lexical scope**, this has no impact. The meaning of the lambda defined on line 3. Under **dynamic scope**: depending on whether `x` or `q` are in the scope where the lambda defined on line 3 is _used_, we get different results! This is harder to reason about! With *lexical scope*, we can understand the entire meaning of lambda just by examining its definition. There are no "hidden parameters" that are passed based on where the lambda later gets used. ### 2 Closures remember the data they need Closures automatically “remember” the data they need. For example, consider the function `greater-than-x` which produces a `lambda`: ```racket= (define (greater-than-x x) (lambda (y) (> y x))) (define (no-negs xs) (filter (greater-than-x -1) xs)) (define (all-greater xs n) (filter (lambda (x) (> x n)) xs) ``` Under **lexical scope**, when `(greater-than-x -1)` is evaluated on line 5, the lambda is saved as a closure over the environment that binds `x` to `-1`, as intended. When `(lambda (y) (> y x))` is evaluated inside the body of `filter`, `x` will be mapped to `-1`. Under **dynamic scope**, when `(greater-than-x -1)` is evaluated on line 5, the value of `-1` is **not saved**. When `(lambda (y) (> y x))` is evaluated inside the body of `filter`, one of two things will happen: 1. If there happens to be another `x` bound in that scope, that different value of `x` will be used! 2. Otherwise, we would get a `variable not found` error. Either case is not what we intended! As you will see on Assignment 3, this means that lambdas, when saved as values under a language with dynamic scope, need 3 things: the argument names, the body, and a copy of the environment at the time they were defined. ## When does what happen when, again? It's important to remember when things are evaluated! In Racket (and most other languages you've used): 1. A **function body** is not evaluated until the function is called. A function body is evaluated every time the function is called. For example, the definition of `(define (f x) (/ 1 0))` does not result in an error, because the body is only evaluated on the function being _called_. 3. A function's **arguments** are evaluated before the called function's body. By convention, they are evaluated from left to right. For example, in `(g (+ 1 2) (* 3 4))`, the sum happens first, then the product, then the evaluation of `g`'s body with the two new bindings for the arguments. 4. A **binding** evaluates its expression when the binding is created, not every time the variable is used. For example, in `(let ([x (+ 1 2)]) (* x x))`, the sum happens just once, when `x` is bound to 3. Each use of `x` in the product just looks up `x` in the environment to get `3`. As with lexical/dynamic scope, there are other options here _not_ used by Racket. We'll talk about this more later in the semester when we get to _lazy_ evaluation! :::success This is the end of the material covered by Quiz 1! ::: ---