Lecture 12: Mutation

Lecture 12: Mutation --- :::warning **Announcements** - **Please sit in the first 2 rows of the room**. - `A5: Type Checking` due tonight - Next help hours: - Emily's 3-5pm tomorrow, L037 ::: :::success Agenda for today - Mutation - References - Sequencing - Mutable records - Mutable arrays (more on this next time) ::: # Mutation We know that functional programming focuses on _pure_ computations that produce values without side effects, typically by prioritizing recursion over iteration. _Pure_ here means functions are mathematically defined by their inputs and outputs, without side effects. However, both Racket and OCaml are not 100% pure languages—they do allow side effects. :::success What are some reasons functional languages *do* need to allow some side effects? ::: :::spoiler Think, then click! One core reason: for interacting with the world! For example, input/output (IO), like printing to the terminal. For programs to have useful results, they often need side effects! Another necessary cause of side effects is dynamic errors, like division by zero. Finally, some algorithms and data structures are more efficient when *mutation* as a side effect is possible. ::: We'll start with the simplest OCaml type with mutation—`ref`—then build to some more complex use cases. ## Reference cells _References_ in OCaml are locations in memory whose contents can be mutated. :::info **New type**: `t ref` where `t` is a type References are first-class in OCaml (they are values that can be used anywhere other values can be used) New expressions to construct a `ref`: - `ref e` to create a reference with initial contents `e` - If `e` has type `t`, `ref e` has type `t ref` - Note the constructor name and the type name are the same: this is fine, OCaml will figure out which one you mean via the context. The constructor comes before the contents, the type `ref` comes after the element type (similar to the `list` type). New expressions to use a `ref`: - `e1 := e2` to update contents - `:=` is an operator with type `'a ref -> 'a -> unit` - `!e` to retrieve contents (`!` is not negation in OCaml, which is `not`) ::: Here is an example of constructing two references: ```ocaml let x = ref 10 (* : int ref *) (* ____ *) (* x -> | 10 | *) (* ---- *) let y = ref 20 (* : int ref *) (* ____ *) (* x -> | 10 | *) (* ---- *) (* ____ *) (* y -> | 20 | *) (* ---- *) ``` Then seeing the contents of reference `x` to be the contents of reference `y` + 1: ```ocaml let _ = x := !y + 1 (* ____ *) (* x -> | 21 | *) (* ---- *) (* ____ *) (* y -> | 20 | *) (* ---- *) ``` In code demo: ```ocaml (* refs and mutable state *) let x = ref 10 let y = ref 20 (* ____ *) (* x -> | 10 | *) (* ---- *) (* ____ *) (* y -> | 20 | *) (* ---- *) let z = x (* ____ *) (* x -> | 10 | *) (* z -> ---- *) (* ____ *) (* y -> | 20 | *) (* ---- *) let _ = x := 21 (* ____ *) (* x -> | 21 | *) (* z -> ---- *) (* ____ *) (* y -> | 20 | *) (* ---- *) let w = !x + !z (* 21 + 21 = 42! *) (* x + 1 *) (* does not typecheck! *) let w = (!y) + (!z) (* 41 *) ``` Note that `ref` first evaluates its argument to a value before it is saved as the contents: ```ocaml utop # ref (1 + 2) ;; - : int ref = {contents = 3} utop # ref (1 / 0) ;; Exception: Division_by_zero. ``` `ref`s themselves are also values. :::warning If you've programed in C, you'll notice that `ref`s are similar to pointers. However, unlike C, OCaml is _memory safe_ (more on this later in the semester), which in part means that `ref`s are much less flexible than C pointers. You cannot cast back and forth between integers and references, for example, and you cannot do pointer arithmetic in OCaml. ::: ### Variables vs. reference contents - A variable bound to a reference (e.g., `x`) is still immutable: it will always refer to the same reference cell (or spot in memory). - But the contents of the reference may change via `:=` - And there may be aliases to the reference, which sees the contents change! That is, the contents `x` references might change even if `x` itself is not used directly. E.g., `z` aliases `x` in the previous example! :::danger **Aliasing** in general: when one location for data in memory can be accessed through two or more different names in the program. Aliasing matters for references because of mutability! Aliasing has a long history as a challenge in analyzing programs, across many different programming languages. ::: ### Evaluation and typing rules for `ref` Constructing references :::info **Construct with**: `ref e` **Type rules** - If `e : t`, `ref e : t ref` **Eval rules** - Evaluate `e` to a value `v` - Allocate a new location `loc` in memory to hold `v` - Store `v` in `loc` - Return `loc` ::: Mutating references (ref assignment) :::info **Use with**: `e1 := e2` **Type rules** - If `e1 : t ref` and `e2: t`, then `e1 := e2 : unit` **Eval rules** - Evaluate `e2` to a value `v` and `e1` to a location `loc` - Store `v` in `loc` - Return `()` (the unit value) ::: Access contents (dereference) :::info **Use with**: `!e` **Type rules** - If `e : t ref`, then `!e : t` **Eval rules** - Evaluate `e` to a location `loc`. - Return the contents of `loc`. ::: ## Mutable record fields OCaml also lets you declare fields of a record as mutable, which allows their contents to change without constructing a new record. For example: ```ocaml type student = {name : string; year : int; mutable num_classes : int} ``` The operator `<-` updates a mutable field, for example, ```ocaml let s2 = {name = "alice"; year = 2027; num_classes = 4} ``` We can then update the field with: ```ocaml s2.num_classes <- 5 ``` ### What's the difference? Refs actually ==_are_== mutable record fields! OCaml actually implements the `ref` type via the mutable field mechanism. OCaml's standard library contains: ```ocaml type 'a ref = { mutable contents : 'a } ``` The other syntax we saw earlier are actually syntactic sugar! E.g., `!` and `:=` desugar to roughly: ```ocaml let ( ! ) r = r.contents let ( := ) r x = r.contents <- x ``` # Sequencing of side effects Sometimes when using expressions just for their side effects, it's useful to describe a chain of effectful expressions. We can do this with `;` (similar to non-expression languages). :::info **Sequence** `e1; e2` **Type checking**: if `e1: unit` and `e2 : t`, `e1; e2 : t`. Similar for longer chains (all `unit` until a `t` in the last expression). **Eval rules** - First evaluate `e1` to a value `v1`. - Then evaluate `e2` to a value `v2`. - Return `v2`. (`v1` is not used at all.) - If there are multiple expressions in a sequence, e.g., `e1; e2; ...; en`, then evaluate each one in order from left to right, returning only `vn`. ::: # Physical Equality The OCaml documentation states: > `e1 == e2` tests for physical equality of `e1` and `e2`. On mutable types such as references, arrays, byte sequences, records with mutable fields and objects with mutable instance variables, `e1 == e2` is true if and only if physical modification of `e1` also affects `e2`. On non-mutable types, the behavior of ( `==` ) is implementation-dependent; however, it is guaranteed that `e1 == e2` implies compare `e1 e2 = 0`. A `ref` is physically equal to itself and any aliases, but not to another `ref` that is a different location in memory, even if it has the same contents. Different types can have special equality: eg., `String.equal` looks for structural equality. We can see this with mutable records: ```ocaml let s3 = {name = "alice"; year = 2027; num_classes = 5} let physically_eq = s2 == s3 (* false *) let structurally_eq = s2 = s3 (* true *) ``` # Group exercises :::success Exercise 1: draw boxes for each reference location you encounter. Use them to predict the output of all bindings in this program. ```ocaml (* Exercise 1: draw boxes for reference locations. Use them to predict the output of all bindings in this program. *) let a = ref 1 let b = ref 2 let c = ref 1 let b1 = a = c let b2 = a == c let d = b let b3 : bool = d = b let b4 = d == b let _ = (d := 1) let b5 = a = b ``` ::: :::spoiler Think, then click! Solution, as provided by `utop`: ```ocaml utop # let a = ref 1 let b = ref 2 let c = ref 1 ;; val a : int ref = {contents = 1} val b : int ref = {contents = 2} val c : int ref = {contents = 1} utop # let b1 = a = c ;; val b1 : bool = true utop # let b2 = a == c ;; val b2 : bool = false utop # let d = b ;; val d : int ref = {contents = 2} utop # let b3 = d = b ;; val b3 : bool = true utop # let b4 = d == b ;; val b4 : bool = true utop # ignore (d := 1) ;; - : unit = () utop # d ;; - : int ref = {contents = 1} utop # b ;; - : int ref = {contents = 1} utop # let b5 = a = b ;; val b5 : bool = true ``` ::: :::success Exercise 2: Java and Python both have a `+=` operator. Informally describe what each one does. Then, define an OCaml equivalents on `ref` types, starting like so to define an infix function: ``` let ( += ) ... ``` The type of the binding should be `int ref -> int -> unit` (ask me if you don't understand why!) Note: exercise adapted from the [OCaml Programming textbook]( https://cs3110.github.io/textbook/chapters/mut/exercises.html) ::: :::spoiler Think, then click Here, we fine `+=` as an infix function that mutates `x` to the sum of the contents of `x` and `y`. ```ocaml let ( += ) x y = x := !x + y ``` ::: --- ## Mutable arrays Arrays in OCaml are similar to primitive arrays in Java or C in that they are ordered sequences of elements with constant-time access and mutation. :::success Recall from CS230: how does the $O(...)$ runtime differ between arrays and linked list (which `cons` lists are a form of)? ::: :::spoiler Think, then click! Accessing an element in a linked list (or cons list) of length $n$ is $O(n)$, while for arrays it is $O(1)$. ::: ## OCaml Arrays Arrays in OCaml are similar to primitive arrays in Java or C: they are ordered sequences of elements with constant-time access and mutation. In OCaml, array sizes must be specified at construction and cannot be changed. `array` is a type constructor, used in the same ways as `list`, e.g., `int array`. ```ocaml [|v1; v2; ...; vn|] ``` is an array literal. Note the vertical bar characters after the open square bracket and before the close square bracket. Otherwise the syntax is identical to list literals. For example: ```ocaml let a = [| 1; 2; 3|] ``` We can access and change array elements with indexing. - `a.(i)` returns the ith element of the array `a`. It throws an exception if `i` is out of bounds. Note: parens needed. - `a.(i) <- v` overwrites the ith element of the array `a` with a new value `v`. Throws an exception if `i` is out of bounds. `Array.make n d` allocates an array of length `n` where every element is equal to `d`. `Array.length a` returns the length of the array `a`. ### Evaluation and typing rules for `array`s Constructing arrays :::info **Construct with**: `[|v1; v2; …; vn|]` or `Array.make n d` **Type rules** - If `ei : t` for all the `ei`, then ``[|e1; e2; ...; en|] : t array`. - If `d : t` in `Array.make n d`, then the result has type `t array`. **Eval rules** - Evaluate each element to a value. - Allocate new space for an array of length `n`, and store each value in the array at its index. ::: Accessing an element at index :::info **Use with**: `e1.(e2)` **Type rules** - If `e1 : t array` and `e2 : int`, then `e1.(e2) : t`. **Eval rules** - Evaluate `e1` to an array value `v1`. - Evaluate `e2` to an integer `v2`. - If `v2` is not within the bounds of the array (`0` to `n-1`), raise an `Invalid_argument` exception. Otherwise, return the value at index `v2`, ::: Mutating an element at index :::info **Use with**: `e1.(e2) <- e3` **Type rules** - If `e1 : t array` and `e2 : int` and `e3 : t`, then `e1.(e2) <- e3 : unit` **Eval rules** - Evaluate `e1` to an array value `v1`. - Evaluate `e2` to an integer `v2`. - Evaluate `e3` to a value `v3` of type `t`. - If `v2` is not within the bounds of the array (`0` to `n-1`), raise an `Invalid_argument` exception. Otherwise, set the element of `v1` at index `v2` to be `v3`. ::: # Loops OCaml offers several ways to work with arrays. The standard library provides functions like `Array.fold_left`, similar to lists. :::success What type should we expect `Array.fold_left` to have? ::: ```ocaml utop # Array.fold_left ;; - : ('a -> 'b -> 'a) -> 'a -> 'b array -> 'a = <fun> ``` `Array.iter` works like map, but it performs side effects instead of returning a new list. :::success What type should we expect `Array.iter` to have? ::: ```ocaml utop # Array.iter ;; - : ('a -> unit) -> 'a array -> unit = <fun> ``` Because `Array.iter` works to perform side effects, the function it takes in first is one that returns `unit` and the overall return type for `iter` is `unit`. We can also explicitly use *loops*, like other less-functional languages you are more familiar with. For example: ``` while e1 do e2 done for x = e1 to e2 do e3 done ``` :::warning Note that `for` loop bounds are *inclusive*: `x` starts as `e1` in the first iteration and is `e2` in the last iteration. As explained [in the documentation](https://ocaml.org/docs/loops-recursion), `for` loops are just syntactic sugar for a series of `let` statements. ::: ## Silly loop example Here is a silly example to show loops with less straightforward structure: write a function that adds 1 to just the first multiple of 3 in an OCaml array. ```ocaml (* This version uses recursion rather than a loop *) let rec change_first_mul3_rec a = let rec change_first_mul3_rec a i = if i < Array.length a then if a.(i) mod 3 = 0 then a.(i) <- a.(i) + 1 else change_first_mul3_rec a (i + 1) in change_first_mul3_rec a 0 (* This version uses a for loop *) let change_first_mul3_for a = let n = Array.length a in let found = ref false in for i = 0 to n - 1 do if not !found && a.(i) mod 3 = 0 then begin a.(i) <- a.(i) + 1; found := true end done (* This version uses while, and assumes there is at least one multiple *) let change_first_mul3_while1 a = let i = ref 0 in while a.(!i) mod 3 > 0 do i := !i + 1 done; a.(!i) <- a.(!i) + 1 (* This version handles the case where there is no multiple found *) let change_first_mul3_while2 a = let n = Array.length a in let i = ref 0 in while (!i < n) && a.(!i) mod 3 > 0 do i := !i + 1 done; if !i < n then a.(!i) <- a.(!i) + 1; ``` # Group exercise: mutable arrays :::success Warmup question (and interview practice!): you've given a large array of integers, but you know it's already partially sorted. What's the best sorting algorithm to use? Why? ::: :::spoiler Think, then click! Answer: insertion sort, because it's fast (linear time) if the array is already sorted. It also does not require a new array (it can be done in place), for which we (in OCaml) need to use mutable arrays, not lists. ::: :::success Implement insertion sort on a mutable array of integers in OCaml. ::: We'll continue with this exercise next class!