Lecture 13: Lazy evaluation

Lecture 13: Lazy evaluation --- :::warning Announcements - **Please sit in the first 2 rows of the room**. - `A4: Static Types` graded - `A6: Types II` due Thursday - Quiz 2 next week Thursday, 11/6 - Practice quiz coming - Progress reports (from me) and mid-semester survey (from you) later this week - Next help hours: - Mine 6:30-8:30 tonight: - **W116** and broken up into individual sessions ::: :::success Agenda for today - Finish `array` sort exercise from last week - Delaying work with *thunks* - Infinite sequences: - Motivation, applications - Initial attempt - Streams: abstraction for infinite data using thunks ::: # Group exercise: mutable arrays :::success Warmup question (and interview practice!): you've given a large array of integers, but you know it's already partially sorted. What's the best sorting algorithm to use? Why? ::: :::spoiler Think, then click! Answer: insertion sort, because it's fast (linear time) if the array is already sorted. It also does not require a new array (it can be done in place), for which we (in OCaml) need to use mutable arrays, not lists. ::: :::success Implement insertion sort on a mutable array of integers in OCaml. ::: :::spoiler Think, then click! ```ocaml (* Only loops, pairwise swaps *) let insertion_sort_swap a = let n = Array.length a in for i = 1 to n - 1 do let swap_idx = ref i in (* move the element left by swapping pairwise *) while !swap_idx > 0 && a.(!swap_idx) < a.(!swap_idx - 1) do let tmp = a.(!swap_idx) in a.(!swap_idx) <- a.(!swap_idx - 1); a.(!swap_idx - 1) <- tmp; swap_idx := !swap_idx - 1 done done (* Only loops, shifts all values over then moves the current value *) let insertion_sort_loop a = let n = Array.length a in for curr_idx = 1 to (n - 1) do let curr_val = a.(curr_idx) in let switch_idx = ref (curr_idx - 1) in while !switch_idx > 0 && a.(!switch_idx) > curr_val do a.(!switch_idx + 1) <- a.(!switch_idx); switch_idx := !switch_idx - 1 done; a.(!switch_idx + 1) <- curr_val done (* Only recursion *) let rec insert a curr_val switch_idx = if switch_idx >= 0 && a.(switch_idx) > curr_val then ( a.(switch_idx + 1) <- a.(switch_idx); insert a curr_val (switch_idx - 1)) else a.(switch_idx + 1) <- curr_val let rec insertion_sort_helper a i n = if i < n then ( insert a a.(i) (i - 1); insertion_sort_helper a (i + 1) n) let insertion_sort a = let n = Array.length a in insertion_sort_helper a 1 n ``` ::: :::info This is the end of the material covered by Quiz 2! ::: --- Next, we'll consider different modes of computation, starting with *delayed computation*. # Thunks Say you wanted to define a debugging version of `if` that always printed its condition. Here is one potential implementation: ```ocaml let dbg_if (c : bool) (t : 'a) (e : 'a) : 'a = print_endline (string_of_bool c); match c with | true -> t | false -> e ``` :::success Would this function work as expected? ::: :::spoiler Think, then click! No! This implementation would cause problems because function calls evaluate their arguments before their body in OCaml. This is known as _eager_ evaluation. ::: In OCaml, as with most languages in the world (and that you have likely used before, including Racket, Python, and Java), arguments to function calls are evaluated eagerly (*before* the call itself). On the other hand, the branches of an `if` are evaluated _lazily_ (not eager). ## Delayed evaluation :::success How could we implement a version of `dbg_if` correctly, without changing the OCaml language semantics? Have we seen any ways to delay computation? That is, save a chunk of work to be done for later? ::: :::spoiler Think, then click! Yes, we're seen work "saved for later" before when understanding lambda function bodies! We can delay computation by wrapping in an “unnecessary” function. - The only purpose is to delay evaluation - Function with zero "useful arguments, used to delay evaluation, are called **thunks**. Since OCaml functions are always 1-argument functions, in OCaml, thunks take the `unit` type as argument. ::: ## Thunks Let's edit `dbg_if` to use thunks in its signature, so that we don't evaluate both arguments before the body: ```ocaml let dbg_if (c : bool) (t : unit -> 'a) (e : unit -> 'a) : 'a = ``` Here is the body: ```ocaml let dbg_if (c : bool) (t : unit -> 'a) (e : unit -> 'a) : 'a = print_endline (string_of_bool c); match c with | true -> t () | false -> e () ``` If the thunk is pure and there is no mutation, then the answer to "when do we execute the body" only affects performance. However, with mutation and side effects, when things evaluate may affect the result of a computation! More thunk examples: ```ocaml (* Happens immediately *) let f1 = print_endline "hello" (* Does not happen immediately! Not until it is applied. *) let f2 () = print_endline "hello" (* We can apply the same thunk multiple times *) let rec n_times_thunk f n = if n <= 0 then "done" else (f () ; n_times_thunk f (n - 1)) ;; let thunk_bad () = 1 / 0 (* thunk_bad is a function! Body will not have been evaluated *) (* result stores the actual result of the body, but in this case *) let result = thunk_bad () ``` ## Thunks and Performance Delaying computation can help, hurt, or not change performance - Help: if you never end up needing the result! - Hurt: if you need the result more than once - No change: if you need the result exactly once Though, notably, we can't always know ahead of time how many times we might need a result. We can use thunks to build other useful data structures. # Infinite sequence motivation In programming, it can be useful to work with abstractions of infinite sequence of values (for example, "all natural numbers", data from a network, clicks from a user interface, etc. ...). How do we define these semantically? We'll approach defining infinite sequences of values recursively. First, let's consider a simple case of defining an unbounded sequence of all `1`s. Just like we can have recursive functions, we can also have recursive lists, so let's try that first:[^1] [^1]: Examples from [OCaml Programming: Correct + Efficient + Beautiful](https://cs3110.github.io/textbook). ```ocaml let rec ones = 1 :: ones ``` :::success Do we expect OCaml to let us run this code? ::: :::spoiler Think, then click! Yes, this works! We get: ```ocaml val ones : int list = [1; <cycle>] ``` ::: We could also have a pattern of alternating zeros and ones: ```ocaml let rec a = 0 :: b and b = 1 :: a ``` ```ocaml val a : int list = [0; 1; <cycle>] val b : int list = [1; 0; <cycle>] ``` Because these list definitions define infinite, unbounded lists, `utop` cannot print the entire list. But, because in memory we have a finite representation (since `cons` lists are fundamentally linked-lists, which can have cycles), we *can* still define lists like this. Notably, these two prior examples are just infinite over repeating patterns. What if we want a stream that actually changes in more complex ways over time? Let's try defining a list of each successive integer from some `n`: ```ocaml (* the infinite list [n; n+1; n+2; ... ] *) let rec from n = n :: from (n + 1) ``` When we try to use this to define the natural numbers, we get: ```ocaml (* We want: [0; 1; 2; ... ] *) let nats = from 0 ``` :::success What do we think will happen here? ::: :::spoiler Think, then click! This one doesn't work! ``` Stack overflow during evaluation (looping recursion?). ``` We don't have a cycle in the `cons` linked-list: we are trying to have an infinite list. There is no way to represent this in memory, so even trying to build it overflows the stack. ::: This is where thunks come back in! We will use thunks to delay evaluation of non-repeating infinite sequences in a clever way, such that we can actually represent infinite lists even without cycles. # Streams: an infinite sequence implementation The core idea of a ==stream== is to use a thunk to help us represent “the rest of the stream”. Streams represent infinite lists, but have finite in-memory representations such that we can actually work with them without overflowing the stack. :::warning Note that while infinite lists/streams are a common programming concept, the language is less standardized than other topics we've covered in CS251. For example, the OCaml textbook calls the same concept "sequences", not "streams", and uses a slightly different implementation. But the core idea is the same. ::: :::info A **stream** is a sequence of data elements that are made available over time. ::: To define a stream, you need to know how to get the next item in the sequence, but you **don’t** need to be able to enumerate all items at once. Even though we aren't using all of the items at once, this is still a useful abstraction because of a common pattern in code called the producer-consumer pattern. The *producer* and *consumer* collaborate, sort of like a 2 player game. - The *producer* knows how to create all values. - The *consumer* decides how many values to ask for at once. We will define a stream as a **thunk that returns a pair** containing: - The “head of the stream” at that point - A stream of the rest of the values ```ocaml (* For CS251, we'll describe streams this way. Compared to other types we've seen, streams have less of a standard definition: people define them differently depending on the context. *) type 'a stream = Stream of (unit -> 'a * 'a stream) (* For example, the OCaml textbook instead uses a Cons-based definition *) ``` ## Implementing streams A stream is a thunk that returns a pair: - `fst` is the “head of the stream” at that point - `snd` is a stream of the rest of the values The producer must be prepared to generate any number of elements! - How does the producer do this? Recursion, based on the current context! - The key “gotcha” is to use a thunk in just the right place. ## Our first streams: ones and the natural numbers Let's try our "give me ones forever" idea from before, but with our new stream type ```ocaml (* "give me ones forever", but with our new stream type *) let rec ones = Stream (fun () -> (1, ones)) ``` As before, this works, though we don't see the same `cycle` result from `utop`. We can also re-do our "natural numbers" example, without overflowing the stack this time: ```ocaml (* More complex structure for non-repeating streams *) let nats = (* Helper produces the pair *) let rec helper start = (start, Stream (fun () -> helper (start + 1))) in (* Call the helper with our actual initial value *) Stream (fun () -> helper 0) ``` This now works without a stack overflow! ```ocaml val ones : int stream = Stream <fun> val nats : int stream = Stream <fun> ``` ## Using streams We can *consume* the values from the start of the stream. For example, we can get just the first element/head with: ```ocaml utop # let head = match ones with | Stream t -> fst (t ()) ;; val head : int = 1 ``` Or the let-match version: ```ocaml utop # let Stream t = nats in fst (t ()) ;; - : int = 0 ``` We can also write a function to take the first `n` values of a stream. ```ocaml (* A function that takes the first n elements of a stream s, returning them as a list in the same order. *) let rec take n s = if n = 0 then [] else match s with | Stream t -> let (head, rest) = t () in head :: take (n-1) rest (* We could also write this as this, using pattern matching in the argument itself *) let rec take n (Stream t) = if n = 0 then [] else let (head, rest) = t () in head :: take (n-1) rest ``` Next class, we'll continue with some stream exercises.