Lecture 4: Tail Recursion & Higher-Order Functions

# Agenda :::success Agenda for today - More efficient recursion - Recursion via accumulation (`reverse`) - Scope detail: shadowing - Understanding natural recursion - Understanding accumulator pattern - Tail recursion - Avoiding redundant recursion (`bad-max`) - High-order functions - `map` (just a taste!) - `filter` `fold` (next time) ::: # More efficient recursion: `reverse` ```racket= ;; Faster 'reverse' from last class (define (reverse lst) (letrec ((helper (lambda (lst acc) ;; WARMUP QUESTION: how do I know which `lst` ;; this next line refers to? (if (empty? lst) acc (helper (rest lst) (cons (first lst) acc)))))) (helper lst '()))) ``` ## Scope detail: shadowing Line 9 refers to the argument to the `lambda` on line 5, rather than the argument to the entire function on line 2. Racket (and most other languages) allow naming conflicts to be resolved via shadowing. Names resolve to the _inner-most_ scope where they are defined. We say a variable is _shadowed_ if it is refined in a more inner scope. Scope: the region of the program where we can refer to a variable. The scope includes the block where the variable is defined, _minus_ any sub-blocks where the variable is shadowed. Visualize scope via lexical contours. ```racket (let ([x 2]) (+ x (let ([x (* x x)]) (+ x 3)))) ``` # Understanding natural recursion Let's look back at the original, slower version of reverse: ```racket (define (reverse lst) (if (empty? lst) '() (append (reverse (rest lst)) (list (first lst))))) ``` Recall that programs store data in two forms of memory: _the stack_, which tracks context for function calls, and _the head_, which tracks dynamic data (such as `cons` cells). The stack frame grows as functions are called, and shrinks when functions return. If a function calls itself recursively, we get one stack frame per call (that is, frames are per call, not per function). Here is one way to visualize the slow `reverse` called on this list `'(1 2 3)` ``` View of the function call stack (grows downward) Slow, natural reverse --------------------------------- | reverse | | lst = '(1 2 3) ; first is 1 | | (first lst) = 1 | | context for return: | | (append (...) (1)) | <- append is linear time | returns (3 2 1) | --------------------------------- | reverse | | lst = '(2 3) ; first is 2 | | context for return: | | (append (...) (2)) | <- append is linear time | returns (3 2) | --------------------------------- | reverse | | lst = '(3) ; first is 3 | | (first lst) = 3 | | context for return: | | (append (...) (3)) | <- append is linear time | returns (3) | --------------------------------- | reverse | | lst = '() ; empty? is #t | | returns () | <- Base case, returns '() --------------------------------- ``` By `context for return`, we mean that each recursive stack frame needs to track the fact that the result of the next call to the function is used to compute the larger expression that is then returned: ``` (append (reverse (rest lst)) (list (first lst))))) ^ recursive call ^ current element ^ combine to list for this level of the recursive stack ``` Note that the argument, `lst`, shrinks on every recursive call, while the returned value is first created at the base case, then grows (represents the result of more work) _after_ each recursive call, before being returned, as execution returns back up the stack. In general, _natural recursion_ shrinks the argument and grows the result. For this specific function, we can visualize this as: ``` Natural recursion stacks grows argument shrinks final result grows ↓ ↓↓↓↓ ↑↑↑↑ ; append with first ↓ ↓↓↓ ↑↑↑ ; append with first ↓ ↓↓ ↑↑ ; append with first ↓ ↓ ↑ ↓ base case base result ``` Here, this is slow because the `append` after each recursive call is linear time, making the overall runtime $O(n^2)$ in terms of a list of length $n$. ## Understanding the accumulator pattern In the most efficient version of the function, the key insight is the final result is more natural to build here by _starting_ to build it at the beginning of the list of the input argument, rather than starting to build it at the base case with the base result. The _accumulator pattern_ lets us do just what we need here: we add an additional argument to the recursive function to track an _accumulator_ that grows (or represent more work) conceptually as the stack _grows_ (rather than as the stack shrinks). The accumulator starts small at the first stack frame, and grows as we recur and grow the stack. The base case can then just return the final accumulated result. ``` Accumulator pattern stacks grows argument shrinks accumulator grows ↓ ↓↓↓↓ ↓ ↑↑↑↑↑ ↓ ↓↓↓ ↓↓ ↑↑↑↑↑ ↓ ↓↓ ↓↓↓ ↑↑↑↑↑ ↓ ↓ ↓↓↓↓ ↑↑↑↑↑ return ↓ base case final result final result ``` In the accumulator pattern, by the time we reach the base case, we already have the final result! For `reverse`, this allows a much more efficient implementation, because by starting with the front of the list, we can use `cons` instead of `append`. `cons` is constant time, so the overall runtime becomes linear instead of quadratic. Even further, we can gain addition efficiency in terms of _space_ by examining how this changes the stack frames themselves! ## Tail recursion ``` View of the function call stack (grows downward) Efficient, *tail* recursion --------------------------------- | reverse | | lst = '(1 2 3) ; first is 1 | | acc = '() | <- cons to grow acc | return result of recursion | --------------------------------- | reverse | | lst = '(2 3) ; first is 2 | | acc = '(1) | <- cons to grow acc | return result of recursion | --------------------------------- | reverse | | lst = '(3) ; first is 3 | | acc = '(2 1) | <- cons to grow acc | return result of recursion | --------------------------------- | reverse | | lst = '() ; empty? #t | | acc = '(3 2 1) | | returns (3 2 1) | <- Base case, return acc --------------------------------- ``` Note that while we need an additional argument binding to track `acc`, each stack frame no longer needs to track the context of needing to do addition work with the result of the recursive call (for slow `reverse`, appending the result with the current `first` of the list). Put another way: once each frame delegates work to a recursive call, the only bit of work it has left to do it pass the return result up. This is the key insight behind an essential optimization for making functional programming efficient: **tail call optimization** ! In this optimization, the language implementation (the interpreter or the compiler, which we'll explore more as the semester goes on) _does not actually build new stack frames_. Instead, when a function is _tail recursive_, the implementation reuses the same stack frame (just updating the arguments of each call). This makes a tail recursive function more similar to a loop with an accumulator in an imperative language. Implementations of languages that encourage functional style programming all typically use tail call optimization: Racket, OCaml, (usually) Rust, etc. Other languages, such as Java and C, do not typically have tail call optimizations implemented. ### Tail position What makes a function tail recursive? _Tail_ recursion is when the recursive call happens in the _tail_ position of the function. We can define the tail position recursively: Tail position? - In `(lambda (x1...xn) e)` or `(let (...) e)`, `e` is the tail - In `(if e1 e2 e3)`, `e2` and `e3` are in the tail - In `(cond (c1 e1) (c2 e2) (c3 e3))`, `e1`, `e2`, and `e3` are in the tail More intuitively, a function definition is tail recursive if the recursive function call is the last thing that happens in the dynamic execution of the function. ## Avoiding redundant recursion (`bad-max`) Consider this code and the recursive calls it makes. What is the asymptotic complexity in terms of space and time? For time, calls to `first`, `rest`, and `null?` are O(1). ```racket (define (bad-max xs) (if (null? xs) null ; not defined on empty list (if (null? (rest xs)) (first xs) (if (> (first xs) (bad-max (rest xs))) (first xs) (bad-max (rest xs)))))) ``` Answer: this function is _even worse_ than slow `reverse`: rather than calling a linear function on each recursive call, this implementation _makes an entire redundant recursive call_ whenever it is not at the base case. Since it makes 2 calls instead of 1, the number of recursive calls is not a list, but instead: ``` * / \ * * / \ / \ * * * * ``` The redundant calls form a binary tree structure with height proportional to the length of the list $n$, meaning, this function is O(2^n): exponential instead of linear or even quadratic! We can split "fixing" this implementation into two steps. First, let's just remove the _redundant_ recursion using a `let` binding: ```racket ;; With natural recursion (define (max-nat xs) (if (empty? (rest xs)) (first xs) ; A single element is the max (let ((rest-max (max-nat (rest xs)))) (if (> (first xs) rest-max) (first xs) rest-max)))) ``` ^ This version now requires linear time and linear additional space. Second, we can switch to using the accumulator pattern *and* tail recursion: ```racket ;; With the accumulator pattern and tail recursion ;; Note we name the accumulator max-so-far (define (max-acc xs) (letrec ((helper (lambda (xs max-so-far) (if (empty? xs) max-so-far (helper (rest xs) (max max-so-far (first xs))))))) (helper xs (first xs)))) ``` ^ This version now requires linear time and **constant**, not linear, additional space. # Group exercise Write a tail-recursive factorial using this common syntactic transformation: ```racket ; Common syntactic transformation to convert to tail call: ; - Add helper w/ an extra accumulator arg ; - Identify initial accumulator (base result) ; - Accumulator becomes base case result. ; - Recursive step applies to accumulator. (define (fact n) (if (= n 0) 1 (* n (fact (- n 1))))) ``` Solution: ```racket ; Tail-recursive factorial (define (fact-tail n) (letrec ((helper (lambda (n acc) (if (= n 0) acc (helper (- n 1) (* n acc)))))) (helper n 1))) ``` A few important points: - This common syntactic transformation works when the operator we are applying to the accumulator is _commutative_. - When the operator is not commutative, we can still convert some functions to use efficient tail recursion via switching the function we are applying (often, to work in a different order). - See Assignment 2 for more musing on this point. - While tail recursive functions usually use the accumulator pattern, not all functions that use an accumulator are tail recursive. # Higher-order functions We'll cover this topic more next lecture, but to preview the main point: Consider these two functions: one doubles every element in a list, and the other adds one to every element in a list: ```racket= (define (double ls) (if (empty? ls) `() (cons (* 2 (first ls)) (double (rest ls))))) (define (add-1 ls) (if (empty? ls) `() (cons (+ 1 (first ls)) (add-1 (rest ls))))) ``` These functions share the majority of their implementation: only the choice of how to transform each element in the list (on line 4/11, respectively) is a change beyond simple renaming. The rest of each function is just the logic to apply the transformation to each element and build a new list with the new results. We have already seen that functions (also known as procedures) are first-class _values_ in Racket. This means we can pass one function as the argument to another! Next class, we'll see how to abstract out the function applied to each element (here, `(lambda (x) (* 2 x)` and `(lambda (x) (+ 1 x)`) and pass them as arguments to a higher-order function, `map`. We'll also look at other famous higher-order functions: `filter` and `fold`.