251 Assignment 2

Assignment 2 ------ This is a **pair or solo Gradescope assignment**—you will submit 5 Racket files and a README on Gradescope. :::info Recall that assignments, if completed as a pair, must be completed **together** rather than dividing work. See the syllabus for more. ::: ### Change log I will avoid making changes to the assignment after it is posted, but any necessary changes will be added here: - [02/08]: Clarify 2.1. fix a few typos. - [02/11]: Clarity that you can use `car` and `cdr` throughout the assignment. # Setup As with previous assignments, I suggest you complete this assignment in DrRacket. For this assignment, your Racket file should start with `#lang racket`. ## Testing :::success I suggest using the [RackUnit](https://docs.racket-lang.org/rackunit/quick-start.html) unit testing library with `(require rackunit)` to write `check-equal?` tests for your functions (as shown in the examples). Later in the semester, your tests will be an explicit part of your grade. ::: ## Functions in scope :::info For the Racket problems, you should write Racket solutions using primarily (1) the operators and functions we have seen so far in class, or (2) functions mentioned in the problem. Do not use Racket library functions that accomplish the full task, even if they exist. If you are ever unsure, feel free to ask on Zulip! ::: # P1: tall tails Answer the following questions in a file `dot-product-tail.rkt`. For the written questions, you can open and close a block comment with `#|` and `|#`. **1.1** Consider the syntactic transformation we used as a series of steps in class to make factorial tail recursive. Why does the function we apply to the accumulator need to be [commutative](https://en.wikipedia.org/wiki/Commutative_property) to use this purely syntactic transformation? --- **1.2** Is `append` commutative? Is `cons`? In your own words, explain how the tail recursive `reverse` we wrote in class relates to these answers (that is, why does it work?) --- **1.3** Using tail recursion (but no higher-order functions), write a function `dot-product-tail` that calculates the [dot product](https://en.wikipedia.org/wiki/Dot_product) of two lists of numbers: the dot product of two lists $(x_1 ... x_n)$ and $(y_1 ... y_n)$ is the sum of products $x_1*y_1 + ... + x_n*y_n$. You can assume that the two lists have the same length and that the dot product of two empty lists is 0. Examples: ```racket > (dot-product-tail `() `()) 0 > (dot-product-tail `(1 2 3) `(4 5 6)) 32 ``` --- # P2: nested lists Answer the following questions in a file `nested-lists.rkt`. Racket has two core operators for adding items to the front of a list. `append` combines lists (while we did not see this in class, the built in `append` can take any number of argument lists). `cons` takes two arguments and produces a pair in the form of a `cons` cell (and canonically, `empty` represents the empty list). Because the subset of Racket we have been working with does not have mutation, neither can modify an existing list. These Racket operators have a funny property: they do not always produce lists in the canonical sense. If their final argument is not a list, they produce an improper list, indicated with a period between the head and tail of the pair: ```racket > (cons 5 empty)` `(5) ; <- a proper list > (cons 5 6)` `(5 . 6) ; <- improper list, but still a pair ``` :::warning To access the first and second items of an improper list, use the following two functions we mentioned in class. These function names are historical artifacts from the original implementation of LISP, where they were shorthands access the two halves of an IBM 704 CPU registers. - [car](https://docs.racket-lang.org/reference/pairs.html#%28def._%28%28quote._~23~25kernel%29._car%29%29) returns the first element of a cons cell (or pair). - [cdr](https://docs.racket-lang.org/reference/pairs.html#%28def._%28%28quote._~23~25kernel%29._cdr%29%29) returns the second element of a cons cell (or pair). You can use these functions throughout the assignment. ::: _As we'll see later, pairs can be useful in their own right_. **2.1** Consider a version of append called `proper-append` that is guaranteed to return a proper list (i.e., throws a dynamic error if its arguments are not proper lists). Think through the recursion involved in terms of efficiency. In a block comment, answer: do you think `proper-append` can have the same asymtotic (Big-O) efficiency as the `my-append` from the class notes? Why or why not? Be sure to consider the case where the input lists have different sizes. _(Note: you do not need to implement `proper-append`.)_ --- **2.2** Write a function `steamroll` that takes a nested, potentially improper list and returns a list with all items within. Helpful functions: - The `cons?` function returns `#t` if its argument is a `cons` cell and `#f` otherwise. Add a comment to your function to specify which code is only hit if the list is improper. Examples: ```racket (check-equal? (steamroll '()) '()) (check-equal? (steamroll (list 1 (list 2))) '(1 2)) (check-equal? (steamroll (list 1 (list 1 2 4) (list (list (list 2 (list 5)))))) '(1 1 2 4 2 5)) ``` --- **2.3** Write a function `deep-rev-tail` that takes a proper list `x` and deeply reverses it using tail recursion. If `x` is an atom (any non `cons` cell value), `deep-rev-tail` should return it as is. If `x` is a `cons` cell, `deep-rev-tail` should assume it represents a proper list, reverses the list, and deeply reverse any lists that are elements in `x`. Note that the most concise and efficient implementations of this function will need **both** tail recursion and natural recursion! Examples: ```racket (check-equal? (deep-rev-tail (list 1)) '(1)) (check-equal? (deep-rev-tail empty) '()) (check-equal? (deep-rev-tail (list 1 2 3 4 5)) '(5 4 3 2 1)) (check-equal? (deep-rev-tail (list 1 (list 2 3) (list 4 5 (list 6 7 8)))) '(((8 7 6) 5 4) (3 2) 1)) ``` --- :::info The following problems in part rely on material we will cover in the second higher-order function lecture. If you would like to start on those parts before then, see the Racket documentation for [map](https://docs.racket-lang.org/reference/pairs.html#%28def._%28%28lib._racket%2Fprivate%2Fmap..rkt%29._map%29%29), [filter](https://docs.racket-lang.org/reference/pairs.html#%28def._%28%28lib._racket%2Fprivate%2Flist..rkt%29._filter%29%29), [foldl](https://docs.racket-lang.org/reference/pairs.html#%28def._%28%28lib._racket%2Fprivate%2Flist..rkt%29._foldl%29%29), and [foldr](https://docs.racket-lang.org/reference/pairs.html#%28def._%28%28lib._racket%2Fprivate%2Flist..rkt%29._foldr%29%29). ::: # P3: boolean questions Answer the following questions in a file `booleans.rkt`. **3.1** Implement two functions: `all?` and `all?-foldl`. Both functions should have the same behavior: they take a one-argument predicate function `p` and a list `xs` and: - returns a non-`#f` value if `(p x)` returns a non-`#f` value for all `x` in `xs` - returns `#f` if `(p x)` returns `#f` for any `x` in `xs` where `p` is defined on all elements preceding `x` in `xs`; and - is otherwise undefined. Be sure to consider the examples below carefully! - `all?` should use recursion, but no calls to higher-order functions (other than `p`, which, for all we know, might be higher order). * `all?-foldl` should not use recursion and instead use the builtin`foldl` (*fold left*) function, but no calls to higher-order functions other than `foldl` and `p`. In the following examples, both functions should have identical results: ```racket (check-equal? (all? (lambda (x) (> x 7)) (list 34 42 12 8 73)) #t) (check-equal? (all? (lambda (x) (= x 9)) null) #t) (check-equal? (all? not (list #f #f #t #f)) #f) ; Note the short circuiting behavior (check-equal? (all? (lambda (x) (= 2 (/ 8 x))) (list 8 4 2 0)) #f) ; Produces an error (check-exn exn:fail? (lambda () (all? (lambda (x) (> (/ 8 x) 3)) (list 2 0)))) ``` --- **3.2** Consider the function `any?` that takes a predicate function `p` and a list `xs` and: - returns a non-`#f` value if there exists some element `x` in `xs` such that `(p x)` returns a non-`#f` value and, for all elements `y` preceding `x` in `xs`, `(p y)` is defined; - returns `#f` if `(p x)` returns `#f` for all elements `x` in `xs`; and - is otherwise undefined. Write a **one-line** implementation of `any?` that uses one of your previously-defined functions (hint: consider the relationship between `any?` and `all?`). Examples: ```racket (check-equal? (any? (lambda (x) (> x 7)) (list 34 42 12 8 73)) #t) (check-equal? (any? (lambda (x) (= x 9)) null) #f) (check-equal? (any? not (list #f #f #t #f)) #t) ;; Note the short circuiting behavior (check-equal? (any? (lambda (x) (> (/ 8 x) 3)) (list 2 0)) #t) ; Produces an error (check-exn exn:fail? (lambda () (any? (lambda (x) (= 2 (/ 8 x))) (list 0 2 4 8)))) ``` --- **3.3** Re-implement `contains-multiple?` and `all-contain-multiple?` from Assignment 1 to using your new `all?` and `any?` higher order functions (with no explicit recursion). --- # P4: data processing Answer the following questions in a file `data.rkt`. An *association list* is a list of pairs that represents a mapping from key to value. Each pair of key and value is represented by a cons cell, with the key in the `first` and the value in the `rest` position. For example, the association list: ```racket (list (cons 2 3) (cons 5 1) (cons "mountain" #t)) ``` maps the key `2` to the value `3`, the key `5` to the value `1`, and the key `"mountain"` to the value `#t`. **4.1** Write a function `lookup` that takes a *key* `k` and an *association list* `associations` and returns: - `#f` if no mapping with key `k` was found in the list; and - a cons cell whose `first` is `k` and whose `rest` is the corresponding value for the shallowest mapping of `k` in the association list. Use the function `(equal? x y)` to test for equality of keys. This will support keys more interesting than just simple values. Examples ```racket (check-equal? (lookup 1 (list (cons 2 3) (cons 5 1) (cons "mountain" #t))) #f) (check-equal? (lookup 5 (list (cons 2 3) (cons 5 1) (cons "mountain" #t))) '(5 . 1)) (check-equal? (lookup 5 (list (cons 2 3) (cons 5 1) (cons 5 "river"))) '(5 . 1)) (check-equal? (lookup (list 3 5) (list (cons (list 3 5) 2) (cons 5 1))) '((3 5) . 2)) ``` ---- A *corpus* (for the purposes of this assignment) is a list of *documents*, where each *document* is a pair (`cons` cell) of a name and the contents (*i.e.*, the list of words in the text) of the document. Names and words may be any simple values, such as [symbols](https://docs.racket-lang.org/guide/symbols.html), strings, or numbers. A corpus may also be considered to be both: - an association list from document name to document contents - a list of lists where the first element of each list is the document name and the remaining elements are the document words. The following example corpus is written with a careful choice of quoted notation that helps illustrate the form as we will interpret it. As we have seen before, Racket may give a different display notation. Consider why. ```racket (define corpus '((hw . (hello world)) (hw2 . (hi world)) (wc . (world champion of the hi fi world)))) ``` --- **4.2** Write a function `count` that takes a corpus of `docs` and a `word` and returns the total number of occurrences of word in the corpus. Use higher-order functions to your advantage. Minimize the number of intermediate data structures (lists / cons cells) that your solution produces. An ideal solution never creates a fresh `cons` cell. Examples: ```racket (check-equal? (count corpus 'hi) 2) (check-equal? (count corpus 'world) 4) (check-equal? (count corpus 'hello) 1) (check-equal? (count corpus 'alexa) 0) (check-equal? (count corpus 'hw) 0) ``` --- **4.3** Write a function `grep` that takes a corpus `docs` and a `word` and returns an association list mapping document name to the number of occurrences of `word` in that document. Only documents in `docs` with at least 1 occurrence of `word` should appear as keys in the association list. Use higher-order functions to your advantage. Minimize the number of intermediate data structures (lists / cons cells) that your solution produces that are not part of the final result. ***Note: for this subproblem, order does not matter within the result!*** This problem is inspired by the [grep](https://en.wikipedia.org/wiki/Grep) Linux utility. Examples (though your order need not be the same): ```racket (check-equal? (grep corpus 'hi) '((wc . 1) (hw2 . 1))) (check-equal? (grep corpus 'world) '((wc . 2) (hw2 . 1) (hw . 1)) (check-equal? (grep corpus 'alexa) '()) (check-equal? (grep null 'world) '()) ``` # P5: MapReduce Answer the following questions in a file `mapreduce.rkt`. Read [MapReduce: Simplified Data Processing on Large Clusters](https://www.usenix.org/legacy/event/osdi04/tech/full_papers/dean/dean.pdf) by Dean and Ghemawat. _Reduce_ is another name for _fold_, so this system is closely related to the _map_ and _fold_ we have been practicing in Racket. MapReduce was a pioneering system for distributed computing, but it's also is a prime example of how an idiom developed within a particularly programming paradigm can be applied even far outside the original languages. The MapReduce system itself is no longer used at Google, but its lessons live on in successor systems. **5.1** How are the semantics of `map` and `reduce` as described in the original paper different from the semantics in Racket? How are they similar? --- In the rest of this problem, you'll write and use your own simplified `mapreduce` function. Our simplified `mapreduce` runs on one computer and takes four arguments - a mapping function, `mapper`, that takes one element and returns one mapping result; - a reducing function, `reducer`, that takes one mapping result and one accumulator and returns one accumulator; - an initial accumulator, `init`; and - a list of elements, `elems` The `mapreduce` function applies the `mapper` function to each element of `elems` and uses the `reducer` function to combine these results of the `mapper`. One possible _inefficient_ implementation is: ``` (define (mapreduce-slow mapper reducer init elems) (foldl reducer init (map mapper elems))) ``` This implementation creates an intermediate list (the result of `map`) that is used only for `fold`ing and then discarded. If we imagined this running on massive datasets on distributed servers, this is a bummer! **5.2** Implement a version of `mapreduce` that creates no temporary cons cells (that is, the only `cons` cells should be those returned as part of the final result). An optimal answer is only a few lines. Examples: ```Racket > (define xs (list 1 2 3 4)) > (mapreduce (lambda (x) (* x x)) + 0 xs) 30 > (mapreduce (lambda (x) (* x x)) cons empty xs) '(16 9 4 1) > (mapreduce (lambda (x) (first (rest x))) cons empty corpus) '(world hi hello) ``` **5.3** Write a function `grep-mapreduce` that acts just like your previous `grep`, but is implemented via your `mapreduce`. The first expression evaluated in the body of `grep-mapreduce` must be a call to `mapreduce` (though you can write any number of inner helper functions before this call). Neither `grep-mapreduce` nor any helper function you write for it should use explicit recursion. Helper functions may use `mapreduce` or other higher-order functions. --- # Grading :::success In your README, also report: - Roughly how long did this assignment take you (in hours)? - Which was the most challenging subproblem? - Which your favorite subproblem? ::: This assignment is worth 100 points, broken down by: :::info - 15: Problem 1 - 20: Problem 2 - 20: Problem 3 - 20: Problem 4 - 25: Problem 5 ::: # Submission Submit the following **6 files** on **Gradescope**: - `dot-product-tail.rkt` - `nested-lists.rkt` - `booleans.rkt` - `data.rkt` - `mapreduce.rkt` - `README.txt` (or `README.md`) # Attribution Some exercises adapted from previous CS251 versions by Ben Wood and Carolyn Anderson.