Assignment 2: ==Functions== ------ This is a **pair or solo Gradescope assignment**—you will submit 3 Racket files and a README on Gradescope. :::info Recall that assignments, if completed as a pair, must be completed **together** rather than dividing work. See the syllabus for more. Pairs should submit a single submission on Gradescope. ::: ### Change log I will avoid making changes to the assignment after it is posted, but any necessary changes will be added here: - [9/23] Clarify boolean "identical" requirement. # Setup As with previous assignments, I suggest you complete this assignment in DrRacket. For this assignment, your Racket files should start with `#lang racket`. You should create the following files for this assignment: - `booleans.rkt` - `data.rkt` - `mapreduce.rkt` - `README.txt` (or `README.md`) ## Testing :::success Use the [RackUnit](https://docs.racket-lang.org/rackunit/quick-start.html) unit testing library with `(require rackunit)` to write `check-equal?`/`check-true`/etc tests for your functions (as shown in the examples). For this assignment, I expect to see roughly ==5+ tests per function==, including some new tests not given in the examples. ::: ## Functions in scope :::info For the Racket problems, you should write Racket solutions using primarily (1) the operators and functions we have seen so far in class, or (2) functions mentioned in the problem. Do not use Racket library functions that accomplish the full task, even if they exist. If you are ever unsure, please ask us on Zulip! ::: # P1: tall tails Answer the following questions in your `README`. Consider the syntactic transformation we used as a series of steps in class to make factorial tail recursive. :::success **1.1** Why does the function we apply to the accumulator need to be [commutative](https://en.wikipedia.org/wiki/Commutative_property) to use this purely syntactic transformation? ::: --- :::success **1.2** Is `append` commutative? Is `cons`? In your own words, explain how the tail recursive `reverse` we wrote in class relates to these answers (that is, why does it work?) ::: --- :::success **1.3** In your own words, explain how the tail call optimization can be used to rewrite a function that previously ran out of memory to one that successfully completes on data of the same size. ::: --- # P2: boolean questions Answer the following questions in a file `booleans.rkt`. :::success **2.1** Implement two functions: `all?` and `all?-foldl`. ::: Both functions should have the same behavior: they take a one-argument predicate function `p` and a list `xs` and: - returns a non-`#f` value if `(p x)` returns a non-`#f` value for all `x` in `xs` - returns `#f` if `(p x)` returns `#f` for any `x` in `xs` where `p` is defined on all elements preceding `x` in `xs`; and - is otherwise undefined. Be sure to consider the examples below carefully! - `all?` should use recursion, but no calls to higher-order functions (other than `p`, which, for all we know, might be higher order). * `all?-foldl` should not use recursion and instead use the builtin `foldl` (*fold left*) function, but no calls to higher-order functions other than `foldl` and `p`. In the following examples, both functions should have identical results: ```racket (check-equal? (all? (lambda (x) (> x 7)) (list 34 42 12 8 73)) #t) (check-equal? (all? (lambda (x) (= x 9)) null) #t) (check-equal? (all? not (list #f #f #t #f)) #f) ; Note the short circuiting behavior (check-equal? (all? (lambda (x) (= 2 (/ 8 x))) (list 8 4 2 0)) #f) ; Produces an error (check-exn exn:fail? (lambda () (all? (lambda (x) (> (/ 8 x) 3)) (list 2 0)))) ``` :star: If you use RackUnit's `check-false` and `check-not-false`, the functions should also produce identical results on any new tests you write (that is, they might produce different values Racket treats as "truthy" as long as both are not-`#f`). --- Consider the function `any?` that takes a predicate function `p` and a list `xs` and: - returns a non-`#f` value if there exists some element `x` in `xs` such that `(p x)` returns a non-`#f` value and, for all elements `y` preceding `x` in `xs`, `(p y)` is defined; - returns `#f` if `(p x)` returns `#f` for all elements `x` in `xs`; and - is otherwise undefined. :::success **2.2** Write a **concise, one-line** implementation of `any?` that uses one of your previously-defined functions (hint: consider the relationship between `any?` and `all?`). ::: Examples: ```racket (check-equal? (any? (lambda (x) (> x 7)) (list 34 42 12 8 73)) #t) (check-equal? (any? (lambda (x) (= x 9)) null) #f) (check-equal? (any? not (list #f #f #t #f)) #t) ;; Note the short circuiting behavior (check-equal? (any? (lambda (x) (> (/ 8 x) 3)) (list 2 0)) #t) ; Produces an error (check-exn exn:fail? (lambda () (any? (lambda (x) (= 2 (/ 8 x))) (list 0 2 4 8)))) ``` --- :::success **2.3** Re-implement `contains-multiple?` and `all-contain-multiple?` from `A1: Values` to using your new `all?` and `any?` higher order functions (with no explicit recursion). ::: --- # P3: data processing Answer the following questions in a file `data.rkt`. An *association list* is a list of pairs that represents a mapping from key to value. Each pair of key and value is represented by a cons cell, with the key in the `first` and the value in the `rest` position. For example, the association list: ```racket (list (cons 2 3) (cons 5 1) (cons "mountain" #t)) ``` maps the key `2` to the value `3`, the key `5` to the value `1`, and the key `"mountain"` to the value `#t`. :::success **3.1** Write a function `lookup` that takes a *key* `k` and an *association list* `associations` and returns: ::: - `#f` if no mapping with key `k` was found in the list; and - a cons cell whose `first` is `k` and whose `rest` is the corresponding value for the shallowest mapping of `k` in the association list. Use the function `(equal? x y)` to test for equality of keys. This will support keys more interesting than just integers. Examples ```racket (check-equal? (lookup 1 (list (cons 2 3) (cons 5 1) (cons "mountain" #t))) #f) (check-equal? (lookup 5 (list (cons 2 3) (cons 5 1) (cons 5 "river"))) '(5 . 1)) (check-equal? (lookup (list 3 5) (list (cons (list 3 5) 2) (cons 5 1))) '((3 5) . 2)) ``` ---- A *corpus* (for the purposes of this assignment) is a list of *documents*, where each *document* is a pair (`cons` cell) of a name and the contents (*i.e.*, the list of words in the text) of the document. Names and words may be any simple values, such as [symbols](https://docs.racket-lang.org/guide/symbols.html), strings, or numbers. A corpus may also be considered to be both: - an association list from document name to document contents - a list of lists where the first element of each list is the document name and the remaining elements are the document words. The following example corpus is written with a careful choice of quoted notation that helps illustrate the form as we will interpret it. As we have seen before, Racket may give a different display notation. Consider why. ```racket (define corpus '((hw . (hello world)) (hw2 . (hi world)) (wc . (world champion of the hi fi world)))) ``` --- :::success **3.2** Write a function `count` that takes a corpus of `docs` and a `word` and returns the total number of occurrences of word in the corpus. ::: Use higher-order functions to your advantage. Minimize the number of intermediate data structures (lists / cons cells) that your solution produces. An ideal solution never creates a fresh `cons` cell. Examples: ```racket (check-equal? (count corpus 'hi) 2) (check-equal? (count corpus 'world) 4) (check-equal? (count corpus 'hello) 1) (check-equal? (count corpus 'alexa) 0) (check-equal? (count corpus 'hw) 0) ``` --- :::success **3.3** Write a function `grep` that takes a corpus `docs` and a `word` and returns an association list mapping document name to the number of occurrences of `word` in that document. ::: Only documents in `docs` with at least 1 occurrence of `word` should appear as keys in the association list. Use higher-order functions to your advantage. Minimize the number of intermediate data structures (lists / cons cells) that your solution produces that are not part of the final result. ***Note: for this subproblem, order does not matter within the result!*** This problem is inspired by the [grep](https://en.wikipedia.org/wiki/Grep) Linux utility. Examples (though your order need not be the same): ```racket (check-equal? (grep corpus 'hi) '((wc . 1) (hw2 . 1))) (check-equal? (grep corpus 'world) '((wc . 2) (hw2 . 1) (hw . 1))) (check-equal? (grep corpus 'alexa) '()) (check-equal? (grep null 'world) '()) ``` # P4: MapReduce Answer the following questions in a file `mapreduce.rkt`. :::success Read [MapReduce: Simplified Data Processing on Large Clusters](https://www.usenix.org/legacy/event/osdi04/tech/full_papers/dean/dean.pdf) by Dean and Ghemawat. ::: As we mentioned in class, _reduce_ is another name for _fold_, so this system is closely related to the _map_ and _fold_ we have been practicing in Racket. MapReduce was a pioneering system for distributed computing, but it's also a prime example of how an idiom developed within a particularly programming paradigm can be applied even far outside the original languages. The MapReduce system itself is no longer used at Google, but its lessons live on in successor systems. For the written question, you can open and close a block comment in your Racket file with `#|` and `|#`. :::success **4.1** How are the semantics of `map` and `reduce` as described in the original paper different from the semantics in Racket? How are they similar? ::: --- In the rest of this problem, you'll write and use your own simplified `mapreduce` function. Our simplified `mapreduce` runs on one computer and takes four arguments - a mapping function, `mapper`, that takes one element and returns one mapping result; - a reducing function, `reducer`, that takes one mapping result and one accumulator and returns one accumulator; - an initial accumulator, `init`; and - a list of elements, `elems` The `mapreduce` function applies the `mapper` function to each element of `elems` and uses the `reducer` function to combine these results of the `mapper`. One possible _inefficient_ implementation is: ``` (define (mapreduce-slow mapper reducer init elems) (foldl reducer init (map mapper elems))) ``` This implementation creates an intermediate list (the result of `map`) that is used only for `fold`ing and then discarded. If we imagined this running on massive datasets on distributed servers, this is a bummer! :::success **4.2** Implement a version of `mapreduce` that creates no temporary cons cells (that is, the only `cons` cells should be those returned as part of the final result). ::: An optimal answer is only a few lines. Examples: ```Racket > (define xs (list 1 2 3 4)) > (mapreduce (lambda (x) (* x x)) + 0 xs) 30 > (mapreduce (lambda (x) (* x x)) cons empty xs) '(16 9 4 1) > (mapreduce (lambda (x) (first (rest x))) cons empty corpus) '(world hi hello) ``` :::success **4.3** Write a function `grep-mapreduce` that acts just like your previous `grep`, but is implemented via your `mapreduce`. ::: The first expression evaluated in the body of `grep-mapreduce` must be a call to `mapreduce` (though you can write any number of inner helper functions before this call). Neither `grep-mapreduce` nor any helper function you write for it should use explicit recursion. Helper functions may use `mapreduce` or other higher-order functions. # P5: HOFs in the wild :::warning For this problem, you should not use GenAI to help you find or explain these library functions. You can use it, however, to help you find interesting libraries to consider. ::: Answer this problem in your `README`. --- Many popular programming *frameworks* include higher-order library functions. Choose one or two Python frameworks you find interesting. For example, Django and Flask are popular web development frameworks. NumPy and Pandas are popular scientific computing frameworks. Scikit-learn, Pytorch, and Tensorflow are popular machine learning frameworks. :::success **5.1** Find at least 3 examples of library functions that take lambdas as arguments. ::: You don’t need to install or run these frameworks; you can use documentation, tutorials, online textbooks, or examples. For each library function, give a link to the documentation. Explain the details of the lambda the function expects (e.g., what arguments does it consume? What result should it produce?). Explain how the library uses the lambda: does it apply it once? Multiple times? :::success **5.2** Reflect on why frameworks might include functions that expect lambdas as arguments. Would it be possible to implement the same functionality without using higher-order functions? ::: --- # Grading :::success In your README, also report: - Roughly how long did this assignment take you (in hours)? - Which was the most challenging subproblem? - Which your favorite subproblem? ::: This assignment is worth 100 points, broken down by: :::info - 10: Problem 1 - 20: Problem 2 - 20: Problem 3 - 25: Problem 4 - 10: Problem 5 - 15: RackUnit tests ::: # Submission ==Have *one person* submit the following **4 files** on **Gradescope**==. If you are working with a partner, make sure the submitter adds the second partner via the Gradescope interface. :::success - `booleans.rkt` - `data.rkt` - `mapreduce.rkt` - `README.txt` (or `README.md`) - Be sure to answer the questions on timing/subproblems from above. ::: # Attribution Some exercises adapted from previous CS251 versions by Ben Wood and Carolyn Anderson.