Homework 4 - HackMD

--- title: Homework 4 tags: Homeworks-F21, 2021 --- # Homework 4: Data Drama **Due:** Tuesday, October 5, 2021 at 9:00PM ET. ## Setup and Handin ### Setup - **Task:** Create a file in [Pyret](http://code.pyret.org) called `hw4-code.arr`. You will write your solution in here. - **Task:** In the same way as in Homework 3, there's a file on Canvas called `hw4-helper.arr`. Copy and paste everything from this file into your new `hw4-code.arr` file. **Do not look at `hw4-helper.arr` before you turn in Homework 3.** We can tell when you look at something, and we'll be checking to see who peeks. :::spoiler *How do I open `hw4-helper.arr`?* Download this file from Canvas. You can open it in a plain-text editor (e.g. TextEdit, WordPad) to copy and paste into your `hw4-code.arr` file. If you want to see it in Pyret, navigate to [code.pyret.org](http://code.pyret.org) and hit "View in Google Drive". Once You're in the Google Drive folder, drag and drop the file in, right click on it > Open with > Pyret. ::: - **Task:** Create a file in your preferred word processor (Google Docs, Word, TextEdit, WordPad, etc) called `hw4-written`. - You will be graded on the quality of your examples for each function. :::spoiler What does this mean? - High-quality examples cover the simple/basic cases for each parameter, values at/around any boundaries mentioned in the problem statement, common cases, and potentially subtle cases (e.g., strings with two uses of a character where a function is looking for one use). - You do not need to include examples for erroneous inputs (such as negative numbers if only positives make sense for the problem). ::: - Do **not** put your name anywhere in the file. ### Handin - Hit "Run", then paste this check block into the interactions window. All the tests should pass: if they don't, this means you have a typo in a function header. ``` check: t1 = table: id :: Number, age :: Number, town :: String, hobby :: String, has-car :: Boolean end t2 = table: part-name :: String, material :: String row: "", "" end t3 = table: material :: String, price :: Number row: "", 0 end t4 = table: name :: String, gender :: String, occupation :: String, num-visits :: Number, fav-trip :: String end exists = lam(f): true end customers-to-show-ad-to(CUSTOMERS, AD-INTERACTION, 0) is t1 customers-in-kingman(t1) is t1 exists(show-age-distrib) is true exists(show-car-proportions) is true part-cost("", t2, t3) is 0 exists(clean-spork) end ``` - Download your code file and make sure is is called `hw4-code.arr`. Download or save your written file and **make sure it is a PDF** called `hw4-written.pdf`. If your written file is not a PDF, we might not be able to grade it. Hand in your work on [Gradescope](https://www.gradescope.com/courses/302007). ### Remember your resources! - [Pyret documentation](https://www.pyret.org/docs/latest/) - [CS0111 Table documentation](https://hackmd.io/@cs111/table) (use this **instead** of the official `Table` documentation) - [TA hours](http://cs.brown.edu/courses/csci0111/fall2021/calendar.html) - [Edstem](https://edstem.org/us/courses/12806/discussion/) and the [Homework 4 FAQs](https://edstem.org/us/courses/12806/discussion/583852)! - [Course missive](https://hackmd.io/@cs111/syllabus-f21) - [Code clarity guide](https://cs.brown.edu/courses/csci0111/fall2021/assets/docs/pyret-clarity-design-testing.html) ## The Assignment ## Part 1: Even more ads! :::info Learning Goals: - Practice writing functions that take tables as inputs - Practice testing tables - Work with library functions that create plots and charts of tabular data ::: ### Part 1A: Multiple ads :::danger *Remember to use the CS0111 `Table` documentation (linked in Setup and Handin), not the official documentation.* ***If your code uses constructs named sieve or extend (without functions), for example, you are using the wrong version.*** ::: In the last homework, we handled a growing collection of data by using a `Row` for a customer's data. This week, we want to handle a larger number of customers themselves, by having a multi-row `Table` of customers as the input. Since there will be many customers, we'll give each customer an ID number that we can use for reference. ID numbers are unique, but besides that, they can be any number. The code in `hw4-helper.arr` contains some `Tables` and functions for various parts of the homework. Copy and paste the code from the helper file into your hw4-code.arr file. In this part we'll use `CUSTOMERS` and `AD-INTERACTION`. The second `Table` is new. You can see what kind of data it contains using the interactions window. *Note: Please don't edit `CUSTOMERS`; if you want other example `Tables` which are similar, please define a new `Table`.* First, the Oatman Ad Agency also wants to have a way to find customers in any given `Table` who live in `"kingman"`. **Task:** Write a function called `customers-in-kingman`, which takes in a `Table` with **at least** column names `id :: Number` and `town :: String` (like `CUSTOMERS`) and returns a `Table` with the same column names and only the `Rows` of customers whose `town` is `"kingman"` (case sensitive). Now, we want to identify the customers to show our ad to.  **Task:** Write a function called `customers-to-show-ad-to`, which takes in two `Tables` and a `Number`. The first `Table` has the column names `id :: Number, age :: Number, town :: String, hobby :: String, has-car :: Boolean` (like `CUSTOMERS`). The second `Table` has the column names `customer-id :: String, ad-id :: String, interacted :: Boolean` (like `AD-INTERACTION`), and describes if each customer interacted with each ad. The `Number` is the `ad id` (either 1, 2, or 3 for the given `AD-INTERACTION` `Table`). This function outputs a new `Table` containing the rows of customers in the `CUSTOMERS`-like table who interacted with the given `ad-id` (with the columns `id :: Number, age :: Number, town :: String, hobby :: String, has-car :: Boolean`). In order to write examples for `customers-to-show-ad-to`, feel free to define new `Tables` explicitly. Your new tables need to have the same column names as `CUSTOMERS` and `AD-INTERACTION`. :::info ::: spoiler **Hint!** `filter-with` is going to be a very useful built-in function for this problem. As you can see in the [documentation](https://hackmd.io/@cs111/table), `filter-with` takes in a `Table` and a function with the type `(Row -> Boolean)`. Also, you don't need lists to complete this problem (we haven't introduced them in class yet, so you aren't expected to know them!) Instead, remember that `filter-with` iterates over the rows in a table. If you want to iterate over, say, a second table at the same time, you may want to consider using a second `filter-with` **inside** of the original... You can define a function inside of another function like so: ``` fun outer-table-helper(r :: Row) -> Boolean: doc: "your docstring" fun inner-table-helper(r2 :: Row) -> Boolean: doc: "another docstring" # can access both r and r2 in here! end # what should happen for every iteration through the rows of the outer table? end filter-with(my-table, outer-table-helper) ``` ### Stencil Here is a stencil that you can use to create your `customers-to-show-ad-to` function. It gives you the general structure, and where to nest functions, but you still need to fill in the parts where you see `TODO`! ``` fun customers-to-show-ad-to( customer-like-table :: Table, interaction-like-table :: Table, ad-id :: Number) -> Table: doc: ```Takes in a Table of potential customers and a Table of which customer interacted with which ad and returns the first Table, filtered by customers who interacted with the ad indicated by ad-id``` fun trim-interactions(r :: Row) -> Boolean: # TODO: Return true when the row's ad-id matches the ad-id we're # considering and customers have interacted with ad # Your code here! end # TODO: Create a new table called current-ad-data containing only rows from # interaction-like-table where ad-id's match up and interacted is true # Your code here! fun matching-customers(customer-row :: Row) -> Boolean: fun interactions-matching-customer(interaction-row :: Row) -> Boolean: # TODO: Return true if customer-id in interaction-row matches id in customer-row # Your code here! end # TODO: Filter current-ad-data using interactions-matching-customer. # If the resulting table has more than 0 rows, this should return true # (i.e. the customer had at least 1 interaction with this ad) # Your code here! end # TODO: Filter the customer-like-table with matching-customers to get # the customers that interacted with the given ad! # Your code here! end ``` ::: *Note: Pyret can read the Strings from the `customer-id` and `ad-id` columns in the `AD-INTERACTION` table and convert them to Numbers. If you choose to write custom ad-interaction tables to test with, your `customer-id` and `ad-id` columns can be of type `Number`.*   ### Part 1B: Visualization Let's help the Oatman Ad Agency visualize some of this customer data! Specifically, let's look at: 1. The proportion of customers in `CUSTOMERS` with a car vs. without a car 3. The distribution of customers in `CUSTOMERS` across the following age groups: 20 or under, 21-30, 31-40, and 41 or older You will refer to the CS111 `Table` documentation to find the appropriate plotting functions. **Task:** Develop a function `show-car-proportions` that takes a `Table` and produces an `Image` that conveys the proportion of customers that do/do not have cars. **Task:** Develop a function `show-age-distrib` that takes in a `Table` and produces a single `Image` of an appropriate plot or chart to show the distribution described in item 2. This raises an interesting question about how to deal with the blank values in the age column. Options include: - making blank values a separate category - omitting them from the visualization - counting them as part of "20 and under" Implement whichever (one!) of these you feel is the most reasonable approach to handling this data. **Task:** In your `hw4-written` file, briefly discuss when you believe each of these three options would be appropriate. Make sure your written answers like this are clearly marked by section (e.g. label this answer `"1B"`). :::spoiler *Hints!* *Hint 1: Decide which kind of plot or chart you want to produce. You may need to play with some of the functions in the Table documentation to figure out what makes sense.* *Hint 2: Plan out how you will get from the original data table to the inputs needed for the plot/chart function that you've chosen to use. Do you need a compute additional information? Limit to a certain set of rows? Do you need to do these in a particular order? List those out (in prose), then build expressions to perform each step.* *Hint 3: Look at the `count` builtin function. What does the table produced by the `count` function look like? What are the names for its columns?* ::: <br> *Note: You don't need to test functions that produce `Images`, but you do need to test any helpers which don't output `Images`. However, these functions should work on any `Table` with the column names used within the function, not just the `CUSTOMERS` table.*  **Task:** The Agency is thinking about producing a scatterplot that maps customers' IDs against their ages. **Do not** actually produce this scatterplot, but consider: What useful information would such a scatterplot provide? Give your answer in a block comment of a sentence or two under your implementation of `show-car-proportions`. ## Part 2: Ship Repair (Extraction) :::info Learning Goals: - Combine data across two tables to answer a question ::: Captain Cork realizes that his ship, the Starship Fisler, is in drastic need of repairs. He finds a list of parts that need fixing, but he finds that each part is priced by its *material*. As part of the code you copied over from `hw4-helper.arr`, there is one `Table` called `PART-INFO` that includes the name and material for each item, and another `Table` called `MATERIAL-PRICES` that shows the price of each material. :::spoiler How do I see what these `Tables` look like? To see what these `Tables` look like (since they're not explicitly defined like `CUSTOMERS`, feel free to type the name of the `Table` in the definitions window and hit "Run", or hit "Run" and then type the name in the interactions window and hit return. ::: <br> Cork wants to know if there is an easier way to figure out how much he has to spend for each item. **Task:** Write a function called `part-cost` which takes in a `String` representing the name of a part, a `Table` with column names `part-name :: String, material :: String` (like `PART-INFO`), and a `Table` with column names `material :: String, price :: Number` (like `MATERIAL-PRICES`). It should return a `Number` representing the cost associated with the input part name. You can assume that the input name exists once and only once in the first `Table`. :::spoiler *Hints!* *Hint 1: Work through a concrete example: pick a part name and work out a list of the computations you'll need to do (say, on each table) to compute the answer. Write expressions to do the computations just for your single concrete example. Then, step back and create functions to let you do the computation for any part.* *Hint 2: Consider creating a helper function to help break up the computation* :::  ## Part 3: Spork's Shuttle Service (Data Cleaning) **We will cover this part in lecture on Friday, 10/1** :::info Learning Goals: - Think about how table designs and code can embody assumptions (possibly limited) about people - Practice cleaning up raw data for reliable processing ::: Mr. Spork has a wonderful business shuttling aliens across space. He wants to roll out a new a customer loyalty program, and his plan is to start a trial run to see how the strategy will impact business. After gathering initial data about his trial participants, Spork finds the data is already suuuuper messy, and he needs your help before he can begin the trial! In the code you copied from `hw4-helper.arr`, there is a detailed `Table` called `SPORK-DATA`. It includes each participant's name, gender, occupation, number of visits in the past 90 days, and favorite constellation destination. Spork needs this data to launch a student/teacher discount and hold special sales. ### Part 3A: Written Before we try to clean Spork's data, we should think about what the valid contents of each field could be. Turns out, this can be pretty hard (many programmers and companies get this wrong in practice). **Task:** Read this short article on [falsehoods programmers believe about names](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/). **Task:** Read this article on [gender storage in databases](https://slate.com/technology/2019/10/gender-binary-nonbinary-code-databases-values.html). **Task:** Answer the following questions **in your `hw4-written` file** *before* moving on to the coding task below. Make sure your written answers like this are clearly marked by section and number (e.g. label these answer `"3A"`and label each question with its number). 1. Take a look at `SPORK-DATA`. Based on the two articles you've just read, what are two assumptions this `Table` seems to reflect, based on its structure or the data you see? Provide an example from `SPORK-DATA` of each assumption you name, and explain why the assumption is harmful or why it does not always hold true. :::spoiler *Hints!* *Hint: Because the articles are about names and gender, try paying special attention to those columns!* ::: 3. In order to help Spork out, Alyssa has decided to write some code to clean up their customer data. Looking at the function `clean-spork-names` and its helpers, which you copied from `hw4-helper.arr`, identify 3 separate assumptions Alyssa is making about names, giving an example from `SPORK-DATA` for why each assumption is false. :::spoiler *Hints!* *Hint 1: You can run the function in Pyret and see its effect on the `Table`!* *Hint 2: Pay special attention to the first ~30 rows of the `Table`!* ::: *Before proceeding to part 3B, make a short list of the issues that you see needing to be cleaned up before someone could work with the data. You aren't turning this in, but it is a good sanity check on your understanding of data cleaning before we tell you what to clean.* ### Part 3B: Designing and Coding :::danger **[Updated 10/1]:** We spent Friday talking about planning table programs rather than cleaning them. As a result, we haven't covered everything that you might need for these three coding problems. We are therefore going to give you two options here: you can write the code as originally assigned (since technically you've seen what you need for these questions), or you can turn this more into practice with planning out programs (which we expect would be valuable for many of you). It's entirely up to you. ::: #### Option 1: The original coding option Now it's your turn to try your hand at cleaning and preparing the data! **Task:** Write function called `clean-spork` that takes a `Table` with the columns of `SPORK-DATA` and produces a table with at least the same column names, as well as one new column called `"stud-teach"`. The cleaned table may have fewer rows than the original. The cleaned table should have: 1. normalized the names of constellations so that only the four strings`"libra"`, `"orion"`,`"aries"`, or `"other"` appear in the `"fav-trip"` column. For example, ``"laries"`` -> ``"aries"`` 3. only numbers between 1 and 200 in the `"num-visits"` column 4. a new `"stud-teach"` column that summarizes/bins the original `"occupation"` column. The new column should have only the values `"student"`, `"teacher"`, or `"other"`. Teacher is a catch-all term for professors, teachers, and instructors. *Note: Try calling `clean-spork` on `SPORK-DATA` itself to see what it does, but since it's such a big `Table` it doesn't make sense to make this an example in the `where` block for `clean-spork`. Instead, like some other functions in this homework, make a test table with rows and columns for the specific cases that you want to test. Make the data varied, but your example `Table` doesn't need to be too long if you design your rows carefully.* :::spoiler *Hints!* *Hint 1: You can and should visualize the data for outliers, use functions like `count`, etc., to help you figure out ways in which the data is messy.* *Hint 2: You might want to try writing more than one helper function for `clean-spork`!* ::: <br> **Task:** In your `hw4-written` file, write a few sentences to reflect on what you learned from this assignment. What have you learned about working with data? What have you learned about coding? We won't grade your answers, but we will look at them to see what's emerging for everyone. #### Option 2: The design practice option Consider the same `clean-spork` function that's described in the coding option, but **don't write the code**. **Task:** Write down a concrete example of a smaller version of `SPORK-DATA` to use as a potential testing input for `clean-spork`. There's no fixed number of rows required. Choose a collection of rows that you think highlight some of the key issues that your function would have had to handle. **Task:** Provide the corresponding output table that you think should result from calling `clean-spork` on your example input table. You can write your tables for these two tasks either in Pyret, as drawings on paper, or in something like google sheets (and upload a screenshot or PDF of the sheets). If you upload a file (rather than use Pyret), call it `clean-spork-tables.(png or pdf)`. **Task:** Write out a program-planning to-do list for writing `clean-spork`. Each of your list items should include a brief prose description of a computation that's needed, the name of a relevant built-in operator from Pyret (if applicable), or the name and input/output types of functions that you would need to write to accomplish that item. By the time you are done, you should have a list of most of the functions that you would need for this problem, but without the bodies of those functions filled in. Remember, this is not about writing code. It is about breaking down the `clean-spork` task into smaller concrete programming steps. One way to approach this is to think about what has to be done to produce each column value in the output table that is not in your input table.  ## Theme Song [Fly Me to the Moon](https://www.youtube.com/watch?v=ZEcqHA7dbwM) by Frank Sinatra ------ > Brown University CSCI 0111 (Fall 2020) > Do you have feedback? Fill out [this form](https://forms.gle/BuehRpxWnX97xYB68).