--- title: Homework 4 tags: Homeworks-F22, 2022 --- # Homework 4: Data Drama **Due:** Tuesday, October 4, 2022 at 11:59PM ET. ## Big Picture This assignment has you practice a multi-stage data-analysis task. We're going to write functions that draw on data from two tables to perform computations. We'll also do multiple versions of a function, where each version adds more features, so you can see an example of building up solutions a bit at a time. This assignment draws on all of the table lectures so far. The planning portion was covered in lecture on Wed 9/28 and again in lab 3 (9/29-30). While we have covered everything that you need as of 9/28, the examples we do in class on 9/30 will give more practice with the concepts in this assignment. ## Setup and Handin ### Setup - Create a file in [Pyret](http://code.pyret.org) called `hw4-code.arr` where you will write your Homework 4 solution. - Open the [planning tool through this link](https://snap.berkeley.edu/snap/snap.html#present:Username=kfisler&ProjectName=table-plans-hw4) (which sets up initial blocks). If you want the tool (Snap) to manage your files, create a Snap account (note: the tool does not autosave -- you have to save your work manually). The planning tool comes pre-populated with pink blocks for each of the art-store problems. Please put all of your plan work within the corresponding pink block (for grading purposes). - Create a file in your preferred word processor (Google Docs, Word, TextEdit, WordPad, etc) called `hw4-written.pdf`. This will hold your written work and graphs. - Do **not** put your name anywhere in the file. - Copy and paste the following starter code into `hw4-code.arr`: :::spoiler Expand here for HW4 starter code ``` include data-source include gdrive-sheets include shared-gdrive("dcic-2021", "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep") ALL-DATA = load-spreadsheet("1nR1VYygeDxFtaPFER8IXaeTokL5c_4uVuaGbZrs4Cts") # Tables for Part 1 ART = table: id :: Number, cost :: Number, currency :: String row: 23, 35, "USD" row: 85, 100, "CHF" row: 99, 90, "EUR" row: 56, 65, "USD" row: 56, 65, "USD" row: 250, 121, "CAD" row: 64, 33, "CNY" row: 87, 72, "JPY" row: 24, 120, "MXN" row: 59, 500, "JPY" row: 59, 500, "JPY" row: 81, 80, "USD" end CURRENCY-CONVERSION = table: from-c :: String, to-c :: String, conv-rate :: Number row: "USD", "EUR", 0.99 row: "EUR", "CHF", 1.05 row: "CNY", "CAD", 0.19 row: "MXN", "EUR", 0.05 row: "MXN", "EUR", 0.05 row: "USD", "MXN", 20.00 end # Tables for Part 2 CUSTOMERS = table: id :: Number, age :: Number, town :: String, device :: String, has-car :: Boolean row: 111, 21, "oatman", "pearPods", false row: 112, 23, "kingman", "pearPods", true row: 113, 25, "kingman", "pearPods", false row: 320, 57, "kingman", "pearTunes", true row: 1010, 0, "oatman", "pearPods", false row: 170, 20, "providence", "pearPods", true row: 180, 18, "oatman", "pearPods", true row: 190, 21, "oatman", "", true row: 150, -2, "kingman", "pearPods", true row: 160, 34, "providence", "pearTunes", false row: 220, 40, "oatman", "pearPods", true row: 330, 41, "kingman", "pearPods", true end AD-INTERACTION = load-table: customer-id :: String, ad-id :: String, interacted :: Boolean source: ALL-DATA.sheet-by-name("ad-interaction", true) end # Table for Part 3 titanic-raw-loader = load-spreadsheet("1ZqZWMY_p8rvv44_z7MaKJxLUI82oaOSkClwW057lr3Q") TITANIC-RAW = load-table: survived :: Number, pclass :: Number, raw-name :: String, sex :: String, age :: Number, sib-sp :: Number, par-chil :: Number, fare :: Number source: titanic-raw-loader.sheet-by-name("titanic", true) end ``` ::: ### Handin - Hit "Run", then paste this check block into the interactions window. All the tests should pass: if they don't, this means you have a typo in a function header and the *autograder won't run on your code!* ``` check "functions exist and have correct inputs": get-art-in-1(23, "USD") is 35 get-art-in-2(23, "USD") is 35 get-art-in-3(23, "USD") is 35 is-table(has-interacted(CUSTOMERS, AD-INTERACTION, 1)) is true end ``` <!-- t1 = table: id :: Number, age :: Number, town :: String, device :: String, has-car :: Boolean end t2 = table: customer-id :: String, ad-id :: String, interacted :: Boolean end exists = lam(f): true end exists(get-art-in-1) is true exists(get-art-in-2) is true exists(get-art-in-3) is true has-interacted(t1, t2, 0) is t1 end ``` --> - Download your planning file and make sure it is called `table-plans-hw4.xml` - Download your code file and make sure it is called `hw4-code.arr`. - Your written answers should be clearly marked with corresponding task numbers. Download or save your written file and **make sure it is a PDF** called `hw4-written.pdf`. If your written file is not a PDF, we might not be able to grade it. - Hand in `table-plans-hw4.xml`, `hw4-code.arr` and `hw4-written.pdf` on [Gradescope](https://www.gradescope.com/courses/423395). ### Remember your resources! - [Pyret documentation](https://www.pyret.org/docs/latest/) - [CS0111 Table documentation](https://hackmd.io/@cs111/table) (use this **instead** of the built-in `Table` documentation) - [Planning Tool Documentation](https://docs.google.com/document/d/1zHV0MMZ7rkqNeYB0fE04vTMfxi43ZwY_bffynYLsCGE/edit?usp=sharing) - [TA hours](https://brown-csci0111.github.io/calendar) - [Edstem](https://edstem.org/us/courses/27983/discussion/) and the Homework 4 FAQ (will link once it starts)! - [Course missive](https://hackmd.io/@cs111/syllabus-f22) - [Code clarity guide](https://cs.brown.edu/courses/csci0111/fall2021/assets/docs/pyret-clarity-design-testing.html) ## The Assignment :::danger *Remember to use the CS0111 `Table` documentation (linked in Setup and Handin), not the official documentation.* ***If your code uses constructs named sieve or extend (without functions), for example, you are using the wrong version and wil not get credit.*** ::: ## Part 1: Prices Under Exchange Rates :::info Learning Goals: - Practice planning programs - Combine data across two tables to answer a question ::: Companies that do business internationally need to be able to quote prices in different currencies, following a table of exchange rates. This is the kind of situation where one would have two separate tables: one with the prices of items, and another with the exchange rates per country. For this problem, we'll be computing currency conversions for an online art store in which artists and clients may be in different countries. The store has two tables, that are both in your starter code. One is for artwork, called `ART`: it tracks the unique id, cost and base currency for each piece. ``` table: id :: Number, cost :: Number, currency :: String ``` The other table is for currency conversion, called `CURRENCY-CONVERSION`. Each row holds the conversion rate (the multiplicative factor) to convert from the first currency (`from-c`) to the second (`to-c`). ``` table: from-c :: String, to-c :: String, conv-rate :: Number ``` We will use these tables to develop several programs for selling art across currencies. **This includes planning your programs.** You will develop both a plan (using the blocks-based planning tool) and the code for each of the following problems. :::spoiler How do I see what the `Tables` in this assignment look like? To see what these `Tables` look like, feel free to type the name of the `Table` in the definitions window and hit "Run", or hit "Run" first and then type the name in the interactions window and hit return. ::: <br> **Task 1:** Consider a function `get-art-in-1` that takes an artwork (by ID number) and a desired currency (a string) and returns the price of the artwork in the given currency. The given currency may or may not be the one listed for the piece in the artwork table. If the artwork is listed in a different currency, use the conversion table to produce the price. You can assume that every artwork queried is listed exactly once in the artwork table, and that every pair of currencies you need is listed exactly once in the currency conversion table. - produce a plan for this program in the planning tool - write the corresponding code in `hwk4-code.arr` :::spoiler Why are some of my tests not passing? There are some errors in `ART` and `CURRENCY-CONVERSION` – we will be tackling these in the next task! For now, make sure that the artworks and currencies you are using for testing are listed exactly once in their respective tables, and any conversions are shown from the existing currency to the desired currency. `get-art-in-1` does not need to handle cases where this is not true. ::: <br> Unfortunately, in practice, errors creep in. Either table may be faulty: entries may be deleted by accident, or there may be duplicate entries. A robust program checks for such conditions and raises errors if they occur. **Task 2:** Develop another version of the function and call it `get-art-in-2` (with the same input/output types). The revised version should raise an error if either table has missing or multiple entries for the info you are looking for. Your new program should only compute and return a numeric answer if none of these happen. - produce a plan for this program in the planning tool - write the corresponding code in `hwk4-code.arr` *Note: Consider whether there any helper functions you can write that can help you clean up your code across the versions. It's okay to only develop those when you write your code (as opposed to when you plan).* **Task 3:** Sometimes, the currency conversion table may not list the conversion from A to B, but it may list the conversion from B to A. Develop one more version, this time called `get-art-in-3` (same inputs/outputs) that uses the inverse conversion if necessary. (You compute the inverse of a number $N$ using $1/N$, where $/$ is division). - produce a plan for this program in the planning tool - write the corresponding code in `hwk4-code.arr` ## Part 2: Even More Ads! :::info Learning Goals: - Practice writing functions that take tables as inputs - Practice writing helper functions for use with tables - Practice testing tables ::: We are continuing with our exploration of how ads are matched to customers by now imagining that we have an entire table of customers (rather than a single customer). Specifically, we have a table with ad-interaction data showing which customers have clicked on which ads in the past. Modern ad-matching looks at the traits of other people who have clicked an ad in the past to estimate whether a new person might also respond to the same ad (based on having similar traits to people who clicked before). In this context, we are going to write a function that takes a table of customer data, a table of ad-interaction data and the ID of an ad, and returns a table with just the data of customers who clicked on that ad. An example of a customer-data table is available in your starter code, called `CUSTOMERS`. An example of an ad-interaction table is also available, called `AD-INTERACTION`. An ad-interaction table must at least have columns named `"customer-id"`, `"ad-id"`, and `"interacted"` (the first two are numbers, the second is a Boolean). **Task 4:** Develop three example ad-interaction tables in Pyret that you will use to test whether the function you are about to write is working. These tables should be smaller and more manageable than the original `AD-INTERACTION` table. In a comment above each table, explain what situation that table has been designed to check (including in contrast to your other two examples). *Note: You can copy and edit the `AD-INTERACTION` table from the starter code, but please leave the name `AD-INTERACTION` referring to the original table.* *Note: In the starter code, Pyret can read the Strings from the `customer-id` and `ad-id` columns in the `AD-INTERACTION` table and convert them to Numbers. When you write custom ad-interaction tables to test with, your `customer-id` and `ad-id` columns can be of type `Number`.* **Task 5:** Using the blocks-planning tool, develop a plan for creating a table of those customers who interacted with a specific ad number. If you want to fix an ad-number for now, that's fine. The output table will be a subset of a customer-data table. **Task 6:** Using your examples and your plan, develop a function called `has-interacted` that takes a customer-data table, an ad-interaction table and the ID of an ad and returns a table of data for customers who interacted with that ad. The output table will be a subset of the inputted customer-data table. **Task 7:** In a comment under your function, describe whether/how the examples and the plan were/weren't useful in developing your function. There's no right or wrong here. Reflecting on what you are doing while problem solving is an important step in internalizing how to solve future problems. ## Part 3: A Titanic Dataset :::info Learning Goals: - Think about how table designs and code can embody assumptions (possibly limited) about people - Practice cleaning up raw data for reliable processing ::: There are a few different versions of databases of passengers who sailed on the Titanic. (There is some ambiguity in this term, because the Titanic made a few stops even on its maiden voyage.) One such database is in your starter code, called `TITANIC-RAW`. We're going to do an analysis of this dataset, but as a slightly more open-ended problem. Overall, your goal is to determine three things from this table: - The six most popular first names of male passengers, - The six most popular first names of female passengers, - A graphical summary of the frequencies of the titles (Mr, etc) of passengers. Relative to the dataset, we define the relevant concepts as follows: - A man has sex field “male”, woman has it as “female” (we acknowledge that this binary may omit or offend some of you; hold on -- we return to this issue later in the exercise) - A “title” is the part of the raw-name field up to but not including the first period. - A “first name” is the part of raw-name between the first and second spaces, skipping over any leading parenthesis. *Note: you might want to review the Tables Documentation when looking for functions to help you with these problems.* *Note: Assume that the titles (Mr., Miss., etc) all end with a '.'.* **Task 8:** Develop a plan to determine the six most popular names of female passengers. Repeat for the six most popular names of male passengers. (Your final program will just need the names to appear in the interactions window in order, starting with the most popular one.) **Task 9:** Develop a plan to summarize the frequencies of the titles in an appropriate chart or graph (indicate which kind of chart you use within the text in one of your boxes). **Task 10:** Write a program that computes these values. Use sensible variable names and/or comments to make clear which parts of your program compute what. In your `hw4-written.pdf` file: - Write the 6 most common male first names in descending order - Write the 6 most common female first names in descending order - Include screenshots of your chart summarizing the frequencies of titles **Task 11:** In your `hw4-written.pdf` file, describe any observations you have about the three answers for task 10. **Task 12:** Read the following two short articles and include your answers to the questions below in `hw4-written.pdf` - [Falsehoods Programmers Believe About Names](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/). - [Gender Storage in Databases](https://slate.com/technology/2019/10/gender-binary-nonbinary-code-databases-values.html) Questions: - Discuss what assumptions were made about names in the Titanic dataset that we gave you. - Do the articles help explain what might have led to any of the interesting observations you made about the resulting data? ### Remember to submit your assignment on Gradescope! Check the handin info at the top for a summary of what we're expecting you to turn in. <!-- :::info #### `hw4-code.arr` Checklist - Part 1: - Task 1: `get-art-in-1` - Task 2: `get-art-in-2` - Task 3: `get-art-in-3` - Part 2: - Task 4: 3 example ad-interaction tables - Task 6: `has-interacted` - Task 7: Comment reflecting on examples and plan - Part 3: - Task 10: Program computing six most popular male and female names, and summarizing title frequency #### `table-plans-hw4.xml` - Downloaded from the Snap-based planning tool #### `hw4-written.pdf` Checklist - Part 3: - Task 10: - Top 6 female passenger names - Top 6 male passenger names - Screenshot of chart summarizing title frequency - Task 11: Observations about Task 10 answers - Task 12: Answers to 2 reflection questions ::: --> <!-- - Hit "Run", then paste the check block from the [Handin section](https://hackmd.io/Kff2t2B3Tmmx8Ht5FuIfiQ?both#Setup-and-Handin) above into the interactions window. All the tests should pass: if they don't, this means you have a typo in a function header. - Download your code file and make sure it is called `hw4-code.arr`. - Your written answers should be clearly marked with corresponding section numbers. Download or save your written file and **make sure it is a PDF** called `hw4-written.pdf`. If your written file is not a PDF, we might not be able to grade it. Hand in your work on [Gradescope](https://www.gradescope.com/courses/423395). --> ## Theme Song [Fly Me to the Moon](https://www.youtube.com/watch?v=ZEcqHA7dbwM) by Frank Sinatra ------ > Brown University CSCI 0111 (Fall 2022) > Do you have feedback? Fill out this form <iframe src="https://docs.google.com/forms/d/e/1FAIpQLSd1cjOjhS2vhnmC2ju5BuhO-vmUEU5b925EIcKfi0vBgBRjgg/viewform?embedded=true" width="640" height="375" frameborder="0" marginheight="0" marginwidth="0">Loading…</iframe>