Due: Tuesday, October 5, 2021 at 9:00PM ET.
Task: Create a file in Pyret called hw4-code.arr
. You will write your solution in here.
Task: In the same way as in Homework 3, there's a file on Canvas called hw4-helper.arr
. Copy and paste everything from this file into your new hw4-code.arr
file. Do not look at hw4-helper.arr
before you turn in Homework 3. We can tell when you look at something, and we'll be checking to see who peeks.
hw4-helper.arr
?Download this file from Canvas. You can open it in a plain-text editor (e.g. TextEdit, WordPad) to copy and paste into your hw4-code.arr
file. If you want to see it in Pyret, navigate to code.pyret.org and hit "View in Google Drive". Once You're in the Google Drive folder, drag and drop the file in, right click on it > Open with > Pyret.
Task: Create a file in your preferred word processor (Google Docs, Word, TextEdit, WordPad, etc) called hw4-written
.
You will be graded on the quality of your examples for each function.
Do not put your name anywhere in the file.
hw4-code.arr
. Download or save your written file and make sure it is a PDF called hw4-written.pdf
. If your written file is not a PDF, we might not be able to grade it. Hand in your work on Gradescope.Table
documentation)Learning Goals:
Remember to use the CS0111 Table
documentation (linked in Setup and Handin), not the official documentation. If your code uses constructs named sieve or extend (without functions), for example, you are using the wrong version.
In the last homework, we handled a growing collection of data by using a Row
for a customer's data. This week, we want to handle a larger number of customers themselves, by having a multi-row Table
of customers as the input.
Since there will be many customers, we'll give each customer an ID number that we can use for reference. ID numbers are unique, but besides that, they can be any number.
The code in hw4-helper.arr
contains some Tables
and functions for various parts of the homework. Copy and paste the code from the helper file into your hw4-code.arr file. In this part we'll use CUSTOMERS
and AD-INTERACTION
. The second Table
is new. You can see what kind of data it contains using the interactions window.
Note: Please don't edit CUSTOMERS
; if you want other example Tables
which are similar, please define a new Table
.
First, the Oatman Ad Agency also wants to have a way to find customers in any given Table
who live in "kingman"
.
Task: Write a function called customers-in-kingman
, which takes in a Table
with at least column names id :: Number
and town :: String
(like CUSTOMERS
) and returns a Table
with the same column names and only the Rows
of customers whose town
is "kingman"
(case sensitive).
Now, we want to identify the customers to show our ad to.
Task: Write a function called customers-to-show-ad-to
, which takes in two Tables
and a Number
. The first Table
has the column names id :: Number, age :: Number, town :: String, hobby :: String, has-car :: Boolean
(like CUSTOMERS
). The second Table
has the column names customer-id :: String, ad-id :: String, interacted :: Boolean
(like AD-INTERACTION
), and describes if each customer interacted with each ad. The Number
is the ad id
(either 1, 2, or 3 for the given AD-INTERACTION
Table
).
This function outputs a new Table
containing the rows of customers in the CUSTOMERS
-like table who interacted with the given ad-id
(with the columns id :: Number, age :: Number, town :: String, hobby :: String, has-car :: Boolean
).
In order to write examples for customers-to-show-ad-to
, feel free to define new Tables
explicitly. Your new tables need to have the same column names as CUSTOMERS
and AD-INTERACTION
.
filter-with
is going to be a very useful built-in function for this problem. As you can see in the documentation, filter-with
takes in a Table
and a function with the type (Row -> Boolean)
.
Also, you don't need lists to complete this problem (we haven't introduced them in class yet, so you aren't expected to know them!) Instead, remember that filter-with
iterates over the rows in a table. If you want to iterate over, say, a second table at the same time, you may want to consider using a second filter-with
inside of the original…
You can define a function inside of another function like so:
Here is a stencil that you can use to create your customers-to-show-ad-to
function. It gives you the general structure, and where to nest functions, but you still need to fill in the parts where you see TODO
!
Note: Pyret can read the Strings from the customer-id
and ad-id
columns in the AD-INTERACTION
table and convert them to Numbers. If you choose to write custom ad-interaction tables to test with, your customer-id
and ad-id
columns can be of type Number
.
Let's help the Oatman Ad Agency visualize some of this customer data! Specifically, let's look at:
CUSTOMERS
with a car vs. without a carCUSTOMERS
across the following age groups: 20 or under, 21-30, 31-40, and 41 or olderYou will refer to the CS111 Table
documentation to find the appropriate plotting functions.
Task: Develop a function show-car-proportions
that takes a Table
and produces an Image
that conveys the proportion of customers that do/do not have cars.
Task: Develop a function show-age-distrib
that takes in a Table
and produces a single Image
of an appropriate plot or chart to show the distribution described in item 2. This raises an interesting question about how to deal with the blank values in the age column. Options include:
Implement whichever (one!) of these you feel is the most reasonable approach to handling this data.
Task: In your hw4-written
file, briefly discuss when you believe each of these three options would be appropriate. Make sure your written answers like this are clearly marked by section (e.g. label this answer "1B"
).
Hint 1: Decide which kind of plot or chart you want to produce. You may need to play with some of the functions in the Table documentation to figure out what makes sense.
Hint 2: Plan out how you will get from the original data table to the inputs needed for the plot/chart function that you've chosen to use. Do you need a compute additional information? Limit to a certain set of rows? Do you need to do these in a particular order? List those out (in prose), then build expressions to perform each step.
Hint 3: Look at the count
builtin function. What does the table produced by the count
function look like? What are the names for its columns?
Note: You don't need to test functions that produce Images
, but you do need to test any helpers which don't output Images
. However, these functions should work on any Table
with the column names used within the function, not just the CUSTOMERS
table.
Task: The Agency is thinking about producing a scatterplot that maps customers' IDs against their ages. Do not actually produce this scatterplot, but consider: What useful information would such a scatterplot provide? Give your answer in a block comment of a sentence or two under your implementation of show-car-proportions
.
Learning Goals:
Captain Cork realizes that his ship, the Starship Fisler, is in drastic need of repairs. He finds a list of parts that need fixing, but he finds that each part is priced by its material.
As part of the code you copied over from hw4-helper.arr
, there is one Table
called PART-INFO
that includes the name and material for each item, and another Table
called MATERIAL-PRICES
that shows the price of each material.
Tables
look like?To see what these Tables
look like (since they're not explicitly defined like CUSTOMERS
, feel free to type the name of the Table
in the definitions window and hit "Run", or hit "Run" and then type the name in the interactions window and hit return.
Cork wants to know if there is an easier way to figure out how much he has to spend for each item.
Task: Write a function called part-cost
which takes in a String
representing the name of a part, a Table
with column names part-name :: String, material :: String
(like PART-INFO
), and a Table
with column names material :: String, price :: Number
(like MATERIAL-PRICES
). It should return a Number
representing the cost associated with the input part name. You can assume that the input name exists once and only once in the first Table
.
Hint 1: Work through a concrete example: pick a part name and work out a list of the computations you'll need to do (say, on each table) to compute the answer. Write expressions to do the computations just for your single concrete example. Then, step back and create functions to let you do the computation for any part.
Hint 2: Consider creating a helper function to help break up the computation
We will cover this part in lecture on Friday, 10/1
Learning Goals:
Mr. Spork has a wonderful business shuttling aliens across space. He wants to roll out a new a customer loyalty program, and his plan is to start a trial run to see how the strategy will impact business. After gathering initial data about his trial participants, Spork finds the data is already suuuuper messy, and he needs your help before he can begin the trial!
In the code you copied from hw4-helper.arr
, there is a detailed Table
called SPORK-DATA
. It includes each participant's name, gender, occupation, number of visits in the past 90 days, and favorite constellation destination. Spork needs this data to launch a student/teacher discount and hold special sales.
Before we try to clean Spork's data, we should think about what the valid contents of each field could be. Turns out, this can be pretty hard (many programmers and companies get this wrong in practice).
Task: Read this short article on falsehoods programmers believe about names.
Task: Read this article on gender storage in databases.
Task: Answer the following questions in your hw4-written
file before moving on to the coding task below. Make sure your written answers like this are clearly marked by section and number (e.g. label these answer "3A"
and label each question with its number).
SPORK-DATA
. Based on the two articles you've just read, what are two assumptions this Table
seems to reflect, based on its structure or the data you see? Provide an example from SPORK-DATA
of each assumption you name, and explain why the assumption is harmful or why it does not always hold true.Hint: Because the articles are about names and gender, try paying special attention to those columns!
clean-spork-names
and its helpers, which you copied from hw4-helper.arr
, identify 3 separate assumptions Alyssa is making about names, giving an example from SPORK-DATA
for why each assumption is false.Hint 1: You can run the function in Pyret and see its effect on the Table
!
Hint 2: Pay special attention to the first ~30 rows of the Table
!
Before proceeding to part 3B, make a short list of the issues that you see needing to be cleaned up before someone could work with the data. You aren't turning this in, but it is a good sanity check on your understanding of data cleaning before we tell you what to clean.
[Updated 10/1]: We spent Friday talking about planning table programs rather than cleaning them. As a result, we haven't covered everything that you might need for these three coding problems. We are therefore going to give you two options here: you can write the code as originally assigned (since technically you've seen what you need for these questions), or you can turn this more into practice with planning out programs (which we expect would be valuable for many of you). It's entirely up to you.
Now it's your turn to try your hand at cleaning and preparing the data!
Task: Write function called clean-spork
that takes a Table
with the columns of SPORK-DATA
and produces a table with at least the same column names, as well as one new column called "stud-teach"
. The cleaned table may have fewer rows than the original.
The cleaned table should have:
"libra"
, "orion"
,"aries"
, or "other"
appear in the "fav-trip"
column."laries"
-> "aries"
"num-visits"
column"stud-teach"
column that summarizes/bins the original "occupation"
column. The new column should have only the values "student"
, "teacher"
, or "other"
. Teacher is a catch-all term for professors, teachers, and instructors.Note: Try calling clean-spork
on SPORK-DATA
itself to see what it does, but since it's such a big Table
it doesn't make sense to make this an example in the where
block for clean-spork
. Instead, like some other functions in this homework, make a test table with rows and columns for the specific cases that you want to test. Make the data varied, but your example Table
doesn't need to be too long if you design your rows carefully.
Hint 1: You can and should visualize the data for outliers, use functions like count
, etc., to help you figure out ways in which the data is messy.
Hint 2: You might want to try writing more than one helper function for clean-spork
!
Task: In your hw4-written
file, write a few sentences to reflect on what you learned from this assignment. What have you learned about working with data? What have you learned about coding? We won't grade your answers, but we will look at them to see what's emerging for everyone.
Consider the same clean-spork
function that's described in the coding option, but don't write the code.
Task: Write down a concrete example of a smaller version of SPORK-DATA
to use as a potential testing input for clean-spork
. There's no fixed number of rows required. Choose a collection of rows that you think highlight some of the key issues that your function would have had to handle.
Task: Provide the corresponding output table that you think should result from calling clean-spork
on your example input table.
You can write your tables for these two tasks either in Pyret, as drawings on paper, or in something like google sheets (and upload a screenshot or PDF of the sheets). If you upload a file (rather than use Pyret), call it clean-spork-tables.(png or pdf)
.
Task: Write out a program-planning to-do list for writing clean-spork
. Each of your list items should include a brief prose description of a computation that's needed, the name of a relevant built-in operator from Pyret (if applicable), or the name and input/output types of functions that you would need to write to accomplish that item. By the time you are done, you should have a list of most of the functions that you would need for this problem, but without the bodies of those functions filled in.
Remember, this is not about writing code. It is about breaking down the clean-spork
task into smaller concrete programming steps. One way to approach this is to think about what has to be done to produce each column value in the output table that is not in your input table.
Fly Me to the Moon by Frank Sinatra
Brown University CSCI 0111 (Fall 2020)
Do you have feedback? Fill out this form.