--- tags: Labs-F20, 2020 title: Lab 03 --- # Lab 3: Tables As you’re researching new ways to make candy, you visit [Dylan's Candy Bar](https://en.wikipedia.org/wiki/Dylan%27s_Candy_Bar). You ask the owner for the recipes to try at home, but she tells you that the recipes are a family secret. However, she agrees to give you the data sets that they used in order to make the candies yourself! ## Setup In this lab you’re going to be working with **tables**! [FiveThirtyEight](https://en.wikipedia.org/wiki/FiveThirtyEight) conducted a survey in which tens of thousands of people were asked to choose between two candies; from the survey's responses, they computed the winning percentage of each candy. Then, they compiled a data-set with `Number` attributes like the winning percentage, relative price, and sugar percentage, and also `Boolean` attributes such as chocolate, fruity, caramel, and hard. This data has been published in a FiveThirtyEight article called [“The Ultimate Halloween Candy Power Ranking,"](https://fivethirtyeight.com/features/the-ultimate-halloween-candy-power-ranking/) which is definitely worth a read after the lab. In this lab, you’ll be looking at the **relationships between these columns**. --- **NOTE:** For Lab 3 (as well as Homework 4 and the upcoming project), we are asking you to use a different version of Pyret. The Pyret development team (at Brown) wants to understand the kinds of errors that students make when working with tables, so they can improve the language. This new version of Pyret makes a copy of your program each time you press Run; the Pyret team will look at these copies for their analysis. They will only have access the programs that you Run, and not any data about you. If you are not comfortable with the Pyret team seeing your work, simply use the normal Pyret version. Links to each follow. :::spoiler How will this affect my grade? Not at all. The CS111 staff will **not** know which version you chose to use. ::: :::spoiler What data will be shared with the research team? The research team will only get the contents of your definitions window each time you press Run. Your Pyret logon email address will also be stored so the staff can link multiple Runs to the same student. None of this information will be shared beyond the Pyret development team (which again, is based at Brown, led by Brown CS Professor [Shriram Krishnamurthi](https://cs.brown.edu/~sk/)). ::: <br> **Task:** Either - access the [Pyret research version](https://pyret.cs.brown.edu/assignment/1lavp3rXQqpXbwIsS7pmR6fVlaXl161T_), which will have set up a file called `lab3-code.arr` where you will write your solution, or - create a file in regular [Pyret](http://code.pyret.org) called `lab3-code.arr`. You will write your solution in here. --- First, include this code at the top of your program: ``` provide * provide-types * include tables include shared-gdrive("cs111-2020.arr", "1imMXJxpNWFCUaawtzIJzPhbDuaLHtuDX") include gdrive-sheets include image # spreadsheet id from Google Sheets ssid = "1XzeWZToT-lqPFVpp-RZLimdoGdGaTcVQZ_8VbDbAkOc" data-sheet = load-spreadsheet(ssid) candy-data = load-table: name, chocolate, fruity, caramel, nutty, nougat, crisped-rice, hard, bar, pluribus, sugar-percent, price-percent, win-percent source: data-sheet.sheet-by-name("candy-data", true) end ``` This code loads a Pyret table from a Google Sheet. Press the "Run" button in the top right corner to process the code in the definitions window. Then type `candy-data` in the interactions window to see the data as a table. You’ll want to refer to the [CS111 Pyret Tables Documentation](https://hackmd.io/@cs111/table) for this lab (*not* the built-in Pyret documentation). ## Part 1: Filtering Let's use the powers of filtering to learn more from our candy data. ### 1.1: Sugar Rush We want to know which candies have the most sugar. Write an expression in the definitions window that produces a table containing only the candies where `sugar-percent` > 0.75. *Hint: Take a look at the `filter-with` function, and remember that functions can be passed as inputs to other functions. Call over a TA if you want help using this structure.* Suggestion: When you’re done writing the expression, copy and paste this line ``` #----------------------------------- ``` into the definitions window directly below it to separate it from the next part. ### 1.2: Bougie Now that we know how to satisfy our sweet tooth. Write an expression in the definitions window that produces a table containing only the candies where `price-percent` is greater than 0.90. Suggestion: Once again, separate it from the next part by pasting this line ``` #----------------------------------- ``` on the line below it. ### 1.3: Chocolate How many of the candies have chocolate? Write an expression in the definitions window that outputs this number in the interactions window. You can get the length of a table by writing `<table>.length()` where `<table>` is the name of (or an expression that evaluates to) a table. Suggestion: Once again, separate it from the next part by pasting this line ``` #----------------------------------- ``` on the line below it. ### 1.4: Chocolate and caramel Of the candies that have chocolate, what proportion also have caramel? Write an expression in the definitions window that outputs this proportion in the interactions window. Suggestion: Once again, separate it from the next part by pasting this line ``` #----------------------------------- ``` on the line below it. ### 1.5: Chocolate and nutty Of the candies that have chocolate, what proportion are also nutty? Write an expression in the definitions window that outputs this proportion in the interactions window. Suggestion: Once again, separate it from the next part by pasting this line ``` #----------------------------------- ``` on the line below it. ___ ### *CHECKPOINT:* Call over a TA once you reach this point. ___ ### 1.6: Chocolate and anything What attribute is paired most frequently with chocolate? To help answer this question, write a function that generalizes the expressions from 1.4 and 1.5. This function will take in a `String` representing the name of the ingredient being paired with chocolate and a `Table` that is only candies that have the chocolate attribute. This function returns a number. The function declaration should look something like this: `candy-with-attr(table :: Table, attr :: String)-> Number:` Look for commonalities across the code you wrote for caramel (1.4) and nutty (1.5). Compare the results of your function on the inputs “fruity,” “nutty,” and “caramel.” You don’t have to write code for this comparison -- just use your new function to compute all three proportions and compare them manually. ___ ### *CHECKPOINT:* Call over a TA once you reach this point. ___ ## Part 2: Building columns and analyzing them ### 2.1: New column Build a column of `Boolean` values that indicates whether a candy is fruity and hard, but not a pluribus. This should produce a new table with an added column. Write an expression in the definitions window that uses this new column to compute how many candies meet this condition. (Pluribus means that there are multiple candies in a packet, ex: Skittles, M&M’s, and Smarties) ***Hint:** Take a look at the `build-column` function. Call over a TA if you want help using this.* ### 2.2: Maximum Of the candies for which this `Boolean` is true (fruity and hard, but not a pluribus), which has the highest winning percentage? ***Hint:** This requires the use of the `sort-by` function, in addition to the `row-n` and `filter-with` functions.* An example of the `row-n` function to take the fifth row of a table: ``` table.row-n(4) ``` ### 2.3: Mean Of the candies for which this `Boolean` is true, what’s the average winning percentage? ***Hint:** This requires the use of the `mean` function.* ___ ### *CHECKPOINT:* Call over a TA once you reach this point. ___ ## 3: Scatterplot What’s the relationship between sugar and winning percentage? Do these two attributes seem correlated? One way to gain intuition on this is to create a scatterplot that puts one attribute on each axis. The [tables documentation](https://hackmd.io/@cs111/table) includes a scatter-plot function. Look at the documentation and try to figure out how to use it to generate a scatterplot of sugar versus winning percentage (it does not matter which variable goes on each axis). Write an expression in the definitions window that creates the scatterplot. In a comment, summarize the relationship in a sentence or two. ## 4: Comparing Attributes ***This is a discussion question -- you are NOT being asked to write code to do this analysis. In a comment, write a few sentences to summarize your ideas.*** Candy enthusiasts down at the saloon want to understand the relative frequencies of the attributes in the table across the collection of candies studied. They want a function where they can give you names of two attributes (column names in the dataset), and you tell them which is more frequent (your function can return `"same"` if the two attributes are in the same number of candies). Design this function. Discuss with your partner how you might go about testing this function. Write a comment describing your testing plan. Now imagine that the analysts want to see the relative frequencies across all of the attributes. How would you want to report that information, and what would be needed to produce that report from the given dataset? You can think about the functions in the documentation, or you can think about reorganizing the data in the table to make it easier. ## 5: Exploration (if you have time) Now think about questions you have about this data, or come up with interesting questions to ask. Investigate (by writing and running expressions), and write a couple sentences in a comment about what you found. ## Takeaways This lab has mostly been about getting you comfortable working with tabular data and practicing some common operators on tables. It also gets you thinking about our course's focus on data: what patterns of manipulating data do we often use in computations? How does the organization of our data impact our ability to answer these questions? Here, we see that filtering, ordering, and summarizing data are some of the key operations. So far we’ve only looked at these operations on tabular data, but these same building blocks will arise many times through this course. When you have a computation to perform around data, you should start by thinking through what combinations of filtering, sorting and summarizing will help you compute your answer. ## Formula Cracked! Great job! After analyzing the candy data, you figured out the [secret formula](https://media1.tenor.com/images/00031ae30ffb6703f227a7623ecb19cf/tenor.gif?itemid=4943915), and now you can have the tasty desserts whenever you want!