NOTE: Use the 111 Tables Documentation for this lab!
The objective of this lab is to gain familiarity with sanitizing, visualizing, and gaining insights from data.
In all past assignments, the data that you worked with had been sanitized for you (i.e. you could expect Pyret to process the data without any extra work). The data in this lab, however, is being imported directly from Google Forms submissions, so there exists data that either isn't valid or isn't useful, and you need to find ways to effectively sort through such cases so that you can gather the insights you need.
We are working up to the course project, in which you will get a real-world data set, prepare it for use, and build some tables and charts/plots to present your analysis of a specific question. Homework 4 focused on combining data across tables. This lab focuses on cleaning up data so that it’s ready for analysis and on manually creating tables that summarize analysis results.
Imagine that you are helping with a research project that is studying aspects of student life at Brown. The lab you're working with has a survey form. You're going to participate in the survey (as a respondent), then help analyze the collected data.
Task 1: Your lab TAs will give you a link to a Google Form with the professor's questions. Respond to the questions (as if you were a participant in the study – don't worry, the survey is anonymous)
Task 2: Now we're going to load your lab's data into Pyret. Go to the response spreadsheet (also from your TAs), make a copy of it (select "File" > "Make a Copy"), and import it into your program. Make sure to convert the format of the cells to Plain Text (select "Format" > "Number" > "Plain text") and update the sharing permissions to "Anyone with the link". This copy of the spreadsheet is yours to tweak manually. There are some values that Pyret will not allow you to read in, such as empty cells and other lovely surprises for you to discover. Pyret will let you know when it finds a value it doesn't like, and you will have to correct these by hand directly in the spreadsheet. After you do a few of these corrections, you might be able to spot a pattern for the sorts of data Pyret doesn't like. To save time, feel free to correct multiple issues at once before trying to load the data into Pyret.
To import the spreadsheet, include this at the top of your program: (replace the ssid with that of your own copy.)
Note: The highlighted portion is what your ssid looks like:
STOP! Run your code! If you get an error like No Spreadsheet with id ... found
, make sure you have updated the sharing permissions to "Anyone with the link can view". Try to display the table to make sure it was imported correctly!
Now that we have the data collected, we need to make sure that the data is clean and processed before analysis.
Task 3: Talk about why it will be difficult to draw insights from the columns about concentration name and hours of extracurricular obligations. Be able to explain your reasoning to a TA. Create a new table (within Google Sheets) that doesn't include these columns, and use this new table for the rest of the lab.
Hint: For sheets with multiple tables, you can change which one is being referenced by changing the string in the line source: data-sheet.sheet-by-name("Form Responses 1", true)
from "Form Responses 1"
to whatever your new table is called. Make sure to also change the columns being read in based on the ones that remain!
Task 4: Once you import your table of student data, you will notice that the values in certain columns are in the format some(x)
or none
. In order to turn these values into the data types we all know and love (String
, Number
, etc.) use the sanitize functions upon import. Section 3.6.2 of the Pyret Tables documentation gives a quick overview of each sanitizer.
Before writing your sanitizers, remember: There are some values that Pyret will not allow you to read in. Pyret will let you know when it finds a value it doesn't like. If you can't fix it with a sanitizer, you will have to correct these by hand directly in the spreadsheet by either editing the cells or deleting the entry itself.
Task 5: Look at the other columns. Discuss with your partner which columns might need to be cleaned up to make sense, and come up with a strategy to do this.
In particular, come up with functions to clean up the columns about the number of classes, the hours spent doing schoolwork, the number of extracurriculars, and the commute time. Think about some times when it might be better to clean the data using sanitizers and other times when it might be more prudent to do so manually.
Task 6: Write these functions, and create a new table with cleaned columns.
Now that the data have been tidied up, let's do some analysis on the data. We are interested in relationships between the different pieces of information that were collected on the form.
Task 7: First, let's understand the lives of students who get 6-8 hours of sleep on an average weeknight. We want to know:
Write the code to compute these two answers.
Task 8: Now, we want to look at the total time students are reporting spending on various activities each day. Create a new column that sums the number of hours spent sleeping and doing homework. Do you notice any impossible results (you might or might not)?
Task 9: Calculate the ratio of hours that each student spends doing homework to hours they spend sleeping on weeknights. Create a chart or plot to convey your results so that data is displayed for each student.
As the professor starts finishing their paper, they realize they are interested in a different question: what are the average commute times and average hours of weekday sleep for each of on-campus and off-campus students. The professor wants this data in table form, not just as separate numbers. Specifically, something like:
| Location | Commute time | Sleep on weekdays |
| ---------------- | ------------ | ----------------- |
| On campus | Average | Average |
| Off campus | Average | Average |
Task 10: The table operations we've seen so far don't seem useful here. Discuss why not (your answer should have something to do with rows and columns).
Task 11: Seeing as the built-in table operations aren't suited to this, we will build the needed table manually. Remember we do this as follows:
For this task, you will replace each ...
with an expression that computes the value that should go in that cell.
Discuss the answers to the following questions with your partner:
Brown University CSCI 0111 (Spring 2025)
Feedback form: tell us about your lab experience today here!