Oh no! You're skydiving in Wii Sports Resort and are having trouble getting the parachute open! The only member of your crew who knew how to open it is still on the plane (oops). You will need to scour the interwebs to find resources that can help you land safely. To do this, it helps to know how to search effectively online for operations and information on programming languages. This week's lab is designed to help you and your crewmate, Toshi, learn how to do this! In other words, getting stuck and unstuck is part of the point this week, so don't get frustrated.
We will be working on searching the Internet for useful and trustworthy sources to get information and debug your code:
pandas
.We'll go through a couple scenarios in which you might want to search online for information. Each scenario has several queries. For each of these scenarios:
Then, after you've made a ranking for each scenario, think about the common elements of the good and bad queries.
Toshi decides to start out with some simple code, and writes the following: 2+"1"
, which throws an error: TypeError: unsupported operand type(s) for +: 'int' and 'str'
. Assist him in searching online for help understanding the error.
Example queries:
python 2+"1"
TypeError: unsupported operand type(s) for +: 'int' and 'str'
python add int and string
python strings
Toshi has grown quite fond of ASCII art, and he wants to be able to use print
, but without a new line (as is printed by default). In other words, he wants print("(\")
and print("(\")
to print:
(\(\
rather than:
(\
(\
Example queries:
print("(\") but without new line
python print
python print without new line
Toshi is doing list operations but wants to write a base case that checks if a list is empty, but he doesn't remember or know how to do that.
Example queries:
“[]” python
check if list is empty
python lists
python check if list is empty
python list length
TASK: For scenarios 1, 2, and 3, which queries were the best? Which ones were the worst? Write in a Google Doc or on a piece of paper your responses to these questions. Make sure for each scenario to write one or two bullet points to explain your answers.
Call a TA over to discuss the questions above!
pandas
Toshi is now equipped to navigate and effectively use the Internet to learn about programming! He wants to test his skills by tackling this topic that he's been hearing about a lot: file input and file output in Python.
Toshi learns that you can write a Python program that reads the contents of a file on your computer, makes calculations, and even writes data to a new file on your machine.
We'll start with a brief explanation of what a package is and how to install one in VSCode. Then, the rest of the lab will consist of a number of explanations and practice problems meant to familiarize you with popular Python packages.
First, some terminology:
There are hundreds of thousands of Python packages available online. Some are so commonly used that you'll find them in almost every large-scale Python application; others serve highly specific purposes.
VSCode makes it very easy to download and use packages in your projects.
pandas
pandas
is a really powerful and fun Python library for data manipulation/analysis, with easy syntax and fast operations. Because of this, it is the probably the most popular library for data analysis in Python programming language.
In this lab section, we're going to practice the basics of pandas
by using its functionality to analyze some datasets.
To start using pandas
in your code, include this line at the top of your Python file:
pandas
is built around the concept of a DataFrame
. Simply said, a DataFrame
is a table. It has rows and columns. Each column in a DataFrame
is a Series
data structure, rows consist of elements inside Series
.
A DataFrame
can be constructed using built-in Python dictionaries:
Reading and writing file data is incredibly easy using pandas
, and pandas
supports many file formats, including CSV, XML, HTML, Excel, JSON, and many more (check out the official pandas
documentation).
For example, if we wanted to save our previous DataFrame df
to a CSV file (spreadsheet), we only need a single line of code:
We have saved our DataFrame, but what about reading data? No problem:
Now that we know the basics of pandas
, let's go ahead and analyze some datasets! Here are some links to the official documentation and a cheat sheet if you get stuck. And don't forget about our good friend Stack Overflow!
Toshi has recently been craving candy a lot, so we've been requested to revisit the candy dataset from Lab 3.
Unlike Pyret, Python has no built-in table functionality (like reading a table directly from Google Sheets, table functions, etc). To complete this lab, we're going to have to take advantage of Python's ability to mutate data, iterate through data, and read and write data to and from input/output files, specifically using pandas
.
candy-data.csv
inside your current directory..webarchive
, not a .csv
), instead create a new file in your VSCode project called candy-data.csv
. This will open a blank file and you can copy and paste all the 'Raw' data into it.lab11.py
. This is where you'll implement the functions below.pandas
You and Toshi should be experts on surfing the web for relevant information and answers now, so let's put those skills to the test. We aren't going to give you much guidance about how to complete these tasks; remember the takeaways from Part 1, and try to use online resources (but if you get stuck, the TAs are still here to help).
Write a function that reads from candy-data.csv
and calculates the name of the candy with the highest win percentage.
HINT: If you're not sure where to start, try following the steps below:
pandas
DataFrame
).Write a second function that writes the results of your answer to Question 1 to a file named result-1.csv
.
HINT: This time, start by writing a function that just writes the string "Hello, World"
onto the first line of a file. Once you've done so, integrate data from the CSV. This might be easier with just the normal python write
function (look up the documentation!).
Write a function that reads from candy-data.csv
and writes the names of the candies with chocolate to a file named result-2.csv
, such that each name is on a separate line.
HINT: Consider which parts of the code your wrote for problems 1 and 2 help you solve this problem.
NOTE:
Call a TA over to go over your work from above!
pandas
to get the candy with the highest sugar percentage.pandas
to get candy that contains both chocolate and caramel.DataFrame
with candy containing chocolate and caramel as a csv file called chocolate_and_caramel.csv
.pandas
to find the top 5 most "boujee" candies, aka the ones with the highest price percents.pandas
to find the top 3 most liked and popular non-chocolate candies (highest win percents).pandas
to add a column to the candy data called too-sugary
, which will store a Boolean value (True
and False
, rather than 1s and 0s) for each candy depending on if it's too sugary. In this case, if sugar-percent is 0.50 or higher, then it's too sugary.Call a TA over to go over your work from above!
Things to consider when googling in order to debug your code:
More information about python packages (you can browse this whole site for more specific info):
CSV files:
pandas
to manipulate CSVspandas
If you want to learn more about the power of pandas
, below are a few resources that you can explore: