cs111
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Help
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Lab 11: Using Resources and Pandas Oh no! You're flying over the city streets but can't seem to track down a frog on the loose! You will need to need to scour the interwebs to find information that can help you find the frog's path. To do this, it helps to know how to search effectively online. This week's lab is designed to help you and your professional frog-whisperer, Jeremy, learn how to do this! In other words, getting stuck and unstuck is part of the point this week, so don't get frustrated. We will be working on searching the Internet for useful and trustworthy sources to get information and debug your code: - We'll start with a couple examples of scenarios and potential queries to get you familiarized with what an effective search query looks like. - Then we'll apply these skills to write a program that reads from and writes to a file while utilizing Python packages, with a particular emphasis on [`pandas`](https://pandas.pydata.org/docs/getting_started/overview.html). ## Problem 1 - Googling for Python Questions ### Instructions We'll go through a couple scenarios in which you might want to search online for information. Each scenario has several queries. For each of these scenarios: 1. Predict which example queries might be good and which might be bad. 2. Google the queries and look into a couple of the search results. Try to note whether the same sites are consistently helpful. 3. Write a ranking of the queries. Then, after you've made a ranking for each scenario, think about the common elements of the good and bad queries. ### Scenario #1: Getting Help on Error Messages Jeremy decides to start out with some simple code, and writes the following: `2+"1"`, which throws an error: `TypeError: unsupported operand type(s) for +: 'int' and 'str'`. Assist him in searching online for help understanding the error. Example queries: - `python 2+"1"` - `TypeError: unsupported operand type(s) for +: 'int' and 'str'` - `python add int and string` - `python strings` ### Scenario #2: Getting Details on an Operation Jeremy has grown quite fond of [ASCII art](https://i.pinimg.com/originals/d9/83/1d/d9831d5626c42e481cd4d96b3938f6f2.jpg), and he wants to be able to use `print`, but without a new line (as is printed by default). In other words, he wants `print("(\")` and `print("(\")` to print: `(\(\` rather than: `(\` `(\` Example queries: - `print("(\") but without new line` - `python print` - `python print without new line` ### Scenario #3: Finding an Appropriate Operation Jeremy is doing list operations and wants to write a base case that checks if a list is empty, but he doesn't know how to do that. Example queries: - `“[]” python` - `check if list is empty` - `python lists` - `python check if list is empty` - `python list length` ***TASK:*** For scenarios 1, 2, and 3, which queries were the best? Which ones were the worst? Write in a Google Doc or on a piece of paper your responses to these questions. Make sure for each scenario to write one or two bullet points to explain your answers. ___ ### CHECKPOINT: **Call a TA over to discuss the questions above!** ___ ## Problem 2 - Intro to Python Packages/`pandas` ### Instructions Jeremy is now equipped to navigate and effectively use the Internet to learn about programming! He wants to test his skills by tackling this topic that he's been hearing about a lot: file input and file output in Python. Jeremy learns that you can write a Python program that reads the contents of a file on your computer, makes calculations, and even writes data to a new file on your machine. We'll start with a brief explanation of what a package is. Then, the rest of the lab will consist of a number of explanations and practice problems meant to familiarize you with popular Python packages. ### What's a package? First, some terminology: - A **package** is a collection of files that make up a **module**. - A **module** is a file containing Python definitions and statements which can be *imported* into your code. A more generalized word for a module is a *library*. There are *hundreds of thousands* of Python packages available online. Some are so commonly used that you'll find them in almost every large-scale Python application; others serve highly specific purposes. VSCode makes it very easy to download and use packages in your projects. ### Installing Packages using VSCode If you followed our lab 0 instructions, you should already have `pandas` in your `cs111-env` environment. If you want to install a new package in this environment, you should go to VSCode --> Terminal --> New Terminal, check that the `cs111-env` is activated, and use the command `conda install -c conda-forge [package name]` For example, you should install `matplotlib` (a plotting package) if you haven't already, by running `conda install -c conda-forge matplotlib`. We won't use it for this lab, but you will use it in the mini project! --- ### `pandas` `pandas` is a really powerful and fun Python library for data manipulation/analysis, with easy syntax and fast operations. Because of this, it is the probably the most popular library for data analysis in Python programming language. In this lab section, we're going to learn the basics of `pandas` and use its functionality to analyze some datasets. To start using `pandas` in your code, include this line at the top of your Python file: ``` import pandas as pd ``` #### Understanding DataFrames `pandas` is built around the concept of a `DataFrame`. Simply said, a `DataFrame` is a table. It has rows and columns. Each column in a `DataFrame` is a `Series` data structure, rows consist of elements inside `Series`. A `DataFrame` can be constructed using built-in Python lists and dictionaries: ``` >>> import pandas as pd >>> df = pd.DataFrame([ ... {'country': 'Kazakhstan', 'population': 17.04, 'square': 2724902}, ... {'country': 'Russia', 'population': 143.5, 'square': 17125191}, ... {'country': 'Belarus', 'population': 9l5, 'square': 207600}, ... {'country': 'Ukraine', 'population': 45.5, 'square': 603628} ... ]) ``` ``` >>> df country population square 0 Kazakhstan 17.04 2724902 1 Russia 143.50 17125191 2 Belarus 9.50 207600 3 Ukraine 45.50 603628 ``` :::spoiler **An alternate way to construct a `DataFrame`** You can also define a `DataFrame` as `dict` of columns: ``` df = pd.DataFrame({ ... 'country': ['Kazakhstan', 'Russia', 'Belarus', 'Ukraine'], ... 'population': [17.04, 143.5, 9.5, 45.5], ... 'square': [2724902, 17125191, 207600, 603628] ... }) ``` ::: <br> #### Reading and Writing to Files Reading and writing file data is incredibly easy using `pandas`, and `pandas` supports many file formats, including CSV, XML, HTML, Excel, JSON, and many more (check out the official `pandas` documentation). For example, if we wanted to save our previous DataFrame `df` to a [CSV file](https://en.wikipedia.org/wiki/Comma-separated_values) (spreadsheet), we only need a single line of code: ``` >>> df.to_csv('filename.csv') ``` We have saved our DataFrame, but what about reading data? No problem: ``` >>> df = pd.read_csv('filename.csv', sep=',') ``` Now that we know the basics of `pandas`, let's go ahead and analyze some datasets! Here are some links to our documentation and a cheat sheet if you get stuck. Sometimes it is also helpful to find answers on StackOverflow, but be careful -- there are many ways to perform a single action in `pandas`, and it can be easy to copy-paste a line of code without understanding what it does (which is a whole mess when it comes to debugging!) * [CS111 Pandas Operation Summary](https://docs.google.com/document/d/1JYXpqRE0Xwe5do6VzjM4TYiTwjnfs9Ra1xU4SjtsXJI/edit?usp=sharing) (from lecture) * [Pandas Cheat Sheet](https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf) * [Official Documentation](http://pandas.pydata.org/pandas-docs/stable/index.html) -- you can refer to it if you need, but it has a *lot* of information to comb through --- ### Candy Data Jeremy has recently been craving candy a lot, so we've been requested to revisit the candy dataset from [Lab 3](https://hackmd.io/QIoBCUADSLmvwvEZknXfcA). Unlike Pyret, Python has no built-in table functionality (like reading a table directly from Google Sheets, table functions, etc). To complete this lab, we're going to have to take advantage of Python's ability to mutate data, iterate through data, and read and write data to and from input/output files, specifically using `pandas`. ### Setup 1. Create a folder for this lab on your computer and open it in VSCode (or open an existing folder where you store your Python code in VSCode). 2. Navigate to [this GitHub file](https://github.com/fivethirtyeight/data/blob/master/candy-power-ranking/candy-data.csv). In the top right of the dataset, click 'Raw'. You should see a wall of text on your browser. 3. Go ahead and right click the page of text and click "Save as..." to save the file as `candy-data.csv` inside of the folder you are using in Step 1. 4. If trying this doesn't work (it might say it can only be saved as a `.webarchive`, not a `.csv`), instead create a new file in your VSCode project called `candy-data.csv`. This will open a blank file and you can copy and paste all the 'Raw' data into it. 5. Create a Python file called `lab11.py`. This is where you'll implement the functions below. ### Task 1: Read Candy Data using `pandas` You and Jeremy should be experts on surfing the web for relevant information and answers now, so let's put those skills to the test. We aren't going to give you much guidance about how to complete these tasks; remember the takeaways from Part 1, and try to use online resources (but if you get stuck, the TAs are still here to help). 1. Write a function or series of expressions that read from `candy-data.csv` and calculates the name of the candy with the highest win percentage. :::info **HINT:** If you're not sure where to start, try following the steps below: - Figure out how to read a CSV file into Python using `pandas` -- note that this is different from what we saw in class, where we downloaded a CSV from the internet. Here, we're asking you to figure out how to read in a CSV file that is stored on your computer. *Note: the path of your CSV file will be "candy-data.csv", as long as this file exists in the top-level folder you have open in VSCode.* - Try to print out a few win percentages for different candies (figure out how to access them from a `DataFrame` using our reference document). - Plan out your code! You should use function(s) from our Pandas Operations Summary to help you (even ones you might not have seen demonstrated in class yet!). You might run into an issue with row labels -- expand the hint below for more guidance. - Test your code to ensure that it works properly. :::spoiler **Dealing with row labels** If you tried to use `sort_values` in your plan, that's great! You're on the right track. You might notice that `sort_values` does not arrange the row labels -- so the row at the top does not magically become row 0. Try to use your newfound search skills to figure out how to use `sort_values` in a way that *does* re-label the rows! ::: <br> 2. Write another function or series of expressions that write the results of your answer to Question 1 to a file named `result-1.csv`. How the results are stored in this file is up to you (as long as it's a valid CSV that opens in a spreadsheet program such as Excel or Google Sheets). :::info **HINT:** This time, start by writing code that just writes the string `"Hello, World"` onto the first line of a file. Once you've done so, integrate data from the CSV. This might be easier with just the normal python `write` function (look up the documentation!). ::: 3. Write a function or series of expresions that read from `candy-data.csv` and writes the names of the candies with chocolate to a file named `result-2.csv`, such that each name is on a separate line. Your solution should **not** use a `for`-loop. :::warning **NOTE:** - Your solutions should read directly from the file. Make sure not to copy the contents of the file into your code. ::: ___ ### CHECKPOINT: **Call a TA over to go over your work from above!** ___ ### Task 2: More Candy Data Manipulation :::warning **HINT:** `pandas` provides a built in function that allows us to read a csv and save it as a `DataFrame`. Check out the documentation for this function [here](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). ::: Again, for the following, refrain from using `for`-loops. Most of these can be done by using operations in our Pandas Operations Summary, along with the knowledge you learned in Task 1. For questions that ask for "top N" rows, you will have to search the web! 1. Use `pandas` to get the candy with the highest sugar percentage. 2. Use `pandas` to get candy that contains both chocolate and caramel. 3. Save the `DataFrame` with candy containing chocolate and caramel as a csv file called `chocolate_and_caramel.csv`. 4. Use `pandas` to find the top 5 most "boujee" candies, aka the ones with the highest price percents. 5. Use `pandas` to find the top 3 most liked and popular **non-chocolate** candies (highest win percents). 6. Now, use `pandas` to add a column to the candy data called `too-sugary`, which will store a Boolean value (`True` and `False`, rather than 1s and 0s) for each candy depending on if it's too sugary. In this case, if sugar-percent is 0.50 or higher, then it's too sugary. ___ ### CHECKPOINT: **Call a TA over to go over your work from above!** ___ ### Task 3: Continuing your Journey in CS! Take some time to read [this article](https://www.geeksforgeeks.org/imposter-syndrome-in-software-developers-am-i-a-fake-developer/) about imposter syndrome (a feeling of not accomplishing enough or being unable to accomplish goals) among programmers. Reflect on the following questions with your lab partner: - Why did you take csci0111? Did this experience change your potential area of study/interest? - How does csci0111 apply to your own interests inside and/or outside the field of cs? As the semester comes to an end, remember that the TAs (in this course and more advanced CS courses) are here to support you in navigating your own pathway within computer science, whether you plan to go into industry, want to research, or apply computer science to another field of study! Please remember to take care of yourself and congratulations on finishing the last lab of csci0111! ___ ## Key Takeaways + Resources ### Googling for Python Questions Things to consider when googling in order to debug your code: - [Stack Overflow](https://stackoverflow.com/) is a website useful for answering specific coding questions (but try to find posts with lots of upvotes) - Websites with tutorials such as [GeeksForGeeks](https://www.geeksforgeeks.org/) are more useful for explaining a particular concept or algorithm - If an answer contains concepts that you haven't seen before, keep searching -- there are often many ways to implement the same feature, and a different one might be more familiar - If you're not sure why an answer isn't working, double check that it uses Python 3.7 or higher (and not Python 2). The specific version of Python for CS 111 this semester is 3.11 (configured when you created your `cs111-env` environment). ### Python Packages More information about python packages (you can browse this whole site for more specific info): - [An Overview of Packaging for Python](https://packaging.python.org/overview/) CSV files: - CSV files are plain text files that arrange tabular data, with each piece of data separated by a comma - CSV files makes it easy to import/ export large chunks of data from spreadsheets or databases - You can use `pandas` to manipulate CSVs - You can also utilize Python CSV's package since Python already has a built-in CSV library, which we can import (not covered in this lab) ### `pandas` If you want to learn more about the power of `pandas`, below are a few resources that you can explore: - [12 Useful Pandas Techniques in Python for Data Manipulation](https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/) - [Pandas Tutorial](https://www.python-course.eu/pandas.php) --- > Brown University CSCI 0111 (Spring 2024) > Feedback form: tell us about your lab experience today [here](https://forms.gle/WPXM7ja6KwHdK8by6)!

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully