cs111
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Help
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
--- title: Project 1 Fall-2021 tags: Projects, Project 1 --- # Project 1: Extraterrestrial Tables! ## Due date information **Out:** Oct 6, 11:00 AM EST **Project 1 Partner Form Deadline:** Oct 8, 5:00 PM EST **In:** Oct 19, 9:00 PM EST ## Summary It's time to try your data science skills on real datasets! For this assignment, you will choose one of three datasets to work on. You'll then apply what we've learned so far to clean up the data, analyze the data, and write a report that presents the results of your analysis. There's no difference in difficulty across the datasets -- we're merely letting you choose which dataset/question you are most interested in exploring. The project occurs in two stages. During the first week, you'll work on the design of your tables and functions, reviewing your work with a TA during a **Design Check**. In the second week, you'll implement your design, presenting your results in a written report that includes charts and plots from the tables that you created. The report and the code you used to process the data get turned in for the **Final Handin**. This is a pair project. You and a partner should complete all project work together. You can find your own partner or we can match you with someone. Note that you have to work with different partners on each of the three projects in the course. ## Dataset options Your dataset/question options are: - global CO2 emissions data since 1960 by country, with a second table showing temperature changes per country since 1960. Your analysis will look at how temperature changes relate to total emissions in different regions of the world. - bikeshare data from New York City for the month of October 2020, combined with a table of zipcodes in which bike stations are located. Your analysis will look at how far people are traveling and whether that varies by part of the city. - data on the number of grocery/convenience stores per county in the USA, with another table of county-level population data. Your analysis will look at whether some states offer better access to grocery stores than others. Whichever dataset you choose, you will be given a collection of questions to answer in your analysis. You will also provide a function (`summary-table`) that can be used to generate summary data about a specific aspect of your dataset. The `summary-table` function will allow the user to customize which statistic (such as average, sum, median) gets used to generate the table data. Detailed instructions for accessing each dataset, and the corresponding analysis and summary requirements, are in the following expansion options. The rest of the handout (after the expansion options) explains general requirements that apply regardless of which option you have chosen. ### CO2 Emissions :::spoiler Instructions **Analysis:** If you choose this dataset, your analysis should answer the following questions about the CO2 Emissions data: - What Region (third column in the warming-deg-c-table) had the highest cumulative CO2 emissions since 1960? Has this Region had the highest emissions in each year since 1960? - Do countries with higher emissions overall have a greater increase in temperature over time? - Do the countries with the highest total emissions all belong to the same Region? **Summary Table:** Your `summary-table` function should produce a Pyret table of the following form: ``` | region | avg-warming | CO2-summary | | ------------- | ----------- | ----------- | | Oceania | avg-warm | num1 | | Africa | ... | ... | | Asia | ... | ... | | S. America | ... | ... | ... where num1 might be the sum of all the CO2 emitted from Oceania since 1960, or the yearly average CO2 emitted from Oceania since 1960, etc. ``` Each row represents a Region. The `avg-warming` column contains the average warming across all countries on that continent since 1960. The `CO2-summary` column summarizes some statistic about the emissions across countries on the continent since 1960. The CO2 statistic might be the total emission, the average, the median, and so on. The person who calls your `summary-table` function will indicate which summary method to use by passing another function as input. [Clarified 10/13] As a suggestion for the summary-table function, consider one of the following headers: ``` # the summary-func takes a smaller table as input fun summary-table(t :: Table, summary-func :: (Table -> Number)) -> Table: doc: ```Produces a table that uses the given function to summarize CO2 emissions from 1960 to 2014 for every region (Oceania/Asia/Europe/Africa/ SouthAmerica/NorthAmerica/Other). The outputted table should also have the average warming in every region.``` ... end # the summary-func takes a smaller table and a column name (the String input) fun summary-table(t :: Table, summary-func :: (Table , String -> Number)) -> Table: doc: ```Produces a table that uses the given function to summarize CO2 emissions from 1960 to 2014 for every region (Oceania/Asia/Europe/Africa/ SouthAmerica/NorthAmerica/Other). The outputted table should also have the average warming in every region.``` ... end ``` Passing a function as an argument is like what you have done when using list or table operations. This might be called as `summary-table(mytable, sum)` or `summary-table(mytable, mean)` to summarize the total or average numbers of rides within the dates represented in `mytable`. :::info Note: The exact columns/structure of the input `t` to `summary-table` is up to you, but it should contain enough information to be able to produce the output of `summary-table`. ::: :::spoiler Stencil Copy and paste the following code to load the datasets into Pyret. ``` include tables include gdrive-sheets include image include shared-gdrive("dcic-2021", "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep") import math as M import statistics as S import data-source as DS google-id = "1XBaxyRvMQpTec0mRVAvJsplMr2iEcGetSMpguRnU3qo" warming-deg-c-unsanitized-table = load-spreadsheet(google-id) co2-emissions-unsanitized-table = load-spreadsheet(google-id) years-1960-2014-unsanitized-table = load-spreadsheet(google-id) warming-deg-c-table = load-table: country :: String, warming-deg-c-since-1960 :: String, region :: String source: warming-deg-c-unsanitized-table.sheet-by-name("warming-degc", true) sanitize country using DS.string-sanitizer sanitize warming-deg-c-since-1960 using DS.string-sanitizer sanitize region using DS.string-sanitizer end co2-emissions-table = load-table: year :: Number, country :: String, total-co2-emission-metric-ton :: Number, per-capita :: Number source: warming-deg-c-unsanitized-table.sheet-by-name("fossil-fuel-co2-emissions-by-nation_csv", true) sanitize year using DS.strict-num-sanitizer sanitize country using DS.string-sanitizer sanitize total-co2-emission-metric-ton using DS.strict-num-sanitizer sanitize per-capita using DS.strict-num-sanitizer end years-1960-2014-table = load-table: year :: Number source: years-1960-2014-unsanitized-table.sheet-by-name("years", true) sanitize year using DS.strict-num-sanitizer end ``` ::: ### Bikeshare Data :::spoiler Instructions **Analysis:** If you choose this dataset, your analysis should answer the following questions about the bikeshare data: - Which zipcodes are the most popular starting and ending points for rides (ie. What are the top 5 most popular starting zipcodes and top 5 most popular ending zipcodes)? How do the most popular zipcodes for starting or ending rides (pick one) differ between Subscribers and Customers? For example, do the most popular starting zipcodes have more Subscribers than Customers? Across age groups? - Which zipcodes are more popular with Customers than Subscribers? - Is there a relationship between the duration of rides and either the age or gender of the rider (pick either age or gender)? **Summary Table:** Your `summary-table` function should produce a Pyret table of the following form: ``` | zipcode | ride-count | duration-summary | | ------- | ---------- | ---------------- | | 10020 | 1500 | 853.7 | | 11101 | ... | ... | | 10451 | ... | ... | | 11237 | ... | ... | ... ``` Each row has data for a particular zipcode. The `ride-count` column contains the total number of unique rides that started or ended in that zipcode. The `duration-summary` column summarizes some statistic about the durations of the unique rides around that zipcode. The `duration-summary` statistic might be the total duration, the average duration, the median, and so on. The person who calls your `summary-table` function will indicate which summary method to use by passing another function as input. [Clarified 10/13] As a suggestion for the summary-table function, consider one of the following headers: ``` # The type of summarizer is function that takes Table and String and returns a Number fun summary-table(t :: Table, summary-func :: (Table -> Number)) -> Table: doc: ```Produces a table that uses the given function to summarize durations of unique rides around that zipcode. The outputted table should also have the total number of unique rides that started or ended in that zipcode.``` ... end # the summary-func takes a smaller table and a column name (the String input) fun summary-table(t :: Table, summary-func :: (Table , String -> Number)) -> Table: doc: ```Produces a table that uses the given function to summarize durations of unique rides around that zipcode. The outputted table should also have the total number of unique rides that started or ended in that zipcode.``` ... end ``` Note that passing a function as an argument is like what you have done when using list or table operations. This might be called as `summary-table(mytable, sum)` or `summary-table(mytable, mean)` to summarize the total or average numbers of rides within the dates represented in `mytable`. :::info Note: The exact columns/structure of the input `t` to `summary-table` is up to you, but it should contain enough information to be able to produce the output of `summary-table`. ::: :::spoiler Stencil Copy and paste the following code to load the datasets into Pyret. ``` include tables include gdrive-sheets include shared-gdrive("dcic-2021", "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep") import math as M import statistics as S import data-source as DS google-id = "1iSAp4AXNNcfdxm7cBCSPcy_KPINpgf0nPYGvxxIZXV4" october-2020-citibike-unsanitized-table = load-spreadsheet(google-id) stations-unsanitized-table = load-spreadsheet(google-id) sorted-zip-codes-unsanitized-table = load-spreadsheet(google-id) #| Note: for the gender column 0 represents unknown 1 represents male 2 represents female |# october-2020-citibike-table = load-table: trip-duration :: Number, start-time :: String, stop-time :: String, start-station-id :: Number, start-station-name :: String, end-station-id :: Number, end-station-name :: String, bike-id :: Number, user-type :: String, birth-year :: Number, gender :: Number source: october-2020-citibike-unsanitized-table.sheet-by-name("october-2020-citibike-sample", true) sanitize trip-duration using DS.strict-num-sanitizer sanitize start-time using DS.string-sanitizer sanitize stop-time using DS.string-sanitizer sanitize start-station-id using DS.strict-num-sanitizer sanitize start-station-name using DS.string-sanitizer sanitize end-station-id using DS.strict-num-sanitizer sanitize end-station-name using DS.string-sanitizer sanitize bike-id using DS.strict-num-sanitizer sanitize user-type using DS.string-sanitizer sanitize birth-year using DS.strict-num-sanitizer sanitize gender using DS.strict-num-sanitizer end stations-table = load-table: station-name :: String, zipcode :: Number source: stations-unsanitized-table.sheet-by-name("station-dataset", true) sanitize station-name using DS.string-sanitizer sanitize zipcode using DS.strict-num-sanitizer end sorted-zip-codes-table = load-table: zipcode :: Number source: sorted-zip-codes-unsanitized-table.sheet-by-name("zipcodes", true) sanitize zipcode using DS.strict-num-sanitizer end ``` ::: ### Grocery Stores :::spoiler Instructions **Analysis:** If you choose this dataset, your analysis should answer the following questions about the grocery and population data: - Which state has the largest number of grocery and convenience stores per capita? - Do states with the largest populations have the most grocery and convenience stores? Are counties with the largest populations in the states with the largest number of stores per capita? - Which states have the largest percentage of convenience stores among the total numbers of convenience and grocery stores? **Summary Table:** Your `summary-table` function should produce a Pyret table of the following form: ``` | state | num-stores | per-capita-summary | | ------------- | ---------- | ------------------ | | Rhode Island | 145,000 | 0.001 | | Colorado | ... | ... | | Maryland | ... | ... | | South Dakota | ... | ... | ... ``` Each row is a State. The `num-stores` column contains the total number of grocery and convenience stores in that state. The per-capita column summarizes some statistic about the total number of stores (grocery and convenience) *per capita* across counties in that state. The per-capita statistic might be the total, average, median, etc stores per capita across counties in that state. For instance, if the `mean` function were passed into your `summary-table` function, the `per-capita-summary` column should contain the average value of *total stores per capita* across all counties in the state for that row. If the `sum` function were passed into your `summary-table` function, the `per-capita-summary` column should contain the sum of the *total stores per capita* across all the counties in that state. The person who calls your `summary-table` function will indicate which summary method to use by passing another function as input. [Clarified 10/13] As a suggestion for the summary-table function, consider one of the following headers: ``` # the summary-func takes a smaller table as input fun summary-table(t :: Table, summary-func :: (Table -> Number)) -> Table: doc: ```Produces a table that uses the given function to summarize stores per capita across counties. The outputted table should also have total number of grocery and convenience stores for every state.``` ... end # the summary-func takes a smaller table and a column name (the String input) fun summary-table(t :: Table, summary-func :: (Table , String -> Number))-> Table: doc: ```Produces a table that uses the given function to summarize stores per capita across counties. The outputted table should also have total number of grocery and convenience stores for every state.``` ... end ``` Note that passing a function as an argument is like what you have done when using list or table operations. This might be called as `summary-table(mytable, sum)` or `summary-table(mytable, mean)` to summarize the total or average stores per capita of counties within each state represented in `mytable`. :::info Note: The exact columns/structure of the input `t` to `summary-table` is up to you, but it should contain enough information to be able to produce the output of `summary-table`. ::: :::spoiler Stencil Copy and paste the following code to load the dataset into Pyret: ``` include tables include gdrive-sheets include shared-gdrive("dcic-2021", "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep") import math as M import statistics as S import data-source as DS google-id = "17OCB7nDBepuvxHrDKB4qMPcI0_UHbTzNwMP_2s0WkXw" county-population-unsanitized-table = load-spreadsheet(google-id) county-store-count-unsanitized-table = load-spreadsheet(google-id) state-abbv-unsanitized-table = load-spreadsheet(google-id) county-population-table = load-table: county :: String, state :: String, population-estimate-2016 :: Number source: county-population-unsanitized-table.sheet-by-name("county-population", true) sanitize county using DS.string-sanitizer sanitize state using DS.string-sanitizer sanitize population-estimate-2016 using DS.strict-num-sanitizer end county-store-count-table = load-table: state :: String, county :: String, num-grocery-stores :: Number, num-convenience-stores :: Number source: county-store-count-unsanitized-table.sheet-by-name("county-store-count", true) sanitize state using DS.string-sanitizer sanitize county using DS.string-sanitizer sanitize num-grocery-stores using DS.strict-num-sanitizer sanitize num-convenience-stores using DS.strict-num-sanitizer end state-abbv-table = load-table: state :: String, abbv :: String source: state-abbv-unsanitized-table.sheet-by-name("state-abbv", true) sanitize state using DS.string-sanitizer sanitize abbv using DS.string-sanitizer end ``` ::: --- :::info :::spoiler Confused about `Option` values? (`some()` / `none`) If you are having trouble converting an `Option` type (`some()`, `none`) to a number look no further! Let's walk through an example: ``` fun mult-by-two(num-string :: String) -> Number: num = string-to-number(num-string) num * 2 where: mult-by-two("7") is 14 end ``` I've created a function, `mult-by-two`, that takes in a string and produces a number, specifically the input multiplied by two. What is wrong with this? There are two primary issues: `string-to-number` does not return a number and the input `num-string` might just be a random string with no relation to a number like "this is number". So what does `string-to-number` *actually* return? An `Option`! ``` >>> x = string-to-number("17") >>> x x = some(17) >>> y = string-to-number("0") >>> y y = some(0) >>> z = string-to-number("three") >>> z z = none ``` All of our inputs to `string-to-number` will return either a `some(#)` or `none`! A `none` is return when the value passed to `string-to-number` can not be converted to Number (eg. "three", "this is a number", or "HTA Erick is the best"). A `some(#)` is returned when the string CAN be converted to a Number. Let's reexamine our `mult-by-two` function with this new information ``` fun mult-by-two(num-string :: String) -> Number: num = string-to-number(num-string) num * 2 where: mult-by-two("7") is 14 end ``` In `mult-by-two`, our `num` variable is either a `some(#)` or a `none`. How can we tell what it is? Well, there are two handy functions, `is-some()` and `is-none()`, that can help us determine what the output of `string-to-number` is. Let's take a look ``` >>> is-some(some(7)) true >>> is-some(none) false >>> is-none(some(-1)) false >>> is-none(none) true ``` With this, we can now progress our `mult-by-two` function! ``` fun mult-by-two(num-string :: String) -> Number: num = string-to-number(num-string) if (is-some(num)): num * 2 else: raise("ERROR: input cannot be converted to a Number") end where: mult-by-two("7") is 14 end ``` Now, this will still produce an error. Why? We can't multiply an Option with a Number! Thus, we will have to extract the value from our `num` variable. Conveniently enough, there exists an Option function to help us do that. Let's do some examples: ``` >>> some(7).or-else("not a number") 7 >>> none.or-else("not a number") "not a number" >>> some(2).or-else(-1) 2 >>> none.or-else(-1) -1 >>> some(-1).or-else(-1) -1 ``` As we can see, the `.or-else` function either returns the contents of a `some(#)` value or will return the input to `.or-else`. As you may have noticed, it doesn't matter what gets passed into the `.or-else`. It can be a number or a string! Let's now wrap up our `mult-by-two` function. ``` fun mult-by-two(num-string :: String) -> Number: num = string-to-number(num-string) if (is-some(num)): num.or-else("This won't get returned") * 2 else: raise("ERROR: input cannot be converted to a Number") end where: mult-by-two("7") is 14 end ``` Note: Why could we not have simply just done this - ``` fun mult-by-two(num-string :: String) -> Number: num = string-to-number(num-string).or-else(-1) num * 2 where: mult-by-two("7") is 14 end ``` Answer: What if `num-string` was `"-1"`? At that point you'd have a hard time differentiating between `none` and `some(#)` values. If you changed the `.or-else` to be `.or-else("Not a valid number")` then you would have to check to see if num was equal to that string or if it was a number. We find ourselves still having to check what the type `num` value is. ::: ## Deadline 1: The Design Stage The design check is a one-on-one meeting between your team and a TA to review your project plans and to give you feedback well before the final deadline. Many students make changes to their designs following the check: doing so is common and will not cost you points. **Task -- Find a Partner and Signup for a slot** * Find a partner for the project. You can also use [this when2meet](https://www.when2meet.com/?8191851-VKA8i) to find a partner with your schedule if you prefer - essentially just put down the timeslots that you would be free (use your Brown email as your name!) and reach out to the people that already indicated that they would be free during the time periods that you are. * Fill out the Project 1 Partner Form [here](https://docs.google.com/forms/d/e/1FAIpQLSf-1OOBOAnawjKcFPoP7jtEBK-B6Fwzl0_m65L_fy7hAoVkVA/viewform) before **Friday, October 8th at 5:00PM**. If you don't have a partner by then, fill out the form anyways! We will pair you up randomly with another student. Design checks are held Sunday through Tuesday (the 10th to 12th) and are mandatory for all groups. Only one of you has to fill out the Project 1 Partner form; after one of you fills out the form, you will both receive an email by the following morning by your design check TA to schedule a meeting for your design check! If you couldn't find a partner, you will be notified of your random assignment before EOD Friday **October 8th**. <!-- :::spoiler Inviting your partner to the Google Calendar event {%youtube x05KHShhA-Q %} **Note:** You will not see a name (like Eli Berkowitz on the video), but you will receive a notification by 7PM on Friday who your Design Check TA is going to be! ::: --> **Task -- Data-cleaning plan:** Look at your datasets and identify the cleaning, normalization, and other pre-processing steps that will need to happen to prepare your datasets for use. Make a list of the steps that you'll need to perform, with a couple of notes about how you will do each step (which table operation, whether you need helper functions, etc). **Task -- Analysis plan:** For each of the analysis questions listed for your dataset, describe how you plan to do the analysis. You should try to answer these questions: * What charts, plots and statistics do you plan to generate to answer the analysis questions? Why? What are the types and the axes of these charts, plots and statistics? * What table(s) will you need to generate those charts, plots and statistics? * If the table(s) you need have different columns or rows than those that we gave you, provide a sample of the table that you need. * For each of the new tables that you identified, describe how you plan to create the table from the ones that we've given you. Make sure to list all Pyret operators, functions (with input/output types and description of what they do, but without the actual code). If you don't know how to create any table, discuss it with the TA at your design check. :::info ***Sample Answer:*** Assume you had a dataset with information on changes in city populations over time. If you were asked to analyze whether cities with population (in 2000) larger than 30,000 have an increase or decrease in population, your answer to this design-check question might be: "I'd start with a table of cities that have a population in 2000 of over 30,000, and then make a scatterplot of the population of those cities in 2000 and 2010. I'd add a linear regression line, then check whether there was a pattern in changes between the two population values. I'd obtain a table of cities with a population of greater than 30,000 in 2000 by using the `filter-with` function." ::: **Task -- Summary Table example:** Draw (or type up) a concrete example of a summary table as required for your chosen dataset. Write down some concrete ideas for how you will produce that table (or specific questions that you want to ask the TA about how to do so). ### Design Check Handin ++Before++ your design check starts, submit your work for the design check as a PDF file named `project-1-design-check.pdf`. to "Project 1 Design Check" on Gradescope. Please add your project partner to your submission on Gradescope as well. You can create a PDF by writing in your favorite word processor (Word, Google Docs, etc) then saving or exporting to PDF. Ask the TAs if you need help with this. Please put both you and your partner's login information at the top of the file. ### Design Check Logistics * Bring your work for the design phase to the meeting either on laptop (files already open and ready to go) or as a printout. Use whichever format you will find it easier to take notes on. * We expect that both partners have participated in designing the project. The TA may ask either one of you answer questions about the work you present. Splitting the work such that each of you does 1-2 of the analysis questions is likely to backfire, as you might have inconsistent tables or insufficient understanding of work done by your partner. * Be on time to your design check. If one partner is sick, contact the TA and try to reschedule rather than have only one person do the design check. ### Design Check Grading Your design check grade will be based on whether you had viable ideas for each of the questions and were able to explain them adequately to the TA (for example, we expect you to be able to describe why you picked a particular plot or table format). Your answers do not have to be perfect, but they do need to illustrate that you've thought about the questions and what will be required to answer them. The TA will give you feedback to consider as part of your final implementation of the project. Your design check grade will be worth roughly a third of your overall project grade. Failure to account for key design feedback in your final solution may result in a deduction on your analysis stage grade (for example, a check moving to a check minus). ***Note:** We believe the hardest part of this assignment lies in figuring out what analyses you will do and in creating the tables you need for those analyses. Once you have created the tables, the remaining code should be similar to what you have written for homework and lab. Take the Design Check seriously. Plan enough time to think out your table and analysis designs.* ## Deadline 2: Perform Your Analysis The deliverables for this stage include: 1. A Pyret file named `analysis.arr` that contains the function `summary-table`, the tests for the function, and all the functions used to generate the report (charts, plots, and statistics). 2. A report file named `report.pdf`. Include in this file the copies of your charts and the written part of your analysis. Your report should address the three analysis questions outlined for your chosen dataset. Your report should also contain responses to the Reflection questions described below. **Note:** Please connect the code in your `analysis` file and the results in your `report` with specific comments and labels in each. For example: :::info ***Sample Linking:** See the comment in the code file:* ``` # Analysis for question on cities with population over 30K fun more-than-thirty-thousand(r :: Row) -> Boolean: ... end qualifying-munis = filter-by(municipalities, more-than-thirty-thousand) munis-ex1-ex2-scatter = lr-plot(qualifying-munis, "population-2000", "population-2010") ``` *Then, your report might look like this:* ![](https://i.imgur.com/2ld32PX.png) ::: #### Guidelines on the Analysis In order to do these analyses, you will need to combine data from the multiple tables in your chosen dataset. For each dataset/problem option, the tables use slightly different formats of the information used to link data across the tables (such as different date formats). *You should handle aligning the datasets in Pyret code, not by editing the Google Sheets prior to loading them into Pyet.* Making sure you know how to use coding to massage tables for combining data is one of our goals for this project. [Pyret String documentation](https://www.pyret.org/docs/latest/strings.html) might be your friend! **Hint:** If you feel your code is getting to complicated to test, add helper functions! You will almostly certainly have computations that get done multiple times with different data for this problem. Create and test a helper or two to keep the problem manageable. You don't need helpers for everything, though -- it is fine for you to have nested `build-column` expressions in your solution, for example. Don't hesitate to reach out to us if you want to review your ideas for breaking down this problem. ### Report Your report should contain any relevant plots and tables, any conclusions you have made, and your reflection on the project (see next section). We are not looking for fancy or specific formatting, but you should put some effort into making the report reads well (use section headings, full sentences, spell-check it, etc). There's no specified length -- just say what you need to say to present your analyses. **Note:** Pyret makes it easy to extract image files of plots to put into your report. When you make a plot, there is an option in the top left hand side of the window to save the chart as a `.png` file which you can then copy into your document. Additionally, whenever you output a table in the interactions window, Pyret gives you the option to copy the table. If you copy the table into some spreadsheet, it will be formatted as a table that you can then copy into Word or Google Docs. ### Reflection Have a section in your report document with answers to each of the following questions ++after you have finished the coding portion of the project++: 1. Describe one key insight that each partner gained about programming or data analysis from working on this project and one mistake or misconception that each partner had to work though. 2. Based on the data and analysis techniques you had, how confident are you in the quality of your results? What other information or skills could have improved the accuracy and precision of your analysis? 3. State one or two followup questions that you have about programming or data analysis after working on this project. ### Final Handin For your final handin, submit one code file named `analysis.arr` containing all of your code for producing plots and tables for this project. Put a summary of the plots, tables, and conclusions into a separate document called `report.pdf`. Your project reflection goes into the report file. Nothing is required to print in the interactions window when we run your analysis file, but your analysis answers should include comments indicating which variable names or expressions yield the data on which you based your answers. ### Final Grading You will get grades on each of Functionality, Design, and Testing for this assignment. Functionality -- Key metrics: * Does your code accurately produce the data you needed for your analyses? * Are you able to use code to perform the table transformations required for your analyses? * Is your `summary-table` function working? Testing -- Key metrics: * Have you tested your functions well, particularly those that do computations more interesting than extracting cells and comparing them to other values? * Have you shown that you understand how to set up smaller tables for testing functions before using them on large datasets? Design -- Key metrics: * Have you chosen suitable charts and statistics for your analysis? * Have you identified appropriate table formats for your analysis tasks? * Have you created helper functions as appropriate to enable reuse of computations? * Have you chosen appropriate functions and operations to perform your computations? * Have you used docstrings and comments to effectively explain your code to others? * Have you named intermediate computations appropriately to improve readability of your code? This includes both what you named and whether the names are sufficiently descriptive to convey useful information about your computation. * Have you followed the other guidelines of the style guide (line length, naming convention, etc.) You can pass the project even if you either (a) skip the `summary-table` function or (b) have to massage some of the tables by hand rather than through code. A project that does not meet either of these baseline requirements will fail the functionality portion. A high score on functionality will require that you wrote appropriate code to perform each analysis and wrote a working `summary-table` function. The difference between high and mid-range scores will lie in whether you chose and used appropriate functions to produce your tables and analyses. For design, the difference between high and mid-range scores will lie in whether your computations that create additional tables are clear and well-structured, rather than appearing as you made some messy choices just to get things to work. ---------------------- Have feedback on the class or for this project? Submit it [here](https://docs.google.com/forms/d/e/1FAIpQLSc_LDM-CJWTQlfv2QcXOJ3G2uEKh6qbxprNXDrXGvNoWe1XtQ/viewform).

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully