# Project 1: Tables ## Due Date Information This project will be due before the end of the semester. ## Summary It’s time to try your data science skills on real datasets! For this assignment, you will choose one of three datasets to work on. You’ll then apply what we’ve learned so far to clean up the data, analyze the data, and write a report that presents the results of your analysis. There’s no difference in difficulty across the datasets – we’re merely letting you choose which dataset/question you are most interested in exploring. The project occurs in two stages. During the first week, you’ll work on the design of your tables and functions, reviewing your work Mr. Baker during a **Design Check**. In the second week, you’ll implement your design, presenting your results in a written report that includes charts and plots from the tables that you created. The report and the code you used to process the data get turned in for the Final Handin. This is a pair project. You and a partner should complete all project work together. You can find your own partner or we can match you with someone. Note that you have to work with different partners on each of the three projects in the course. ## Resources [https://hackmd.io/@cs111/table](https://hackmd.io/@cs111/table) ## Dataset Options Your dataset/question options are: * **Climate Change**: Annual CO2 emissions data since 1960 by country, with a second table showing temperature changes per country since 1960. Your analysis will look at how temperature changes relate to total emissions in different regions of the world. * **Bikeshare**: Bikeshare data from New York City for the month of October 2020, combined with a table of zipcodes in which bike stations are located. Your analysis will look at how far people are traveling and whether that varies by part of the city. * **Grocery Stores**: Data on the number of grocery/convenience stores per county in the USA, with another table of county-level population data. Your analysis will look at whether some states offer better access to grocery stores than others. Whichever dataset you choose, you will be given a collection of questions to answer in your analysis. You will also provide a function (`summary-generator`) that can be used to generate summary data about a specific aspect of your dataset. The `summary-generator` function will allow the user to customize which statistic (such as average, sum, median) gets used to generate the table data. Detailed instructions for accessing each dataset, and the corresponding analysis and summary requirements, are in the following expansion options. The rest of the handout (after the expansion options) explains general requirements that apply regardless of which option you have chosen. <div class="alert alert-info part" data-startline="46" data-endline="47" data-position="2938" data-size="998"> <p data-position="2946" data-size="0"><span data-position="2946" data-size="278">I suggest skimming the instructions for your chosen dataset, and then reading the design check instructions, and then reading the instructions for your chosen dataset in detail. The overarching goal of this project is to answer the analysis questions and to write and test the </span><code data-position="3225" data-size="17">summary-generator</code><span data-position="3243" data-size="122"> function, and the rest of this handout walks you through those goals. We expect that the specific analysis questions and </span><code data-position="3366" data-size="17">summary-generator</code><span data-position="3384" data-size="548"> function description will take a few read-throughs to thoroughly understand, and that you’ll switch between the general instructions and the specific instructions for your dataset while making your design check plan. Do not worry if you and your partner do not immediately arrive at the list of tasks you need to do in order to complete the project – one of the skills you are practicing is how to break a large analysis down into smaller steps. The design check with Mr. Baker will be one opportunity to let you know if you are on the right track.</span></p> </div> ### CO2 Emissions <details class="part" data-startline="52" data-endline="161" data-position="3959" data-size="9173" open=""><summary>Instructions</summary> <p data-position="3985" data-size="0"><strong data-position="3985" data-size="0"><span data-position="3987" data-size="9">Overview:</span></strong><span data-position="3998" data-size="128"> This dataset covers country-level carbon emissions and climate change measurements since 1960. The main table in this dataset (</span><code data-position="4127" data-size="20">co2-emissions-table </code><span data-position="4148" data-size="469">) gives the total and per-capita emissions for every country and year in the dataset. You are also given a table of all years from 1960 until the end of the dataset, a table of all regions, and a table that has a row for each country with the region it is in and its cumulative warming in degrees since 1960. Take a minute to load the tables (using the stencil code below) into Pyret and take a look at the contents of each table before continuing reading this section.</span></p> <p data-position="4619" data-size="0"><strong data-position="4619" data-size="0"><span data-position="4621" data-size="9">Analysis:</span></strong><span data-position="4633" data-size="93"> If you choose this dataset, your analysis should answer the following questions about the CO</span><sub><span data-position="4619" data-size="0">2</span></sub><span data-position="4728" data-size="17"> Emissions data: </span><strong data-position="4745" data-size="0"><span data-position="4747" data-size="98">All analysis questions should be answered with both a visualization/chart and corresponding table</span></strong></p> <ul> <li class="" data-position="4851" data-size="0" data-startline-back="59" data-endline-back="60"><span data-position="4851" data-size="34">Which region (third column in the </span><code data-position="4886" data-size="19">warming-deg-c-table</code><span data-position="4906" data-size="18">) had the highest </span><em data-position="4924" data-size="0"><span data-position="4925" data-size="10">cumulative</span></em><span data-position="4937" data-size="3"> CO</span><sub><span data-position="4851" data-size="0">2</span></sub><span data-position="4942" data-size="22"> emissions since 1960?</span> <ul> <li class="" data-position="4972" data-size="0" data-startline-back="60" data-endline-back="60"><span data-position="4972" data-size="74">Did this region always have the highest emissions in each year since 1960?</span></li> </ul> </li> <li class="" data-position="5049" data-size="0" data-startline-back="61" data-endline-back="61"><span data-position="5049" data-size="121">Is there a relationship between cumulative emissions since 1960 and the increase in temperature since 1960 for countries?</span></li> <li class="" data-position="5173" data-size="0" data-startline-back="62" data-endline-back="63"><span data-position="5173" data-size="131">Of the top 10 countries with the highest cumulative emissions since 1960, what proportion of these countries belong to each region?</span></li> </ul> <p data-position="5306" data-size="0"><span data-position="5306" data-size="97">Note: Use the following function that’s included in your stencil code instead of Pyret’s builtin </span><code data-position="5404" data-size="16">string-to-number</code><span data-position="5421" data-size="30"> function! Take a look at the </span><code data-position="5452" data-size="5">where</code><span data-position="5458" data-size="29"> block to see how it is used.</span></p> <pre><code>fun string-to-num-project1(s :: String) -&gt; Number: doc: "converts string to number value" n = string-to-number(s) cases (Option) n: | some(v) =&gt; v | none =&gt; raise("Non-number value passed in: " + s) end where: string-to-num-project1("3") is 3 string-to-num-project1("3.5") is 3.5 string-to-num-project1("-10") is -10 string-to-num-project1("hello") raises "Non-number value passed in: hello" end </code></pre> <p data-position="5936" data-size="0"><strong data-position="5936" data-size="0"><span data-position="5938" data-size="18">Summary Generator:</span></strong><span data-position="5958" data-size="24"> You will also create a </span><code data-position="5983" data-size="17">summary-generator</code><span data-position="6001" data-size="97"> function that can be used to generate summaries of country-level emissions and warming datasets.</span></p> <p data-position="6100" data-size="0"><span data-position="6100" data-size="271">Imagine that there are multiple country-level emissions and warming datasets starting in 1960, and that you need to create summaries of them with different types of statistics. For example, in one case, you may get the dataset from 1960 through 2020 and need to find the </span><strong data-position="6371" data-size="0"><span data-position="6373" data-size="5">total</span></strong><span data-position="6381" data-size="3"> CO</span><sub><span data-position="6100" data-size="0">2</span></sub><span data-position="6386" data-size="197"> emissions over all years in the time period, over all of the countries in each region. Or in another case, you may get the dataset with a different set of years or countries, and need to find the </span><strong data-position="6583" data-size="0"><span data-position="6585" data-size="7">average</span></strong><span data-position="6595" data-size="3"> CO</span><sub><span data-position="6100" data-size="0">2</span></sub><span data-position="6600" data-size="218"> emissions over all years in the time period, over all of the countries in each region. This all requires building a function that is flexible in terms of which data it presents and what kind of statistics it computes.</span></p> <p data-position="6820" data-size="0"><span data-position="6820" data-size="5">Your </span><code data-position="6826" data-size="17">summary-generator</code><span data-position="6844" data-size="58"> will take in a table that contains the following columns:</span></p> <ul> <li class="" data-position="6905" data-size="0" data-startline-back="91" data-endline-back="91"><span data-position="6905" data-size="40">“year”: the year this row’s data is from</span></li> <li class="" data-position="6948" data-size="0" data-startline-back="92" data-endline-back="92"><span data-position="6948" data-size="63">“country”: the name of the country that this row’s data is from</span></li> <li class="" data-position="7014" data-size="0" data-startline-back="93" data-endline-back="93"><span data-position="7014" data-size="81">“total-co2-emission-metric-ton”: the country’s total emissions for the given year</span></li> <li class="" data-position="7098" data-size="0" data-startline-back="94" data-endline-back="94"><span data-position="7098" data-size="72">“warming-deg-c-since-1960”: the total warming of the country in celsius </span><em data-position="7170" data-size="0"><span data-position="7171" data-size="51">over the entire timespan of the dataset, since 1960</span></em><span data-position="7223" data-size="2"> (</span><strong data-position="7225" data-size="0"><span data-position="7227" data-size="127">this means that any two rows with the same “country” value but different “year” values will have the same value in this column!</span></strong><span data-position="7356" data-size="1">)</span></li> <li class="" data-position="7360" data-size="0" data-startline-back="95" data-endline-back="96"><span data-position="7360" data-size="114">“region”: the region of the world (Africa, Asia, Europe, Oceania, N. America, S. America, Other) the country is in</span></li> </ul> <p data-position="7476" data-size="0"><span data-position="7476" data-size="44">For example, the input table might look like</span></p> <pre><code>| year | country | total-co2-emission-metric-ton | warming-deg-c-since-1960 | region | | ----- | -------- | ----------------------------- | ------------------------ | ---------- | | 1960 | Bolivia | emi1 | warm1 | S. America | | 1961 | Bolivia | emi2 | warm1 | ... | | 1960 | Peru | emi3 | warm2 | ... | | 1961 | Peru | ... | ... | ... | | ... | ... | ... | ... | ... | ... </code></pre> <p data-position="8186" data-size="0"><span data-position="8186" data-size="4">The </span><code data-position="8191" data-size="17">summary-generator</code><span data-position="8210" data-size="21"> will also take in a </span><a href="https://hackmd.io/@cs111/table#Summarizing-Columns" target="_blank" rel="noopener"><span data-position="8231" data-size="16">summary function</span></a><span data-position="8300" data-size="32">. Take a look at the functions (</span><code data-position="8333" data-size="3">sum</code><span data-position="8337" data-size="2">, </span><code data-position="8340" data-size="4">mean</code><span data-position="8345" data-size="62">, etc) on the summary function documentation. Each takes in a </span><code data-position="8408" data-size="5">Table</code><span data-position="8414" data-size="5"> and </span><code data-position="8420" data-size="6">String</code><span data-position="8427" data-size="109"> representing a column name, and applies the relevant math over that column of the table to produce a single </span><code data-position="8537" data-size="6">Number</code><span data-position="8544" data-size="15"> (for example, </span><code data-position="8560" data-size="4">mean</code><span data-position="8565" data-size="55"> produces the mean of all of the values in the column).</span></p> <p data-position="8622" data-size="0"><span data-position="8622" data-size="17">The goal of your </span><code data-position="8640" data-size="17">summary-generator</code><span data-position="8658" data-size="165"> is to figure out how to use the summary function and the input table to produce an output table with only these columns: “region”, “avg-warming”, and “CO2-summary”:</span></p> <pre><code>| region | avg-warming | CO2-summary | | ------------- | ----------- | ----------- | | Oceania | avg-warm | num1 | | Africa | ... | ... | | Asia | ... | ... | | S. America | ... | ... | ... </code></pre> <p data-position="9114" data-size="0"><span data-position="9114" data-size="60">Each row represents the statistics for a single Region. The </span><code data-position="9175" data-size="11">avg-warming</code><span data-position="9187" data-size="88"> column contains the average warming across all countries on that continent since 1960. </span><strong data-position="9275" data-size="0"><span data-position="9277" data-size="130">The “region” and “avg-warming” columns will be the same for a given input table, no matter what summary function you give to your </span><code data-position="9408" data-size="17">summary-generator</code><span data-position="9426" data-size="1">.</span></strong><span data-position="9429" data-size="5"> The </span><code data-position="9435" data-size="11">CO2-summary</code><span data-position="9447" data-size="106"> column summarizes some statistic about the emissions per year across countries in the region since 1960, </span><strong data-position="9553" data-size="0"><span data-position="9555" data-size="35">based on the summary function input</span></strong><span data-position="9593" data-size="8">. The CO</span><sub><span data-position="9114" data-size="0">2</span></sub><span data-position="9603" data-size="94"> statistic might be the total, average, median, etc emissions across countries in each region.</span></p> <p data-position="9699" data-size="0"><span data-position="9699" data-size="21">For instance, if the </span><code data-position="9721" data-size="4">mean</code><span data-position="9726" data-size="32"> function were passed into your </span><code data-position="9759" data-size="17">summary-generator</code><span data-position="9777" data-size="15"> function, the </span><code data-position="9793" data-size="11">CO2-summary</code><span data-position="9806" data-size="37"> column should contain the average CO</span><sub><span data-position="9699" data-size="0">2</span></sub><span data-position="9845" data-size="92"> emissions over all years of the data set, over all of the countries in that region. If the </span><code data-position="9938" data-size="3">sum</code><span data-position="9942" data-size="32"> function were passed into your </span><code data-position="9975" data-size="17">summary-generator</code><span data-position="9993" data-size="15"> function, the </span><code data-position="10009" data-size="11">CO2-summary</code><span data-position="10021" data-size="120"> column should contain the total CO2 emissions over all years of the data set, over all of the countries in that region.</span></p> <p data-position="10143" data-size="0"><span data-position="10143" data-size="8">For the </span><code data-position="10152" data-size="17">summary-generator</code><span data-position="10170" data-size="36"> function, use the following header:</span></p> <pre><code># the summary-func takes a smaller table and a column name (the String input) fun summary-generator(t :: Table, summary-func :: (Table , String -&gt; Number)) -&gt; Table: doc: ```Produces a table that uses the given function to summarize CO2 emissions for every region (Oceania/Asia/Europe/Africa/ SouthAmerica/NorthAmerica/Other). The outputted table should also have the average warming in every region.``` ... end </code></pre> <p data-position="10655" data-size="0"><span data-position="10655" data-size="24">This might be called as </span><code data-position="10680" data-size="31">summary-generator(mytable, sum)</code><span data-position="10712" data-size="4"> or </span><code data-position="10717" data-size="32">summary-generator(mytable, mean)</code><span data-position="10751" data-size="37"> to summarize the total or average CO</span><sub><span data-position="10655" data-size="0">2</span></sub><span data-position="10790" data-size="72"> emissions since 1960 across countries in each region as represented in </span><code data-position="10863" data-size="7">mytable</code><span data-position="10871" data-size="1">.</span></p> <p data-position="10875" data-size="0"><strong data-position="10875" data-size="0"><span data-position="10877" data-size="4">Note</span></strong><span data-position="10883" data-size="2">: </span><code data-position="10886" data-size="3">sum</code><span data-position="10890" data-size="5"> and </span><code data-position="10896" data-size="4">mean</code><span data-position="10901" data-size="171"> here are built-in functions (that you do not write), as described above. Passing a function as an argument is like what you have done when using or table operations like </span><code data-position="11073" data-size="12">build-column</code><span data-position="11086" data-size="2">, </span><code data-position="11089" data-size="11">filter-with</code><span data-position="11101" data-size="6">, and </span><code data-position="11108" data-size="16">transform-column</code><span data-position="11125" data-size="1">.</span></p> <p data-position="11128" data-size="0"><span data-position="11128" data-size="5">Your </span><code data-position="11134" data-size="17">summary-generator</code><span data-position="11152" data-size="10"> function </span><strong data-position="11162" data-size="0"><span data-position="11164" data-size="10">should not</span></strong><span data-position="11176" data-size="68"> reference any tables from outside the function except the provided </span><code data-position="11245" data-size="13">regions-table</code><span data-position="11259" data-size="52">. While producing your output table, you should use </span><code data-position="11312" data-size="13">regions-table</code><span data-position="11326" data-size="101"> as a starting point (to build columns for the output table and to extract data from the input table </span><code data-position="11428" data-size="1">t</code><span data-position="11430" data-size="34">). Also, your output table should </span><strong data-position="11464" data-size="0"><span data-position="11466" data-size="3">not</span></strong><span data-position="11471" data-size="109"> contain any columns other than those shown in the example above: “region”, “avg-warming”, and “CO2-summary”.</span></p> <p data-position="11582" data-size="0"><strong data-position="11582" data-size="0"><span data-position="11584" data-size="5">Note:</span></strong><span data-position="11591" data-size="25"> You do not need to test </span><code data-position="11617" data-size="17">summary-generator</code><span data-position="11635" data-size="22">. However, please run </span><code data-position="11658" data-size="17">summary-generator</code><span data-position="11676" data-size="138"> twice outside of the function with two different summary functions. Make sure the output makes sense! This will look something like this:</span></p> <pre><code>fun summary-generator(...): # your code end summary-generator(your-input-table, func1) summary-generator(your-input-table, func2) </code></pre> <p data-position="11959" data-size="0"><strong data-position="11959" data-size="0"><span data-position="11961" data-size="5">Hints</span></strong><span data-position="11968" data-size="1">:</span></p> <ol data-position="11972" data-size="0"> <li class="" data-position="11975" data-size="0" data-startline-back="159" data-endline-back="159"><span data-position="11975" data-size="322">You will have to construct an example input table with the column names “year,” “country,” “total-co2-emission-metric-ton,” “warming-deg-c-since-1960,” and “region” yourself. Plan out how you will do this in the design check! It will also help to manually create smaller input tables that you can use while developing the </span><code data-position="12298" data-size="17">summary-generator</code><span data-position="12316" data-size="1">.</span></li> <li class="" data-position="12321" data-size="0" data-startline-back="160" data-endline-back="160"><span data-position="12321" data-size="198">The format of the output suggests that you will have to call the summary function once for every region to generate the specific summary value for that region’s row. The summary function takes in a </span><code data-position="12520" data-size="5">Table</code><span data-position="12526" data-size="7"> and a </span><code data-position="12534" data-size="6">String</code><span data-position="12541" data-size="221">. For each region, what does the input table to the summary function look like in order to get the desired output? It may help to draw out an example table for a specific region. Then, think about how you to create those </span><code data-position="12763" data-size="5">Table</code><span data-position="12769" data-size="28">s out of the input table to </span><code data-position="12798" data-size="17">summary-generator</code><span data-position="12816" data-size="1">.</span></li> <li class="" data-position="12821" data-size="0" data-startline-back="161" data-endline-back="161"><span data-position="12821" data-size="70">Go back to the analysis questions. Can you use tables created by your </span><code data-position="12892" data-size="17">summary-generator</code><span data-position="12910" data-size="163"> to answer some of those questions? What summary functions would you use? Understanding this question will go a long way in helping you understand the goal of the </span><code data-position="13074" data-size="17">summary-generator</code><span data-position="13092" data-size="36"> function and the entire assignment.</span></li> </ol> </details> <details class="part" data-startline="164" data-endline="232" data-position="13134" data-size="2231" open=""><summary>Stencil</summary> <p data-position="13153" data-size="0"><span data-position="13153" data-size="66">Copy and paste the following code to load the datasets into Pyret.</span></p> <pre><code>include tables include gdrive-sheets include image include shared-gdrive("dcic-2021", "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep") import math as M import statistics as S import data-source as DS google-id = "1igZMhJUpAiKg3U6775pcWFIRyDBNmTTLwlrXoUem29M" warming-deg-c-unsanitized-table = load-spreadsheet(google-id) co2-emissions-unsanitized-table = load-spreadsheet(google-id) years-1960-2014-unsanitized-table = load-spreadsheet(google-id) regions-unsanitized-table = load-spreadsheet(google-id) ################## Importing tables ################## warming-deg-c-table = load-table: country :: String, warming-deg-c-since-1960 :: String, region :: String source: warming-deg-c-unsanitized-table.sheet-by-name("warming-degc", true) sanitize country using DS.string-sanitizer sanitize warming-deg-c-since-1960 using DS.string-sanitizer sanitize region using DS.string-sanitizer end co2-emissions-table = load-table: year :: Number, country :: String, total-co2-emission-metric-ton :: Number, per-capita :: Number source: warming-deg-c-unsanitized-table.sheet-by-name( "fossil-fuel-co2-emissions-by-nation_csv", true) sanitize year using DS.strict-num-sanitizer sanitize country using DS.string-sanitizer sanitize total-co2-emission-metric-ton using DS.strict-num-sanitizer sanitize per-capita using DS.strict-num-sanitizer end years-1960-2014-table = load-table: year :: Number source: years-1960-2014-unsanitized-table.sheet-by-name("years", true) sanitize year using DS.strict-num-sanitizer end regions-table = load-table: region :: String source: regions-unsanitized-table.sheet-by-name("regions", true) sanitize region using DS.string-sanitizer end fun string-to-num-project1(s :: String) -&gt; Number: doc: "converts string to number value" n = string-to-number(s) cases (Option) n: | some(v) =&gt; v | none =&gt; raise("Non-number value passed in: " + s) end where: string-to-num-project1("3") is 3 string-to-num-project1("3.5") is 3.5 string-to-num-project1("-10") is -10 string-to-num-project1("hello") raises "Non-number value passed in: hello" end </code></pre> </details> ### Bikeshare Data <details class="part" data-startline="237" data-endline="325" data-position="15387" data-size="7247"><summary>Instructions</summary> <p data-position="15412" data-size="0"><strong data-position="15412" data-size="0"><span data-position="15414" data-size="9">Overview:</span></strong><span data-position="15425" data-size="162"> This dataset covers customers’ starting and stopping zip codes for a bikshare service. The data was collected over October 2020. The main table in this dataset (</span><code data-position="15588" data-size="27">october-2020-citibike-table</code><span data-position="15616" data-size="406">) shows information about the start/stop bikeshare stations customers have used and trip duration, as well as the age/gender of customers. Additionally, we include a table with the zipcodes of each bikeshare station, and a table of all zipcodes. Take a minute to load the tables (using the stencil code below) into Pyret and take a look at the contents of each table before continuing reading this section.</span></p> <p data-position="16024" data-size="0"><strong data-position="16024" data-size="0"><span data-position="16026" data-size="9">Analysis:</span></strong><span data-position="16037" data-size="107"> If you choose this dataset, your analysis should answer the following questions about the bikeshare data: </span><strong data-position="16144" data-size="0"><span data-position="16146" data-size="98">All analysis questions should be answered with both a visualization/chart and corresponding table</span></strong></p> <ul> <li class="" data-position="16250" data-size="0" data-startline-back="243" data-endline-back="243"><span data-position="16250" data-size="88">Which 5 zip codes were used the most number of times as start stations? As end stations?</span></li> <li class="" data-position="16341" data-size="0" data-startline-back="244" data-endline-back="244"><span data-position="16341" data-size="264">Find the top 5 starting zipcodes for customers, and the top 5 starting zipcodes for subscribers. Compare these sets of zip codes. Are there any similarities? If there are similar zipcodes between the sets, are there more customers or subscribers at those zipcodes?</span></li> <li class="" data-position="16608" data-size="0" data-startline-back="245" data-endline-back="245"><span data-position="16608" data-size="167">Find the top 5 starting zip codes for the following age groups: 20 or under, 21-30, 31-40, and 41 or older. Are there any similarities between these sets of zip codes?</span></li> <li class="" data-position="16778" data-size="0" data-startline-back="246" data-endline-back="248"><span data-position="16778" data-size="72">Is there a relationship between the length of rides and gender of rider?</span></li> </ul> <p data-position="16853" data-size="0"><strong data-position="16853" data-size="0"><span data-position="16855" data-size="18">Summary Generator:</span></strong><span data-position="16875" data-size="23"> You must now create a </span><code data-position="16899" data-size="17">summary-generator</code><span data-position="16917" data-size="71"> function that can be used to generate summaries of bikeshare datasets.</span></p> <p data-position="16991" data-size="0"><span data-position="16991" data-size="210">Imagine that there are many bikshare datasets, and that you need to create summaries of them with different types of statistics. For example, in one case, you may get the dataset from 2019 and need to find the </span><strong data-position="17201" data-size="0"><span data-position="17203" data-size="7">average</span></strong><span data-position="17212" data-size="108"> ride duration for each zipcode. Or in another case, you may get the dataset from 2022 and need to find the </span><strong data-position="17320" data-size="0"><span data-position="17322" data-size="6">median</span></strong><span data-position="17330" data-size="163"> ride duration for each zipcode. This all requires building a function that is flexible in terms of which data it presents and what kind of statistics it computes.</span></p> <p data-position="17495" data-size="0"><span data-position="17495" data-size="5">Your </span><code data-position="17501" data-size="17">summary-generator</code><span data-position="17519" data-size="49"> will take in a table with the following columns:</span></p> <ul> <li class="" data-position="17571" data-size="0" data-startline-back="254" data-endline-back="254"><span data-position="17571" data-size="45">“start-zip”: the starting zip code for a ride</span></li> <li class="" data-position="17619" data-size="0" data-startline-back="255" data-endline-back="255"><span data-position="17619" data-size="41">“end-zip”: the ending zip code for a ride</span></li> <li class="" data-position="17663" data-size="0" data-startline-back="256" data-endline-back="256"><span data-position="17663" data-size="27">“age”: the age of the rider</span></li> <li class="" data-position="17693" data-size="0" data-startline-back="257" data-endline-back="258"><span data-position="17693" data-size="36">“duration”: the duration of the ride</span></li> </ul> <p data-position="17731" data-size="0"><span data-position="17731" data-size="44">For example, the input table might look like</span></p> <pre><code> | start-zip| end-zip | age | duration | | -------- | -------- | -------- | -------- | | zip1 | zip2 | age1 | dur1 | | zip3 | zip4 | age2 | dur2 | | zip5 | zip6 | age3 | dur3 | | ... | ... | ... | ... | </code></pre> <p data-position="18064" data-size="0"><span data-position="18064" data-size="4">The </span><code data-position="18069" data-size="17">summary-generator</code><span data-position="18088" data-size="21"> will also take in a </span><a href="https://hackmd.io/@cs111/table#Summarizing-Columns" target="_blank" rel="noopener"><span data-position="18109" data-size="16">summary function</span></a><span data-position="18178" data-size="32">. Take a look at the functions (</span><code data-position="18211" data-size="3">sum</code><span data-position="18215" data-size="2">, </span><code data-position="18218" data-size="4">mean</code><span data-position="18223" data-size="62">, etc) on the summary function documentation. Each takes in a </span><code data-position="18286" data-size="5">Table</code><span data-position="18292" data-size="5"> and </span><code data-position="18298" data-size="6">String</code><span data-position="18305" data-size="109"> representing a column name, and applies the relevant math over that column of the table to produce a single </span><code data-position="18415" data-size="6">Number</code><span data-position="18422" data-size="15"> (for example, </span><code data-position="18438" data-size="4">mean</code><span data-position="18443" data-size="55"> produces the mean of all of the values in the column).</span></p> <p data-position="18500" data-size="0"><span data-position="18500" data-size="17">The goal of your </span><code data-position="18518" data-size="17">summary-generator</code><span data-position="18536" data-size="171"> is to figure out how to use the summary function and the input table to produce an output table with only these columns: “zipcode”, “average-age”, and “duration-summary”:</span></p> <pre><code>| zipcode | average-age | duration-summary | | ------- | ---------- | ---------------- | | 10020 | 24.7 | 853.7 | | 11101 | ... | ... | | 10451 | ... | ... | | 11237 | ... | ... | ... </code></pre> <p data-position="18994" data-size="0"><span data-position="18994" data-size="48">Each row has data for a particular zipcode. The </span><code data-position="19043" data-size="11">average-age</code><span data-position="19055" data-size="100"> column contains the average age of the riders for each ride that started or ended in that zipcode. </span><strong data-position="19155" data-size="0"><span data-position="19157" data-size="4">The </span><code data-position="19162" data-size="7">zipcode</code><span data-position="19170" data-size="5"> and </span><code data-position="19176" data-size="11">average-age</code><span data-position="19188" data-size="100"> columns will be the same for a given input table, no matter what summary function you give to your </span><code data-position="19289" data-size="17">summary-generator</code><span data-position="19307" data-size="1">.</span></strong><span data-position="19310" data-size="5"> The </span><code data-position="19316" data-size="16">duration-summary</code><span data-position="19333" data-size="95"> column summarizes some statistic about the durations of the unique rides around that zipcode, </span><strong data-position="19428" data-size="0"><span data-position="19430" data-size="35">based on the summary function input</span></strong><span data-position="19467" data-size="6">. The </span><code data-position="19474" data-size="16">duration-summary</code><span data-position="19491" data-size="84"> statistic might be the total duration, the average duration, the median, and so on.</span></p> <p data-position="19578" data-size="0"><span data-position="19578" data-size="21">For instance, if the </span><code data-position="19600" data-size="4">mean</code><span data-position="19605" data-size="32"> function were passed into your </span><code data-position="19638" data-size="17">summary-generator</code><span data-position="19656" data-size="15"> function, the </span><code data-position="19672" data-size="16">duration-summary</code><span data-position="19689" data-size="83"> column should contain the average duration over all rides of the data set. If the </span><code data-position="19773" data-size="3">sum</code><span data-position="19777" data-size="32"> function were passed into your </span><code data-position="19810" data-size="17">summary-generator</code><span data-position="19828" data-size="15"> function, the </span><code data-position="19844" data-size="16">duration-summary</code><span data-position="19861" data-size="73"> column should contain the total duration over all rides of the data set.</span></p> <p data-position="19936" data-size="0"><span data-position="19936" data-size="8">For the </span><code data-position="19945" data-size="17">summary-generator</code><span data-position="19963" data-size="36"> function, use the following header:</span></p> <pre><code>fun summary-generator(t :: Table, summary-func :: (Table , String -&gt; Number))-&gt; Table: doc: ```Produces a table that uses the given function to summarize duration of rides across zipcodes. The outputted table should also have average age of riders for each zipcdoe.``` ... end </code></pre> <p data-position="20301" data-size="0"><span data-position="20301" data-size="24">This might be called as </span><code data-position="20326" data-size="32">summary-generator(mytable, sum)</code><span data-position="20359" data-size="4"> or </span><code data-position="20364" data-size="32">summary-generator(mytable, mean)</code><span data-position="20397" data-size="84"> to summarize the total or average duration of rides in each zipcode represented in </span><code data-position="20482" data-size="7">mytable</code><span data-position="20490" data-size="1">.</span></p> <p data-position="20493" data-size="0"><strong data-position="20493" data-size="0"><span data-position="20495" data-size="4">Note</span></strong><span data-position="20501" data-size="2">: </span><code data-position="20504" data-size="3">sum</code><span data-position="20508" data-size="5"> and </span><code data-position="20514" data-size="4">mean</code><span data-position="20519" data-size="171"> here are built-in functions (that you do not write), as described above. Passing a function as an argument is like what you have done when using or table operations like </span><code data-position="20691" data-size="12">build-column</code><span data-position="20704" data-size="2">, </span><code data-position="20707" data-size="11">filter-with</code><span data-position="20719" data-size="6">, and </span><code data-position="20726" data-size="16">transform-column</code><span data-position="20743" data-size="1">.</span></p> <p data-position="20746" data-size="0"><span data-position="20746" data-size="5">Your </span><code data-position="20752" data-size="17">summary-generator</code><span data-position="20770" data-size="10"> function </span><strong data-position="20780" data-size="0"><span data-position="20782" data-size="10">should not</span></strong><span data-position="20794" data-size="68"> reference any tables from outside the function except the provided </span><code data-position="20863" data-size="22">sorted-zip-codes-table</code><span data-position="20886" data-size="65">. While producing your output table, you should build columns to </span><code data-position="20952" data-size="22">sorted-zip-codes-table</code><span data-position="20975" data-size="150">. Also, your output table should not contain any columns other than those shown in the example above: “zipcode”, “average-age” and “duration-summary”.</span></p> <p data-position="21127" data-size="0"><strong data-position="21127" data-size="0"><span data-position="21129" data-size="5">Note:</span></strong><span data-position="21136" data-size="25"> You do not need to test </span><code data-position="21162" data-size="17">summary-generator</code><span data-position="21180" data-size="22">. However, please run </span><code data-position="21203" data-size="17">summary-generator</code><span data-position="21221" data-size="138"> twice outside of the function with two different summary functions. Make sure the output makes sense! This will look something like this:</span></p> <pre><code>fun summary-generator(...): # your code end summary-generator(your-input-table, func1) summary-generator(your-input-table, func2) </code></pre> <p data-position="21504" data-size="0"><strong data-position="21504" data-size="0"><span data-position="21506" data-size="5">Hints</span></strong><span data-position="21513" data-size="1">:</span></p> <ol data-position="21517" data-size="0"> <li class="" data-position="21520" data-size="0" data-startline-back="323" data-endline-back="323"><span data-position="21520" data-size="275">You will have to construct an example input table with the column names “start-zip,” “end-zip,” “age,” and “duration” yourself. Plan out how you will do this in the design check! It will also help to manually create smaller input tables that you can use while developing the </span><code data-position="21796" data-size="17">summary-generator</code><span data-position="21814" data-size="1">.</span></li> <li class="" data-position="21819" data-size="0" data-startline-back="324" data-endline-back="324"><span data-position="21819" data-size="200">The format of the output suggests that you will have to call the summary function once for every zipcode to generate the specific summary value for that zipcode’s row. The summary function takes in a </span><code data-position="22020" data-size="5">Table</code><span data-position="22026" data-size="7"> and a </span><code data-position="22034" data-size="6">String</code><span data-position="22041" data-size="223">. For each zipcode, what does the input table to the summary function look like in order to get the desired output? It may help to draw out an example table for a specific zipcode. Then, think about how you to create those </span><code data-position="22265" data-size="5">Table</code><span data-position="22271" data-size="28">s out of the input table to </span><code data-position="22300" data-size="17">summary-generator</code><span data-position="22318" data-size="1">.</span></li> <li class="" data-position="22323" data-size="0" data-startline-back="325" data-endline-back="325"><span data-position="22323" data-size="70">Go back to the analysis questions. Can you use tables created by your </span><code data-position="22394" data-size="17">summary-generator</code><span data-position="22412" data-size="163"> to answer some of those questions? What summary functions would you use? Understanding this question will go a long way in helping you understand the goal of the </span><code data-position="22576" data-size="17">summary-generator</code><span data-position="22594" data-size="36"> function and the entire assignment.</span></li> </ol> </details> <details class="part" data-startline="328" data-endline="383" data-position="22636" data-size="2054"><summary>Stencil</summary> <p data-position="22655" data-size="0"><span data-position="22655" data-size="66">Copy and paste the following code to load the datasets into Pyret.</span></p> <pre><code>include tables include gdrive-sheets include shared-gdrive("dcic-2021", "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep") import math as M import statistics as S import data-source as DS google-id = "1iSAp4AXNNcfdxm7cBCSPcy_KPINpgf0nPYGvxxIZXV4" october-2020-citibike-unsanitized-table = load-spreadsheet(google-id) stations-unsanitized-table = load-spreadsheet(google-id) sorted-zip-codes-unsanitized-table = load-spreadsheet(google-id) #| Note: for the gender column 0 represents unknown 1 represents male 2 represents female |# october-2020-citibike-table = load-table: trip-duration :: Number, start-time :: String, stop-time :: String, start-station-id :: Number, start-station-name :: String, end-station-id :: Number, end-station-name :: String, bike-id :: Number, user-type :: String, birth-year :: Number, gender :: Number source: october-2020-citibike-unsanitized-table.sheet-by-name("october-2020-citibike-sample", true) sanitize trip-duration using DS.strict-num-sanitizer sanitize start-time using DS.string-sanitizer sanitize stop-time using DS.string-sanitizer sanitize start-station-id using DS.strict-num-sanitizer sanitize start-station-name using DS.string-sanitizer sanitize end-station-id using DS.strict-num-sanitizer sanitize end-station-name using DS.string-sanitizer sanitize bike-id using DS.strict-num-sanitizer sanitize user-type using DS.string-sanitizer sanitize birth-year using DS.strict-num-sanitizer sanitize gender using DS.strict-num-sanitizer end stations-table = load-table: station-name :: String, zipcode :: Number source: stations-unsanitized-table.sheet-by-name("station-dataset", true) sanitize station-name using DS.string-sanitizer sanitize zipcode using DS.strict-num-sanitizer end sorted-zip-codes-table = load-table: zipcode :: Number source: sorted-zip-codes-unsanitized-table.sheet-by-name("zipcodes", true) sanitize zipcode using DS.strict-num-sanitizer end </code></pre> </details> ### Grocery Stores <details class="part" data-startline="388" data-endline="481" data-position="24712" data-size="7894"><summary>Instructions</summary> <p data-position="24737" data-size="0"><strong data-position="24737" data-size="0"><span data-position="24739" data-size="9">Overview:</span></strong><span data-position="24750" data-size="131"> This dataset covers the county-level quantities of grocery and convenience stores, as well as county populations. The main table (</span><code data-position="24882" data-size="24">county-store-count-table</code><span data-position="24907" data-size="331">) shows the numbers of grocery and convenience stores in counties. You are also given a table with the populations of counties, as well as a table of state abbreviations. Take a minute to load the tables (using the stencil code below) into Pyret and take a look at the contents of each table before continuing reading this section.</span></p> <p data-position="25240" data-size="0"><strong data-position="25240" data-size="0"><span data-position="25242" data-size="9">Analysis:</span></strong><span data-position="25253" data-size="120"> If you choose this dataset, your analysis should answer the following questions about the grocery and population data: </span><strong data-position="25373" data-size="0"><span data-position="25375" data-size="98">All analysis questions should be answered with both a visualization/chart and corresponding table</span></strong></p> <p data-position="25478" data-size="0"><span data-position="25478" data-size="81">Note: “combined stores” here refers to the sum of grocery and convenience stores.</span></p> <ul> <li class="" data-position="25564" data-size="0" data-startline-back="398" data-endline-back="398"><span data-position="25564" data-size="65">Which state has the largest number of combined stores per capita?</span></li> <li class="" data-position="25633" data-size="0" data-startline-back="399" data-endline-back="399"><span data-position="25633" data-size="74">Do states with the largest populations also have the most combined stores?</span></li> <li class="" data-position="25710" data-size="0" data-startline-back="400" data-endline-back="400"><span data-position="25710" data-size="138">Is there a correlation between county populations and the number of combined stores per capita in the states to which the counties belong?</span></li> <li class="" data-position="25851" data-size="0" data-startline-back="401" data-endline-back="404"><span data-position="25851" data-size="79">Which 5 states have the largest ratio of convenience stores to combined stores?</span></li> </ul> <p data-position="25934" data-size="0"><strong data-position="25934" data-size="0"><span data-position="25936" data-size="18">Summary Generator:</span></strong><span data-position="25956" data-size="23"> You must now create a </span><code data-position="25980" data-size="17">summary-generator</code><span data-position="25998" data-size="83"> function that can be be used to generate summaries of county-level store datasets.</span></p> <p data-position="26084" data-size="0"><span data-position="26084" data-size="230">Imagine that there are many county-level store datasets in the US, and that you need to create summaries of them with different types of statistics. For example, in one case, you may get the dataset from 1970 and need to find the </span><strong data-position="26314" data-size="0"><span data-position="26316" data-size="7">maximum</span></strong><span data-position="26325" data-size="128"> stores-per-capita value for counties in each state. Or in another case, you may get the dataset from 2021 and need to find the </span><strong data-position="26453" data-size="0"><span data-position="26455" data-size="7">average</span></strong><span data-position="26464" data-size="177"> stores-per-capita for counties in each state. This all requires building a function that is flexible in terms of which data it presents and what kind of statistics it computes.</span></p> <p data-position="26643" data-size="0"><span data-position="26643" data-size="5">Your </span><code data-position="26649" data-size="17">summary-generator</code><span data-position="26667" data-size="58"> will take in a table that contains the following columns:</span></p> <ul> <li class="" data-position="26729" data-size="0" data-startline-back="411" data-endline-back="411"><span data-position="26729" data-size="27">“county”: the county’s name</span></li> <li class="" data-position="26759" data-size="0" data-startline-back="412" data-endline-back="412"><span data-position="26759" data-size="76">“state”: the county’s state, either as the state’s full name or abbreviation</span></li> <li class="" data-position="26838" data-size="0" data-startline-back="413" data-endline-back="413"><span data-position="26838" data-size="85">“num-stores-county”: the total number of grocery and convenience stores in the county</span></li> <li class="" data-position="26926" data-size="0" data-startline-back="414" data-endline-back="415"><span data-position="26926" data-size="128">“stores-per-capita-county”: the total number of grocery and convenience stores in the county, divided by the county’s population</span></li> </ul> <p data-position="27056" data-size="0"><span data-position="27056" data-size="44">For example, the input table might look like</span></p> <pre><code>| county | state | num-stores-county | stores-per-capita-county | | ------------- | ----- | ----------------- | ------------------------ | | county 1 | RI | num1. | num2. | | county 2 | CO | ... | ... | | county 3 | MD | ... | ... | | county 4 | SD | ... | ... | ... </code></pre> <p data-position="27553" data-size="0"><span data-position="27553" data-size="4">The </span><code data-position="27558" data-size="17">summary-generator</code><span data-position="27577" data-size="21"> will also take in a </span><a href="https://hackmd.io/@cs111/table#Summarizing-Columns" target="_blank" rel="noopener"><span data-position="27598" data-size="16">summary function</span></a><span data-position="27667" data-size="32">. Take a look at the functions (</span><code data-position="27700" data-size="3">sum</code><span data-position="27704" data-size="2">, </span><code data-position="27707" data-size="4">mean</code><span data-position="27712" data-size="62">, etc) on the summary function documentation. Each takes in a </span><code data-position="27775" data-size="5">Table</code><span data-position="27781" data-size="5"> and </span><code data-position="27787" data-size="6">String</code><span data-position="27794" data-size="109"> representing a column name, and applies the relevant math over that column of the table to produce a single </span><code data-position="27904" data-size="6">Number</code><span data-position="27911" data-size="15"> (for example, </span><code data-position="27927" data-size="4">mean</code><span data-position="27932" data-size="55"> produces the mean of all of the values in the column).</span></p> <p data-position="27989" data-size="0"><span data-position="27989" data-size="17">The goal of your </span><code data-position="28007" data-size="17">summary-generator</code><span data-position="28025" data-size="178"> is to figure out how to use the summary function and the input table to produce an output table with only these columns: “state”, “abbv”, “num-stores”, and “per-capita-summary”:</span></p> <pre><code>| state | abbv | num-stores | per-capita-summary | | ------------- | ----- | ---------- | ------------------ | | Rhode Island | RI | 145,000 | 0.001 | | Colorado | CO | ... | ... | | Maryland | MD | ... | ... | | South Dakota | SD | ... | ... | ... </code></pre> <p data-position="28578" data-size="0"><span data-position="28578" data-size="25">Each row is a State. The </span><code data-position="28604" data-size="10">num-stores</code><span data-position="28615" data-size="83"> column contains the total number of grocery and convenience stores in that state. </span><strong data-position="28698" data-size="0"><span data-position="28700" data-size="136">The “state”, “abbv” and “num-stores” columns will be the same for a given input table, no matter what summary function you give to your </span><code data-position="28837" data-size="17">summary-generator</code><span data-position="28855" data-size="1">.</span></strong><span data-position="28858" data-size="5"> The </span><code data-position="28864" data-size="18">per-capita-summary</code><span data-position="28883" data-size="93"> column summarizes some statistic about the total number of stores (grocery and convenience) </span><em data-position="28976" data-size="0"><span data-position="28977" data-size="10">per capita</span></em><span data-position="28988" data-size="34"> across counties in that state, , </span><strong data-position="29022" data-size="0"><span data-position="29024" data-size="35">based on the summary function input</span></strong><span data-position="29061" data-size="116">. The per-capita statistic might be the total, average, median, etc stores per capita across counties in that state.</span></p> <p data-position="29180" data-size="0"><span data-position="29180" data-size="21">For instance, if the </span><code data-position="29202" data-size="4">mean</code><span data-position="29207" data-size="32"> function were passed into your </span><code data-position="29240" data-size="17">summary-generator</code><span data-position="29258" data-size="15"> function, the </span><code data-position="29274" data-size="18">per-capita-summary</code><span data-position="29293" data-size="44"> column should contain the average value of </span><em data-position="29337" data-size="0"><span data-position="29338" data-size="23">total stores per capita</span></em><span data-position="29362" data-size="55"> across all counties in the state for that row. If the </span><code data-position="29418" data-size="3">sum</code><span data-position="29422" data-size="32"> function were passed into your </span><code data-position="29455" data-size="17">summary-generator</code><span data-position="29473" data-size="15"> function, the </span><code data-position="29489" data-size="18">per-capita-summary</code><span data-position="29508" data-size="38"> column should contain the sum of the </span><em data-position="29546" data-size="0"><span data-position="29547" data-size="23">total stores per capita</span></em><span data-position="29571" data-size="39"> across all the counties in that state.</span></p> <p data-position="29613" data-size="0"><span data-position="29613" data-size="26">The person who calls your </span><code data-position="29640" data-size="17">summary-generator</code><span data-position="29658" data-size="89"> function will indicate which summary method to use by passing another function as input.</span></p> <p data-position="29749" data-size="0"><span data-position="29749" data-size="8">For the </span><code data-position="29758" data-size="17">summary-generator</code><span data-position="29776" data-size="36"> function, use the following header:</span></p> <pre><code>fun summary-generator(t :: Table, summary-func :: (Table , String -&gt; Number))-&gt; Table: doc: ```Produces a table that uses the given function to summarize stores per capita across counties. The outputted table should also have total number of grocery and convenience stores for every state.``` ... end </code></pre> <p data-position="30144" data-size="0"><span data-position="30144" data-size="24">This might be called as </span><code data-position="30169" data-size="31">summary-generator(mytable, sum)</code><span data-position="30201" data-size="4"> or </span><code data-position="30206" data-size="32">summary-generator(mytable, mean)</code><span data-position="30240" data-size="35"> to summary the total or average CO</span><sub><span data-position="30144" data-size="0">2</span></sub><span data-position="30277" data-size="72"> emissions since 1960 across countries in each region as represented in </span><code data-position="30350" data-size="7">mytable</code><span data-position="30358" data-size="1">.</span></p> <p data-position="30362" data-size="0"><strong data-position="30362" data-size="0"><span data-position="30364" data-size="4">Note</span></strong><span data-position="30370" data-size="2">: </span><code data-position="30373" data-size="3">sum</code><span data-position="30377" data-size="5"> and </span><code data-position="30383" data-size="4">mean</code><span data-position="30388" data-size="171"> here are built-in functions (that you do not write), as described above. Passing a function as an argument is like what you have done when using or table operations like </span><code data-position="30560" data-size="12">build-column</code><span data-position="30573" data-size="2">, </span><code data-position="30576" data-size="11">filter-with</code><span data-position="30588" data-size="6">, and </span><code data-position="30595" data-size="16">transform-column</code><span data-position="30612" data-size="1">.</span></p> <p data-position="30615" data-size="0"><span data-position="30615" data-size="5">Your </span><code data-position="30621" data-size="17">summary-generator</code><span data-position="30639" data-size="10"> function </span><strong data-position="30649" data-size="0"><span data-position="30651" data-size="10">should not</span></strong><span data-position="30663" data-size="68"> reference any tables from outside the function except the provided </span><code data-position="30732" data-size="16">state-abbv-table</code><span data-position="30749" data-size="52">. While producing your output table, you should use </span><code data-position="30802" data-size="16">state-abbv-table</code><span data-position="30819" data-size="101"> as a starting point (to build columns for the output table and to extract data from the input table </span><code data-position="30921" data-size="1">t</code><span data-position="30923" data-size="157">). Also, your output table should not contain any columns other than those shown in the example above: “state”, “abbv”, “num-stores” and “per-capita-summary”</span></p> <p data-position="31082" data-size="0"><strong data-position="31082" data-size="0"><span data-position="31084" data-size="5">Note:</span></strong><span data-position="31091" data-size="25"> You do not need to test </span><code data-position="31117" data-size="17">summary-generator</code><span data-position="31135" data-size="22">. However, please run </span><code data-position="31158" data-size="17">summary-generator</code><span data-position="31176" data-size="138"> twice outside of the function with two different summary functions. Make sure the output makes sense! This will look something like this:</span></p> <pre><code>fun summary-generator(...): # your code end summary-generator(your-input-table, func1) summary-generator(your-input-table, func2) </code></pre> <p data-position="31459" data-size="0"><strong data-position="31459" data-size="0"><span data-position="31461" data-size="5">Hints</span></strong><span data-position="31468" data-size="1">:</span></p> <ol data-position="31472" data-size="0"> <li class="" data-position="31475" data-size="0" data-startline-back="478" data-endline-back="478"><span data-position="31475" data-size="299">You will have to construct an example input table with the column names “state”, “county,” “num-stores-county” and “stores-per-capita-county” yourself. Plan out how you will do this in the design check! It will also help to manually create smaller input tables that you can use while developing the </span><code data-position="31775" data-size="17">summary-generator</code><span data-position="31793" data-size="1">.</span></li> <li class="" data-position="31798" data-size="0" data-startline-back="479" data-endline-back="479"><span data-position="31798" data-size="196">The format of the output suggests that you will have to call the summary function once for every state to generate the specific summary value for that state’s row. The summary function takes in a </span><code data-position="31995" data-size="5">Table</code><span data-position="32001" data-size="7"> and a </span><code data-position="32009" data-size="6">String</code><span data-position="32016" data-size="219">. For each state, what does the input table to the summary function look like in order to get the desired output? It may help to draw out an example table for a specific state. Then, think about how you to create those </span><code data-position="32236" data-size="5">Table</code><span data-position="32242" data-size="28">s out of the input table to </span><code data-position="32271" data-size="17">summary-generator</code><span data-position="32289" data-size="1">.</span></li> <li class="" data-position="32294" data-size="0" data-startline-back="480" data-endline-back="481"><span data-position="32294" data-size="70">Go back to the analysis questions. Can you use tables created by your </span><code data-position="32365" data-size="17">summary-generator</code><span data-position="32383" data-size="163"> to answer some of those questions? What summary functions would you use? Understanding this question will go a long way in helping you understand the goal of the </span><code data-position="32547" data-size="17">summary-generator</code><span data-position="32565" data-size="36"> function and the entire assignment.</span></li> </ol> </details> <details class="part" data-startline="484" data-endline="524" data-position="32608" data-size="1554"><summary>Stencil</summary> <p data-position="32627" data-size="0"><span data-position="32627" data-size="65">Copy and paste the following code to load the dataset into Pyret:</span></p> <pre><code>include tables include gdrive-sheets include shared-gdrive("dcic-2021", "1wyQZj_L0qqV9Ekgr9au6RX2iqt2Ga8Ep") import math as M import statistics as S import data-source as DS google-id = "17OCB7nDBepuvxHrDKB4qMPcI0_UHbTzNwMP_2s0WkXw" county-population-unsanitized-table = load-spreadsheet(google-id) county-store-count-unsanitized-table = load-spreadsheet(google-id) state-abbv-unsanitized-table = load-spreadsheet(google-id) county-population-table = load-table: county :: String, state :: String, population-estimate-2016 :: Number source: county-population-unsanitized-table.sheet-by-name("county-population", true) sanitize county using DS.string-sanitizer sanitize state using DS.string-sanitizer sanitize population-estimate-2016 using DS.strict-num-sanitizer end county-store-count-table = load-table: state :: String, county :: String, num-grocery-stores :: Number, num-convenience-stores :: Number source: county-store-count-unsanitized-table.sheet-by-name("county-store-count", true) sanitize state using DS.string-sanitizer sanitize county using DS.string-sanitizer sanitize num-grocery-stores using DS.strict-num-sanitizer sanitize num-convenience-stores using DS.strict-num-sanitizer end state-abbv-table = load-table: state :: String, abbv :: String source: state-abbv-unsanitized-table.sheet-by-name("state-abbv", true) sanitize state using DS.string-sanitizer sanitize abbv using DS.string-sanitizer end </code></pre> </details> --- ## Deadline 1: The Design Stage The design check is a 30-minute one-on-one meeting between your team and a TA to review your project plans and to give you feedback well before the final deadline. Many students make changes to their designs following the check: doing so is common and will not cost you points.