<style type="text/css"> .reveal .slides { min-height: 100%; width: 100%; height: 96mm; } .reveal .slides section h1 { color: #ffefef; text-shadow: -1px -1px 0 rgba(230,2,255, 0.7), 2px -2px 0 rgba(230,2,255, 0.7), -3px 3px 0 rgba(230,2,255, 0.7), 3px 3px 0 rgba(230,2,255, 0.7); } .reveal .slides section h3 { color: #ffefef; text-shadow: -1px -1px 0 rgba(230,2,131, 0.7), 1px -1px 0 rgba(230,2,131, 0.7), -2px 2px 0 rgba(230,2,131, 0.7), 2px 2px 0 rgba(230,2,131, 0.7); } .reveal img, .reveal video, .reveal iframe { width: 100%; min-width: 100%; max-height: 100%; } html, .reveal { color: #CFC7BF; } ol, ul { font-size: 3.5rem; } .reveal pre code { display: block; width:100%; line-height: 1.5em; border-radius: 0.9rem; font-size: 2rem; padding: 3rem; overflow: auto; } </style> --- # Final Project ## [Northeastern University](https://www.northeastern.edu/) 27 September 2022 ### by Kelsey Packard --- ![](https://i.imgur.com/5pcy9BS.jpg) --- ### Objectives - Explore data warehouses - Practice using SQL - Understand how data travels as `.csv` files - Look for correlation between maks wearing and public health --- # Mask Implementation during COVID-19 ## Did you wear your mask during the Pandemic? --- ![masks in class](https://i.imgur.com/ctCV7Ag.jpg) --- # SQL queries --- # The Schema of Table 1 --- ![](https://i.imgur.com/Wk66lK9.png) --- # The Schema of Table 2 --- ![](https://i.imgur.com/9O3em0z.png) --- ## I used the following SQL code on BigQuery public data sets --- ```sql SELECT m.*, c.* FROM `bigquery-public-data.covid19_nyt.mask_use_by_county` AS m INNER JOIN `bigquery-public-data.covid19_nyt.us_counties` AS c ON c.county_fips_code = m.county_fips_code; ``` --- ## The first query generated over 2 million records so I tried a different one. Basically, to start with I wanted from table c only the name of the county and the name of the state --- ```sql= SELECT DISTINCT(c.county_fips_code), m.*, c.county, c.state_name FROM `bigquery-public-data.covid19_nyt.mask_use_by_county` AS m INNER JOIN `bigquery-public-data.covid19_nyt.us_counties` AS c ON c.county_fips_code = m.county_fips_code; ``` --- ## I used DISTINCT to only get each county once --- ## Then I uploaded the `.csv` into a Jupyter nootebook --- ![](https://i.imgur.com/zAk631N.png) --- ## Pandas: data exploration 🧭 --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/5e0e7109-c2dd-4701-a62c-9c6f35b1ab25/b16620a7-1450-444b-827f-6599d328bb37/f896aeb6a0304397b994d3a37be93145?height=134.17897415161133" height="134.17897415161133" width="500"/> --- ## The first graph 👇 --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/b582b3d0-399f-4dee-88a7-309ce0171951/b16620a7-1450-444b-827f-6599d328bb37/8f66c56bef1a4c6fb4f86f368d027716?height=600" height="600" width="500"/> --- ## The Second --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/b582b3d0-399f-4dee-88a7-309ce0171951/b16620a7-1450-444b-827f-6599d328bb37/3d1b0ad0844442af988b4cc97a4cecad?height=600" height="600" width="500"/> --- # The Third --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/5e0e7109-c2dd-4701-a62c-9c6f35b1ab25/b16620a7-1450-444b-827f-6599d328bb37/3388c81f1a5f4f82b19461d221ba31c2?height=599.9744262695312" height="599.9744262695312" width="500"/> --- ## Some data cleaning --- ![](https://i.imgur.com/nr4FBYe.png) --- ## Choropleth without data cleaning --- ``` import plotly.express as px fig = px.choropleth(mask_use_by_county_df, geojson=counties, locations='fips', color='% always', color_continuous_scale="Viridis", range_color=(0, 12), scope="usa", labels={'county':'county name'} ) fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0}) fig.show() ``` --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/5e0e7109-c2dd-4701-a62c-9c6f35b1ab25/b16620a7-1450-444b-827f-6599d328bb37/f46d73326ccc4d7e8b253b86e3f9ae5d?height=599.9744262695312" height="599.9744262695312" width="500"/> --- The graph (it is called a choropleth) above has California broken. I Googled aboout FIPS format and found that it had to be formatted as a 5 digit string which is what the code below does. --- ## **Data Cleaning and Manipulation** ``` mask_use_by_county_df['county_fips_code'].apply(lambda x: str(x) if len(str(x))>4 else '0'+str(x)) ``` --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/5e0e7109-c2dd-4701-a62c-9c6f35b1ab25/b16620a7-1450-444b-827f-6599d328bb37/ef8da8e82c7a4fdeb5b597c2ecbba203?height=599.9957275390625" height="599.9957275390625" width="500"/> --- ## Other types of data viz --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/5e0e7109-c2dd-4701-a62c-9c6f35b1ab25/b16620a7-1450-444b-827f-6599d328bb37/d2f9038010d0489c9d390932f2c979c5?height=629.9744262695312" height="629.9744262695312" width="500"/> --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/5e0e7109-c2dd-4701-a62c-9c6f35b1ab25/b16620a7-1450-444b-827f-6599d328bb37/63ca423e7f4d4318872a3196a61aaaea?height=629.9744262695312" height="629.9744262695312" width="500"/> --- ![](https://i.imgur.com/PYHvvqB.png) --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/5e0e7109-c2dd-4701-a62c-9c6f35b1ab25/b16620a7-1450-444b-827f-6599d328bb37/86423ae5c1da42eb8aa719187909742b?height=599.9744262695312" height="599.9744262695312" width="500"/> --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/5e0e7109-c2dd-4701-a62c-9c6f35b1ab25/b16620a7-1450-444b-827f-6599d328bb37/40fbbec1d7ed44e1ada634285a101a08?height=599.9744262695312" height="599.9744262695312" width="500"/> --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/5e0e7109-c2dd-4701-a62c-9c6f35b1ab25/b16620a7-1450-444b-827f-6599d328bb37/581ee4e9e09048df9a06ceb4b9f944cd?height=599.9744262695312" height="599.9744262695312" width="500"/> --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/5e0e7109-c2dd-4701-a62c-9c6f35b1ab25/b16620a7-1450-444b-827f-6599d328bb37/ab06081dbd8340388665a84a3e2cf82f?height=599.9744262695312" height="599.9744262695312" width="500"/> --- <iframe title="Embedded cell output" src="https://embed.deepnote.com/5e0e7109-c2dd-4701-a62c-9c6f35b1ab25/b16620a7-1450-444b-827f-6599d328bb37/84c3cedf39804a8f8c595c287d8f86b0?height=683.9943237304688" height="683.9943237304688" width="500"/> ---
{"metaMigratedAt":"2023-06-17T10:15:15.659Z","metaMigratedFrom":"YAML","title":"Kelsey Packard presentation","breaks":true,"slideOptions":"{\"theme\":\"dark\",\"transition\":\"fade\"}","contributors":"[{\"id\":\"28161843-f826-46bb-ab16-4f11acb3bfad\",\"add\":8624,\"del\":1006},{\"id\":null,\"add\":5097,\"del\":5744}]"}
    199 views