Chia Shen Tsai
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- title: "MLR_lab_CJS" author: "Simon Sharp, Jon Ekberg, Chia Shen Tsai" output: html_document date: "2022-11-13" --- ## Introduction The central question Spotwood et al. investigated is the existence of an association between COVID-19 cases, nature accessibility, and other sociodemographic characteristics. They attempted to utilize ZIP-Code-scale data to examine whether the negative association between COVID- 19 case rates and greenness shown with county-level data in the United States still holds at the finer geospatial level. The variables are all in the scale of ZIP Code Tabulation Areas (ZCTAs). The primary response variable is the number of COVID cases per 100,000 people between March and September 2020 (fetched on 1 October 2020). The explanatory variables include greenness, such as the Normalized Difference Vegetation Index (NDVI) and park access, and sociodemographic features, such as the proportion of White people and people of colour (POC), age, income, and population density. Notably, Spotswood et al. excluded rural areas from their analysis while we kept the rural data in our model. Furthermore, the "Urban" variable is the only binary variable in our dataset. We hypothesized that more green space and higher income would lead to lower Covid-19 case rates. In order to test this hypothesis, we used Covid-19 case rates as our response variable, and selected median income, proportion of white residents, NDVI, percent parks, median age, and aridity, as well as the dummy variable urban, as our explanatory variables. We selected Arizona, Florida, and Maine as the states on which to conduct our analysis. <!-- About ZCTAs: https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html --> ## Summary statistics We chose Florida (n=912), Arizona (n=353), and Maine (n=376) as the analysis target. The distributions of variables across different State varied (See Table1). Overall, Florida has the most COVID-19 cases rates (mean = 3064.34, median = 2611.34), Arizona the second (mean = 2645.94, median = 1968.03), and Maine the least (mean = 238.479, median = 149.115). The greenness also differ across states (mean = 0.32(AZ), 0.58(FL), 0.78(ME)), but the standard deviations (AZ 0.09, Fl 0.10, ME 0.04) show that the difference among each state is relatively small. In terms of population structure, Maine has more White Majority communities, Florida the second, and Arizona the least. In Arizona, the distribution of COVID-19 cases per 100,000 people is highly positively skewed, having a mean of 2645.94 and a standard deviation of 5618.20. For Maine, the distribution of population density is the variable that skewed the most significantly (skewness = 12.558) while all the population density values across the dataset is close to zero in the rough scale. ## Model interpretation and limitations ### Florida Model For Florida, we created our model using the original data with the dependent variable (Covid19) logged. Our group used 8 explanatory variables to describe our dependent variable; we used the variable to describe whether a zip code is urban or rural, we used median income, proportion of white population, vegetation index, percent park, median age, aridity index, and population density. We logged two of the variables: Covid19 (our dependent variable) and popDens (one of our explanatory variables). The results of the model show that the White population proportion is strongly with covid 19 cases in Florida (See Figure 2). The results of the model also show that the vegetation index (NDVI) and aridity is moderatly associated with covid-19 cases in Florida(See Figure 2). For every 1% decrease in proportion white population there is a 1.35 percent increase in the rate of covid 19 cases in the state of Florida holding all other variables constant. For every 1 unit decrease in the vegetation index there is a 1.25 unit increase in the rate of covid 19 cases in the state of Florida holding all other variables constant. And for every 1% decrease in aridity there is a 1.02 percent increase in the rate of covid 19 cases in the state of Florida holding all other variables constant. The adjusted R^2^ value for our Flroida model is 0.2124, which is lower than the other two models we created for Florida, but we decided to use the logged data for our final model since this model has the lowest value in the AIC table (a value of 2,220.13 versus 16,430.77 for the original and scaled data) (See Table 5). Therefore we decided that this model with two of the variables logged is the model that fits our data best. This model has a F-statistic of 30.79 on 8 and 876 degrees of freedom with a very low p-value of 2.2e-16 (See Figure 2). Since this p-value is less than 0.05 there is at least one explanatory variable in our model that is statistically significant with our response variable. ### Arizona Model For Arizona, we used the original data and logged the aridity and the population density variables to establish the linear regression model. One observation was removed due to the unusual occurance of 0 in White population proportion, which is highly possible a mistake from the original data. We selected the model because the adjusted R^2^ (0.458) is the highest compared to models established with scaled data (See Table 7) and the AIC (544.39) is the lowest (see Table 8). The results show that the White population proportion, the aridity and the population density are strongly associated with the number of COVID-19 cases in Arizona (See Table 7). Every 1% decrease in the White population proportion is associated with a 1.34% change in the rate of confirmed cases, holding all other variables constant. Furthermore, every 1% decrease in aridity also asscoiated with a 35% increase in the amounts, and every 1% increase in the population density is moderately associated with a 7.26% increase in COVID-19 cases, holding all other variables constant. The percentage of park area is mildly positively associated with the change in COVID-19 case rates in ZIP code areas. Every 1% increase in the percentage is associated with 2.09% increase in COVID-19 case rates, holding all other variables constant. The p-value of F-statitics is remarkably small (F-statistic=35.47, df=8 and 319, p-value < 0.001), indicating there is at least one explanatory variable associated with the change of the response variable. ### Maine Model For Maine, we used the original data with the response variable logged (Covid-19 case rate), and two explanatory variables logged (NDVI and Population Density). The results of the model show that whether a zip code is urban or rural, the median income, and NDVI, median age, and the population density all have statistically significant impacts on Covid-19 case rates (see Table 14). An area being urban as opposed to rural was associated with an 81% increase in Covid-19 case rates, holding all other variables constant. An increase of Median Income of $1 is associated with a 0.002% increase in Covid-19 case rates, holding all other variables equal. An increase in NDVI of 1% was associated with a decrease of 2.7% in Covid-19 case rates, holding all other variables equal. An increase of one year in median age is associated with a 2.09% increase in Covid-19 case rates, holding all other variables constant. Finally, an increase of 1% in population density is associated with a decrease of 0.13% in Covid-19 case rates. The adjusted R^2^ value for the model was 0.197, which was the highest among the three models I constructed (original data, scaled data, logged data). Additionally, when comparing the models using AIC, the logged model has the lowest value (see Table 13). Therefore, we determined that this model best fit the data. Based on the F-statistic (F-statistic=8.826, df=8 and 248, p-value<0.001) we are able to reject the null hypothesis that all slope coefficients are equal to zero. ### Modelling Limitations The OLS model we used and the negative binomial mixed effect model Spotwood et al. used are potentially confronted with the spatial autocorrelation and endogeneity issues. Spotwood et al. attempted to fix the spatial autocorrelation issue with simultaneous autoregressive models (SAR). For potential endogeneity problem which may omit important variables, they incorporated the Instrument Variable (IV) regression using the two-stage least squares method. ## Assumptions For the Florida model, the results of the Breusch-Pagan test (BP=23.114, df=8, p-value=0.003221) indicate that we cannot reject the null hypothesis that homoscedasticity is present in the model (See Table 3). This result can be confirmed with the residual vs fitted plot for Florida which shows an expected mean close to zero(See Figure 3). The results of the VIF test indicate that there is no multicollinearity between the explanatory variables (VIF scores all lower than 5) (See Table 4). For the Arizona model, given that there is no number larger than 5, the result of VIF test indicates there is no multicollinearity problem (see Table 9). In terms of heteroskedasticity, although the residual v.s. fitted plot (See Figure 7) shows the average of the expected value on each fitted point is close to zero, the Breush-Pagen test manifests a strong possibilty that the variables are heteroskedastic (BP=35.24 ,df=8, p-value<0.001, Table 10). To address the violation of assumption, we transformed the standard error with heteroskedasticity-consistent standard error estimators (Hayes & Cai 2007), the results are shown in Table 6. For the Maine model, the results of the Breusch-Pagan test (BP=10.594, df=8, p-value=0.2258) indicate that we cannot reject the null hypothesis that homoscedasticity is present in the model (see Table 11). This is consistent with the residuals vs. fitted graph for Maine, which shows an expected mean relatively close to zero (see Figure 10). The results of the VIF test indicate that there is no multicollinearity between the explanatory variables (VIF scores all lower than 5) (see Table 12). ## Conclusions The results of our models indicated different impacts of explanatory variables across the states we examined. For example, while NDVI had a statistically significant negative association with COVID-19 in both Florida and Arizona, it did not have a statistically significant impact in Arizona. This could be due to the fact that Arizona is a dryer state with less natural greenery. Aridity also had a different impact of COVID-19 case rates based on the state. In Florida, aridity had a statistically significant positive correlation with COVID-19 case rates in Florida, while it had a statistically significant negative association with case rates in Arizona. Median age also showed different results across different states, with a positive association with COVID-19 case rates in Maine, but a negative association in Florida. Overall, neither variable we were interested in (access to greenery and higher income) showed to have consistent results across all states examined. This suggests that the pattern we were hoping to see does not exist as clearly at such a fine spatial level as zip codes. This does not show the same negative correlation between income, green space, and percent POC that Spotswood et. al. found in their analysis. ## Appendix **Table 1. Summary Table for Explanatory Variables** ![](https://i.imgur.com/dYne9St.png) ### Florida Model Figures **Figure 1.** ![](https://i.imgur.com/RLcedlR.png) **Florida Log-transformed Variables Scatterplot Matrix** **Figure 2.** ![](https://i.imgur.com/U6l32ws.png) Florida Log-transformed Model **Figure 3.** ![](https://i.imgur.com/QaoJacX.png) **Florida Log-transformed Residual vs Fitted** **Figure 4.** ![](https://i.imgur.com/bTbUlfO.png) **Florida Log-transformed Cook's Distance Plot** **Figure 5.** ![](https://i.imgur.com/FWlC02d.png) **Florida Log-transformed Breusch Pagan Test** **Table 2. Florida Log-transformed VIF Test** ![](https://i.imgur.com/BrOEffH.png) **Table 3. Breusch Pagan Table** ![](https://i.imgur.com/CXf65kS.png) **Table 4. VIF Table of Florida Model** ![](https://i.imgur.com/kPNLYhg.png) **Table 5. Florida AIC Comparison** ![](https://i.imgur.com/ZG03f3v.png) **Table 6. Florida Model Comparison** ![](https://i.imgur.com/kgyoLBP.png) ### Arizona Model Figures **Figure 6.** ![](https://i.imgur.com/iLZ88uP.jpg) **Arizona's Scatterplot Matrix** **Figure 7.** ![](https://i.imgur.com/OIU05Oh.png) **Residual v Fitted Plot of Arizona Model without Outlier** **Figure 8.** ![](https://i.imgur.com/lFORAM0.png) **Cook's Distance Plot of Arizona Model without Outlier** **Table 7. Arizona Model Results Comparison** ![](https://i.imgur.com/P3RQaoh.png) **Table 8. Arizona AIC Comparison** ![](https://i.imgur.com/BdxPmj7.png) **Table 9. Arizona VIF test** ![](https://i.imgur.com/JpxO18Y.png) **Table 10. Breusch-Pagen Test of Arizona Model** ![](https://i.imgur.com/yuOpvcj.png) ### Maine Model Figures Figures **Figure 9.** ![](https://i.imgur.com/oDgxGqR.png) Maine Scatter Matrix **Figure 10.** ![](https://i.imgur.com/OL3hjTU.png) Maine Residual vs. Fitted **Table 11. Maine BP Test** ![](https://i.imgur.com/95378WK.png) **Table 12. Maine VIF Test** ![](https://i.imgur.com/ThRVhaa.png) **Table 13. Maine AIC Comparison** ![](https://i.imgur.com/m2Zh3nM.png) **Table 14. Maine Model Comparison** ![](https://i.imgur.com/SsMVn0y.png) ## Reference Hayes, Andrew F., and Li Cai. "Using heteroskedasticity-consistent standard error estimators in OLS regression: An introduction and software implementation." Behavior research methods 39.4 (2007): 709-722. <!-- Aricle Link: https://link.springer.com/content/pdf/10.3758/BF03192961.pdf --> ### Division of labor We each ran all tests and created all graphs and tables for one state. (1) Florida: Jon (2) Maine: Simon (3) Arizona: Chia Shen We all worked together on the writing of the report, the stitching together of the code, and the final troubleshooting and knitting. <!-- # Code ## Communal Code ### Common Environment ```{r libraries, warning= FALSE, message = FALSE} # Require Packages library(GGally) library(devtools) library(dplyr) library(ggplot2) library(ggthemes) library(gridExtra) library(gt) library(knitr) library(moments) library(tidyverse) library(raster) library(janitor) library(sp) library(rgdal) library(spdep) library(car) library(showtext) library(olsrr) library(IDPmisc) library(sjPlot) library(sjmisc) library(sjlabelled) library(lmtest) library(sandwich) rm(list = ls()) ``` ### data preparation: extract ```{r read and scale, warning= FALSE, message = FALSE} states<-readOGR("Covid_Nature_Data_FINAL.shp") states$NDVI_scaled <- as.vector(scale(states$NDVI)) states$PctPark_scaled <- as.vector(scale(states$PctPark)) states$popDens_scaled <- as.vector(scale(states$popDens)) states$medAge_scaled <- as.vector(scale(states$medAge)) states$Days_scaled <- as.vector(scale(states$Days)) states$propWht_scaled <- as.vector(scale(states$propWht)) states$propPOC_scaled <- as.vector(scale(states$propPOC)) states$medIncome_scaled <- as.vector(scale(states$medInc)) states$BUI_scaled <- as.vector(scale(as.numeric(states$BUI))) states$AridIndex_scaled <- as.vector(scale(states$Arid)) ``` ```{r echo = FALSE, warning= FALSE, message = FALSE} states.df<-states@data head(states.df) ``` ### Summary Table ```{r summary table, warning= FALSE, message = FALSE} summary_states.df<-states.df%>% dplyr::filter(STATE=="Maine"| STATE=="Florida" | STATE=="Arizona") summary_states.df <- summary_states.df[,c(1,2,7,9,10,13,16,17,18,19,20)] ``` ``` {r tranfsorm to long 2, warning= FALSE, message = FALSE} summary_states.long <- summary_states.df%>% tidyr::pivot_longer(cols=-c(STATE, County), names_to="variable",values_to="value") head(summary_states.long) ``` ``` {r summary table 2} summary.table<-summary_states.long%>% group_by(STATE, variable)%>% dplyr::summarize( obs=length(value[!is.na(value)]), max=round(max(value, na.rm=TRUE),3), min=round(min(value, na.rm=TRUE),3), mean=round(mean(value, na.rm=TRUE),3), median=round(median(value, na.rm=TRUE),3), sd=round(sd(value, na.rm=TRUE),3), skew=round(skewness(value, na.rm = TRUE),3) ) head(summary.table,20) ``` ```{r summary table gt} summary.table %>% gt() %>% tab_header( title = md("Summary Statistics of COVID-19 Case Rates and All Explanatory Variables in Arizona, Florida, and Maine")) %>% fmt_passthrough (columns=c(`STATE`)) %>% fmt_number(columns = c(obs), decimals = 0) %>% fmt_number(columns = c(max), decimals = 2) %>% fmt_number(columns = c(min), decimals = 2) %>% fmt_number(columns = c(mean), decimals = 2) %>% fmt_number(columns = c(median), decimals = 2) %>% fmt_number(columns = c(sd), decimals = 2) %>% fmt_number(columns = c(skew), decimals = 2)%>% cols_label( STATE="State", obs = "Observations", max = "Maximum", min = "Minimum", mean = "Mean", median = "Median", sd = "SD", skew = "Skewness") ``` ### data preparation: state filter ``` {r maine, warning= FALSE, message = FALSE} maine.df <- filter(states.df, states.df$STATE == "Maine") head(maine.df) ``` ``` {r maine log transformations, warning= FALSE, message = FALSE} maine.df.log<-maine.df%>% mutate(Covid19.log=log(Covid19), NDVI.log=log(NDVI), popDens.log=log(popDens) ) maine.df.log ``` ``` {r maine remove INF, warning= FALSE, message = FALSE} maine.df.log$Covid19.log[is.infinite(maine.df.log$Covid19.log)] <- NA ``` ## Arizona Model (Chia Shen) ```{r arizona, warning= FALSE, message = FALSE} Arizona.df <- filter(states.df, states.df$STATE == "Arizona") head(Arizona.df) AZ.df <- dplyr::select(Arizona.df, County, STATE, Urban, medInc, propWht, NDVI, PctPark, medAge, Arid, popDens, Covid19) head(AZ.df) AZ.s.df <- dplyr::select(Arizona.df, County, STATE, Urban, medIncome_scaled, propWht_scaled, NDVI_scaled, PctPark_scaled, medAge_scaled, AridIndex_scaled, popDens_scaled, Covid19) head(AZ.s.df) AZ.df[c(201,224),] AZ.df[AZ.df$log.Covid19 < 0.001,] ``` ### matrix ```{r matrix, warning= FALSE, message = FALSE} AZ.m.df <- dplyr::select(AZ.df, c(3,4,5,6,7,8,9,10,11)) ggpairs(AZ.m.df, columns = 1:9, title = "Original Variables Correlation Matrix", axisLabels = "show") AZ.m.s.df <- dplyr::select(AZ.s.df, c(3,4,5,6,7,8,9,10,11)) ggpairs(AZ.m.s.df, columns = 1:9, title = "Scaled Variables Correlation Matrix", axisLabels = "show") ``` ### model 1.1: original, log-transform some variables, remove NA ```{r log-transformed model, warning= FALSE, message = FALSE} AZ.df$log.Arid <- log(AZ.df$Arid) AZ.df$log.popDens <- log(AZ.df$popDens) AZ.df$log.Covid19 <- log(AZ.df$Covid19) AZ.m.df <- dplyr::select(AZ.df, c(3,4,5,6,7,8,12,13,14)) ggpairs(AZ.m.df, columns = 1:9, title = "Log-transformed Variables Correlation Matrix", axisLabels = "show") AZ.df[which(is.infinite(AZ.m.df$log.Covid19)),] AZ.m.df$log.Covid19[is.infinite(AZ.m.df$log.Covid19)] <- NA #There are actually Covid cases in these ZCTA, but somehow they are 0 in the data ggpairs(AZ.m.df, columns = 1:9, title = "Log-transformed Variables Correlation Matrix", axisLabels = "show") ``` ```{r Logged orginal model, warning= FALSE, message = FALSE} f1 <- log.Covid19 ~ Urban + medInc + propWht + NDVI + PctPark + medAge + log.Arid + log.popDens m1 <- lm(f1, data=AZ.m.df, na.action = na.exclude) #Exclude the data that has NA in it summary(m1) ``` ```{r M1.1 Assumption test, warning= FALSE, message = FALSE} vif(m1) plot(m1,1) plot(m1,4) #cook's distance ols_plot_cooksd_bar(m1) bptest(m1) #breusch-pagen test result reflects heteroskedasticiy problem coeftest(m1, vcov = vcovHC(m1, type = "HC1")) ``` ### model 1.2: original, log-transform some variables, remove NA, outliers ```{r inspect and exclude outlier in log data, warning= FALSE, message = FALSE} AZ.df[c(201,224),] #Yuma actually have much more White population, not 0 as the data indicates AZ.m.df.nol <- AZ.m.df[-c(224),] m1.nol <- lm(f1, data=AZ.m.df.nol) summary(m1.nol) AZ.vif <- vif(m1.nol) plot(m1.nol,1) ols_plot_cooksd_bar(m1.nol) #cook's distance bptest(m1.nol) AZ.bp <- data.frame( bp = as.numeric(c(35.24)), df = as.numeric(c(8)), "p-value" = as.numeric(c(0.00002)) ) AZ.bp %>% gt() %>% tab_options( table.width = pct(80)) %>% cols_width( everything() ~ px(40)) %>% tab_header( title = md("BP Test Result of Arizona Model")) AZ.vifp <- as.data.frame(t(as.data.frame(AZ.vif))) AZ.vifp %>% gt() %>% tab_options( table.width = pct(80)) %>% cols_width( everything() ~ px(80)) %>% tab_header( title = md("VIF of Arizona Model"))%>% fmt_number(columns = c(Urban), decimals = 2) %>% fmt_number(columns = c(medInc), decimals = 2) %>% fmt_number(columns = c(propWht), decimals = 2) %>% fmt_number(columns = c(NDVI), decimals = 2) %>% fmt_number(columns = c(PctPark), decimals = 2)%>% fmt_number(columns = c(medAge), decimals = 2) %>% fmt_number(columns = c(log.Arid), decimals = 2) %>% fmt_number(columns = c(log.popDens), decimals = 2) ``` ```{r robust standard error for m1.nol, warning= FALSE, message = FALSE} coeftest(m1.nol, vcov = vcovHC(m1.nol, type = "HC1")) #source: https://www.r-econometrics.com/methods/hcrobusterrors/ ``` ### model 2.1: scaled ```{r scaled model, warning= FALSE, message = FALSE} f2 <- Covid19 ~ Urban + medIncome_scaled + propWht_scaled + NDVI_scaled + PctPark_scaled + medAge_scaled + AridIndex_scaled + popDens_scaled m2 <- lm(f2, data=AZ.m.s.df) summary(m2) ``` ```{r Assumption test for scaled model AZ, warning= FALSE, message = FALSE} bptest(m2) vif(m2) plot(m2,1) plot(m2,4) #cook's distance AZ.df[224,] #Yuma's propWht is 0, which is not reasonable coeftest(m2, vcov = vcovHC(m1.nol, type = "HC1")) ``` ### model 2.2: scaled, remove outlier ```{r remove outlier, warning= FALSE, message = FALSE} AZ.m.s.df.nol <- AZ.m.s.df[-c(224),] m2.nol <- lm(f2, data=AZ.m.s.df.nol) summary(m2.nol) bptest(m2.nol) vif(m2.nol) plot(m2.nol,1) ols_plot_cooksd_bar(m2.nol) ``` ```{r robust standard error for m2.nol, warning= FALSE, message = FALSE} coeftest(m2.nol, vcov = vcovHC(m2.nol, type = "HC1")) ``` ```{r AIC comparison, warning= FALSE, message = FALSE} AIC_AZ <- AIC(m1.nol,m2,m2.nol) mr <- c("Logged Model Without Outlier","Scaled Model","Scaled Model Without Outlier") AIC_AZ <- cbind(AIC_AZ,mr) AIC_AZ = AIC_AZ[,c(3,1,2)] AIC_AZ ``` ```{r model comparison table, warning= FALSE, message = FALSE} AIC_AZ %>% gt() %>% tab_options( table.width = pct(80)) %>% cols_width( everything() ~ px(80)) %>% tab_header( title = md("AIC for Arizona Models")) %>% fmt_number(columns = c(df), decimals = 0) %>% fmt_number(columns = c(AIC), decimals = 2)%>% cols_label( mr = "Model") tab_model(m1.nol,m2,m2.nol, dv.labels = c("Logged Data without Outlier","Scaled Data", "Scaled Data without Outlier"), vcov.fun = "HC1", show.se = TRUE, CSS = list(css.table = '+font-family: Arial;')) #source:https://cran.r-project.org/web/packages/sjPlot/vignettes/tab_model_estimates.html #change the font: https://stackoverflow.com/questions/73103366/how-to-change-font-style-to-arial-for-tab-model-output ``` ## Jon Ekberg's Florida Model ### Scaled Variables ```{r scaling, warning= FALSE, message = FALSE} # Scale all variables states$NDVI_scaled <- as.vector(scale(states$NDVI)) states$PctPark_scaled <- as.vector(scale(states$PctPark)) states$popDens_scaled <- as.vector(scale(states$popDens)) states$medAge_scaled <- as.vector(scale(states$medAge)) states$Days_scaled <- as.vector(scale(states$Days)) states$propWht_scaled <- as.vector(scale(states$propWht)) states$propPOC_scaled <- as.vector(scale(states$propPOC)) states$medIncome_scaled <- as.vector(scale(states$medInc)) states$BUI_scaled <- as.vector(scale(as.numeric(states$BUI))) states$AridIndex_scaled <- as.vector(scale(states$Arid)) ``` ### Loading Dataframes ```{r, warning= FALSE, message = FALSE} table(states.df$STATE) ``` ```{r, warning= FALSE, message = FALSE} states.df<-states@data head(states.df) ``` ```{r, warning= FALSE, message = FALSE} Florida.df <- dplyr::filter(states.df, states.df$STATE == "Florida") head(Florida.df) ``` ```{r, warning= FALSE, message = FALSE} FL.df <- dplyr::select(Florida.df, County, STATE, Urban, medInc, propWht, NDVI, PctPark, medAge, Arid, popDens, Covid19) head(FL.df) ``` ```{r, warning= FALSE, message = FALSE} FL.m.df <- dplyr::select(Florida.df, Urban, medInc, propWht, NDVI, PctPark, medAge, Arid, popDens, Covid19) ``` ```{r, warning= FALSE, message = FALSE} FL.s.df <- dplyr::select(Florida.df, County, STATE, Urban, medIncome_scaled, propWht_scaled, NDVI_scaled, PctPark_scaled, medAge_scaled, AridIndex_scaled, popDens_scaled, Covid19) head(FL.s.df) ``` ```{r, warning= FALSE, message = FALSE} FL.m.s.df <- dplyr::select(Florida.df, Urban, medIncome_scaled, propWht_scaled, NDVI_scaled, PctPark_scaled, medAge_scaled, AridIndex_scaled, popDens_scaled, Covid19) head(FL.m.s.df) ``` library(GGally) ### Florida Scatterplot Matrix 1 (Original Variables) ```{r scatter matrix 1, warning= FALSE, message = FALSE} FL.Matrix.Original<-c("Urban","medInc","propWht","NDVI","PctPark","medAge","Arid","popDens","Covid19") ggpairs(FL.m.df,columns=FL.Matrix.Original,title="Florida Original Variables Correlation Matrix",axisLabels="show") ``` ![](https://i.imgur.com/rzU9vCm.jpg) ### Florida Scatterplot Matrix 2 (Scaled Variables) ```{r scatter matrix 2, warning= FALSE, message = FALSE} FL.Matrix.Scaled <-c("Urban","medIncome_scaled","propWht_scaled","NDVI_scaled","PctPark_scaled","medAge_scaled","AridIndex_scaled","popDens_scaled","Covid19") ggpairs(FL.m.s.df,columns=FL.Matrix.Scaled,title="Florida Scaled Variables Correlation Matrix",axisLabels="show") ``` ![](https://i.imgur.com/7dZGXTL.jpg) ### Florida Scatterplot Matrix 3 (Logged Variables) ```{r scatter matrix 3, warning= FALSE, message = FALSE} FL.df$log.popDens <- log(FL.df$popDens) FL.df$log.Covid19 <- log(FL.df$Covid19) FL.Matrix.Logged <-c("Urban","medInc","propWht","NDVI","PctPark","medAge","Arid","log.popDens","log.Covid19") ggpairs(FL.df,columns=FL.Matrix.Logged,title="Florida Logged Variables Correlation Matrix",axisLabels="show") FL.df$log.popDens[is.infinite(FL.df$log.popDens)] <-0 FL.df$log.Covid19[is.infinite(FL.df$log.Covid19)] <- NA ggpairs(FL.df, columns = FL.Matrix.Logged, title = "Florida Log-transformed Variables Correlation Matrix", axisLabels = "show") ``` ![](https://i.imgur.com/RLcedlR.png) ### Florida Model 1 (Original Variables) ```{r florida model 1, warning= FALSE, message = FALSE} FL.Model.Original<- lm(Covid19 ~ Urban+medInc+propWht+NDVI+PctPark+medAge+Arid+popDens, data=FL.df) summary(FL.Model.Original) ``` ![](https://i.imgur.com/ocGCKFU.png) ```{r florida model 1 residual vs fitted, warning= FALSE, message = FALSE} plot(FL.Model.Original,1) ``` ![](https://i.imgur.com/mWWGAmN.png) ```{r florida model 1 cooks distance, warning= FALSE, message = FALSE} plot(FL.Model.Original,4) ``` ![](https://i.imgur.com/PrnVsef.png) ```{r florida model 1 VIF test, warning= FALSE, message = FALSE} vif(FL.Model.Original) ``` ![](https://i.imgur.com/KMu8TkG.png) ```{r florida model 1 BP test, warning= FALSE, message = FALSE} bptest(FL.Model.Original) ``` ![](https://i.imgur.com/QLMKn1O.png) ### Florida Model 2 (Scaled Variables) ```{r florida model 2 scaled, warning= FALSE, message = FALSE} FL.Model.Scaled<- lm(Covid19 ~ Urban+medIncome_scaled+propWht_scaled+NDVI_scaled+PctPark_scaled+medAge_scaled+AridIndex_scaled+popDens_scaled, data=FL.m.s.df) summary(FL.Model.Scaled) ``` ![](https://i.imgur.com/jApAyn6.png) ```{r florida model 2 residual vs fitted, warning= FALSE, message = FALSE} plot(FL.Model.Scaled,1) ``` ![](https://i.imgur.com/sWA4y53.png) ```{r florida model 2 cooks distance, warning= FALSE, message = FALSE} plot(FL.Model.Scaled,4) ``` ![](https://i.imgur.com/edQQwwS.png) ```{r florida model 2 VIF test, warning= FALSE, message = FALSE} vif(FL.Model.Scaled) ``` ![](https://i.imgur.com/efymSoS.png) ```{r florida model 2 BP test, warning= FALSE, message = FALSE} bptest(FL.Model.Scaled) ``` ![](https://i.imgur.com/ldWtdEj.png) ### Florida Model 3 (Logged Some Original Variables) ```{r florida model logged, warning= FALSE, message = FALSE} Florida.model.logged<- lm(log.Covid19 ~ Urban+medInc+propWht+NDVI+PctPark+medAge+Arid+log.popDens, data=FL.df, na.action = na.exclude) summary(Florida.model.logged) ``` ![](https://i.imgur.com/U6l32ws.png) ```{r florida model 3 residual vs fitted, warning= FALSE, message = FALSE} plot(Florida.model.logged,1) ``` ![](https://i.imgur.com/QaoJacX.png) ```{r florida model 3 cooks distance, warning= FALSE, message = FALSE} plot(Florida.model.logged,4) ``` ![](https://i.imgur.com/bTbUlfO.png) ```{r florida model 3 VIF test, warning= FALSE, message = FALSE} FL.vif <- vif(Florida.model.logged) FL.vif ``` ![](https://i.imgur.com/BrOEffH.png) ```{r florida model 3 BP test, warning= FALSE, message = FALSE} bptest(Florida.model.logged) ``` ![](https://i.imgur.com/FWlC02d.png) ```{r florida combined bp vif, warning= FALSE, message = FALSE} FL.bp <- data.frame( bp = c(19.307), df = c(7), pvalue = c(0.007277) ) FL.bp %>% gt() %>% tab_options( table.width = pct(80)) %>% cols_width( everything() ~ px(40)) %>% tab_header( title = md("BP Test Result of Florida Model") ) FL.vifp <- as.data.frame(t(as.data.frame(FL.vif))) FL.vifp %>% gt() %>% tab_options( table.width = pct(80)) %>% cols_width( everything() ~ px(80)) %>% tab_header( title = md("VIF of Florida Model"))%>% fmt_number(columns = c(medInc), decimals = 2) %>% fmt_number(columns = c(propWht), decimals = 2) %>% fmt_number(columns = c(NDVI), decimals = 2) %>% fmt_number(columns = c(PctPark), decimals = 2)%>% fmt_number(columns = c(medAge), decimals = 2) %>% fmt_number(columns = c(Arid), decimals = 2) %>% fmt_number(columns = c(log.popDens), decimals = 2) ``` ![](https://i.imgur.com/uK3SSdh.png) ![](https://i.imgur.com/kPNLYhg.png) ### Florida AIC Comparison ```{r florida AIC, warning= FALSE, message = FALSE} AIC_FL <- AIC(FL.Model.Original,FL.Model.Scaled,Florida.model.logged) AIC_FL$model <- c("Original Data", "Scaled Data", "Log Transformed Data") AIC_FL <- relocate(AIC_FL,"model",) AIC_FL AIC_FL %>% gt() %>% tab_options( table.width = pct(80)) %>% cols_width( everything() ~ px(80)) %>% tab_header( title = md("AIC for Florida Models")) %>% fmt_number(columns = c(df), decimals = 0) %>% fmt_number(columns = c(AIC), decimals = 2)%>% cols_label( model = "Model") tab_model(FL.Model.Original,FL.Model.Scaled,Florida.model.logged, vcov.fun = "HC1", dv.labels=c("Original Data","Scaled Data","Log Transformed Data"),show.se = TRUE, CSS = list(css.table = '+font-family: Arial;')) #source:https://cran.r-project.org/web/packages/sjPlot/vignettes/tab_model_estimates.html #change the font: https://stackoverflow.com/questions/73103366/how-to-change-font-style-to-arial-for-tab-model-output ``` ![](https://i.imgur.com/ZG03f3v.png) ![](https://i.imgur.com/kgyoLBP.png) ## Maine Model (Simon) ### Scatter Matrixes ### Original Data ``` {r scatter matrix original maine, warning= FALSE, message = FALSE} columns.select.original<-c("Urban","medInc","propWht","NDVI","PctPark","medAge","Arid","popDens","Covid19") ggpairs(maine.df,columns=columns.select.original,title="",axisLabels="show") ``` ![](https://i.imgur.com/WPYAMqH.png) ### Scaled Data ``` {r scatter matrix scaled maine, warning= FALSE, message = FALSE} columns.select.s<-c("Urban","medIncome_scaled","propWht_scaled","NDVI_scaled","PctPark_scaled","medAge_scaled","AridIndex_scaled","popDens_scaled","Covid19") ggpairs(maine.df,columns=columns.select.s,title="",axisLabels="show") ``` ![](https://i.imgur.com/UB0axer.png) ### Logged Data ``` {r remove INF, warning= FALSE, message = FALSE} maine.df.log$Covid19.log[is.infinite(maine.df.log$Covid19.log)] <- NA ``` ``` {r scatter matrix logged maine, warning= FALSE, message = FALSE} columns.select.logged<-c("Urban","medInc","propWht","NDVI.log","PctPark","medAge","Arid","popDens.log","Covid19.log") ggpairs(maine.df.log,columns=columns.select.logged,title="",axisLabels="show") ``` ![](https://i.imgur.com/oDgxGqR.png) ### model 1: original data ``` {r linear model original maine, warning= FALSE, message = FALSE} model.original.ME<- lm(Covid19 ~ Urban+medInc+propWht+NDVI+PctPark+medAge+Arid+popDens, data=maine.df) summary(model.original.ME) ``` ![](https://i.imgur.com/oDvtavZ.png) ``` {r residual vs fitted original maine, warning= FALSE, message = FALSE} plot(model.original.ME,1) plot(model.original.ME,4) ``` ![](https://i.imgur.com/3V3hUi5.png) ![](https://i.imgur.com/hpFIS0S.png) ### model 2: scaled data ``` {r linear model scaled maine, warning= FALSE, message = FALSE} model.scaled.ME<- lm(Covid19 ~ Urban+medIncome_scaled+propWht_scaled+NDVI_scaled+PctPark_scaled+medAge_scaled+AridIndex_scaled+popDens_scaled, data=maine.df) summary(model.scaled.ME) ``` ![](https://i.imgur.com/eMiJ2G0.png) ``` {r residual vs fitted scaled maine, warning= FALSE, message = FALSE} plot(model.scaled.ME,1) plot(model.scaled.ME,4) ``` ![](https://i.imgur.com/FvkN95M.png) ![](https://i.imgur.com/G1PwolU.png) ### model 3: log-transformed data, excl. NA ``` {r linear model log transformed maine, warning= FALSE, message = FALSE} model.logged.ME<- lm(Covid19.log ~ Urban+medInc+propWht+NDVI.log+PctPark+medAge+Arid+popDens.log, data=maine.df.log, na.action = na.exclude) summary(model.logged.ME) ``` ![](https://i.imgur.com/4qB9dwS.png) ``` {r residuals vs fitted logged, warning= FALSE, message = FALSE} plot(model.logged.ME,1) plot(model.logged.ME,4) ``` ![](https://i.imgur.com/OL3hjTU.png) ![](https://i.imgur.com/ZethF9N.png) ``` {r bp test logged maine, warning= FALSE, message = FALSE} bptest(model.logged.ME) ``` ``` {r bp test table, warning= FALSE, message = FALSE} bp_ME <- data.frame( BP = c(10.59), df = c(8), "p-value" = c(0.2258) ) bp_ME %>% gt() %>% tab_options( table.width = pct(80)) %>% cols_width( everything() ~ px(80)) %>% tab_header( title = md("Breusch-Pagan Test for Maine Model")) ``` ``` {r vif test logged maine, warning= FALSE, message = FALSE} vif(model.logged.ME) ``` ``` {r vif test table, warning= FALSE, message = FALSE} vif_ME <- data.frame( Urban = c(1.65), medInc = c(1.46), propWht = c(1.33), NDVI.log = c(1.92), PctPark = c(1.12), medAge = c(1.38), Arid = c(1.60), popDens.log = c(2.74) ) vif_ME %>% gt() %>% tab_options( table.width = pct(80)) %>% cols_width( everything() ~ px(100)) %>% tab_header( title = md("VIF Test for Maine Model")) ``` ### Maine AIC Comparison ```{r maine AIC comparison, warning= FALSE, message = FALSE} AIC_ME <- AIC(model.original.ME,model.scaled.ME,model.logged.ME) AIC_ME$model <- c("Original Data", "Scaled Data", "Log Transformed Data") AIC_ME <- relocate(AIC_ME,"model",) AIC_ME ``` ```{r maine AIC comparison table, warning= FALSE, message = FALSE} AIC_ME %>% gt() %>% tab_options( table.width = pct(80)) %>% cols_width( everything() ~ px(80)) %>% tab_header( title = md("AIC for Maine Models")) %>% fmt_number(columns = c(df), decimals = 0) %>% fmt_number(columns = c(AIC), decimals = 2)%>% cols_label( model = "Model") tab_model(model.original.ME,model.scaled.ME,model.logged.ME, dv.labels = c("Original Data", "Scaled Data","Logged Data"), show.se = TRUE, CSS = list(css.table = '+font-family: Arial;')) ``` -->

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully