Studio 12: Visualizing Election Data

--- title: "Studio 12: Visualizing Election Data" layout: post label: studio tags: studio --- # CS100: Studio 12 ### Visualizing Election Data ##### December 7, 2022 ### Instructions During today’s studio, you will be creating data visualizations in R. Please write all of your code, and answers to the questions, in an R markdown document. ### Objectives By the end of this studio, you will know: - what FIPS is By the end of this studio, you will be able to: - plot all the counties in a single state, or the entire United States, colored by election results ### Data [Here](https://cs.brown.edu/courses/cs100/studios/data/9/nyt-election2016-county-result.csv) are the results of the 2016 Presidential Election, as reported by the New York Times. Read in the data as follows: ~~~ results <- read.csv('nyt-election2016-county-result.csv', na.strings = 'NULL') ~~~ Look over `results`. Observe the different observations and variables, as well the latters’ types. Your goal in today’s studio is to create a map of the US in which counties are colored either red or blue depending on how their constituents’ voted. (Blue for democrat, and red for Republican.) ### Mapping Election Results with the "maps" Package You’ll be visualizing the results of the 2016 presidential election with the `maps` package. Please install and load this package now: ~~~ install.packages("maps") library(maps) ~~~ To create a vector of the appropriate colors, you can use an `ifelse` statement. Here is the generic form of `ifelse`: ~~~ ifelse(test, yes, no) ~~~ Here, `test` is the predicate, `yes` is the value that should be returned if `test` is TRUE, and `no` is the values that should be returned if `test` is FALSE. In our case, the following `ifelse` statement creates the appropriate color vector: ~~~ county_color <- ifelse(results$PercentGOP2016 > results$PercentDEM2016, "red", "blue") ~~~ Seems like we’re on the right track. Let’s try plotting our results: ~~~ map('county', col = results$county_color, fill = TRUE, lty = 0) map('county', region = "Massachusetts", col = county_color, fill = TRUE, lty = 0) map('county', region = "Rhode Island", col = county_color, fill = TRUE, lty = 0) ~~~ Hmmm...something’s wrong. (Not a single county in Massachusetts went for Trump)[https://en.wikipedia.org/wiki/United_States_presidential_election_in_Massachusetts,_2016], and in Rhode Island, (all the coastal counties went for Hillary, while inland counties went for Trump)[http://www.providencejournal.com/news/20161112/two-rhode-islands-blue-state-red-state] (more or less mirroring the behavior of the country itself). The Federal Information Processing Standard, FIPS, assigns region, country, state, county, etc. codes to geographical locations for data processing. There are two FIPS databases in the maps package: `state.fips` and `county.fips`. The counties listed in `county.fips` are precisely those that are mapped by the `map` function. How many of these are there? (*Hint:* Use `nrow`.) Let’s make sure that we have the same number of counties in `results` as there are in `county.fips`. Use `nrow` to count the observations in `results`. Well, would you look at that? The data are not aligned! That’s at least part of the reason our maps didn’t look right. Let’s fix this problem. The `%in%` predicate tests whether a datum is an element of a vector, or tests whether the contents of a first vector appear in a second. For example, `1 %in% c(1, 2, 3)` evaluates to `TRUE`. Similarly, `c(-1, 1, 3) %in% c(1, 2,3)` evaluates to `[1] FALSE TRUE TRUE`. Apply `%in%` to the vectors `results$CountyFips` and `county.fips$fips`. Do it twice, so that you can determine which entries in the first vector are also in the second, and vice versa. It should not bother us too much if there are counties in `results` that do not also appear in `county.fips`. These counties won’t show up on our plots, but there are plenty of other counties that will, so we can safely ignore these missing data. On the other hand, we do have to figure out a way to handle counties that appear in `county.fips` for which we do not have results. Which county is present in `county.fips`, but missing from `results`? Store this information, so you can rectify this situation in a moment. The next step is to merge the results with `county.fips` on the `fips` variable. Rename this variable in `results`, and then merge as follows: ~~~ results <- results %>% rename(fips = CountyFips) merged_results <- merge(x = county.fips, y = results, by = "fips", all.x = TRUE) ~~~ In `merged_results`, there is one county without a color, since there were no results for this county. Assign this county the color `white`. What we would like to do at this point is map `county` coloring the counties according to the `county_color` variable we defined earlier: ~~~ map('county', col = merged_results$county_color, fill = TRUE, lty = 1) ~~~ Unfortunately, doing so again yields nonsense. For clarity, define `map_names` as follows: ~~~ map_names <- map('county', plot = FALSE)$names ~~~ This vector stores the names of all the counties in the order in which `map` expects them. The problem is, the counties (and hence their colors) are not listed in the same order in `merged_results` as they are in `map_names`. So, we need to reorder `county_color` before sending it off to `map`, so that the colors are listed in this order. To do so, we will create a `sorted_colors` vector which reorders all the counties’ colors so that they correspond to the order in which counties are listed in `map_names`. Below is code that accomplishes this task. It is *really* confusing, so feel free to call over a TA for an explanation. ~~~ map_index <- match(map_names, merged_results$polyname) sorted_colors <- merged_results$county_color[map_index] ~~~ And that’s it. At this point, you can create a map of the United States with all counties for which the NYT obtained results in the 2016 presidential election colored either red or blue: ~~~ map('county', col = sorted_colors, fill = TRUE, lty = 0) ~~~ If you want, you can insert a thick, black outline around each state: ~~~ map('state', col = 'black', fill = FALSE, lty = 1, lwd = 3, add = TRUE) ~~~ That’s a lot of information! To make sense of it all (and to verify that our map is now correct), let’s take a closer look at a few individual states. To do so, we follow the same procedure as above, but restrict the `region` to one state, say Kentucky: ~~~ map_names_state <- map('county', region = ‘Kentucky’, plot = FALSE)$names map_index_state <- match(map_names_state, merged_results$polyname) sorted_colors = merged_results$county_color[map_index_state] map('county', region = ‘Kentucky’, col = sorted_colors, fill = TRUE, lty = 1) map.cities(us.cities, minpop = 100000) ~~~ You’ll notice that Kentucky is primarily red but has 2 blue counties. This is largely a sign of the rural-urban divide: cities (even when they’re located in red states) tend to be bastions of Democratic support. Let’s see if we can verify this using the built-in `us.cities` database we covered earlier: `map.cities(us.cities, minpop = 100000)`. You should see that the 2 cities in Kentucky with a population of 100,000+ (Louisville and Lexington) are both in blue counties. Now create maps for some other states (e.g., Rhode Island and Massachusetts). See if they also show signs of the rural-urban divide. Discuss your overall findings with your partner. ### End of Studio When you finish, call over a TA to look at your your visualizations and check you off for coming to today’s studio.