Tidy Tuesdays Week 1: U.S. Measles Vaccination Data

To start please visit: https://github.com/BEES-Tidy-Tuesdays/home

You will find a link to this collaborative document called "Week 1 notes".

This is a collaborative markdown document: feel free to add, change, and improve it. We will upload the final document to github after this Tidy Tuesday session and use parts of it as a template for future sessions.

If something is unclear or doesn't make sense, fix it, or make a comment.

A. Preparation (~5-10 minutes)

A1. Make sure R and R studio are installed and running

A2. Download and start exploring the data

Please access the data here. Download and extract to a specified folder. We encourage you to start working by creating a new Rproject, and use best practices for file management.

To clone the data set from github using git in RStudio:

  1. Select "New Project"
  2. Select "Version control"
  3. Select "Git"
  4. Paste "https://github.com/WSJ/measles-data" as the URL, and select wher you want to clone the files to on your computer

N.B. You need git installed on your computer, you can download it here:

A2. What are the files? What are the variables?

Files

The data is broken down into a few files:

File Description
all-measles-rates.csv Data for each individual school
state-overview.csv More generalized data by state counties or state school districts
individual-states/[STATE].csv Same data as all-measles-rates, but seperated by state

Variables/Attributes

Attribute Description Optional?
index
state School's state
county School's county y
district School's district y
name School name
type Whether a school is public, private, charter y
enroll Enrollment* y
mmr School's Measles, Mumps, and Rubella (MMR) vaccination rate y
overall School's overall vaccination rate y
xmed Percentage of students exempted from vaccination for medical reasons y
xper Percentage of students exempted from vaccination for personal reasons y
xrel Percentage of students exempted from vaccination for religious reasons y
lat School latitude (only in individual state files)
lng School longitude (only in individual state files)

A3. Pre-processing

  1. Are the data 'tidy'?
  • All of the data (individual states, state overviews, and all measles rate) are tidy.
  • However, the state-overviews.csv, and the all-measles-rate.csv don't have the geographical location information.
  1. What else do we need to do to make this data ready to use?
  • What does -1 mean?
    • Sometimes datasets use "-1" or "-999" instead of NA
  • Do we need to filter out schools with missing data?

B. Think of questions we can ask from the data (10 minutes)

Talk to the people nearest you and brainstorm some questions we can ask with this dataset. What could this data tell us? What are some interesting questions we could ask? How do you plan to visualise it?

Question ideas

  • Are there different rates of vaccinations betwen different types of school?

C. Some plots we've done (30 minutes)

Preprocessing code

Read all data from the individual-states

library(tidyverse)

ind_stats <-
    list.files(path = "individual-states/", # locate the folder
    pattern = "*.csv", # identify the type of files we want to read
    full.names = T) %>% # tell R to give us the whole directory
  map_df(~read_csv(., col_types = cols(.default = "c")))
  
View(ind_stats)

Can we map the where the missing data are?

ind_stats %>% 
  sample_frac(0.2) %>% # random sample just 20% of the data
naniar::vis_miss() # map the missing data

Q1 Do vaccination rates go up over time? (Alex)

Answer
ggplot(data_rates, aes(year, mmr))+
  geom_point()

Hmm, doesn't seem to show much. Only 3 time points.

Q2 Let's see the vaccination rates in the different states (Gian)

Answer
ggplot(state_overviews, aes(fct_reorder(state, mmr), mmr)) +
  geom_boxplot() +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) # rotate x axis label

Q3 Does the type of school have an effect on vaccination rates

library(ggbeeswarm)
ind_stats %>% 
  drop_na(type,mmr) %>% 
  mutate(mmr = as.numeric(mmr)) %>% 
  ggplot(., aes(x = fct_reorder(type, (100-mmr)), y = (100-mmr))) +
  geom_quasirandom() +
  theme_bw() +
  scale_y_log10()+
  xlab("School Type")+
  ylab("Vaccination Rate")

note that I had a problem with question 4 plotting because mmr was a character variable - need to change this when reading code (see last line):

ind_stats <-
    list.files(path = "individual-states/",
    pattern = "*.csv", full.names = T) %>%
  map_df(~read_csv(., col_types = cols(.default = "c"))) %>%
  mutate(mmr=as.integer(mmr))

D. Wrap up (10 minutes)

Cool R things I learnt this week

What could be improved from the next meeting?

  • get some fooood

E Help! - ask questions about today's Tidy Tuesday here

Q. How do I change the background?

A. Use theme() https://ggplot2.tidyverse.org/reference/theme.html

F General R help - Stuck on something? Need advice on you current project? Can you help answer someone else's question?

(N.B. this document is public, so don't include sensitive or private information)

Q. How do I use gganimate to transition between two 'sets' of columns? - Alex

Example columns I have: Species, Current.Temp, Future.Temp, Current.Rain, Future.Rain, Current.Risk, Future.Risk

Want something like:

ggplot(data, (x = Current.Temp, y = Current.Temp, colour = Current.Risk)) +
geom_point()

transitioning to:

ggplot(data, (x = Future.Temp, y = Future.Temp, colour = Future.Risk)) +
geom_point()

A.

Select a repo