Tidy Tuesdays Week 1: U.S. Measles Vaccination Data

To start please visit: https://github.com/BEES-Tidy-Tuesdays/home

You will find a link to this collaborative document called "Week 1 notes".

This is a collaborative markdown document: feel free to add, change, and improve it. We will upload the final document to github after this Tidy Tuesday session and use parts of it as a template for future sessions.

If something is unclear or doesn't make sense, fix it, or make a comment.

A. Preparation (~5-10 minutes)

A1. Make sure R and R studio are installed and running

A2. Download and start exploring the data

Please access the data here. Download and extract to a specified folder. We encourage you to start working by creating a new Rproject, and use best practices for file management.

To clone the data set from github using git in RStudio:

Select "New Project"
Select "Version control"
Select "Git"
Paste "https://github.com/WSJ/measles-data" as the URL, and select wher you want to clone the files to on your computer

N.B. You need git installed on your computer, you can download it here:

https://gitforwindows.org/ (Windows)
https://git-scm.com/download/mac (Mac)

A2. What are the files? What are the variables?

Files

The data is broken down into a few files:

File	Description
all-measles-rates.csv	Data for each individual school
state-overview.csv	More generalized data by state counties or state school districts
individual-states/[STATE].csv	Same data as all-measles-rates, but seperated by state

Variables/Attributes

Attribute	Description	Optional?
index
state	School's state
county	School's county	y
district	School's district	y
name	School name
type	Whether a school is public, private, charter	y
enroll	Enrollment*	y
mmr	School's Measles, Mumps, and Rubella (MMR) vaccination rate	y
overall	School's overall vaccination rate	y
xmed	Percentage of students exempted from vaccination for medical reasons	y
xper	Percentage of students exempted from vaccination for personal reasons	y
xrel	Percentage of students exempted from vaccination for religious reasons	y
lat	School latitude	(only in individual state files)
lng	School longitude	(only in individual state files)

A3. Pre-processing

Are the data 'tidy'?

All of the data (individual states, state overviews, and all measles rate) are tidy.
However, the state-overviews.csv, and the all-measles-rate.csv don't have the geographical location information.

What else do we need to do to make this data ready to use?

What does -1 mean?
- Sometimes datasets use "-1" or "-999" instead of NA
Do we need to filter out schools with missing data?

B. Think of questions we can ask from the data (10 minutes)

Talk to the people nearest you and brainstorm some questions we can ask with this dataset. What could this data tell us? What are some interesting questions we could ask? How do you plan to visualise it?

Question ideas

Are there different rates of vaccinations betwen different types of school?

C. Some plots we've done (30 minutes)

Preprocessing code

Read all data from the individual-states

library(tidyverse)

ind_stats <-
    list.files(path = "individual-states/", # locate the folder
    pattern = "*.csv", # identify the type of files we want to read
    full.names = T) %>% # tell R to give us the whole directory
  map_df(~read_csv(., col_types = cols(.default = "c")))
  
View(ind_stats)

Can we map the where the missing data are?

ind_stats %>% 
  sample_frac(0.2) %>% # random sample just 20% of the data
naniar::vis_miss() # map the missing data

Q1 Do vaccination rates go up over time? (Alex)

Answer

ggplot(data_rates, aes(year, mmr))+
  geom_point()

Hmm, doesn't seem to show much. Only 3 time points.

Q2 Let's see the vaccination rates in the different states (Gian)

Answer

ggplot(state_overviews, aes(fct_reorder(state, mmr), mmr)) +
  geom_boxplot() +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) # rotate x axis label

Q3 Does the type of school have an effect on vaccination rates

library(ggbeeswarm)
ind_stats %>% 
  drop_na(type,mmr) %>% 
  mutate(mmr = as.numeric(mmr)) %>% 
  ggplot(., aes(x = fct_reorder(type, (100-mmr)), y = (100-mmr))) +
  geom_quasirandom() +
  theme_bw() +
  scale_y_log10()+
  xlab("School Type")+
  ylab("Vaccination Rate")

note that I had a problem with question 4 plotting because mmr was a character variable - need to change this when reading code (see last line):

ind_stats <-
    list.files(path = "individual-states/",
    pattern = "*.csv", full.names = T) %>%
  map_df(~read_csv(., col_types = cols(.default = "c"))) %>%
  mutate(mmr=as.integer(mmr))

D. Wrap up (10 minutes)

Cool R things I learnt this week

What could be improved from the next meeting?

get some fooood

E Help! - ask questions about today's Tidy Tuesday here

Q. How do I change the background?

A. Use theme() https://ggplot2.tidyverse.org/reference/theme.html

F General R help - Stuck on something? Need advice on you current project? Can you help answer someone else's question?

(N.B. this document is public, so don't include sensitive or private information)

Q. How do I use gganimate to transition between two 'sets' of columns? - Alex

Example columns I have: Species, Current.Temp, Future.Temp, Current.Rain, Future.Rain, Current.Risk, Future.Risk

Want something like:

ggplot(data, (x = Current.Temp, y = Current.Temp, colour = Current.Risk)) +
geom_point()

transitioning to:

ggplot(data, (x = Future.Temp, y = Future.Temp, colour = Future.Risk)) +
geom_point()

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.