https://ucsdlib.github.io/2020-02-04-UCSDsocsci/
If you'd like this workshop to show up on your co-curricular record, follow the instructions in the CCR portal, then email Stephanie (slabou@ucsd.edu) two items:
Stephanie Labou (Library) - slabou@ucsd.edu
Reid Otsuji (Library) - rotsuji@ucsd.edu
Rick Mccosh (OBGYN/Reprod Sci)
Justin Shaffer (Pediatrics)
This is the collaborative notes for the workshop.
Please type your notes here!
Blue - "i need help"
Orange - "i'm good!"
Full Name, Affiliation (Faculty, student, post-doc, staff), department or Lab
Reid Otsuji, Librarian, Library
Rick McCosh, Postdoc, OB/GYN + Repro. Sci
Ryan Johnson, Librarian, Library
Stephanie Labou, Librarian, Library
Leonidas Mylonakis, Alumni, History
Veronica Hoyo, Staff, ACTRI-Health Sciences
Evie Xinqi Guo, Graduate Student, Experimental Psychology
Alexandre Gomide, Visiting Scholar, GPS-UCSD
Julian Borba, Visiting Scholar, CILAS-UCSD
Peipei Zhu, post-doc, Croker Lab, Pediatrics
Danielle Fritts, Graduate Student, Visiting
Rita Kuckertz, Visiting Graduate Student
Bryan Kehr, Library IT
Lesson information:
https://datacarpentry.org/spreadsheets-socialsci/
https://ndownloader.figshare.com/articles/6262019/versions/4
https://datacarpentry.org/openrefine-socialsci/setup.html
Download and unzip on your desktop:
https://librarycarpentry.org/lc-shell/data/shell-lesson.zip
Git Bash for Windows setup:
https://librarycarpentry.org/lc-shell/setup.html
Exercise: what's wrong with the spreadsheet?
Problems identified:
How to clean up:
Consistent values
be explicit - clean up will create more values in the spreadsheet - that's ok R can handle it.
R likes 'NA' for missing values
R can handle column names of various lengths but does not like spaces or numbers at the beginning (use underscore:_
)
Common formating problems:
using multiple tables and/or tabs
not filling in zeros
confusing field names
inconsistent column names
Excel stores dates as numbers
Tip for dates in Excel: separate dates in to columns (year, month, day)
To import .zip
or .tar
projects use the import project
command
why use open refine:
Data is often very messy. OpenRefine provides a set of tools to allow you to identify and amend the messy data.
It is important to know what you did to your data. Additionally, journals, granting agencies, and other institutions are requiring documentation of the steps you took when working with your data. With OpenRefine, you can capture all actions applied to your raw data and share them with your publication as supplemental material.
All actions are easily reversed in OpenRefine.
If you save your work it will be to a new file. OpenRefine always uses a copy of your data and does not modify your original dataset.
Data cleaning steps often need repeating with multiple files. OpenRefine keeps track of all of your actions and allows them to be applied to different datasets.
Some concepts such as clustering algorithms are quite complex, but OpenRefine makes it easy to introduce them, use them, and show their power.
for more OpenRefine information:
google openrefine libguide
You can find out a lot more about OpenRefine at http://openrefine.org
More about GREL:
https://github.com/OpenRefine/OpenRefine/wiki/GREL-Functions
Data Sharing SNAFU youtube video:
https://www.youtube.com/watch?v=66oNv_DJuPc
ICPSR (social science data repository - great for finding data, can also deposit data): https://www.icpsr.umich.edu/icpsrweb/
UCSD's reserach data curation group in the library: https://library.ucsd.edu/research-and-collections/data-curation/
R package for non-English language text mining: https://www.bnosac.be/index.php/blog/72-natural-language-processing-for-non-english-languages-with-udpipe
Full Name, Affiliation (Faculty, student, post-doc, staff), department or Lab
Reid Otsuji, Librarian, Library
Rick McCosh, Postdoc, OB/GYN + Repro. Sci.
Stephanie Labou, Librarian, Library
Veronica Hoyo, Staff, ACTRI-Health Sciences
Peipei Zhu, Postdoc, Pediatrics
Leonidas Mylonakis, Alumni, History
Alexandre Gomide, Visiting Scholar, GPS-UCSD
Danielle Fritts, Visiting Graduate Student
Justin Shaffer, Postdoc, Pediatrics
Bryan Kehr, Library IT
Rita Kuckertz, Visiting Graduate Student
Evie Xinqi Guo, Graduate Student, Experimental Psychology
https://datacarpentry.org/r-socialsci/
https://ndownloader.figshare.com/articles/6262019/versions/4
https://datacarpentry.org/socialsci-workshop/data/
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Logic.html
https://vita.had.co.nz/papers/tidy-data.pdf
http://docs.ggplot2.org/current/ggtheme.html
https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf
Assigning varialbes:
assign variables using the <-
assignment function
be thoughtful about variable naming:
varialbe name styles:
Examples:
sqrt() # square root fucntion
b <- sqrt(2) #assign fuctions to variables
round(3.1415)
args(round)
?round #for function help
Function errors will display in the console.
Examples:
hh_members <- c(3,7,10, 6) # c() means conacatenate and used to create vectors
hh_wall_type <- c("mauddaub","burntbricks", "sunbricks")
length(hh_members)
length(hh_wall_type)
class(hh_members)
class(hh_wall_type)
R data types:
Tip: R is indexed beginning at 1
Tip: remember R is case sensitive - be consistent
possessions <- c("bike","radio", "tv")
possessions <- c(posessions, "cellPhone")
possessions <-
Data structures
num_char <- c(1,2,3,"a")
class(num_char)
tricky <- c(1,2,3, "4")
num <- c(1,2,3)
class(num)
int <- as.integer(num)
class(int)
# as.integer()
# as.character()
# as.numeric()
hh_wall_type
hh_wall_type[2]
hh_wall_type[c(1,3)]
hh_wall_type[1,3]
more_wall_types <- hh_wall_type[c(1,2,3,3,4,1)]
more_wall_types
hh_members
hh_members > 5
hh_members[hh_members >5]
possessions
possessions[ possessions == "tv" | possessions == "bike"] # search for one item at a time
possessions %in% c("car", "bicycle", "motorcycles", "boat") # search for multiple items at once
rooms <- c(2,1,1,NA,4)
mean(rooms)
max(rooms)
mean(rooms, na.rm = TRUE)
args(mean)
is.na(rooms)
rooms[!is.na(rooms)]
install.packages("tidyverse")
library(tidyverse)
interviews <- read_csv("SAFI_clean.csv")
View(interviews)
head(interviews)
tail(interviews)
class(interview)
str(interviews)
nrow(interviews)
ncol(interviews)
names(interviews)
summary(interviews)
interviews[1,6] # output as a tibble (data frame)
interviews[1]
interviews[[1]] #output as a vector
interviews[1:6, ] #all of the columns and 1 - 6 rows
interviews[4:7, ] # subset
interviews[ , -1]
interviews["village"]
interviews[["village"]]
interview$village # use $ to access a column e.g. dataFrameName$columnName
village
# factors
respondent_floor_type <- factor(c("earch","cement","cement","earth"))
class(respondent_floor_type)
levels(respondent_floor_type)
levels(respondent_floor_type)
unique(respondent_floor_type)
respondent_floor_type
as.character(respondent_floor_type)
memb_assoc <- interviews$memb_assoc
memb_assoc <- as.factor(memb_assoc)
memb_assoc
plot(memb_assoc)
levels(memb_assoc)
R and rpackages likes date formats in yyy-mm-dd
install.packages("lubridate") #if you need to install the package
library(lubridate)
dates <- interviews$interview_date
interviews$day <- day(dates)
interviews$month <- month(dates)
interviews$year <- year(dates)
View(interviews)
select (interviews, village, no_members, years_liv)
filter(interviews, village == "god")
interviews_god <- select(interviews2, no_members, years_liv)
''
interviews2 <- filter(interviews, village == "god")
interviews2
interviews_god <- select(interviews2, no_members, years_live)
interviews_god
dates <- interviews$interview_date
interviews$day <- day(dates)
interviews$month <- month(dates)
interviews$year <- year(dates)
interviews$key_ID <- day(dates)
View(interviews)
interviews2 <- filter(interviews, village == "god")
interviews2
interviews_god <- select(interviews2, no_membrs, years_liv)
interviews_god
interviews_god <- select(filter(interviews, village == "God"), no_membrs, years_liv)
intervews_god <- interviews %>%
filter(village =="God") %>%
select(no_membrs, years_liv)
interviews_god
Exercise solution:
interviews %>%
filter(memb_assoc == "yes") %>%
select(affect_conflicts, liv_count, no_meals)
new_variables <- interviews %>%
mutate(people_per_room = no_membrs / rooms)
view(new_variable)
interviews %>%
filter(!is.na(member_assoc)) %>%
mutate(people_per_room = no_membrs / rooms)
Exercise solution:
interviews %>%
mutate(total_meals = no_membrs * no_meals) %>%
select(village, total_meals)
interviews_total_meals <- interviews %>%
mutate(total_meals = no_membrs * no_meals) %>%
filter(total_meals > 20) %>%
select(village, total_meals)
View(interviews_total_meals)
Average household size by village
interviews %>%
group_by(village) %>%
summarize(mean_no_membrs = mean(no_membrs))
interviews %>%
filter(!is.na(memb_assoc)) %>%
group_by(village, memb_assoc) %>%
summarize(mean_no_membrs = mean(no_membrs))
interviews %>%
filter(!is.na(memb_assoc)) %>%
group_by(village, memb_assoc) %>%
summarize(mean_no_membrs = mean(no_membrs), min_membrs = min(no_membrs))
interviews %>%
filter(!is.na(memb_assoc)) %>%
group_by(village, memb_assoc) %>%
summarize(mean_no_membrs = mean(no_membrs), min_membrs = min(no_membrs)) %>%
arrange(desc(min_membrs))
interviews %>%
count(village)
interviews %>%
count(village, sort = TRUE)
Exercise solution
interviews %>%
group_by(village) %>%
summarize(mean_no_membrs = mean = no_membrs,
min_no_membrs = min(no_membrs),
max_no_membrs = max(no_membrs),
n = n()
)
interviews_spread <- interviews %>%
mutate(wall_type_logical = TRUE) %>%
spread(key = respondent_wall_type, value = wall_type_logical, fill = FALSE)
interviews_gather <- interviews_spread %>%
gather(key = respondent_wall_type, value = "wall_type_logical",
burntbricks:sunbricks)%>%
filter(wall_type_logical) %>%
select(-wall_type_logical)
# each cell should only have a single data
interviews_items_owned <- interviews %>%
separate_rows(items_owned, sep=";") %>%
mutate(items_owned_logical = TRUE) %>%
spread(key = items_owned, value = items_owned_logical, fill = FALSE)
View(interviews_items_owned)
nrow(interviews_items_owned)
interviews_items_owned %>%
filter(bicycle) %>%
group_by(village) %>%
count(bicycle)
interviews_items_owned %>%
mutate(number_items = rowSums(select(., bicycle:television))) %>%
group_by(village) %>%
summarize(mean_items = mean(number_items))
interviews_items_owned %>%
mutate(number_items = rowSums(select(., bicycle:television))) %>%
group_by(village) %>%
summarize(mean_items = mean(number_items))
Exercise Solution part 1:
interviews_months_lack_food <- interviews %>%
separate_rows(months_lack_food, sep=";") %>%
mutate(months_lack_food_logical = TRUE) %>%
spread(key = months_lack_food, value = months_lack_food_logical, fill = FALSE)
Exercise Solution part 2:
How many months (on average) were respondents without food if they did belong to an irrigation association? What about if they didn’t?
interviews_months_lack_food %>%
mutate(number_months = rowSums(select(., Apr:Sept))) %>%
group_by(memb_assoc) %>%
summarize(mean_months = mean(number_months))
# This is what to export for plotting:
interviews_plotting <- interviews %>%
## spread data by items_owned
separate_rows(items_owned, sep=";") %>%
mutate(items_owned_logical = TRUE) %>%
spread(key = items_owned, value = items_owned_logical, fill = FALSE) %>%
rename(no_listed_items = `<NA>`) %>%
## spread data by months_lack_food
separate_rows(months_lack_food, sep=";") %>%
mutate(months_lack_food_logical = TRUE) %>%
spread(key = months_lack_food, value = months_lack_food_logical, fill = FALSE) %>%
## add some summary columns
mutate(number_months_lack_food = rowSums(select(., Apr:Sept))) %>%
mutate(number_items = rowSums(select(., bicycle:television)))
# saving to .csv
write_csv(interviews_plotting, path = "data_output/interviews_plotting.csv")
library(tidyverse)
ggplot(data = interviews_plotting)
ggplot(data = interviews_plotting, aes(x = no_membrs, y = number_items))
ggplot(data = interviews_plotting, aes(x = no_membrs, y = number_items)) + geom_point()
assign ggplot to a variable
ggplot(data = interviews_plotting, aes(x = no_membrs, y = number_items))
now you can use the variable and change geom_
interviews_plot +
geom_point()
https://datacarpentry.org/r-socialsci/04-ggplot2/index.html
#######################################################################################################################
#######################################################################################################################
#######################################################################################################################
#######################################################################################################################
#######################################################################################################################
getwd()
setwd("~/Google-Drive-UCSD/R/2020_carpentries_social_sci/")
#######################################################################################################################
install.packages("tidyverse")
install.packages("tidyr")
install.packages("ggplot2")
install.packages("plyr")
install.packages("dplyr")
library(tidyverse)
library(ggplot2)
#######################################################################################################################
interviews <- read_csv("data/SAFI_clean.csv", na = "NULL")
#######################################################################################################################
interviews
View(interviews)
class(interviews)
dim(interviews)
nrow(interviews)
ncol(interviews)
head(interviews)
tail(interviews)
names(interviews)
str(interviews)
summary(interviews)
#######################################################################################################################
interviews[1, 1]
interviews[1, 6]
interviews[[1]]
interviews[1]
interviews[1:3, 7]
interviews[3, ]
interviews[, -1]
interviews[-c(7:131), ]
interviews["village"] # Result is a data frame
interviews[, "village"] # Result is a data frame
interviews[["village"]] # Result is a vector
interviews$village # Result is a vector
#######################################################################################################################
interviews_100 <- interviews[100, ]
n_rows
to improve readability and reduce duplicationn_rows <- nrow(interviews)
interviews_last <- interviews[n_rows, ]
interviews_middle <- interviews[(n_rows / 2), ]
interviews_head <- interviews[-(7:n_rows), ]
#######################################################################################################################
respondent_floor_type <- factor(c("earth", "cement", "cement", "earth"))
levels(respondent_floor_type)
nlevels(respondent_floor_type)
respondent_floor_type
respondent_floor_type <- factor(respondent_floor_type, levels = c("earth", "cement"))
respondent_floor_type
levels(respondent_floor_type)
levels(respondent_floor_type)[2] <- "brick"
levels(respondent_floor_type)
respondent_floor_type
#######################################################################################################################
as.character(respondent_floor_type)
year_fct <- factor(c(1990, 1983, 1977, 1998, 1990))
as.numeric(year_fct)
as.numeric(as.character(year_fct))
as.numeric(levels(year_fct))[year_fct]
#######################################################################################################################
memb_assoc <- interviews$memb_assoc
memb_assoc <- as.factor(memb_assoc)
memb_assoc
plot(memb_assoc)
memb_assoc <- interviews$memb_assoc
memb_assoc[is.na(memb_assoc)] <- "undetermined"
memb_assoc <- as.factor(memb_assoc)
memb_assoc
plot(memb_assoc)
#######################################################################################################################
levels(memb_assoc)
levels(memb_assoc) <- c("No", "Undetermined", "Yes")
memb_assoc <- factor(memb_assoc, levels = c("No", "Yes", "Undetermined"))
plot(memb_assoc)
##############################################################################################################################################################################################################################################
#######################################################################################################################
#######################################################################################################################
library(tidyverse)
interviews <- read_csv("data/SAFI_clean.csv", na = "NULL")
interviews
View(interviews)
#######################################################################################################################
select(interviews, village, no_membrs, years_liv)
filter(interviews, village == "God")
#######################################################################################################################
interviews2 <- filter(interviews, village == "God")
interviews2
interviews_god <- select(interviews2, no_membrs, years_liv)
interviews_god
interviews_god <- select(filter(interviews, village == "God"), no_membrs, years_liv)
interviews_god <- interviews %>%
filter(village == "God") %>%
select(no_membrs, years_liv)
interviews_god
#######################################################################################################################
#######################################################################################################################
interviews %>%
filter(memb_assoc == "yes") %>%
select(affect_conflicts, liv_count, no_meals)
#######################################################################################################################
new_variable <- interviews %>%
mutate(people_per_room = no_membrs / rooms)
View(new_variable)
interviews %>%
filter(!is.na(memb_assoc)) %>%
mutate(people_per_room = no_membrs / rooms)
#######################################################################################################################
interviews %>%
mutate(total_meals = no_membrs * no_meals) %>%
select(village, total_meals)
interviews_total_meals <- interviews %>%
mutate(total_meals = no_membrs * no_meals) %>%
filter(total_meals > 20) %>%
select(village, total_meals)
View(interviews_total_meals)
#######################################################################################################################
interviews %>%
group_by(village) %>%
summarize(mean_no_membrs = mean(no_membrs))
interviews %>%
group_by(village, memb_assoc) %>%
summarize(mean_no_membrs = mean(no_membrs))
interviews %>%
filter(!is.na(memb_assoc)) %>%
group_by(village, memb_assoc) %>%
summarize(mean_no_membrs = mean(no_membrs))
interviews %>%
filter(!is.na(memb_assoc)) %>%
group_by(village, memb_assoc) %>%
summarize(mean_no_membrs = mean(no_membrs), min_membrs = min(no_membrs))
interviews %>%
filter(!is.na(memb_assoc)) %>%
group_by(village, memb_assoc) %>%
summarize(mean_no_membrs = mean(no_membrs), min_membrs = min(no_membrs)) %>%
arrange(min_membrs)
interviews_new <- interviews %>%
filter(!is.na(memb_assoc)) %>%
group_by(village, memb_assoc) %>%
summarize(mean_no_membrs = mean(no_membrs), min_membrs = min(no_membrs)) %>%
arrange(desc(min_membrs))
View(interviews_new)
#######################################################################################################################
interviews %>%
count(village)
interviews %>%
count(village, sort = TRUE) %>%
#######################################################################################################################
interviews %>%
count(no_meals)
interviews %>%
group_by(village) %>%
summarize(
mean_no_membrs = mean(no_membrs),
min_no_membrs = min(no_membrs),
max_no_membrs = max(no_membrs),
n = n()
)
interviews %>%
group_by(village) %>%
summarize(
mean_no_membrs = mean(no_membrs),
min_no_membrs = min(no_membrs),
max_no_membrs = max(no_membrs),
n = n()
)
#######################################################################################################################
interviews_spread <- interviews %>%
mutate(wall_type_logical = TRUE) %>%
spread(key = respondent_wall_type, value = wall_type_logical, fill = FALSE)
View(interviews_spread)
interviews_gather <- interviews_spread %>%
gather(key = respondent_wall_type, value = "wall_type_logical", burntbricks:sunbricks)
View(interviews_gather)
interviews_gather <- interviews_spread %>%
gather(key = "respondent_wall_type", value = "wall_type_logical",
burntbricks:sunbricks) %>%
filter(wall_type_logical)
select(-wall_type_logical)
View(interviews_gather)
#######################################################################################################################
View(interviews)
str(interviews$items_owned)
interviews_items_owned <- interviews %>%
separate_rows(items_owned, sep=";") %>%
mutate(items_owned_logical = TRUE) %>%
spread(key = items_owned, value = items_owned_logical, fill = FALSE)
View(interviews_items_owned)
nrow(interviews_items_owned)
interviews_items_owned <- interviews_items_owned %>%
rename(no_listed_items = <NA>
)
interviews_items_owned %>%
filter(computer) %>%
group_by(village) %>%
count(computer)
interviews_items_owned %>%
mutate(number_items = rowSums(select(., bicycle:television))) %>%
group_by(village) %>%
summarize(mean_items = mean(number_items))
#######################################################################################################################
interviews_months_lack_food <- interviews %>%
separate_rows(months_lack_food, sep=";") %>%
mutate(months_lack_food_logical = TRUE) %>%
spread(key = months_lack_food, value = months_lack_food_logical, fill = FALSE)
View(interviews_months_lack_food)
interviews_months_lack_food %>%
mutate(number_months = rowSums(select(., Apr:Sept))) %>%
group_by(memb_assoc) %>%
summarize(mean_months = mean(number_months))
#######################################################################################################################
interviews_plotting <- interviews %>%
separate_rows(items_owned, sep=";") %>%
mutate(items_owned_logical = TRUE) %>%
spread(key = items_owned, value = items_owned_logical, fill = FALSE) %>%
rename(no_listed_items = <NA>
) %>%
separate_rows(months_lack_food, sep=";") %>%
mutate(months_lack_food_logical = TRUE) %>%
spread(key = months_lack_food, value = months_lack_food_logical, fill = FALSE) %>%
mutate(number_months_lack_food = rowSums(select(., Apr:Sept))) %>%
mutate(number_items = rowSums(select(., bicycle:television)))
write_csv(interviews_plotting, path = "data_output/interviews_plotting.csv")
##############################################################################################################################################################################################################################################
#######################################################################################################################
#######################################################################################################################
library(tidyverse)
ggplot(data = interviews_plotting)
ggplot(data = interviews_plotting, aes(x = no_membrs, y = number_items))
ggplot(data = interviews_plotting, aes(x = no_membrs, y = number_items)) +
geom_point()
interviews_plot <- ggplot(data = interviews_plotting, aes(x = no_membrs, y = number_items))
interviews_plot +
geom_point()
interviews_plot +
geom_point()
interviews_plot
ggplot(data = interviews_plotting, aes(x = no_membrs, y = number_items)) +
geom_point()
ggplot(data = interviews_plotting, aes(x = no_membrs, y = number_items)) +
geom_point(alpha = 0.5)
ggplot(data = interviews_plotting, aes(x = no_membrs, y = number_items)) +
geom_jitter(alpha = 0.5)
ggplot(data = interviews_plotting, aes(x = no_membrs, y = number_items)) +
geom_jitter(alpha = 0.7, color = "blue")
ggplot(data = interviews_plotting, aes(x = no_membrs, y = number_items)) +
geom_jitter(aes(color = village), alpha = 0.5)
ggplot(data = interviews_plotting, aes(x = village, y = rooms)) +
geom_jitter(aes(color = respondent_wall_type), alpha = 0.5)
ggplot(data = interviews_plotting, aes(x = village, y = rooms)) +
geom_jitter(aes(color = respondent_wall_type))
#######################################################################################################################
ggplot(data = interviews_plotting, aes(x = respondent_wall_type, y = rooms)) +
geom_boxplot()
ggplot(data = interviews_plotting, aes(x = respondent_wall_type, y = rooms)) +
geom_boxplot(alpha = 0) +
geom_jitter(alpha = 0.5, color = "tomato")
ggplot(data = interviews_plotting, aes(x = respondent_wall_type, y = rooms)) +
geom_jitter(alpha = 0.5, color = "tomato") +
geom_boxplot()
#######################################################################################################################
ggplot(data = interviews_plotting, aes(x = respondent_wall_type, y = rooms)) +
geom_violin(alpha = 0) +
geom_jitter(alpha = 0.5, color = "tomato")
ggplot(data = interviews_plotting, aes(x = respondent_wall_type, y = liv_count)) +
geom_boxplot(alpha = 0) +
geom_jitter(alpha = 0.5)
ggplot(data = interviews_plotting, aes(x = respondent_wall_type, y = liv_count)) +
geom_boxplot(alpha = 0) +
geom_jitter(aes(alpha = 0.5, shape = memb_assoc))
#######################################################################################################################
ggplot(data = interviews_plotting, aes(x = respondent_wall_type)) +
geom_bar()
ggplot(data = interviews_plotting, aes(x = respondent_wall_type)) +
geom_bar(aes(fill = village))
ggplot(data = interviews_plotting, aes(x = respondent_wall_type)) +
geom_bar(aes(fill = village), position = "dodge")
percent_wall_type <- interviews_plotting %>%
filter(respondent_wall_type != "cement") %>%
count(village, respondent_wall_type) %>%
group_by(village) %>%
mutate(percent = n / sum(n)) %>%
ungroup()
View(percent_wall_type)
ggplot(percent_wall_type, aes(x = village, y = percent, fill = respondent_wall_type)) +
geom_bar(stat = "identity", position = "dodge")
ggplot(percent_wall_type, aes(x = village, y = percent, fill = respondent_wall_type)) +
geom_bar(stat = "identity", position = "dodge")
#######################################################################################################################
percent_memb_assoc <- interviews_plotting %>%
filter(!is.na(memb_assoc)) %>%
count(village, memb_assoc) %>%
group_by(village) %>%
mutate(percent = n / sum(n)) %>%
ungroup()
View(percent_memb_assoc)
ggplot(percent_memb_assoc, aes(x = village, y = percent, fill = memb_assoc)) +
geom_bar(stat = "identity", position = "dodge")
#######################################################################################################################
ggplot(percent_wall_type, aes(x = village, y = percent, fill = respondent_wall_type)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title="Proportion of wall type by village", x="Wall Type", y="Percent")
######################################################################################################################
ggplot(percent_wall_type, aes(x = respondent_wall_type, y = percent)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title="Proportion of wall type by village",
x="Wall Type",
y="Percent") +
facet_wrap(~ village, ncol=1)
ggplot(percent_wall_type, aes(x = respondent_wall_type, y = percent)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title="Proportion of wall type by village",
x="Wall Type",
y="Percent") +
facet_wrap(~ village) +
theme_bw() +
theme(panel.grid = element_blank())
table(interviews$village)
percent_items <- interviews_plotting %>%
gather(items, items_owned_logical, bicycle:no_listed_items) %>%
filter(items_owned_logical) %>%
count(items, village) %>%
mutate(people_in_village = case_when(village == "Chirodzo" ~ 39,
village == "God" ~ 43,
village == "Ruaca" ~ 49)) %>%
mutate(percent = n / people_in_village)
ggplot(percent_items, aes(x = village, y = percent)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ items) +
theme_bw() +
theme(panel.grid = element_blank())
######################################################################################################################
######################################################################################################################
ggplot(percent_items, aes(x = village, y = percent)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ items) +
labs(title = "Percent of respondents in each village who owned each item",
x = "Village",
y = "Percent of Respondents") +
theme_bw()
ggplot(percent_items, aes(x = village, y = percent)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ items) +
labs(title = "Percent of respondents in each village who owned each item",
x = "Village",
y = "Percent of Respondents") +
theme_bw() +
theme(text=element_text(size = 16))
ggplot(percent_items, aes(x = village, y = percent)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ items) +
labs(title = "Percent of respondents in each village \n who owned each item",
x = "Village",
y = "Percent of Respondents") +
theme_bw() +
theme(axis.text.x = element_text(colour = "grey20", size = 12, angle = 45, hjust = 0.5, vjust = 0.5),
axis.text.y = element_text(colour = "grey20", size = 12),
text = element_text(size = 16))
grey_theme <- theme(axis.text.x = element_text(colour = "grey20", size = 12, angle = 45, hjust = 0.5, vjust = 0.5),
axis.text.y = element_text(colour = "grey20", size = 12),
text = element_text(size = 16),
plot.title = element_text(hjust = 0.5))
ggplot(percent_items, aes(x = village, y = percent)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ items) +
labs(title = "Percent of respondents in each village \n who owned each item",
x = "Village",
y = "Percent of Respondents") +
grey_theme
my_plot <- ggplot(percent_items, aes(x = village, y = percent)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ items) +
labs(title = "Percent of respondents in each village \n who owned each item",
x = "Village",
y = "Percent of Respondents") +
theme_bw() +
theme(axis.text.x = element_text(colour = "grey20", size = 12, angle = 45, hjust = 0.5, vjust = 0.5),
axis.text.y = element_text(colour = "grey20", size = 12),
text = element_text(size = 16),
plot.title = element_text(hjust = 0.5))
ggsave("items_by_village_barplot.png", my_plot, width = 15, height = 10)