Collaborative Document. Day 3, Sept 28th

# Collaborative Document. Day 3, Sept 28th 2022-09-28 R for Social Scientists Welcome to The Workshop Collaborative Document This Document is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents. All content is publicly available under the Creative Commons Attribution License https://creativecommons.org/licenses/by/4.0/ ---------------------------------------------------------------------------- This is the document for the 26th: [link](https://hackmd.io/@o3DWHyfCQNqBUaAA1JO-_A/HJkxiTA-o/edit) This is the document for the 27th: [link](https://hackmd.io/@o3DWHyfCQNqBUaAA1JO-_A/H1iIiNkGj/edit) This is the document for today: [link](https://hackmd.io/@o3DWHyfCQNqBUaAA1JO-_A/HJfCD0lGo/edit) ## 👮Code of Conduct * Participants are expected to follow those guidelines: * Use welcoming and inclusive language * Be respectful of different viewpoints and experiences * Gracefully accept constructive criticism * Focus on what is best for the community * Show courtesy and respect towards other community members ## ⚖️ License All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ ## 🙋Getting help to ask a question, type in the chat window to get help, type in the chat window you can ask questions in the document or chat window and helpers will try to help you ## 🖥 Workshop website The workshop website can be found [here](https://steltenpower.github.io/2022-09-26-dc-socsci-R-nlesc-dccpo-online/). ### 🛠 Setup The general setup of the workshop can be found [here](https://datacarpentry.org/socialsci-workshop/setup-r-workshop.html). #### For today: - R - RStudio - `install.packages("tidyverse")` en `install.packages("here")` - [SAFI_clean.csv](https://ndownloader.figshare.com/files/11492171) ## About the data For more information about the dataset and to download it from [Figshare](http://www.datacarpentry.org/socialsci-workshop/data), check out the Social Sciences workshop data page. ## 👩‍🏫👩‍💻🎓 Instructors Ruud Steltenpool, Rick de Klerk ## 🧑‍🙋 Helpers Rins Rutgers, Margriet Miedema ## Check-in: naam (pronouns) | organisatie | wat is na Engels en Nederlands je beste taal ## Starting with data ### Importeren en data inladen ```r= library(tidyverse) library(here) interviews <- read_csv( here("data", "SAFI_clean.csv"), na = "NULL") ``` RStudio kan warnings geven omdat de libraries nog niet zijn geïmporteerd. Voor het reproduceerbaar 'managen' van packages: https://rstudio.github.io/packrat/walkthrough.html ### Inspecteren ```r= View(interviews) print(interviews) dim(interviews) nrow(interviews) ncol(interviews) head(interviews) tail(interviews) head(interviews, n = 9) tail(interviews, n = 3) names(interviews) names(interviews)[1:3] colnames(interviews) str(interviews) summary(interviews) glimpse(interviews) ``` print(interviews, n = 10, width = Inf) om alle kolommen te printen en 10 rijen. ### Indexeren ```r= interviews[1, 1] # rij, kolom: geeft dbl 1 interviews[1, 2] # geeft chr "God" interviews[[1]] sum(interviews[[1]]) interviews[1,] # de hele eerste rij interviews[2:4] # tweede t/m de vierde kolom interviews[2:4, ] # tweede t/m de vierde rij interviews[-1] # alles behalve de eerste kolom subset <- interviews[-c(2:130), 1:2] # let op dat je hier c() nodig hebt interviews["village"] interviews[1:3, "village"] interviews[1, ] interviews$village # vector ``` **Opdracht** 1. Create a tibble (interviews_100) containing only the data in row 100 of the interviews dataset. 1. Notice how nrow() gave you the number of rows in the tibble? - Use that number to pull out just that last row in the tibble. - Compare that with what you see as the last row using tail() to make sure it’s meeting expectations. - Pull out that last row using nrow() instead of the row number. - Create a new tibble (interviews_last) from that last row. 1. Using the number of rows in the interviews dataset that you found in question 2, extract the row that is in the middle of the dataset. Store the content of this middle row in an object named interviews_middle. (hint: This dataset has an odd number of rows, so finding the middle is a bit trickier than dividing n_rows by 2. Use the median( ) function and what you’ve learned about sequences in R to extract the middle row! 1. Combine nrow() with the - notation above to reproduce the behavior of head(interviews), keeping just the first through 6th rows of the interviews dataset. ```r= ## 1. interviews_100 <- interviews[100, ] ## 2. # Saving `n_observations` to improve readability and reduce duplication n_observations <- nrow(interviews) # totaal aantal rijen interviews_last <- interviews[n_observations, ] ## 3. interviews_middle <- interviews[median(1:n_observations), ] ## 4. interviews_head <- interviews[-(7:n_observations), ] ``` ## Factors ```r= ## Creating a factor with 2 levels: respondent_floor_type <- factor(c("earth", "cement", "cement", "earth")) ## Showing levels: levels(respondent_floor_type) ## Count levels: nlevels(respondent_floor_type) respondent_floor_type <- fct_recode(respondent_floor_type, brick = "cement") levels(respondent_floor_type) respondent_floor_type <- factor(respondent_floor_type, ordered = TRUE) respondent_floor_type text <- as.character(respondent_floor_type) year_fct <- factor(c(1990, 1983, 1977, 1998, 1990)) as.numeric(year_fct) # Dit werkt niet, omdat hij de onderliggende nummers omzet as.numeric(as.character(year_fct)) memb_assoc <- interviews$memb_assoc memb_assoc memb_assoc <- factor(memb_assoc) memb_assoc ## NAs ook meenemen memb_assoc <- interviews$memb_assoc # als een waarde na is, vul dan "undetermined" in memb_assoc[is.na(memb_assoc)] <- "undetermined" # En weer een factor van maken voor geheugen besparing. memb_assoc <- as.factor(memb_assoc) #namen aanpassen: memb_assoc <- fct_recode(memb_assoc, No = "no", Undetermined = "undetermined", Yes = "yes") memb_assoc <- factor(memb_assoc, levels = c("No", "Yes", "Undetermined")) plot(memb_assoc) ``` ### Dates ```r= library(lubridate) ``` ## Data wrangling ```r= ## load the tidyverse library(tidyverse) library(here) interviews <- read_csv(here("data", "SAFI_clean.csv"), na = "NULL") ## inspect the data interviews ## preview the data # view(interviews) ``` ### Select en filter ```r= select(interviews, village, no_membrs, months_lack_food) # seleCCCt voor Columns select(interviews, c("village", "no_membrs", "months_lack_food")) select(interviews, village:respondent_wall_type) filter(interviews, village == "Chirodzo") # filteRRRRRR voor Rijen filter(interviews, village == "Chirodzo", rooms > 1, no_meals > 2) filter(interviews, village == "Chirodzo" & rooms > 1 & no_meals > 2) # doet hetzelfde als de vorige statement filter(interviews, village == "Chirodzo" | village == "Ruaca") # | betekent OR ``` ### Pipes ```r= filter(select(interviews, village:respondent_wall_type), village == "Chirodzo" | village == "Ruaca") # %>% is het pipe symbool uit de tidyverse |> zit sinds 4.x in base R interviews %>% filter(village == "Chirodzo") %>% select(village:respondent_wall_type) subset <- interviews %>% filter(village == "Chirodzo") %>% select(village:respondent_wall_type) # nu is de subset opgeslagen in de variabele subset ``` **Opdracht** Using pipes, subset the `interviews` data to include interviews where respondents were members of an irrigation association (`memb_assoc`) and retain only the columns `affect_conflicts`, `liv_count`, and `no_meals`. ```r= filter(memb_assoc == "yes") %>% select( affect_conflicts , liv_count , no_meals ) ``` ### Mutate ```r= interviews %>% mutate(people_per_room = no_membrs / rooms) %>% select(no_membrs, rooms, people_per_room) interviews %>% filter(!is.na(memb_assoc)) %>% mutate(people_per_room = no_membrs / rooms) %>% select(no_membrs, rooms, people_per_room) ``` **Opdracht** Create a new dataframe from the `interviews` data that meets the following criteria: contains only the `village` column and a new column called `total_meals` containing a value that is equal to the total number of meals served in the household per day on average (`no_membrs` times `no_meals`). Only the rows where `total_meals` is greater than 20 should be shown in the final dataframe. **Hint:** think about how the commands should be ordered to produce this data frame! ```r= interviews_total_meals <- interviews %>% mutate(total_meals = no_membrs * no_meals) %>% filter(total_meals > 20) %>% select(village, total_meals) ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.