Portfolio assignment 4

--- title: "Portfolio assignment 4" author: "Laura Sørine Voldgaard & Sofie Bøjgaard Thomsen" date: '2022-11-22' output: pdf_document: default --- ```{r setup, include=FALSE, warning=FALSE} knitr::opts_chunk$set(echo = FALSE) # setting working directory knitr::opts_knit$set(root.dir = '/work/CogSci_Methods01/data') ``` ```{r, include=FALSE, warning=FALSE} # loading packages pacman::p_load(tidyverse) library("lme4") library("lmerTest") library("boot") library("caret") library("e1071") # clear environment. rm(list=ls()) # clears global workspace. ``` ```{r, include=FALSE, warning=FALSE} # Loading the cake data cake_data <- read_csv("cake.csv") ``` # Analysis 1 (LSV) R (R Core Team, 2019) was used to perform a linear mixed effects analysis in order to find the best model for predicting the breakage angle of chocolate cakes with the variables 'recipe', 'temp' and 'replicate'. We compared different theoretically justifiable models and chose the best one by the lowest AIC score, where recipe and temp were used as fixed effects since they were expected to be the predictors of influence on the DV (Angle). Intercept for replicate was used as random effect to account for the variation in DV (Angle) between the replicates. It has the following syntax: lmerTest::lmer(Angle ~ temp + (1 + recipe|replicate)) The whole model accounted for roughly 70% of variance in cake breakage angle (*R2 conditional = 0.70*), while the fixed effect for roughly 11% (*R2 marginal = 0.106*). Our residual plot (*figure 1*) below reveals no obvious deviations from linearity or homoscedasticity. The breakage angle of the cake is significantly modulated by the recipe used for the cake as well as the temperature with which the cake was baked, *beta = 0.158, SE = 0.016, t = 9.8, p < .001*. ```{r, include=FALSE, warning=FALSE} m1 <- lmerTest::lmer(angle ~ recipe + temp + (1|recipe:replicate), data = cake_data, REML = F) m2 <- lmerTest::lmer(angle ~ recipe * temp + (1|recipe:replicate), data = cake_data, REML = F) m3 <- lmerTest::lmer(angle ~ recipe + temp + (1|replicate), data = cake_data, REML = F) m4 <- lmerTest::lmer(angle ~ recipe + (1|replicate), data = cake_data, REML = F) m5 <- lmerTest::lmer(angle ~ temp + (1|replicate), data = cake_data, REML = F) m6 <- lmerTest::lmer(angle ~ temp + (1|recipe:replicate), data = cake_data, REML = F) m7 <- lmerTest::lmer(angle ~ temp + (1 + recipe|replicate), data = cake_data, REML = F) AIC(m1, m2, m3, m4, m5, m6, m7) %>% arrange(AIC) ``` ```{r, include=TRUE} # Testing visually for linearity and homoscedasticity of the residuals plot(m7, main = "Figure 1") ``` ```{r, include=TRUE} # Making table with summary output of the model with lowest AIC score summary(m7) ``` ```{r, include=FALSE, warning=FALSE} # Calculating R scores MuMIn::r.squaredGLMM(m7) ``` ```{r, include=FALSE} # Making it pretty r2 <- c(0.1062652, 0.7023687) R2 <- c("R2 marginal", "R2 conditional") r2_df <- data.frame(R2, r2) knitr::kable(r2_df, caption = "R2 scores", digits = c(4,4,4,4,30)) ``` # Analysis 2 (SBT) A generalized linear model was made to predict the survival rate of the passengers on the RMS Titanic by the following variables; their age, gender and the class they were travelling on. The syntax of the model: base::glm(Survived ~ Age + Sex + Pclass) The summary output from our model (*table 2*) shows the intercept *beta = 3.63492*, given Sex = female, Pclass = 1, Age = 0. The other coefficient estiamtes explain the change in log odds when the predictor variables increase with 1. E.g., this means that the older the passenger is (pr. year) the likelihood of surviving decreases with -0.034 in log odds scale. The p-values for all the predictors of the model are extremely low (p<.05), which indicates that all the predictors have a statistically significant influence on our model. Also, the very low p-value at the *Chi-squared* value of 381.18 with 4 degress of freedom tells us that the model is very useful. Furthermore, there were no significant violations of the assumptions of the model according to the DHARMa test. ```{r, include=FALSE, warning=FALSE} # Loading the Titanic data titanic <- read_csv("titanic.csv") ``` ```{r, include=FALSE, warning=FALSE} # Mutating survival, gender and class to make sure they are all factor titanic$Survived <- as.factor(titanic$Survived) titanic$Sex <- as.factor(titanic$Sex) titanic$Pclass <- as.factor(titanic$Pclass) ``` ```{r} tmodel1 <- glm(Survived ~ Age + Sex + Pclass, data = titanic, family = binomial) summary(tmodel1) tmodel2 <- glm(Survived ~ Age, data = titanic, family = binomial) tmodel3 <- glm(Survived ~ Sex, data = titanic, family = binomial) tmodel4 <- glm(Survived ~ Pclass, data = titanic, family = binomial) ``` ```{r, include = FALSE} # checking for assumptions simulationOutput <- DHARMa::simulateResiduals(tmodel1, quantreg=T, plot = T) plot(simulationOutput) plot(tmodel1) ``` The probability of surviving for gender (at the median age of 27 for female and 28 for male passengers) on the different classes is shown in the table below (*table 3*), where females in general have a higher chance of surviving than males. This has been computed from the summary output by including the relevant beta coefficients in the following model: boot::inv.logit(Beta_0 +(MeanAgeForGender * Beta_1)+ Beta_2 + (Beta_3 | Beta_4)) The highest survivability rate is for female 1st class passengers (*probability of surviving = 93.8%*) and the lowest survivability rate is for male 3rd class passengers (*probability of surviving = 8.6%*). ```{r, include =FALSE} # Finding median age for both male and female titanic %>%group_by(Sex) %>% summarise(median(Age)) ``` ```{r, include=FALSE} # calculating probabilities of surviving F1 <- boot::inv.logit(3.63492 + (27 * -0.03427)) # 1st class female F2 <- boot::inv.logit(3.63492 + (27 * -0.03427) -1.19911) # 2nd class female F3 <- boot::inv.logit(3.63492 + (27 * -0.03427) -2.45544) # 3rd class female M1 <- boot::inv.logit(3.63492 - 2.58872 + (28 * -0.03427)) # 1st class male M2 <- boot::inv.logit(3.63492 - 2.58872 - 1.19911 + (28 * -0.03427)) # 2nd class male M3 <- boot::inv.logit(3.63492 - 2.58872 - 2.45544 + (28 * -0.03427)) # 3rd class male # creating a tibble with the probabilities prob_tib <- tibble(Sex = c("Female", "Male"), "1st class" = c(F1*100, M1*100), "2nd class" = c(F2*100, M2*100), "3rd class" = c(F3*100, M3*100)) prob_tib ``` ```{r, include=TRUE} # Making nice table for suvival probabilities knitr::kable(prob_tib, caption = "Survival probabilities (in %)", digits = 1) ``` The survival distribution across the variables class and gender is illustrated in the jitter plot below (*figure 2*). ```{r, include=TRUE, warning = FALSE} Class <- titanic$Pclass titanic %>% ggplot(aes(x = Sex, y = Survived, col = Class))+ geom_jitter(shape = 16) + geom_label( label="Survived", x=1.5, y=1.6, label.size = 0.2, color = "black") + geom_hline(yintercept = 1.5, color = "black", linetype = "dashed") + geom_label( label= "Deceased", x=1.5, y=1.4, label.size = 0.2, color = "black") + labs(title = "Figure 2") + theme_bw() ``` The distribution of age across gender, survival rate and class is illustrated in the plots below (*figure 3*). ```{r, include=FALSE, warning=FALSE} #preparation for fancy plot titanic_new <- titanic %>% mutate(genderlive = case_when(Sex == "female" & Survived == 1 ~ 1, Sex == "male" & Survived == 1 ~ 2, Sex == "female" & Survived == 0 ~ 3, Sex == "male" & Survived == 0 ~ 4)) as.factor(titanic_new$genderlive) genderlive.names <- list( '1'="Surviving women", '2'="Surviving men", '3'="Deceased women", '4'="Deceased men") genderlive_labeller <- function(variable,value){ return(genderlive.names[value]) } ``` ```{r, include=TRUE, warning=FALSE} # Making fancy plot ggplot(titanic_new, aes(x = Age, color = Pclass, fill= Pclass)) + geom_dotplot(dotsize = 0.4, binwidth = 4) + facet_wrap(~genderlive, labeller= genderlive_labeller) + labs(title = "Figure 3") ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.