owned this note
owned this note
Published
Linked with GitHub
---
title: "Portfolio assignment 4"
author: "Laura Sørine Voldgaard & Sofie Bøjgaard Thomsen"
date: '2022-11-22'
output:
pdf_document: default
---
```{r setup, include=FALSE, warning=FALSE}
knitr::opts_chunk$set(echo = FALSE)
# setting working directory
knitr::opts_knit$set(root.dir = '/work/CogSci_Methods01/data')
```
```{r, include=FALSE, warning=FALSE}
# loading packages
pacman::p_load(tidyverse)
library("lme4")
library("lmerTest")
library("boot")
library("caret")
library("e1071")
# clear environment.
rm(list=ls()) # clears global workspace.
```
```{r, include=FALSE, warning=FALSE}
# Loading the cake data
cake_data <- read_csv("cake.csv")
```
# Analysis 1 (LSV)
R (R Core Team, 2019) was used to perform a linear mixed effects analysis in order to find the best model for predicting the breakage angle of chocolate cakes with the variables 'recipe', 'temp' and 'replicate'.
We compared different theoretically justifiable models and chose the best one by the lowest AIC score, where recipe and temp were used as fixed effects since they were expected to be the predictors of influence on the DV (Angle). Intercept for replicate was used as random effect to account for the variation in DV (Angle) between the replicates. It has the following syntax:
lmerTest::lmer(Angle ~ temp + (1 + recipe|replicate))
The whole model accounted for roughly 70% of variance in cake breakage angle (*R2 conditional = 0.70*), while the fixed effect for roughly 11% (*R2 marginal = 0.106*). Our residual plot (*figure 1*) below reveals no obvious deviations from linearity or homoscedasticity. The breakage angle of the cake is significantly modulated by the recipe used for the cake as well as the temperature with which the cake was baked, *beta = 0.158, SE = 0.016, t = 9.8, p < .001*.
```{r, include=FALSE, warning=FALSE}
m1 <- lmerTest::lmer(angle ~ recipe + temp + (1|recipe:replicate), data = cake_data, REML = F)
m2 <- lmerTest::lmer(angle ~ recipe * temp + (1|recipe:replicate), data = cake_data, REML = F)
m3 <- lmerTest::lmer(angle ~ recipe + temp + (1|replicate), data = cake_data, REML = F)
m4 <- lmerTest::lmer(angle ~ recipe + (1|replicate), data = cake_data, REML = F)
m5 <- lmerTest::lmer(angle ~ temp + (1|replicate), data = cake_data, REML = F)
m6 <- lmerTest::lmer(angle ~ temp + (1|recipe:replicate), data = cake_data, REML = F)
m7 <- lmerTest::lmer(angle ~ temp + (1 + recipe|replicate), data = cake_data, REML = F)
AIC(m1, m2, m3, m4, m5, m6, m7) %>%
arrange(AIC)
```
```{r, include=TRUE}
# Testing visually for linearity and homoscedasticity of the residuals
plot(m7, main = "Figure 1")
```
```{r, include=TRUE}
# Making table with summary output of the model with lowest AIC score
summary(m7)
```
```{r, include=FALSE, warning=FALSE}
# Calculating R scores
MuMIn::r.squaredGLMM(m7)
```
```{r, include=FALSE}
# Making it pretty
r2 <- c(0.1062652, 0.7023687)
R2 <- c("R2 marginal", "R2 conditional")
r2_df <- data.frame(R2, r2)
knitr::kable(r2_df, caption = "R2 scores", digits = c(4,4,4,4,30))
```
# Analysis 2 (SBT)
A generalized linear model was made to predict the survival rate of the passengers on the RMS Titanic by the following variables; their age, gender and the class they were travelling on. The syntax of the model:
base::glm(Survived ~ Age + Sex + Pclass)
The summary output from our model (*table 2*) shows the intercept *beta = 3.63492*, given Sex = female, Pclass = 1, Age = 0. The other coefficient estiamtes explain the change in log odds when the predictor variables increase with 1. E.g., this means that the older the passenger is (pr. year) the likelihood of surviving decreases with -0.034 in log odds scale.
The p-values for all the predictors of the model are extremely low (p<.05), which indicates that all the predictors have a statistically significant influence on our model. Also, the very low p-value at the *Chi-squared* value of 381.18 with 4 degress of freedom tells us that the model is very useful.
Furthermore, there were no significant violations of the assumptions of the model according to the DHARMa test.
```{r, include=FALSE, warning=FALSE}
# Loading the Titanic data
titanic <- read_csv("titanic.csv")
```
```{r, include=FALSE, warning=FALSE}
# Mutating survival, gender and class to make sure they are all factor
titanic$Survived <- as.factor(titanic$Survived)
titanic$Sex <- as.factor(titanic$Sex)
titanic$Pclass <- as.factor(titanic$Pclass)
```
```{r}
tmodel1 <- glm(Survived ~ Age + Sex + Pclass, data = titanic, family = binomial)
summary(tmodel1)
tmodel2 <- glm(Survived ~ Age, data = titanic, family = binomial)
tmodel3 <- glm(Survived ~ Sex, data = titanic, family = binomial)
tmodel4 <- glm(Survived ~ Pclass, data = titanic, family = binomial)
```
```{r, include = FALSE}
# checking for assumptions
simulationOutput <- DHARMa::simulateResiduals(tmodel1, quantreg=T, plot = T)
plot(simulationOutput)
plot(tmodel1)
```
The probability of surviving for gender (at the median age of 27 for female and 28 for male passengers) on the different classes is shown in the table below (*table 3*), where females in general have a higher chance of surviving than males. This has been computed from the summary output by including the relevant beta coefficients in the following model:
boot::inv.logit(Beta_0 +(MeanAgeForGender * Beta_1)+ Beta_2 + (Beta_3 | Beta_4))
The highest survivability rate is for female 1st class passengers (*probability of surviving = 93.8%*) and the lowest survivability rate is for male 3rd class passengers (*probability of surviving = 8.6%*).
```{r, include =FALSE}
# Finding median age for both male and female
titanic %>%group_by(Sex) %>%
summarise(median(Age))
```
```{r, include=FALSE}
# calculating probabilities of surviving
F1 <- boot::inv.logit(3.63492 + (27 * -0.03427)) # 1st class female
F2 <- boot::inv.logit(3.63492 + (27 * -0.03427) -1.19911) # 2nd class female
F3 <- boot::inv.logit(3.63492 + (27 * -0.03427) -2.45544) # 3rd class female
M1 <- boot::inv.logit(3.63492 - 2.58872 + (28 * -0.03427)) # 1st class male
M2 <- boot::inv.logit(3.63492 - 2.58872 - 1.19911 + (28 * -0.03427)) # 2nd class male
M3 <- boot::inv.logit(3.63492 - 2.58872 - 2.45544 + (28 * -0.03427)) # 3rd class male
# creating a tibble with the probabilities
prob_tib <- tibble(Sex = c("Female", "Male"), "1st class" = c(F1*100, M1*100), "2nd class" = c(F2*100, M2*100), "3rd class" = c(F3*100, M3*100))
prob_tib
```
```{r, include=TRUE}
# Making nice table for suvival probabilities
knitr::kable(prob_tib, caption = "Survival probabilities (in %)", digits = 1)
```
The survival distribution across the variables class and gender is illustrated in the jitter plot below (*figure 2*).
```{r, include=TRUE, warning = FALSE}
Class <- titanic$Pclass
titanic %>%
ggplot(aes(x = Sex, y = Survived, col = Class))+
geom_jitter(shape = 16) +
geom_label(
label="Survived",
x=1.5,
y=1.6,
label.size = 0.2,
color = "black") +
geom_hline(yintercept = 1.5, color = "black", linetype = "dashed") +
geom_label(
label= "Deceased",
x=1.5,
y=1.4,
label.size = 0.2,
color = "black") +
labs(title = "Figure 2") +
theme_bw()
```
The distribution of age across gender, survival rate and class is illustrated in the plots below (*figure 3*).
```{r, include=FALSE, warning=FALSE}
#preparation for fancy plot
titanic_new <- titanic %>%
mutate(genderlive = case_when(Sex == "female" & Survived == 1 ~ 1,
Sex == "male" & Survived == 1 ~ 2,
Sex == "female" & Survived == 0 ~ 3,
Sex == "male" & Survived == 0 ~ 4))
as.factor(titanic_new$genderlive)
genderlive.names <- list(
'1'="Surviving women",
'2'="Surviving men",
'3'="Deceased women",
'4'="Deceased men")
genderlive_labeller <- function(variable,value){
return(genderlive.names[value])
}
```
```{r, include=TRUE, warning=FALSE}
# Making fancy plot
ggplot(titanic_new, aes(x = Age, color = Pclass, fill= Pclass)) + geom_dotplot(dotsize = 0.4, binwidth = 4) + facet_wrap(~genderlive, labeller= genderlive_labeller) + labs(title = "Figure 3")
```