# PSYC3361 Verification Report
## Did Social Connection Decline During the First Wave of COVID-19?: The Role of Extraversion
**Authors:** Saraa Al-Saddik
**Group:** Tuesday 1-3pm Group 5
**Date:** 7th August 2022
- [Article](https://online.ucpress.edu/collabra/article/6/1/37/114469/Did-Social-Connection-Decline-During-the-First)
**R code**:
- [Study 1 R code](https://osf.io/f7z5e?view_only=b6f77e38fbb54f839d6c2bd1fb8d41c2)
- [Study 2 R code](https://osf.io/bz6yk?view_only=14b462058b2745f8a51997a49f8b62e0)
**CSV files**:
- [Study 1 csv](https://osf.io/zj9fv?view_only=b6f77e38fbb54f839d6c2bd1fb8d41c2)
- [Study 2 csv](https://osf.io/6s94g?view_only=14b462058b2745f8a51997a49f8b62e0)
## Part 1 Summary and Reaction:
### Summary
COVID is a recent crisis and the psychological impact of the social behaviour changes (in the form of social distancing) that arose in response to it is a recent area of research that has yet to be investigated.
During COVID-19, physical/social distancing of six feet from anyone outside of our household became a known preventative measure. The question Folk and colleagues (2020) aimed to address is whether this change in social behaviour altered people's sense of social connection and wellbeing before versus during the pandemic. However, this question was boiled down to an individual's personal characteristics. That is, the change in an individual's level of social connectedness and wellbeing was dependant on their level of extraversion.
Thus, the study aimed to determine changes in levels of social connectedness and wellbeing in introverts versus extroverts. Here, it was an open question as to whether an introverts or extraverts’ overall level of wellbeing/ connectedness is differentially affected by COVID changes in social behaviour. Extraverts are known to have stronger social relationships and therefore a stronger support network that can assist them through a crisis. However, an extrovert's wellbeing is dependent on their level of social connectedness. Meanwhile, introverts may be less adversely impacted by changes in social behaviour as they already had fewer social interactions before the pandemic. Thus, they may not exhibit much of a difference in social connectedness and wellbeing before versus during the pandemic.
In Study 1, 467 undergraduate students from a Canadian university underwent a survey at two timepoints- before the pandemic (T1) and during the pandemic (T2)- where the same participants at T1 were recruited again at T2. Here, the survey contained demographic items, measures of social connectedness, lethargy- that acted as a proxy for measuring subjective wellbeing- and extraversion measures.
Study 1 found that participants reported lower levels of social connectedness during the pandemic versus before, however it was small in magnitude. Moreover, extraverted participants reported larger drops in social connectedness from T1 to T2 than introverted. However, when connectedness levels before the pandemic were controlled, extraverts coped better than introverts. As an indicator of wellbeing it was found that lethargy levels (proxy for wellbeing) increased from T1 to T2.
It was also found that changes in lethargy was significantly correlated with changes in social connectedness. Although the data provides us with insight into the psychological aspects of coping with the pandemic- and coping strategies vary according to individual differences- Study 1 had a non-representative sample. That is, the results obtained from undergraduate students cannot be generalised to the general population. Thus, Study 2 was conducted.
In Study 2, 336 community adults (primarily from the US and UK) were surveyed at before and during the pandemic as well. Social connectedness was determined using two scales; relatedness (using the Balanced Measure of Psychological Needs- BMPN) and loneliness. Moreover, life satisfaction (using the Satisfaction With Life Scale- SWLS) and levels of extraversion (before the pandemic) of each participant were also measured. In both study 1 and 2, physical distancing levels of each participant was also measured.
Study 2 found that there were no differences in relatedness levels before (T1) versus during (T2) the pandemic. Moreover, participants reported feeling slightly, but significantly, less lonely. In relation to extraversion, it was found that the most introverted participants showed a significant decrease in loneliness, whereas the most extraverted showed no improvement. Finally, it was found that levels of life satisfaction did not change before versus during the pandemic
Overall, both studies suggest that, on average, people had relatively low levels of changes in social connectedness despite the social behaviour changes caused by COVID. This suggests that despite the crisis limiting our social interactions, we as a society are able to adapt our social behaviours in order to obtain our need for social connection and belonging- where most of our social interactions became online-based.
In relation to the overall relationship between extraversion and connectedness, Study 1 indicated that extraverts were worse off than introverts. However, study 2 indicated that introverts experienced less loneliness. However, we see that when initial social connectedness levels were controlled (pre-pandemic) the effect of extraversion level on connectedness reversed or disappeared. Thus, the conclusion is that extraverts appear to have fared worse as they had more social connections to lose than introverts. Thus, exhibiting lower levels of social connectedness and therefore wellbeing. This again provides insight as to how individual differences can cause individuals to cope with the crisis differently and how individuals vary in their coping abilities.
### Reaction:
*The paper reminded me of….* My own experience during the pandemic. It is true that individual differences can affect the psychological impact of changes in social behaviour. I classify myself as somewhere between an introvert and an extrovert, so I found myself able to relate to both. I would constantly alternate in my levels of loneliness and connectedness where sometimes I would be fine not socialising with anyone for weeks, and then at other times I would find myself constantly going to the park just to see people outside of my household.
*I wonder whether… the authors should have investigated* other measures of individual differences, such as household composition. Although extraversion can be an important factor, it would also be important to investigate individuals who live on their own in comparison to those who live in a family (I live with a family of six). In this case, all variables that were measured, particularly loneliness (refer to figure below), should substantially, and significantly, differ based on an individual’s household composition. Individuals living alone and are not in a relationship would at a higher risk of loneliness, reduced wellbeing, and increased levels of psychological distress, such as depression (Flood 2005; Lauder et al. 2004; Relationships Australia 2011). It is important to investigate this as we can then identify higher risk individuals and provide them with mental health support that they need where many people may still be experiencing the aftereffects.
**(ABS,2020)** Graph:

*I was surprised that…* Changes in levels relatedness did not vary before versus during the pandemic. As relatedness was measured by aspects such as "I felt close and connected with people who are important to me", I would have thought that being quarantined with your household members whilst undergoing a crisis would have made individuals feel more close and connected. Again, this could have also tied into the household composition aspect where not everyone lives with their family. Due to rules such as not being allowed to visit other households, individuals may have only been able to communicate with their family members online and therefore felt a sense of disconnection. However, there was no change whatsoever where the mean change in relatedness score was M = - 0.01. So although relatedness reduced, it definitely was not significant.
## Part 2:
### Verification goals
The goal of the verification section of this report was to reproduce the demographic descriptives, the means/ SD reported in the text and all figures from the article. This was done to help get a better idea behind the challenges encountered when reproducing other studies where these challenges will be highlighted throughout the report.
### Demographic statistics reported in the study:
#### Study 1:

#### Study 2:
- Time 1:

- Time 2:

### Means/SD reported in the text:
#### Study 1:
In-text results:
- Physical/Social distancing:

- Changes in social connectedness due to COVID:

- Social connectedness x extraversion levels:

- Lethargy levels (wellbeing proxy):

Table results:
- Mean and SD reported in table 1:

#### Study 2:
In text results:
- Physical/Social distancing:

- Extraversion levels:

- Change in relatedness due to COVID:

- Change in loneliness levels due to COVID:

- Loneliness levels x extraverion levels:

- Changes in life satisfaction levels due to COVID:

Table results:

### Figures reported in the text:
#### Study 1:
Figure 1:

#### Study 2:
Figure 2:

Figure 3:

# Steps to replicating data:
### Locate the open data
The first step was to locate the open data for both studies which was easily located by following the OSF links under the Methods section for each study.
The OSF Repositories are:
- [Study 1](https://osf.io/zj9fv?view_only=b6f77e38fbb54f839d6c2bd1fb8d41c2)
- [Study 2](https://osf.io/6s94g?view_only=14b462058b2745f8a51997a49f8b62e0)
## Code book:
We then went to find the datacodes for each study so we could see what each variable was labelled:
- Study 1: 
- Study 2: 
# Replication process:
As mentioned earlier, there were two studies in this article. Therefore, there were two separate csv files for each study. Therefore replication for each study occurred independantly starting with Study 1.
### Packages and reading data
```
# Read the Data STUDY 1-----------------------------------------------------------
# install the packages
install.packages("tidyverse") # for most of the functions needed
install.packages("dplyr") # for efficient data manipulation
install.packages("lsr") # used to compute mode
install.packages("psych") # used to create table1
install.packages("cowplot") # to combine ggplots into a grid
install. packages ("ggplot2") # data visualisations
# load the packages
library(tidyverse)
library(lsr)
library(psych)
library(dplyr)
library(cowplot)
library(ggplot2)
```
Note that we knew to load some of these packages by looking at the R code provided by the authors in the OSF file where we used the relevant packages that we deemed we needed to reproduce the data.
# Demographic statistics:
## Study 1:
### Reading Study 1 data
Raw study 1 data from the OSF file was saved as "study1_data.csv" in my work directory whereby the .csv format allowed data to be read using the read_csv() function to get the data into the Rmd file.The read_csv allows data to be imported into R as a tibble (a package used to manipulate and print data frames).
The data is then named as "study1_raw" by assigning it (<-) to the read_csv function.
```
# read the data
study1_raw <- read_csv(file = "study1_data.csv")
```
#### Exclusion criteria:
Lucky for us there was no issue reading the data where the exclusion criteria had already been applied to the data for both studies (where there were 467 rows of data- i.e. one row for each of the participants who had met their inclusion criteria- as mentioned earlier). This was made our job easier as, when we were reading the exclusion criteria, we had a mini panic attack when we saw the following criteria:

Although the authors didn't make our replication journey easier, after seeing the struggles other groups encountered with applying the exclusion criteria, were lucky to have skipped this obstacle and reproduce demographics.
#### Starting the actual process:

##### Gender identity of participants
Again another blessing is that the authors wrote "man", "woman", "other" or "decline to answer" when reporting the gender data, so we did not have have to rename variables. Thus we easilt reproduced the gender percentage. As I am a beginner to coding, Google had immediately become my best friend from the first step. So after a quick google search of 'how to find percentage in R', it was recommended that the easiest way was to use the dplyr package using this (adapted) example code:
```
library( dplyr )
study1raw %>%
group_by(Gender) %>%
summarise(percent = 100 * n()/nrow(study1raw)) #Success
```
```
Gender percent
<chr> <dbl>
1 [Decline to Answer] 0.428
2 Man 21.6
3 Other 0.857
4 Woman 77.1 # study reported women % = 77%... YES!
```
#### Age descriptives (or as we call it... the 'Age Dilemma'):
This section was our *main challenge* in the demographics section of the replications.
So initially we decided to filter out the “decline to answer” responses (where people did not want to give researchers their ages) and use the summarise() function to find the mean age and SD of participants:
```
#average age attempt 1
study1_raw %>%
filter(age != "[Decline to Answer]") %>% # filter out the “decline to answer” responses
summarise(mean(age), sd(age))
#FAIL (would work if age was numeric)
```
To which the output was:
```
`mean(Age)` `sd(Age)`
<dbl> <dbl>
1 NA 3.03
Warning message:
In mean.default(Age) : argument is not numeric or logical: returning NA # Age needs to be a numeric variable.
```
BUTTTT.... we got the SD (YAY!). So we know the summarise() function is working. Now to make age numeric rather than a character variable (). So in my first attempt, the suggested process to convert to numerical was to check the class type:
```
# check variable type
class(study1_raw) # supposed to be getting [1] "character"????
```
```
[1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
```
and then to use the recode() in the dplyr package then ifelse() to convert from categorical to numerical then match() function to rename the character variable into a numeric one. However, after seeing the output of the first line(and being heavily confused as to what I was supposed to be typing in the brackets based on the example code used in [website](https://universeofdatascience.com/how-to-recode-character-variables-in-r/). I abandoned that method pretty quickly and ran to find an easier one. However, what did pique my interest is how recoded variables can be converted into each other using as.numeric and the as.character functions.
So then began the process of combining different methods of filtering the "Decline to Answer" responses and trying to convert the age variable into a numeric one:
```
#convert age to numeric variable attempt 2
age.subset <- age.subset %>%
as.numeric(as.character(age)) #FAIL
#attempt 3
demographics %>% mutate(new_age = age != "[Decline to Answer]") # use dplyr to create a new variable ('new_age') #FAIL
#attempt 4
age.subset %>% mutate(age_test = numeric(age.subset)) #FAIL
```
It was just a repetitive process of trial and error until eventually we figured out the amazing $. This allowed us to extract the specific Age part of the study1_raw data. Then we allowed Age to become a numeric variable by using the “as.numeric” function. We were now able to calculate the age means and SD and by using na.rm we were able to remove any "Decline to Answer” (NA) values:
```
#THE FIX
study1_raw$Age <- as.numeric(study1_raw$Age)
#Converts "chr" to numeric form.
study1_age <- study1_raw %>%
summarise(mean_age = mean(Age, na.rm = TRUE),
sd_age = sd(Age, na.rm = TRUE),
min_age = min(Age, na.rm = TRUE),
max_age = max(Age, na.rm = TRUE)) #SUCCESS
```
Output:
```
mean_age sd_age min_age max_age
<dbl> <dbl> <dbl> <dbl>
1 20.9 3.03 17 44 # Study reported that mean age = 20.89, SD = 3.03.
```
Although the minimum and maximum age was not reported in the document, we decided to flex our newfound abilities ;)
## Study 2:
We again started off by loading the relavent packages and reading the data.
```
# load packages
library(tidyverse) #for most of the functions needed
library(dplyr) # tools for efficient data manipulation
# read the data
study2_raw <- read_csv(file= "Study 2.csv")
```
Then we began the replication process starting with Time 1.
### Time 1:

As mentioned in Study 1, the exclusion criteria had already been applied, thus we only have data for the N= 336 participants who had "completed both the Time 1 and Time 2 surveys". Thus, as we were missing 50 rows of data, we were unable to reproduce the demographic statistics at Time 1, only Time 2.
### Time 2:

As we had already reproduced the age and gender demographics in study 1, we figured it would be easier to start with those (and we were right- no problems). For gender we decided to assign a variable (which we called 'study2_gender') to the study2_raw environment. Using the dplyr package, we were able to use the group_by() function to group data frames according to the specified gender variable. We then used the summarise() function to create a new data frame and creating three combinations of grouping variables- as seen in the output below ('male', 'female', 'non-binary'). We then calculated the percentage for each grouping combination and put them in descending order in order to determine the most common gender within the participant data:
#### Gender:
```
study2_gender <- study2_raw %>% #Ensures a new table is saved to the environment
group_by(Gender) %>% #Groups the data based on the specified variable.
summarise(n = n(), #Counts the number of unique instances for the variable.
Percent = 100 * n()/nrow(study2_raw)) %>% #Calculates the percentage - representative percentage for the instance (100 x number of instances/total rows)
arrange(desc(n)) #arranges from largest to smallest
```
0utput:
```
Gender n Percent
<chr> <int> <dbl>
1 Male 184 54.8 # close enough if rounded, i guess?
2 Female 151 44.9
3 Non-Binary 1 0.298
```
For the age variable, we realised that there were no "Decline to Answer" responses (thank god!), so the age variable was numeric. Therefore we did not encounter the issues faced in Study 1 and were able to just assign the 'study2_age' variable to to study2_raw and calculate the means as per usual using the summarise() function from dplyr.
#### Age:
```
study2_age <- study2_raw %>%
summarise(mean_age = mean(Age),
sd_age = sd(Age),
min_age = min(Age),
max_age = max(Age))
```
```
mean_age sd_age min_age max_age
<dbl> <dbl> <dbl> <dbl>
1 32.0 11.9 18 72 # SUCCESS (or close enough when rounded)
```
Next we needed to reproduce the percentages of ethnicity's and country's that participants belonged to. Here, since it was percentages we decided to adapt the gender method of finding percentages from study 1.
Here we assigned the variables ('study2_ethnicity' or 'study2_country') to 'study2_raw' to create a new table in the environement of the relevant data percentages for each assigned variable. The data was again arranged in descending order-using the arrange(desc(n)) via dplyr package- to ensure that the percentages are in order from most to least popular participant ethnicity/ country:
#### Ethnicity:
```
study2_ethnicity <- study2_raw %>% #Identical code used here as was used for gender.
group_by(Ethnicity) %>%
summarise(n = n(), Percent = 100 * n()/nrow(study2_raw)) %>%
arrange(desc(n))
```
```
Ethnicity n Percent
<chr> <int> <dbl>
1 White 268 79.8 # SUCCESS
2 Asian 30 8.93
3 Hispanic 14 4.17
4 More than one 10 2.98
5 Black 9 2.68
6 Middle-Eastern 3 0.893
7 American Indian/Alaskan Native 1 0.298
8 Other 1 0.298
```
#### Country:
```
study2_country <- study2_raw %>% #Identical code used here as was used for gender.
group_by(Country) %>%
summarise(n = n(), Percent = 100 * n()/nrow(study2_raw)) %>%
arrange(desc(n))
```
```
Country n Percent
<chr> <int> <dbl>
1 USA 104 31.0 # not 32%???? not sure why (*cue OCD*)
2 UK 90 26.8 # close enough :)
3 Poland 29 8.63
4 Portugal 23 6.85
5 Canada 15 4.46
6 Greece 11 3.27
7 Italy 10 2.98
8 Germany 6 1.79
9 Mexico 6 1.79
10 Slovenia 5 1.49
# … with 19 more rows
```
Although we did not get exactly 32% for participants from the USA, it was close enough (may have been rounding errors?- we will say that a lot as we tend to see that values can get very close but not exactly the same).
#### Marriage stats:
The main issue we encountered when reproducing Study 2 demographics is that they did not provide their marriage data, so we were unable to reproduce the '45% single/never married' data (kind of random since they provided everything else). Here, Samuel decided to email the authors asking them for their marriage data. Initially Dunigan Folk was emailed to which he responded (was very unexpected). However, he emailed us to email his coauthor Dr Okabe-Miyamoto as she had the data with her.

However, we never got a response and therefore could not replicate the data. So on to Stage 2: MEAN/SD!
# Mean/SD:
## Study 1:
After having identified the Mean/SD that need to be identified, we decided to start off with physical distancing:
#### Physical distancing:

As before, the physical distancing percentage was also produced by adapting the original gender percentage and therefore we were able to easily reproduce this. Again, we assigned the 'study1_socialdistancing' variable to study1_raw where we grouped the data based on the 'SocialDistancing variable' and then the summarise() function from the dplyr package to determine the percentage of people who socially distanced:
```
# physical/social distancing
study1_socialdistancing <- study1_raw %>% # creates a new variable in the environment
group_by(SocialDistancing) %>% # groups data by social distancing
summarise(n = n(), # counts the number of instances
percent = 100 * n()/nrow(study1_raw)) # find the percentage
```
```
SocialDistancing n percent
<dbl> <int> <dbl>
1 1 460 98.5 # % who socially distanced #YAY
2 3 7 1.50 # did not socially distance
```
Woohoo! Next we reproduced the six feet distancing statistics. Here, I had to resort to google to determine how mode can be calculated. It was suggested to install the 'library(modeest)' package. After assigning variables, mlv was used to used as a generic function to compute a generic estimate of the mode of the study1_sixfeet data. Here, the "mfv" method was used to return the mode of the data:
```
# attempt 1
library(modeest)
study1_sixfeet <- study1_raw %>%
mlv(study1_sixfeet, method = "mfv") # FAIL
```
However, the following error message was produced:
```
Error in mlv.default(., study1_sixfeet, method = "mfv") :
is.numeric(x) is not TRUE
```
After inserting the error message into Google it was suggested to use the shapiro.test to determine whether the data is normally distributed:
```
#edit:
library(modeest)
study1_sixfeet <- study1_raw
shapiro.test(study1_sixfeet)
mlv(study1_sixfeet, method = "mfv")
```
```
Error in shapiro.test(study1_sixfeet) : is.numeric(x) is not TRUE
> mlv(study1_sixfeet, method = "mfv")
Error in mlv.default(study1_sixfeet, method = "mfv") :
is.numeric(x) is not TRUE
```
I felt this method was too complicated just to calculate mode, so I changed tactics. As R does not have a built in function to determine mode, another website suggested creating a user function to calculate the mode of the data. Here the vector study1_sixfeet is used as input and the mode is produced as an output.
```
#attempt 2:
# Create the function
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
# Calculate mode using user function
six_feet <- getmode(study1_sixfeet)
print(six_feet)
# Create vector and summarise()
study1_sixfeet <- study1_raw %>%
summarise(mode_six_feet = getmode(SixFeet)) # SUCCESS
```
Output:
```
# A tibble: 1 × 1
mode_six_feet
<dbl>
1 0 # SUCCESS
```
However, at a group meeting I realised that although Samuel and I had calculated the mode in a similar way, Georgia was able to create a much more efficient code by installing the lsr package to compute mode then using the modeOf() function to get the mode. She was also able to combine her method of calculating the mode whilst simulatenougly calculating the mean and SD:
```
install.packages("lsr") # used to compute mode
six_feet <- study1_raw %>% # creates a variable from raw data
summarise(modeOf(SixFeet), mean(SixFeet), sd(SixFeet)) # calculates mode, mean and sd
```
However, when I tried to run the code on my R console, I would receive an error message:
```
! could not find function "modeOf"
```
Luckily for me, upon doing study 2 mode, I came upon a much more efficient code to calculate mode, mean and SD in the one code:
```
study1_sixfeet <- study1_raw %>%
summarise(mode_six_feet = getmode(SixFeet),mean(study1_raw$SixFeet), sd(study1_raw$SixFeet))
```
Output:
```
# A tibble: 1 × 3
mode_six_feet `mean(study1_raw$SixFeet)` `sd(study1_raw$SixFeet)`
<dbl> <dbl> <dbl>
1 0 0.767 1.39 # YAY
```
Overall, this was a lesson on the beauty of collaborating rather than dividing and conquering where although it may be a success that we were able to reproduce the data, the ultimate goal is to be able to reproduce it in a way that is efficient. It also shows that there is more than one correct way to reproduce the same value and that by working with your team members, you are able to have diverse and new approaches.
For the following reproduced values, we aimed to reproduce them using both the in-text and table 1 values.
#### Table 1 and Lethargy:
Next we decided to reproduce table 1 as it included most of the intext values as well:

Here, we started off by creating a data frame where we assigned study1_table1 to the study1_raw data where the study1_table1 variable had the unnecessary columns filtered out where we used the select() function in dplyr to only leave the variables that were reported in Table 1 (as shown below):
```
study1_table1 <- study1_raw %>% # creating a new data frame
select(LETHAVERAGE.T1, # this data frame only includes the variables that are in table 1
LETHAVERAGE.T2,
LethDiff,
SCAVERAGE.T1,
SCAVERAGE.T2,
SCdiff,
EXTRAVERSION)
```
We then added another 'table 1' variable to the environment where we used the describe() function- using the psych package- to get the selected descriptive statistics of the given 'study1_table1' dataset.
```
library(psych)
table1 <- describe(study1_table1)
```
However, we realised that it gives you *all* of the varying types of descriptives:

Since we only wanted means and SD, we again decided to use the select function to filter out and leave only the mean and SD columns. Here all the data replicated with the values given in Table 1.
```
table1 <- describe(study1_table1) %>% # create a table that summarises table 1 variables
select(mean, sd) # only include means and sd's
```
```
mean sd
LETHAVERAGE.T1 2.60 1.16
LETHAVERAGE.T2 3.16 1.27
LethDiff 0.56 1.33
SCAVERAGE.T1 4.11 0.88
SCAVERAGE.T2 3.97 0.85
SCdiff -0.14 0.71
EXTRAVERSION 4.17 1.01 #SUCCESS
```
#### Social connectedness x extraversion levels:

As soon as I saw quantiles... I knew I was finished. So I decided to look at the author's R code (in OSF file) to see how they had done it. Here is their method:
##### Most intraverted x social connectedness:
```
#Have most introverted participants experienced changes in connectedness
quantile(study1_table1$EXTRAVERSION) #finding quartiles
BottomQuarter<- subset(study1_table1, EXTRAVERSION <= 3.41667) #Making subset of data with only participants in bottom quartile of extraversion
BottomQuarter
BottomQuarter<-as_tibble(BottomQuarter)
BottomQuarter
mean(BottomQuarter$SCAVERAGE.T2)
sd(BottomQuarter$SCAVERAGE.T2)
mean(BottomQuarter$SCAVERAGE.T1)
sd(BottomQuarter$SCAVERAGE.T1)
```
Output:
```
> mean(BottomQuarter$SCAVERAGE.T2)
[1] 3.351172 # mean for most introverted at Time 2
> sd(BottomQuarter$SCAVERAGE.T2)
[1] 0.655226 # SD for most introverted at Time 2
> mean(BottomQuarter$SCAVERAGE.T1)
[1] 3.446572 # mean for most introverted at Time 1
> sd(BottomQuarter$SCAVERAGE.T1)
[1] 0.7024337 # SD for most introverted at Time 1
# SUCCESS... they all match the in-text reported data (making life so much easier buttt...huh?)
```
Although the code worked for both most introverted and most extraverted participants, the code was defintely far from easy to read and understand. Especially for someone as helpless at coding as me. So we decided to develop our own method.
#### Our way:
The first step was to determine the quantiles for extraversion using quantile() and filtering out the extraversion data using $:
```
# Introverts vs Extraverts -----------------------------------------------
# quantile split by score (quantile cutoff)
study1_quantile <- quantile(study1_raw$EXTRAVERSION) # find quantiles
```
```
0% 25% 50% 75% 100%
1.50000 3.41667 4.16667 4.83333 6.75000
```
After determining the quantiles, we created data subsets for the most introverted and most extraverted participants. Here, we used select() to only have our desired dependant variables where the authors only used extraversion and social connectedness levels at T1 and T2. We then filtered out the data to only include the most introverted or most extroverted in their respective data subsets.
**Introverts**:
The most introverted participants were deemed as those who fell into the 0% and 25% quantile (first quantile or less):
```
study1_introverts <- study1_raw %>% # create new variable using raw data
select(EXTRAVERSION, SCAVERAGE.T1, SCAVERAGE.T2) %>% # only including three variables
filter(EXTRAVERSION <= 3.41667) %>% # filtering out data so that only the most introverted (first quantile) are included in this subset
mutate(Type = "Most Introverted") # create new variable from data subset called "most introverted"
# this results in 119 people in this group ^ (as we can see in the 119 rows)
```
Output:- We have 119 participants that fall within the "Most Introverted" subset.

**Extroverts**:
The most extraverted participants were those who fell into the third quantile (75%) or greater:
```
study1_extraverts <- study1_raw %>% # create a new variable using raw data
select(EXTRAVERSION, SCAVERAGE.T1, SCAVERAGE.T2) %>% # only including three variables of interest
filter (EXTRAVERSION >= 4.83333) %>% # only including the most extraverted (4th quatile cutoff)
mutate(Type = "Most Extraverted") # create new variable from data subset called "most extraverted"
# This results in 130 people in this group ^
```
Output: - This left us with 130 participants that fall into the "Most Extraverted" participants subset:

We then moved on to determining the mean and sd for each relevant most introverted/ extraverted group using describe() and selecting the mean and sd columns:
**Introverts**:
```
# Finding the means and Sd's
introvert_meanSC <- describe(study1_introverts) %>% # new variable using most introverted data
select(mean, sd) # calulates only means and sd's
```
Output:
```
introvert_meanSC
mean sd
EXTRAVERSION 2.89 0.39
SCAVERAGE.T1 3.45 0.70 #YES
SCAVERAGE.T2 3.35 0.66 # YES
Type* 1.00 0.00
```
**Extraverts**:
```
extravert_meanSC <- describe(study1_extraverts) %>% # new variable using most extraverted data
select(mean, sd) # calulates means and sds
```
Output:
```
extravert_meanSC
mean sd
EXTRAVERSION 5.42 0.47
SCAVERAGE.T1 4.70 0.72 # YES
SCAVERAGE.T2 4.45 0.84 # YES
Type* 1.00 0.00
```
Although this was the most challenging part of the Mean/SD process, it showed me the importance of taking things step by step and dissecting the error code that R provides.
## Study 2:
### Physical distancing:

Again, we started off with replicating the social distancing descriptives. Here, we used the same method as Study 1:
```
study2_socialdistancing <- study2_raw %>% # creates a new variable in the environment
group_by(SocialDistancing) %>% # groups data by social distancing
summarise(n = n(), # counts the number of instances
percent = 100 * n()/nrow(study2_raw)) # find the percentage
```
However, my value was different to what was reported in the text:
```
# A tibble: 2 × 3
SocialDistancing n percent
<int> <int> <dbl>
1 0 23 6.85
2 1 313 93.2 # not 92.9%???
```
After running my other group member's code, they had also received the same output, so it must have been another error on the author's part.
Next we replicated the six feet distancing values again using the same code as in Study 1:
```
# Create vector and summarise()
study2_sixfeet <- study2_raw %>%
summarise(mode_six_feet = getmode(SixFeet),mean(study2_raw$SixFeet), sd(study2_raw$SixFeet))
Output:
mode_six_feet mean(study2_raw$SixFeet) sd(study2_raw$SixFeet)
1 0 1.116071 1.753864 #SUCCESS (more efficient code that develops mean, sd and mode but.....)
```
Why is the SD= 1.75 rather than 0.75? Upon comparing my group's code to mine they also received the same result (and their's look a lot more efficient than mine). We then decided to run the author's R code:
```
# Social Distancing
install.packages("psych")
install.packages("plyr R")
count(study2_raw$SixFeet)
describeBy(study2_raw$SixFeet))
```
Although I tried to run the code multiple times and ensured I installed the right packages, typed the command correctly, etc, I could not seem to get rid of the following error code:
```
Error in UseMethod("count") :
no applicable method for 'count' applied to an object of class "c('integer', 'numeric')"
or
Error in describeBy(study2_raw$SocialDistancing) :
could not find function "describeBy"
```
and having searched the internet for ways to fix the error (maybe describe.by? nope), it did still produce SD=1.75 for my group members when they ran it so I will take their word for it to reduce my anxiety.
It really is a crisis :)
### Table 3:

Next we decided to replicate table 3 as it also contained the in-text values (or so we thought). Here we started off by using the describe() function to calculate statistical data where we used the select() function to specifically produce mean/sd output. We also specified the relevant variables that were listed in table 3:
```
llibrary(psych)
study2_summary <- describe(study2_raw[, c('T1SWLS', #T1 life satisfaction
'T2SWLS', # T2 life satisfaction
'SWLS_Diff', # Life satisfaction change
'T1BMPN', # T1 Relatedness score
'T2BMPN', #T2 Relatedness score
'BMPN_Diff', # Relatedness difference score
'T1Lonely', # T1 Loneliness
'T2Lonely', # T2 Loneliness
'Lonely_Diff', # Loneliness change (T2-T1)
'T1Extraversion')]) %>% # Extraversion level pre-pandemic
select(mean,sd)
```
Output:
```
mean sd
T1SWLS 3.97 1.53
T2SWLS 3.99 1.45
SWLS_Diff 0.02 0.88
T1BMPN 4.92 1.09
T2BMPN 4.91 1.14
BMPN_Diff -0.01 1.11
T1Lonely 2.12 0.65
T2Lonely 2.06 0.62
Lonely_Diff -0.06 0.40
T1Extraversion 3.90 0.79 # All the same as table 3 values! Yay!
```
Now whilst I had compared my output to the table values, an issue we found is the discrepency between the table and tntext values:
- Change in relatedness due to COVID:

For example, we can see the relatedness score reported in the text is for "relatedness prior to the pandemic (Time 1: M= 4.90, SD = 1.11)". In the table, the reported mean is M=4.92 and the reported SD is 1.09 (not 1.11). This was a consistent error which could be seen for relatedness T2, loneliness scores, etc. It caused us a lot of unneccessary frustration as whilst Samuel was using the in-text values, Natasha and I were using the table values and were confused as to how he was getting them wrong. As we said in our presentation, this brought into question the reliability of the peer review process.
### Exploratory analysis: Extraversion x Loneliness:
For this exploratory, I tried to use the author's code to see whether it would work as well, however, I came across the same issue where I found it too difficult to understand their method. So I decided to use our method again. Once again, I figured out the quantiles for study 2 extraversion.
```
# STUDY2: Split Quantile by scores
study2_quantile <- quantile(study2_raw$T1Extraversion)
```
Output:
```
0% 25% 50% 75% 100%
2.083333 3.333333 3.833333 4.416667 6.000000
```
I then created the subsets for 'most introverted' versus 'most extraverted' where I picked the vaiables of interest using the select () function to isolate the T1Extraversion, T1Lonely, T2Lonely variables specified by the authors. I then filtered the rows based on the extraversion data subsets. For most introverted they needed to have fall in the 25% quantile or less (<=3.33). For most extraverted, they had to fall into the 75% quantile or higher (>=4.416777).
```
# Create and filter lower quantile of extroverts (introverts)
library(dplyr)
study2_introverts <- study2_raw %>%
select(T1Extraversion, T1Lonely, T2Lonely) %>%
filter(T1Extraversion <= 3.33)
# There is 80 people as a result in this group (Quantile 1).
# Create and filter upper quantile of extroverts (most extroverted)
study2_extroverts <- study2_raw %>%
select(T1Extraversion, T1Lonely, T2Lonely) %>%
filter(T1Extraversion >=4.416777)
# This results in 83 people as being categorised as the most extroverted
# Calculate mean and sd's of each of the above subsets
library(psych)
introvert_meanL <- describe(study2_introverts) %>%
select(mean,sd)
```
```
introvert_meanL
mean sd
T1Extraversion 2.90 0.29
T1Lonely 2.53 0.63
T2Lonely 2.31 0.63 # T1Lonely is off by 0.02 for both mean and SD
```
Most Extraverted:
```
extrovert_meanL <- describe(study2_extroverts) %>%
select(mean,sd)
```
```
> extrovert_meanL
mean sd
T1Extraversion 4.95 0.34
T1Lonely 1.63 0.49 # T1Lonely is off by 0.02 for both mean and SD
T2Lonely 1.67 0.49
```
However, for study 2 we were unable to reproduce the exact same values (off by .02), and as much as I try I am not sure why that is the case. Once again...What a crisis :)
# Figures:
## Study 1:
### Figure 1:

When replicating figure 1, I started off by identifying what type of graph it was (a histogram), and then simply search up 'how to create a histogram in R' and voila, I got an amazing [youtube video](https://www.youtube.com/watch?v=tp_BG5wDeVU&t=112s) that took me step-by-step on how to make a histogram, producing the following code using the tidyverse package:
```
figure1 <- study1_raw
hist(study1_raw$SCdiff, # to make histogram
breaks = 13, # number of bars in histogram
name = "Distribution of Social Connectedness Difference Scores")
```
However, the video did not include how to label the axis or how to name the graph (which I kind of fluked by writing name = , but as you can guess that did not work so google it is!). A quick google search informed me to use the main, xlab and ylab parameters:
```
#attempt 2
figure1 <- study1_raw
hist(study1_raw$SCdiff, # create a histogram fro SCdiff
breaks = 13, #number of bars in the histogram
main = "Distribution of Social Connectedness Difference Scores", # name of graph
xlab = "Social Connectedness Difference Score (T2-T1)", # x axis label
ylab = "Frequency", # y axis label
) #SUCCESS
```
This produced the following graph:

Yes they look similar! (Never have I been more grateful to the authors that they did not try to be fancy by adding different colours)
However, when liasing with my team members, I found that they were not so lucky and had tried to use ggplot and the geom_histogram functions to produce the graph:
```
# Failed Attempts
figure1<- ggplot(study1_raw, aes(x = SCdiff)) +
geom_histogram(colour = "black" , fill = "grey", bins=12) +
ggtitle("Distribution of Social Connectedness Difference Scores") + # for the main title
xlab("Social Connectedness Difference Score (T2 - T1)") + # x axis label
ylab("Frequency") # y axis label
# doesnt work ^^
#still doesnt work
figure1 <- ggplot(study1_raw) + geom_histogram(
mapping = aes(x = SCdiff), colour = "black", fill = "grey", bins = 13)
```
Until eventually they had reached the same code I had:
```
successful attempt:
figure1 <- hist(study1_raw$SCdiff, # create a histogram for SCdiff variable
main = "Distruibution of Social Connectedness Difference Scores", # for the main title
xlab = "Social Connectedness Difference Score (T2 - T1)") # for the x axis label
```
Keep in mind that our approach was a divide and conquer - so yes it was not the best idea and we could have saved a lot of time if we had just done them together and communicated more consistently. It was also hard to see what other team members were doing in their R studio, so we would have to wait until they post it in the shared HackMD document to be able to compare codes (you live and you learn). Not very efficient but we got there in the end.
However, upon editing our code, we came across a more effective method using ggplot:
```
figure1<- ggplot(study1_raw, aes(x = SCdiff)) +
geom_histogram(colour = "black" , fill = "grey", breaks = seq(-4, 4, by = 0.5)) +
scale_x_continuous(breaks = seq(-3, 3, by = 1)) +
scale_y_continuous(expand = expansion(mult = 0, add = 0)) +
theme(
axis.line = element_line(colour = "black"),
panel.background = element_blank())+
ggtitle("Distribution of Social Connectedness Difference Scores") +
xlab("Social Connectedness Difference Score (T2 - T1)") +
ylab("Frequency")
```
Produced:

## Study 2:
### Figure 2:
We then began replicating figure 2 starting with the "Distribution of Relatedness Difference Scores" started off by making a bunch of mistakes for relatedness and then did the same code for loneliness (why not do what you did in study 1? dont ask)

### Relatedness:
Initially we started off by installing the packages where we also installed 'ggplot2' package to create more complex plots using the 'study2data' in the 'Relatedness_diff' data frame where we used the select() dplyr function to extract and use only the 'BMPN_Diff' (relatedness difference scores) to plot the graph:
```
# Load packages
library(tidyverse)
library(ggplot2)
# Read the data
study2data <- read_csv(file= "Study 2.csv")
# Create data frame
Relatedness_diff <- study2data %>%
select(BMPN_Diff)
```
So we initially made the mistake of trying to use ggplot, specifically the geom_histogram function, to create our graph. Not sure why I thought it would work this time. We tried to add features, such as colour and the number of bars:
```
relatedness <- ggplot(Relatedness_diff, aes(x = BMPN_Diff)) +
geom_histogram(colour = "black" , fill = "grey", bins = 18) +
ggtitle("Distribution of Relatedness Difference Scores") + # for the main title
xlab("Relatedness Difference Score (T2 - T1)") + # x axis label
ylab("Frequency") # y axis label
```
And yes it did work!

Or so we thought. Upon closer inspection, we saw that the frequencies of each social difference score in our graph looks different to what was produced in the study. This is particularly evident from the 1 to -1 relatedness difference scores where our columns show more variability. So we decided to use the Study 1 method.
Here, we used the hist() function to produce the distribution of relatedness difference scores histogram and labelled the x and y axis as in Study 1. This process was then repeated to reproduce the loneliness aspect of figure 2. We then investigated methods to combine the two graphs into a grid. We originally tried to use 'library(cowplot)' package, however we realised that this was only for ggplots. Thus, we set out to find a different method. We decided to ask our tutor Yooki where she suggested we use the par() function to combine the two histograms:
```
par(mfrow=c(1,2))
hist(study2_raw$Lonely_Diff, main = "Distribution of Loneliness Scores",
xlab = "Loneliness Difference Score (T2-T1)", xlim=c(-2,2), ylim=c(0,50), breaks = 40)
hist(study2_raw$BMPN_Diff, main = "Distribution of Relatedness Difference Scores",
xlab = "Relatedness Difference Score (T2-T1)", xlim=c(-6,4), ylim=c(0,80), breaks = 20)
```
This produced:

So...yay!
Upon editing our work, we decided we wanted a graph that allows for more customisation (via ggplot). This was mainly for smaller details such as having the x and y axis joined and just making the graph more similar to that in the original study. I have provided a breakdown of or steps in the code:
```
# Figure 2
library(ggplot2)
# Figure 2- Relatedness:
figure2_relatedness<- ggplot(study2_raw, aes(x = BMPN_Diff)) + # create a new graph based on relatedness scores
geom_histogram(colour = "black" , fill = "grey", breaks = seq(-6, 4, by = 0.5)) + # customise graph, seq() allows for x axis length to be customised, by = allows us to specify the number of columns we want per one unit on the x axis
scale_x_continuous(breaks = seq(-5, 4, by = 1)) + # specify x axis (from -5 to 4) with each x value being plotted a unit away (ie specify we want it as -5, -4, -3, etc)
scale_y_continuous(expand = expansion(mult = 0, add = 0)) + # customise space between data and axis using expansion (by making it = 0 we are indicating we want no gaps)
theme(
axis.line = element_line(colour = "black"), # customise axes
panel.background = element_blank())+ # remove graph background
ggtitle("Distribution of Relatedness Difference Scores") + # name of graph
xlab("Relatedness Difference Score (T2 - T1)") + # name axes
ylab("Frequency")
# Figure 2:- Loneliness
figure2_loneliness<- ggplot(study2_raw, aes(x = Lonely_Diff)) + # create a new graph based on loneliness scores
geom_histogram(colour = "black" , fill = "grey", breaks = seq(-2, 2, by = 0.1)) + # customise x axis to range from -2 to 2 as in the study and to have a column every .1 unit on the x axis
scale_x_continuous(breaks = seq(-2, 2, by = 1)) + # customise x axis
scale_y_continuous(expand = expansion(mult = 0, add = 0)) + # no gaps
theme(
axis.line = element_line(colour = "black"),
panel.background = element_blank())+
ggtitle("Distribution of Loneliness Difference Scores") +
xlab("Loneliness Difference Score (T2 - T1)") +
ylab("Frequency")
# Joint graph:
library(cowplot)
figure2<- plot_grid(figure2_relatedness, figure2_loneliness) # combine two ggplots in a grid
```
This produced:

This process showed me again that although it is great to have been able to reproduce the original study's statistics, it is also important to go over your code and ensure that it is the *best* way to reproduce it. It is good to always come back to your original code and see if there are better (and perhaps more efficient) ways of reproducing the data.
## Figure 3:
Before I get into our final solution, I would like to explore some of the errors we came across (not all as ther's way too many). We initially underestimated what a nightmare this graph would be so here is a couple of our failed attempts:
1.
```
figure3 <- ggplot(figure3_study1)+
geom_line(mapping = aes(
x = Time,
y = SCAVERAGE,
lty = Type,
group = Type)) # FAIL
```
Producing:

2.
```
# add in stat summary
figure3 <- ggplot(figure3_study1)+
geom_line(mapping = aes(
x = Time,
y = SCAVERAGE,
lty = Type,
group = Type)) + stat_summary(aes(x = Time, y = SCAVERAGE, group = Type)) # FAIL
```
Produced:

3.
```
# doesnt work
figure3 <- ggplot(figure3_study1)+
stat_summary(aes(x = Time, y = SCAVERAGE, group = Type)) +
geom_line(aes(x = Time, y = mean(SCAVERAGE), lty = Type, group = Type))
```
Produced:

At least the points were plotted?
Anyway, so we decided to go back and read the article method. As soon as we saw that the sample was "subdivided" into "the top and bottom quartiles for extraversion", we knew to use the same method as our previous exploratory methods. Now this figure involved two sections:- social connectedness and loneliness. We started off with social connectedness.
### Social connectedness:
Here we needed to use the study 1 data and determine the quantiles for extraversion levels. This was done using the previous methods:
```
# quantile split by score (quantile cutoff)
study1_quantile <- quantile(study1_raw$EXTRAVERSION)
```
```
0% 25% 50% 75% 100%
1.50000 3.41667 4.16667 4.83333 6.75000
```
We then created the data subsets for most introverted and most introverted again using the same method as in study 1. We also used the mutate() function to add the new variable 'Type' as when we combine the data later, it will be hard to distingush who are the most introverted and who are the most extroverted (easier to read and distingush):
```
#STEP ONE - create data subsets that filter out the most introverted and the most extraverted adding a new variable that classifies each as either extravert/introvert
study1_introverts <- study1_raw %>% # create new variable using raw data
select(EXTRAVERSION, SCAVERAGE.T1, SCAVERAGE.T2) %>% # only including three variables
filter(EXTRAVERSION <= 3.41667) %>% # filtering out data so that only the most introverted (first quantile) are included in this subset
mutate(Type = "Most Introverted")
study1_extraverts <- study1_raw %>% # create a new variable using raw data
select(EXTRAVERSION, SCAVERAGE.T1, SCAVERAGE.T2) %>% # only including three variables of interest
filter(EXTRAVERSION >= 4.83333) %>% # only including the most extraverted (4th quatile cuoff)
mutate(Type = "Most Extraverted") # adding a variable so its easier to graph
```
Output:

Upon working on my exploratory for awhile, I knew that to visualise these separate data, we needed to find ways to combine these 2 datasets (most intraverted + most extraverted) as we cannot visualise two different datasets in the one graph. Here, we decided to use the bind_rows() function from the dplyr package to combine the two datasets:
```
study1_extreme <- bind_rows(study1_introverts, study1_extraverts)
```
Now we needed to separate this 'study1_extreme' joint data into pre and during COVID timeframes. Since this graph centred on social connectedness levels, we selected SCAVERAGE scores and created a new varible within the dataset called 'Before Pandemic' using mutate(). This was done as when we combine these timeframes again, it will be hard to tell what rows are for 'before pandemic' and which are 'during pandemic', so this makes it easier to read the data. We also decided to rename SCAVERAGE.T1 to just SCAVERAGE:
```
# STEP THREE - create new variables that have separated the different time variables but includes both introverts and extraverts
study1_extreme_before <- study1_extreme %>% #creating a new variable using the extreme data (most introverted and extraverted)
select(SCAVERAGE.T1, Type) %>% #only including variables of interest --> added type using the mutate function
mutate(Time = "Before Pandemic") %>% # create a new variable that classifies the time period
rename(SCAVERAGE = SCAVERAGE.T1) # renaming so that this can be plotted on the y axix
```
The same method was repeated for 'during the pandemic' timeframe:
```
study1_extreme_during <- study1_extreme %>% # same but for the second time measurement
select(SCAVERAGE.T2, Type) %>%
mutate(Time = "During Pandemic") %>%
rename(SCAVERAGE = SCAVERAGE.T2)
```
Output:

We then binded the data together again (cannot visualise data on the one graph with multiple datasets). This creates a dataset that includes a categorisation of 'most introverted' vs 'most extraverted' *and* 'before" or 'during' with the corresponding social connectedness (SCAVERAGE) score :
```
figure3_study1 <- bind_rows(study1_extreme_before, study1_extreme_during) # combining the two variables created earlier into one data set stacked on top of each other
```
Now that all of the data is combined and categorised, we now needed to determine the descriptives, particularly the means and sd, for each category:
```
# STEP FIVE create a summary table with all the SC means of the four groups
figure3_study1_summary <- figure3_study1 %>% # new variable
group_by(Type, Time) %>%
summarise(SCAVERAGE = mean(SCAVERAGE))
```
Output:
```
# A tibble: 4 × 5
# Groups: Type [2]
Type Time SCAVERAGE lower upper
<chr> <chr> <dbl> <dbl> <dbl>
1 Most Extraverted Before Pandemic 4.70 4.58 4.83
2 Most Extraverted During Pandemic 4.45 4.3 4.59
3 Most Introverted Before Pandemic 3.45 3.32 3.57
4 Most Introverted During Pandemic 3.35 3.23 3.47
```
Now we needed to isolate the variables again to be able to form our graph that differentiates between before vs during pandemic and most extraverted vs most introverted. That is, we create variables that differ based on the specific level of extraversion and time point. For example, we created assigned a new variable to the dataframe called 'study1_extreme_before_introverts' from the 'study1_extreme_before' dataset (that specifies only prepandemic values) where this variable only takes into account the 'most introverted' participants. This was done by using filter():
```
study1_extreme_before_introverts <- study1_extreme_before %>%
filter(Type == "Most Introverted") # only want most introverted (and before the pandemic)
```
Output:

Thus, we see that we have created a variable that is specific to 'before pandemic' and only for the 'most introverted' participants. We then coverted the variable into values where we only need the SCAVERAGE values. This was done to no longer make them no longer function or data objects. This was done using $ and using SCAVERAGE scores (our only variable of interest):
```
study1_extreme_before_extraverts <- study1_extreme_before_extraverts$SCAVERAGE
```
Output:

We can see that it is now just a string of values (not categorised).
This was repeated for each required variable:
- During pandemic + Most introverted:
```
# during pandemic - introverts
study1_extreme_during_introverts <- study1_extreme_during %>%
filter(Type == "Most Introverted")
study1_extreme_during_introverts <- study1_extreme_during_introverts$SCAVERAGE
```
- Before pandemic + Most extraverted:
```
# before pandemic - extraverts
study1_extreme_before_extraverts <- study1_extreme_before %>%
filter(Type == "Most Extraverted")
study1_extreme_before_extraverts <- study1_extreme_before_extraverts$SCAVERAGE
```
- During pandemic + Most extraverted:
```
# during pandemic - extraverts
study1_extreme_during_extraverts <- study1_extreme_during %>%
filter(Type == "Most Extraverted")
study1_extreme_during_extraverts <- study1_extreme_during_extraverts$SCAVERAGE
```
Now for the error bars! Now in the caption, it was stated that "95% CI error bars" were used. So we could not just used SE error bars. For this we had to experiment with several trial and error methods to determine CI. So eventually we resorted to looking at the author's code and determining how they conducted their CI's and used a slightly altered version based on our desired dataset:
```
t.test(study1_extreme_during_introverts,
alternative = "two.sided", paired = TRUE)
```
To which we got an error:
```
Error in t.test.default(study1_extreme_during_introverts, alternative = "two.sided", :
'y' is missing for paired test
```
So I decided to remove the 'paired = TRUE' and see what that produced:
```
t.test(study1_extreme_during_introverts,
alternative = "two.sided")
```
Output:
```
data: study1_extreme_during_introverts
t = 55.793, df = 118, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
3.232228 3.470116
sample estimates:
mean of x
3.351172 # getting somewhere
```
So that was the we used to determine the upper and lower limits of our CI for each variable using a one-sample, rather than paired sampled, t-test:
- Before pandemic + Most intraverted:
```
t.test(study1_extreme_before_introverts,
alternative = "two.sided")
```
- During pandemic + Most introverted:
```
t.test(study1_extreme_during_introverts,
alternative = "two.sided")
```
- Before pandemic + Most extraverted:
```
t.test(study1_extreme_before_extraverts,
alternative = "two.sided")
```
- During pandemic + Most extraverted:
```
t.test(study1_extreme_during_extraverts,
alternative = "two.sided")
```
We then created vectors that explicitly defined upper and lower CI values using c():
```
# STEP EIGHT - create data for upper and lower limits using the results from the t test above
figure3_study1_summary$lower <- c(4.579, 4.3, 3.319, 3.232) # creating a variable that contains the lower CI limits of the four groups
figure3_study1_summary$upper <- c(4.83, 4.59, 3.574, 3.47) #creating a variable that contains the upper CI limits of the four groups
```
Now that were done with all the pre-work, we can finally graph it! Here I have provided comments to help follow along the graph:
```
# READY TO PLOT THE GRAPH!!!
figure3_SC <- ggplot(figure3_study1_summary) + # used data frame used to find SCAVERAGE mean and SD for all extraversion levels and time points
geom_line(aes(x = Time, y = SCAVERAGE, lty = Type, group = Type)))+ # draws the lines, lty is used to specify changes in line type (changes in line are based on extraversion type)
geom_point(aes(x = Time, y = SCAVERAGE, group = Type))+ # add the points of the SC means for each extraversion type
ylim(3, 5.0)+ # rescales the y axis so it starts from the value of 3 and ends at the value of 5
theme(legend.title = element_blank(), # removes a title from the legend
panel.grid.major = element_blank(), # removes grid lines
panel.grid.minor = element_blank(), # removes grid lines
axis.line = element_line(colour = "black"), # adds lines on the axis and makes them black
panel.background = element_blank())+ #sets the background to white
geom_errorbar(aes( #defining the asethetics of the CI error bars
x = Time, # defining the x variable (before or during pandemic)
y = SCAVERAGE, # defining the y variable
group = Type, # defining how to group (most extraverted or most introverted)
ymin = lower, # setting the lower error bar to the corresponding lower CI value as defined earlier by c()
ymax= upper, # setting the upper error bar to the corresponding upper CI value as defined earlier by c()
width = .1,))+ # sets the width of the CI error bars
ggtitle("Social Connectedness Changes
Based on Extraversion") + # for the main title (put enter as the title is too long when we placed the SC graph beside loneliness- see later)
ylab("Mean Social Connectedness") # title for the Y axis
print(figure3_SC)
```
This produces the following graph:

#### Loneliness:
Now the same process was repeated for the loneliness graph, so a brief summary of the method will be provided.
Here, study 2 data was used and quantiles for extraversion levels were determined:
```
# quantile split by score (quantile cutoff)
study2_quantile <- quantile(study2data$T1Extraversion)
# Ouput:
> study2_quantile
0% 25% 50% 75% 100%
2.083333 3.333333 3.833333 4.416667 6.000000
```
Then the data subsets were created based on extraversion level:
```
study2_introverts <- study2data %>% # create new variable using raw data
select(T1Extraversion, T1Lonely, T2Lonely) %>% # only including three variables
filter(T1Extraversion <= 3.833333) %>% # filtering out data so that only the most introverted (first quantile or less) are included in this subset
mutate(Type = "Most Introverted")
study2_extraverts <- study2data %>% # create a new variable using raw data
select(T1Extraversion, T1Lonely, T2Lonely) %>% # only including three variables of interest
filter(T1Extraversion >= 4.416667) %>% # only including the most extraverted (3rd quantile or larger)
mutate(Type = "Most Extraverted") # adding a variable so its easier to graph
```
We then combined repeated the same process again:
```
# STEP ONE - combining the introvert and extravert subset data into one - stacking on top of each other
study2_figure3_Bind <- bind_rows(study2_introverts, study2_extraverts)
# STEP TWO - create new variables that have separated the different time variables but includes both introverts and extraverts - - stacking on top of each other
study2_figure3_before <- study2_figure3_Bind %>% #creating a new variable using the extreme data (most introverted and extraverted)
select(T1Lonely, Type) %>% #only including variables of interest
mutate(Time = "Before Pandemic") %>% # create a new variable that classifies the time period
rename(Loneliness = T1Lonely) # renaming so that this can be plotted on the y axix
study2_figure3_during <- study2_figure3_Bind %>% # same but for the second time measurement (during pandemic)
select(T2Lonely, Type) %>%
mutate(Time = "During Pandemic") %>%
rename(Loneliness = T2Lonely)
# STEP THREE - bind the two new variables on top of each other (now categorised based on extraversion level and time point)
#- this makes a data set that includes a categorisation of introvert vs extravert and before or during with the corresponding Loneliness score
figure3_study2_data <- bind_rows(study2_figure3_before, study2_figure3_during) # combining the two variables created earlier into one data set stacked on top of each other
# STEP FOUR create a summary table with all the SC means of the four groups
figure3_study2_summary <- figure3_study2_data %>% # new variable
group_by(Type, Time) %>%
summarise(Loneliness = mean(Loneliness))
# Output:
> figure3_study2_summary
# A tibble: 4 × 5
# Groups: Type [2]
Type Time Loneliness lower upper # lower and upper is defining the CI limits that we produced later on
<chr> <chr> <dbl> <dbl> <dbl>
1 Most Extraverted Before Pandemic 1.63 1.52 1.74
2 Most Extraverted During Pandemic 1.67 1.57 1.78
3 Most Introverted Before Pandemic 2.40 2.30 2.50
4 Most Introverted During Pandemic 2.29 2.19 2.38
```
Next we created separate variables that categorise specific to extraversion level and time points and turn them into values in the environment:
```
#Before pandemic - introverts
study2_figure3_introverts_before <- study2_figure3_before %>%
filter(Type == "Most Introverted") # only want most introverted
study2_figure3_introverts_before <- study2_figure3_introverts_before$Loneliness # only want Loneliness variable and need to convert to a value using $
# Before pandemic - extraverts
study2_figure3_extroverts_before <- study2_figure3_before %>%
filter(Type == "Most Extraverted")
study2_figure3_extroverts_before <- study2_figure3_extroverts_before$Loneliness
# During pandemic - introverts
study2_figure3_introverts_during <- study2_figure3_during %>%
filter(Type == "Most Introverted")
study2_figure3_introverts_during <- study2_figure3_introverts_during$Loneliness
# During pandemic - extraverts
study2_figure3_extroverts_during <- study2_figure3_during %>%
filter(Type == "Most Extraverted")
study2_figure3_extroverts_during <- study2_figure3_extroverts_during$Loneliness
# STEP SIX - Find the CI for each group via one samples t test
#Before pandemic - introverts
t.test(study2_figure3_introverts_before,
alternative = "two.sided")
# During pandemic - introverts
t.test(study2_figure3_introverts_during,
alternative = "two.sided")
# Before pandemic - extraverts
t.test(study2_figure3_extroverts_before,
alternative = "two.sided")
# During pandemic - extraverts
t.test(study2_figure3_extroverts_during,
alternative = "two.sided")
```
We then create vectors to specify the lower and upper values of the CI's:
```
figure3_study2_summary$lower <- c(1.519254, 1.566918, 2.300463, 2.191586)
figure3_study2_summary$upper <- c(1.735026, 1.779943, 2.495222, 2.381804)
```
Now we can finally graph it (same as social connectedness)!
```
# TIME TO GRAPH!
figure3_lonely <- ggplot(figure3_study2_summary) +
geom_line(aes(x = Time, y = Loneliness, lty = Type, group = Type))+
geom_point(aes(x = Time, y = Loneliness, group = Type))+
ylim(1, 3.0)+
theme(legend.title = element_blank(),
legend.key = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black"),
panel.background = element_blank())+
geom_errorbar(aes(
x = Time,
y = Loneliness,
group = Type,
ymin = lower,
ymax= upper,
width = .05,))+
ggtitle("Loneliness Changes
Based on Extraversion") +
ylab("Mean Loneliness")
plot(figure3_lonely)
```
This produces:

#### Finalised figure 3:
Now that we have produced the two graphs, to have it look exactly like that in the study's:

I used the following code to combine the two ggplots:
```
library(cowplot) # allows ggplots to be arranged as a grid using plot_grid() function
final_figure3 <- plot_grid(figure3_SC+ theme(legend.position = "none"), figure3_lonely) # legend.position -> used to remove the legend from the connectedness graph
```
This produced:

Yup...definitely a triumph.