# Continued Verification Report ## Did Social Connection Decline During the First Wave of COVID-19?: The Role of Extraversion **Authors:** Saraa Al-Saddik **Group:** Tuesday 1-3pm Group 5 **Date:** 7th August 2022 - [Article](https://online.ucpress.edu/collabra/article/6/1/37/114469/Did-Social-Connection-Decline-During-the-First) # Part 3: Exploratory: ## Exploratory 1: Relationship between age and levels of social connectedness: I had begun my first year of university when COVID had first hit, so I did not really have an oppurtunity to develop strong social relationships outside of my high school friends and family. During this time, I would envy students in their third or fourth years of university as they would have had time to develop these strong social relationships and support network at university. This caused me to feel as though I had reduced levels of social connectedness in comparison to older university students. So, I decided to explore whether my hypothesis that students in their first years of university (first_few) experienced lower levels of social connectedness than those in their third or fourth years (last_few). Now in the study 1 data, we only have the ages of the university 467 university participants. So I will have to make a couple of assumptions in order to conduct this exploratory analysis. Here, I will divide the age groups into quantiles. Here, those in the second quantile (25%) and below will be deemed as being in their first few years of university. Those in the third quantile or above (>= 75% quantile) are deemed as being in their later years of university. Now ofcourse I am aware that there can be older students in their first few years of university due to reasosns such as gap years and so on, however we will just follow the aforementioned assumptions. Now, regardless of whether older participants are in their first few years of university, the argument here is that, for example, those who were 17-18 years old most likely transitioned from school and straight into university. They did not have much of an oppurtunity to immerse themselves in society and be able to form strong social relationships outside of school and family. However, older students, no matter what year of university, do have experience and have engaged in social interactions prior to the pandemic and even starting university. Therefore, regardless, overall the hypothesis is that, on average, the older the undergraduate student, the stronger their support networks, and therefore their levels of social connectedness will not change as much as younger students who feel more isolated from their school friends and university class mates. As seen in our earlier Study 1 Demographics, the maximum age was 44 whilst the minimum age was 17: ``` # A tibble: 1 × 4 mean_age sd_age min_age max_age <dbl> <dbl> <dbl> <dbl> 1 20.9 3.03 17 44 ``` Thus, there are a range of ages to work with. (Note that the following method is a re-adaptation of the figure 3 method). ### Descriptives: Initially I started off by determining the quantiles and classifying undergraduate students as first few years of university (25% and below) or later years of university (75% or above): ``` ####Has social connectedness changed more for younger or older undergraduate students? # quantile split by score (quantile cutoff) study1_quantile <- quantile(study1_raw$Age) # find quantiles ``` To which I got an error message: ``` Error in (1 - h) * qs[i] : non-numeric argument to binary operator ``` Oh yes, the age dilemma. So I decided to apply our previous solution to the dilemma by selecting only the Age data from the study 1 data, turning it into a numeric variable and then filtering out the 'Decline to Answer' responses: ``` #attempt 2 # Filter out 'Decline to answer responses and make age numeric study1_raw$Age <- as.numeric(study1_raw$Age) # filter out Age data and turn into numeric variable # Output: Warning message: NAs introduced by coercion # quantile split by score (quantile cutoff) study1_quantile <- quantile(study1_raw$Age, na.rm = TRUE) # find quantiles and remove "decline to answer"/NA responses ``` Next, as per usual, I created data subsets based on whether participants were classified as first few years or later years of university: ``` # create data subsets first_few <- study1_raw %>% # create new variable using raw data select(Age, SCAVERAGE.T1, SCAVERAGE.T2, SCdiff, EXTRAVERSION) %>% # only including four variables filter(Age <= 19) %>% # filtering out data so that only the younger participants (first quantile) are included in this subset mutate(Type = "First few years") # creating new 'Type' variable to graph # this results in 143 people in this group ^ final_few <- study1_raw %>% # create a new variable using raw data select(Age, SCAVERAGE.T1, SCAVERAGE.T2, SCdiff, EXTRAVERSION) %>% # only including four variables of interest filter (Age >= 22) %>% # only including participants above 75% quantile mutate(Type = "Final years") # creating new 'Type' variable to graph # This results in 118 people in this group ^ ``` I then combined the two data subsets into one to be able to visualise the data later on: ``` # combining first few years and final years data into one study1_age_extremes <- bind_rows(first_few, final_few) ``` I then created new variables based on both time point (before or during pandemic) and age (first years of final years): ``` # New variable based on time point and age age_extreme_before <- study1_age_extremes %>% #creating a new variable using the extreme data (first years and last years) select(SCAVERAGE.T1, Type) %>% #only including variables of interest --> added type using the mutate function mutate(Time = "Before Pandemic") %>% # create a new variable that classifies the time period rename(SCAVERAGE = SCAVERAGE.T1) # renaming so that this can be plotted on the y axis age_extreme_during <- study1_age_extremes %>% # same but for the second time measurement select(SCAVERAGE.T2, Type) %>% mutate(Time = "During Pandemic") %>% rename(SCAVERAGE = SCAVERAGE.T2) ``` And then combined the data frames once again, now categorised based on time and age: ``` # Combining the two variables (age and time point) age_time <- bind_rows(age_extreme_before, age_extreme_during) # combining the two variables created earlier into one data set stacked on top of each other ``` Now that all variables are categorised into the one data frame, the mean for each group can be determined: ``` # Finding the means: age_time_summary <- age_time %>% # new variable group_by(Type, Time) %>% summarise(SCAVERAGE = mean(SCAVERAGE)) ``` Same as before, we then create separate variables that are categorised based on age (proxy for first or last years of university) and time of measurement: ``` # Create separate variables for each group specific to age and time of measurement #Before pandemic - first years study1_extreme_before_first <- age_extreme_before %>% filter(Type == "First Years") # only want first years (and before the pandemic) study1_extreme_before_first <- study1_extreme_before_first$SCAVERAGE # only want SCAVERAGE variable and need to convert to a values in the environment bar using $ # before pandemic - last years study1_extreme_before_last <- age_extreme_before %>% filter(Type == "Last Years") study1_extreme_before_last <- study1_extreme_before_last$SCAVERAGE # during pandemic - first years study1_extreme_during_first <- age_extreme_during %>% filter(Type == "First Years") study1_extreme_during_first <- study1_extreme_during_first$SCAVERAGE # during pandemic - last years study1_extreme_during_last <- age_extreme_during %>% filter(Type == "Last Years") study1_extreme_during_last <- study1_extreme_during_last$SCAVERAGE ``` So now we are ready to plot our graph! ### Visualisation: ``` #Plot graph age_SC <- ggplot(age_time_summary) + # used data frame for mean and SD geom_line(aes(x = Time, y = SCAVERAGE, lty = Type, group = Type))+ # draws the lines geom_point(aes(x = Time, y = SCAVERAGE, group = Type))+ # add the points on the means ylim(3, 5.0)+ # rescales the y axis theme(legend.title = element_blank(), # removes a title from the legend panel.grid.major = element_blank(), # removes grid lines panel.grid.minor = element_blank(), # removes grid lines axis.line = element_line(colour = "black"), # adds lines on the axis and makes them black panel.background = element_blank()) + # title for the Y axis #sets the background to white ggtitle("Social Connectedness Changes Based on Age") + # for the main title ylab("Mean Social Connectedness") # label the y axis ``` ### Results: The graph tells us that initially first and last year university students had the same level of social connectedness, but during the pandemic their levels of social connectedness began to differ. We see that although both groups had a decrease in social connectedness levels, first years did in fact experience lower levels of social connectedness than students in their final years of their undergraduate degree. This brought into question whether this difference in social connectedness during the pandemic between the two groups is significant. #### Test of significance: To determine CI, I tried using the same method as figure 3, however, it did not work. Here's an example: ``` t.test(study1_extreme_before_first, alternative = "two.sided") # Error code: Error in t.test.default(study1_extreme_before_first, alternative = "two.sided") : not enough 'x' observations ``` I then tried to use other methods to test significance such as: ``` wilcox.test(study1_extreme_during_first,study1_extreme_during_last, alternative = "g") # FAIL or wilcox.test(study1_extreme_before_first, conf.int = TRUE) # FAIL ``` It was suggested that a reason for this error was due to NA values, and due to the age dilemma I was worried about that, so I tried methods to remove na values: ``` na <- na.omit(study1_extreme_during_first,study1_extreme_during_last) # to remove NA values, did not work- would give 0 observations in environment ``` After multiple rounds of trial and error, I decided to try another data frame as I only needed to test the significance of the 'during the pandemic' timeframe anyway: ``` # try different data frame? SCdiff <- age_time_summary %>% # create new variable using raw data select(SCAVERAGE, Type, Time) %>% # only including four variables filter(Time == "During Pandemic") # Test whether difference in social connectedness during pandemic is statistically significant between first v last uni years: t.test(SCdiff$SCAVERAGE, alternative = "two.sided") # SUCCESS ``` Thus, this supports the hypothesis that students in their first few years of university experienced statistically significant changes in social connectedness levels than older students who had more time to develop these strong social connections prior to the pandemic. This is seen in the p < .05 value and how CI does not contain zero therefore rejecting the null hypothesis that there is no significant difference in social connectedness between those in their first few years versus their last few years of university. This study is beneficial in allowing university support system to conduct workshops and activities that give first year students an oppurtunity to work on enhancing these social connectedness levels and developing a strong social network. ## Exploratory 2: Relationship between number of people participants encountered during social distancing and loneliness levels: Although the researchers did compare levels of loneliness for individuals who reported undergoing social distancing, I decided to go a bit deeper where I wanted to explore whether the *amount* of people that individuals encountered within six feet of them (outside of their household) affected their loneliness levels. It was hypothesised that the more people participants encountered within six feet, the lower the change in their loneliness levels as their need for belonging and social interaction is being addressed (to some extent) more than those with minimal to no contact. To test this, I again readapted the figure 3 method: ``` # quantile split by score (quantile cutoff) study2_quantile <- quantile(study2_raw$SixFeet) # find quantiles for six feet physical distancing # create data subsets study2_no <- study2_raw %>% # create new variable using raw data select(SixFeet, T1Lonely, T2Lonely) %>% # only including ten desired variables filter(study2_raw$SixFeet <= 0) %>% # filtering out data so that only the people who reported seeing no people (first quantile) are included in this subset mutate(Type = "No Contact") # this results in 204 people in this group ^ study2_min<- study2_raw %>% # create a new variable using raw data select(SixFeet, T1Lonely, T2Lonely) %>% filter (study2_raw$SixFeet > 0, study2_raw$SixFeet <= 2 ) %>% # filtering out data so that only the people who reported seeing an acceptable amount of people are included in this subset mutate(Type = "Minimal Contact") # This results in 64 people in this group ^ study2_max <- study2_raw %>% # create a new variable using raw data select(SixFeet, T1Lonely, T2Lonely) %>% # only including four variables of interest filter (study2_raw$SixFeet > 2) %>% # only including participants greater than the 75% quantile mutate(Type = "Max Contact") # This results in 68 people in this group ^ # Combining the no, min and max six ft social distancing subset data into one - stacking on top of each other study2_sixft_Bind <- bind_rows(study2_no,study2_min,study2_max) # Create new variables that have separated the different time variables but includes all forms of distancing - - stacking on top of each other study2_sixft_before <- study2_sixft_Bind %>% #creating a new variable using the bind data select(T1Lonely, Type) %>% #only including variables of interest mutate(Time = "Before Pandemic") %>% # create a new variable that classifies the time period rename(Loneliness = T1Lonely) # renaming so that this can be plotted on the y axis study2_sixft_during <- study2_sixft_Bind %>% # same but for the second time measurement select(T2Lonely, Type) %>% mutate(Time = "During Pandemic") %>% rename(Loneliness = T2Lonely) # Bind the two new variables on top of each other # this makes a data set that includes a categorisation of introvert vs extravert and before or during with the corresponding Loneliness score sixft_study2_data <- bind_rows(study2_sixft_before, study2_sixft_during) # combining the two variables created earlier into one data set stacked on top of each other # Create a summary table with all the SC means of the six groups sixft_study2_summary <- sixft_study2_data %>% # new variable group_by(Type, Time) %>% summarise(Loneliness = mean(Loneliness)) # Create separate variables for each group specific to level of six feet social distancing and time of measurement #Before pandemic x no contact study2_no_before <- study2_sixft_before %>% filter(Type == "No Contact") # only want most introverted study2_no_before <- study2_no_before$Loneliness # only want Loneliness variable and need to convert to a value using $ # Before pandemic x minimal contact study2_min_before <- study2_sixft_before %>% filter(Type == "Minimal contact") study2_min_before <- study2_min_before$Loneliness # Before pandemic x max contact study2_max_before <- study2_sixft_before %>% filter(Type == "Max Contact") study2_max_before <- study2_max_before$Loneliness # During pandemic - no contact study2_no_during <- study2_sixft_during %>% filter(Type == "No Contact") study2_no_during <- study2_no_during$Loneliness # During pandemic - min contact study2_min_during <- study2_sixft_during %>% filter(Type == "Minimal Contact") study2_min_during <- study2_min_during$Loneliness # During pandemic - max contact study2_max_during <- study2_sixft_during %>% filter(Type == "Max Contact") study2_max_during <- study2_max_during$Loneliness # Plot sixft_lonely <- ggplot(sixft_study2_summary) + geom_line(aes(x = Time, y = Loneliness, colour = Type, group = Type))+ geom_point(aes(x = Time, y = Loneliness, group = Type))+ theme(legend.title = element_blank(), legend.key = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"), panel.background = element_blank())+ ggtitle("Loneliness Changes Based on six feet social distancing levels") + ylab("Mean Loneliness") ``` #### Results: The pattern of results indicates that, across all levels of social distancing, participants experienced lower levels of loneliness during the pandemic- as reported by the study. However, if we break these changes in loneliness down, we see that participants who had no contact, within six feet of individuals outside of their household, reported the highest levels of loneliness. What I found most surprising is that those who had maximum contact with individuals outside of their household (more than 2 people) actually experienced higher levels of loneliness than particpants who had minimal contact (1-2 people) with people outside their household. In fact, those with minimal contact experienced the greatest drop in loneliness levels during the pandemic. By looking into this result, it can allow us to fill in the literature gaps and better understand the nature of social relationships, particularly during a crisis such as the pandemic, where surprisingly encountering less people had reduced loneliness. ## Why individuals with minimum contact have the biggest drop in loneliness? The purpose of the second part of the exploratory question is to determine the contributing factors behind the large drop in loneliness levels for those with minimal contact. My hypothesis was that individuals who have minimal contact tend to have larger drops in loneliness levels due to personality factors/ individual differences. These difference may arise from these 64 participants having higher levels of relatedness and/or have higher life satisfaction. As in the study relatedness is scored by scoring measures such as "I felt close and connected with people who are important to me". Life satisfaction was defined as scoring items such as "I am satisfied with my life". By being high in relatedness and life satisfaction, individuals with minimal conteact do not define their social interactions by the number of people they encounter, rather they define it by the quality of those interactions (less is more). ##### Relatedness: To test this theory the same data subsets were used as in Exploratory question 2 (study2_no, study2_min, study2_max): ``` # create data subsets study2_no <- study2_raw %>% # create new variable using raw data select(SixFeet, BMPN_Diff) %>% # only including ten desired variables filter(study2_raw$SixFeet <= 0) %>% # filtering out data so that only the people who reported seeing no people (first quantile) are included in this subset mutate(Type = "No Contact") # this results in 204 people in this group ^ study2_min<- study2_raw %>% # create a new variable using raw data select(SixFeet, BMPN_Diff) %>% filter (study2_raw$SixFeet > 0, study2_raw$SixFeet <= 2 ) %>% # filtering out data so that only the people who reported seeing an acceptable amount of people are included in this subset mutate(Type = "Minimal Contact") # This results in 64 people in this group ^ study2_max <- study2_raw %>% # create a new variable using raw data select(SixFeet, BMPN_Diff) %>% # only including four variables of interest filter (study2_raw$SixFeet > 2) %>% # only including participants greater than the 75% quantile mutate(Type = "Max Contact") # This results in 68 people in this group ^ # Combining the no, min and max six ft social distancing subset data into one - stacking on top of each other study2_sixft_BMPN <- bind_rows(study2_no,study2_min,study2_max) ``` Then a summary table was created to determine the mean relatedness change in each category in relation to amount of people encountered outside participant's household whilst social distancing at six feet: ``` # Create a summary table with all the SC means of the six groups sixft_BMPN_summary <- study2_sixft_BMPN %>% # new variable group_by(Type) %>% summarise(Relatedness = mean(BMPN_Diff)) ``` As we can already see in the output, most values are close to zero. That was the first indication that there may not be a significant result. I then decided to test the significance of these means to determine whether they are significantly different from zero via a One Way Samples t-test: ``` # CI limits t.test(sixft_BMPN_summary$Relatedness, alternative = "two.sided") # Specify CI limits to visualise in graph sixft_BMPN_summary$lower <- c(-0.1003813) sixft_BMPN_summary$upper <- c(0.1200232) ``` As we can see in the output, the change in relatedness in relation to level of social distancing was not significant for any group ( let alone minimal contact). It failed to reject the null hypothesis that there is a significant difference in relatedness change levels across levels of social distancing. It even went against the hypothesis that there would be an even more significant change in relatedness for minimal contact individuals. I then decided to visualise the data to be abel to visually compare changes in relatedness score across types of social distancing whilst inserting 95% CI error bars. Note that teh graoh was a readaptation of the method used in Figure 3: ``` # Plot library(ggplot2) sixft_BMPN <- ggplot(sixft_BMPN_summary, aes(x= Type, y= Relatedness, colour = Type)) + geom_col(width = .5 )+ ylim(-.5, .5)+ theme(legend.title = element_blank(), legend.key = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"), panel.background = element_blank())+ geom_errorbar(aes( x = Type, y = Relatedness, group = Type, ymin = lower, ymax= upper, width = .05,))+ ggtitle("Relatedness levels based on six feet social distancing") + ylab("Mean Relatedness Change") ``` ##### Life satisfaction: I then repeated the exact same method as 'Relatedness' (but used the life satisfaction - BMPN_Diff data instead): ``` #Exploratory whether those who had minimum six feet contact vs those who had a lot # create data subsets study2_no <- study2_raw %>% # create new variable using raw data select(SixFeet, SWLS_Diff) %>% # only including ten desired variables filter(study2_raw$SixFeet <= 0) %>% # filtering out data so that only the people who reported seeing no people (first quantile) are included in this subset mutate(Type = "No") # this results in 204 people in this group ^ study2_min<- study2_raw %>% # create a new variable using raw data select(SixFeet, SWLS_Diff) %>% filter (study2_raw$SixFeet > 0, study2_raw$SixFeet <= 2 ) %>% # filtering out data so that only the people who reported seeing an acceptable amount of people are included in this subset mutate(Type = "Min") # This results in 64 people in this group ^ study2_max <- study2_raw %>% # create a new variable using raw data select(SixFeet, SWLS_Diff) %>% # only including four variables of interest filter (study2_raw$SixFeet > 2) %>% # only including participants greater than the 75% quantile mutate(Type = "Max") # This results in 68 people in this group ^ # Combining the no, min and max six ft social distancing subset data into one - stacking on top of each other study2_sixft_SWLS <- bind_rows(study2_no,study2_min,study2_max) # Create a summary table with all the SC means of the six groups sixft_SWLS_summary <- study2_sixft_SWLS %>% # new variable group_by(Type) %>% summarise(Satisfaction = mean(SWLS_Diff)) # Again all values are close to zero across all levels of social distancing #Plot # CI limits t.test(sixft_SWLS_summary$Satisfaction, alternative = "two.sided") # Output:- p-value = 0.4288 # not significant 95 percent confidence interval: -0.2160440 0.3441894 # zero is contained within CI limits, therefore failing to reject the null hypothesis that there is a significant difference in life satisfaction changes across levels of social distancing. # Specify CI limits to visualise in graph sixft_SWLS_summary$lower <- c(-0.2160440 ) sixft_SWLS_summary$upper <- c(0.3441894) # Plot sixft_SWLS <- ggplot(sixft_SWLS_summary, aes(x= Type, y= Satisfaction, colour = Type)) + geom_col(width = .5 )+ ylim(-.5, .5)+ theme(legend.title = element_blank(), legend.key = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"), panel.background = element_blank())+ geom_errorbar(aes( x = Type, y = Satisfaction, group = Type, ymin = lower, ymax= upper, width = .05,))+ ggtitle("Life Satisfaction levels based on six feet social distancing") + ylab("Mean Life Satisfaction Change") # Joining the two graphs for comparison: library(cowplot) joint_figure <- plot_grid(sixft_BMPN+ theme(legend.position = "none"), sixft_SWLS) ``` #### Results: So as we can see, changes in relatedness and life satisfaction difference scores (from before to during the pandemic) are close to zero. Moreover, these slight differences are not enough to be deemed statistically significant and go against the alternate hypothesis that individual differences in levels of relatedness and life satisfaction can impact the amount of social interactions a participant is satisfied in encountering. Moreover, there was no significantly different effect of these individual differences in individuals with minimal contact. Although minimal contact did have the highest mean (M = 0.184) in life satisfaction levels (not so much relatedness- (M= 0.0286)), it was not significant enough to support the hypothesis. Thus, this indicates that relatedness and life satisfaction have nothing to do with the amount of people encountered outside of their household during social distancing. Moreover, these changes are close to zero and insignificant across all types of social distancing, let alone having a significant outcome in minimal contact. Thus, it is inconclusive as to the cause of the drastic drop in loneliness for minimal contact and further research needs to be conducted. ### Exploratory 3: Whether younger participants are more resilient to changes in social behaviour (caused by the pandemic) than older participants The aim of this exploratory analysis was to determine whether younger population was more resilient in showing higher levels of social connectedness than the older population when undergoing changes in social behaviours via social distancing. This was left as an open ended question as the younger population may have been better able to adapt due to the obtaining their needs for social connection and belonging through online platforms such as social media. However, older individuals may be more adaptable and have higher levels of social connectedness as they have stronger social relatioships and support networks that they can rely on during a crisis (such as a pandemic). Note that the study had operationalised social connectedness as a combination of relatedness and loneliness scores. To test this hypothesis, the figure 3 method was once again re-adapted with some slight changes. I initially began by refering back to the study 2 age descriptives where I found that there was data on age groups ranging from 18-72. ``` #Age descriptives: print(study2_age) # Age ranges from 18-72 ``` I then used the quantile() to be able to categorise participants into their respective age groups. Here, those in the 0% and 25% quantile were deemed as being the younger age group. Those in the 50% and 75% quantile were deemed as being the middle aged group and those above the 75% quantile were deemed as being the older age group. I started off by creating data sets for relatedness levels (which was then repeated for loneliness levels). These age subsets then were binded and then again formed another subset based on the time of relatedness (before or during pandemic) and were once again bound. This formed categories based on the participant's age group and the time of pandemic. ``` # Develop quantile study2_quantile <- quantile(study2_raw$Age) # find quantiles # Relatedness # create data subsets younger <- study2_raw %>% # create new variable using raw data select(Age,T1BMPN, T2BMPN) %>% # only including three variables of interest filter(Age <= 23) %>% # filtering out data so that only the younger participants (first quantile) are included in this subset mutate(Type = "Younger") # creating new 'Type' variable to graph # this results in 94 people in this group ^ mid <- study2_raw %>% # create a new variable using raw data select(Age,T1BMPN, T2BMPN) %>% # only including three variables of interest filter (Age > 23, Age <= 38) %>% # only including participants within 50% and 75% quantile mutate(Type = "Middle") # creating new 'Type' variable to graph # This results in 159 people in this group ^ older <- study2_raw %>% # create a new variable using raw data select(Age,T1BMPN, T2BMPN) %>% # only including three variables of interest filter (Age > 38) %>% # only including participants above 75% quantile mutate(Type = "Older") # creating new 'Type' variable to graph # This results in 83 people in this group ^ # combining all age subset data into one study2_all_ages <- bind_rows(younger, mid, older) # New variable based on time point and age age_all_before <- study2_all_ages %>% #creating a new variable using the age groups data select(T1BMPN, Type) %>% #only including variables of interest --> added type using the mutate function mutate(Time = "Before Pandemic") %>% # create a new variable that classifies the time period rename(Relatedness = T1BMPN) # renaming so that this can be plotted on the y axis age_all_during <- study2_all_ages %>% # same but for the second time measurement select(T2BMPN, Type) %>% mutate(Time = "During Pandemic") %>% rename(Relatedness = T2BMPN) # Combining the two variables (age and time point) age_groups_time <- bind_rows(age_all_before, age_all_during) # combining the two variables created earlier into one data set stacked on top of each other # Finding the means: age_relatedness_study2_summary <- age_groups_time %>% # new variable group_by(Type, Time) %>% summarise(Relatedness = mean(Relatedness)) # Create separate variables for each group specific to age and time of measurement #Before pandemic - Younger study2_age_before_younger<- age_all_before %>% filter(Type == "Younger") # only want first years (and before the pandemic) study2_age_before_younger <- study2_age_before_younger$Relatedness # only want SCAVERAGE variable and need to convert to a values in the environment bar using $ # Before pandemic - Mid-Age study2_age_before_mid <- age_all_before %>% filter(Type == "Middle") study2_age_before_mid <- study2_age_before_mid$Relatedness # Before pandemic - Older study2_age_before_older <- age_all_before %>% filter(Type == "Older") study2_age_before_older <- study2_age_before_older$Relatedness # during pandemic - Younger study2_age_during_younger <- age_all_during %>% filter(Type == "Younger") study2_age_during_younger <- study2_age_during_younger$Relatedness # during pandemic - Mid-Age study2_age_during_mid <- age_all_during %>% filter(Type == "Middle") study2_age_during_mid <- study2_age_during_mid$Relatedness # during pandemic - Older study2_age_during_older <- age_all_during %>% filter(Type == "Older") study2_age_during_older <- study2_age_during_older$Relatedness ``` I then created CI limits for the 95% CI error bars. As the CI limits all do not contain zero, it is safe to reject the null hypothesis that participants did not experience relatedness (or loneliness later on). Participants did experience and relatedness and loneliness before and during the pandemic. ``` # CI limits: # Find the CI for each group via One sample t tests t.test(study2_age_before_younger, alternative = "two.sided") t.test(study2_age_before_mid, alternative = "two.sided") t.test(study2_age_before_older, alternative = "two.sided") t.test(study2_age_during_younger, alternative = "two.sided") t.test(study2_age_during_mid, alternative = "two.sided") t.test(study2_age_during_older, alternative = "two.sided") # Create data for upper and lower limits using the results from the t test above age_relatedness_study2_summary$lower <- c(4.545670, 4.678196, 5.016795, 4.497347, 4.636523, 5.071762) # creating a variable that contains the lower CI limits of the six groups age_relatedness_study2_summary$upper <- c(4.968515,5.024110, 5.477181, 4.967192, 5.000793, 5.514584) #creating a variable that contains the upper CI limits of the six groups ``` I then plotted the relatedness graph: ``` # READY TO PLOT THE GRAPH!!! Relatedness_ex3 <- ggplot(age_relatedness_study2_summary) + # used data frame for mean and SD geom_line(aes(x = Time, y = Relatedness, colour = Type, group = Type))+ # draws the lines geom_point(aes(x = Time, y = Relatedness, group = Type))+ # add the points on the means ylim(2, 6.0)+ # rescales the y axis theme(legend.title = element_blank(), # removes a title from the legend panel.grid.major = element_blank(), # removes grid lines panel.grid.minor = element_blank(), # removes grid lines axis.line = element_line(colour = "black"), # adds lines on the axis and makes them black panel.background = element_blank())+ #sets the background to white geom_errorbar(aes( #defining the asethetics of the error bars x = Time, # defining the x variable y = Relatedness, # defining the y variable group = Type, # defining how to group ymin = lower, # setting the lower error bar to the corresponding lower CI value as defined earlier ymax= upper, # setting the upper error bar to the corresponding upper CI value as defined earlier width = .1,))+ # sets the width of the error bars ggtitle("Relatedness Changes Based on Age and Time of Pandemic") + # for the main title ylab("Mean Relatedness") # title for the Y axis ``` The same process for relatedness was repeated for loneliness where only the selected variable (T1Lonely and T2Lonely) changed: ``` # Loneliness # create data subsets younger <- study2_raw %>% # create new variable using raw data select(Age,T1Lonely, T2Lonely) %>% # only including three variables filter(Age <= 23) %>% # filtering out data so that only the younger participants (first quantile) are included in this subset mutate(Type = "Younger") # creating new 'Type' variable to graph # this results in 94 people in this group ^ mid <- study2_raw %>% # create a new variable using raw data select(Age,T1Lonely, T2Lonely) %>% # only including three variables of interest filter (Age > 23, Age <= 38) %>% # only including participants within 50% and 75% quantile mutate(Type = "Middle") # creating new 'Type' variable to graph # This results in 159 people in this group ^ older <- study2_raw %>% # create a new variable using raw data select(Age,T1Lonely, T2Lonely) %>% # only including three variables of interest filter (Age > 38) %>% # only including participants above 75% quantile mutate(Type = "Older") # creating new 'Type' variable to graph # This results in 83 people in this group ^ # combining all age subset data into one study2_all_ages <- bind_rows(younger, mid, older) # New variable based on time point and age age_all_before <- study2_all_ages %>% #creating a new variable using the age groups data select(T1Lonely, Type) %>% #only including variables of interest --> added type using the mutate function mutate(Time = "Before Pandemic") %>% # create a new variable that classifies the time period rename(Loneliness = T1Lonely) # renaming so that this can be plotted on the y axis age_all_during <- study2_all_ages %>% # same but for the second time measurement select(T2Lonely, Type) %>% mutate(Time = "During Pandemic") %>% rename(Loneliness = T2Lonely) # Combining the two variables (age and time point) age_groups_time <- bind_rows(age_all_before, age_all_during) # combining the two variables created earlier into one data set stacked on top of each other # Finding the means: age_time_study2_summary <- age_groups_time %>% # new variable group_by(Type, Time) %>% summarise(Loneliness = mean(Loneliness)) # Create separate variables for each group specific to age and time of measurement #Before pandemic - Younger study2_age_before_younger<- age_all_before %>% filter(Type == "Younger") # only want first years (and before the pandemic) study2_age_before_younger <- study2_age_before_younger$Loneliness # only want SCAVERAGE variable and need to convert to a values in the environment bar using $ # Before pandemic - Mid-Age study2_age_before_mid <- age_all_before %>% filter(Type == "Middle") study2_age_before_mid <- study2_age_before_mid$Loneliness # Before pandemic - Older study2_age_before_older <- age_all_before %>% filter(Type == "Older") study2_age_before_older <- study2_age_before_older$Loneliness # during pandemic - Younger study2_age_during_younger <- age_all_during %>% filter(Type == "Younger") study2_age_during_younger <- study2_age_during_younger$Loneliness # during pandemic - Mid-Age study2_age_during_mid <- age_all_during %>% filter(Type == "Middle") study2_age_during_mid <- study2_age_during_mid$Loneliness # during pandemic - Older study2_age_during_older <- age_all_during %>% filter(Type == "Older") study2_age_during_older <- study2_age_during_older$Loneliness # CI limits: # Find the CI for each group via One sample t tests t.test(study2_age_before_younger, alternative = "two.sided") t.test(study2_age_before_mid, alternative = "two.sided") t.test(study2_age_before_older, alternative = "two.sided") t.test(study2_age_during_younger, alternative = "two.sided") t.test(study2_age_during_mid, alternative = "two.sided") t.test(study2_age_during_older, alternative = "two.sided") # Create data for upper and lower limits using the results from the t test above age_time_study2_summary$lower <- c(2.085014,2.048396, 1.821376, 1.998295, 2.021845,1.761657) # creating a variable that contains the lower CI limits of the six groups age_time_study2_summary$upper <- c(2.339398,2.258125,2.096189,2.244706,2.221121,2.017671) #creating a variable that contains the upper CI limits of the six groups # READY TO PLOT THE GRAPH!!! Loneliness_ex3 <- ggplot(age_time_study2_summary) + # used data frame for mean and SD geom_line(aes(x = Time, y = Loneliness, colour = Type, group = Type))+ # draws the lines geom_point(aes(x = Time, y = Loneliness, group = Type))+ # add the points on the means ylim(0,3)+ # rescales the y axis theme(legend.title = element_blank(), # removes a title from the legend panel.grid.major = element_blank(), # removes grid lines panel.grid.minor = element_blank(), # removes grid lines axis.line = element_line(colour = "black"), # adds lines on the axis and makes them black panel.background = element_blank())+ #sets the background to white geom_errorbar(aes( #defining the asethetics of the error bars x = Time, # defining the x variable y = Loneliness, # defining the y variable group = Type, # defining how to group ymin = lower, # setting the lower error bar to the corresponding lower CI value as defined earlier ymax= upper, # setting the upper error bar to the corresponding upper CI value as defined earlier width = .1,))+ # sets the width of the error bars ggtitle("Loneliness Changes Based on Age and Time of Pandemic") + # for the main title ylab("Mean Loneliness") # title for the Y axis # Combine graphs: library(cowplot) final_ex3 <- plot_grid(Relatedness_ex3+ theme(legend.position = "none"), Loneliness_ex3) ``` I then tested the significance of the difference between relatedness and loneliness in younger vs older participants during the pandemic: ``` Test significance of difference during pandemic Relatedness_diff <- age_relatedness_study2_summary %>% select(Relatedness, Type, Time) %>% filter(Time == "During Pandemic") Loneliness_diff <- age_time_study2_summary %>% select(Loneliness, Type, Time) %>% filter(Time == "During Pandemic") # Test whether difference in social connectedness during pandemic is statistically significant between first v last uni years: t.test(Relatedness_diff$Relatedness, alternative = "two.sided") # significant, p<.05 t.test(Loneliness_diff$Loneliness, alternative = "two.sided") # significant, p<.05 ``` #### Results: Thus, we can see that participants in the oldest age group experienced the highest level of relatedness and the lowest level of loneliness whereby these results are noth statistically significant (p<.05). If anything, younger individuals experienced the lowest level of relatedness and the highest level of loneliness. Thus, this indicates that online platforms are not in fact coping mechanisms. Rather they may even contribute to increasing loneliness during the pandemic. If this is the case, it is important to get rid of the assumption that social media platforms can reduce loneliness and enhance social connectedness as the people encountered online are not someone we feel "close and connected with" nor are they people that are important to us. It is important to raise awareness it may cause more harm than good.However, assuming technology is behind this reduced social connectedness is an assumption and more directed studies need to investigate this relationship. # Part 4: Recommendations *Make access to aspects of OSF file easier:-* The pre-registration pdf, measures, raw data csv files for each study, the provided R code and the code book were each provided in separate links that were spread out through the study. However, there was a link toward the end of the study (under the 'Data Accessibility Statement') for each of the study's respective OSF files. This was nicely laid out where there were subheadings as to which link was for the R code, csv file, etc. However, each time you clicked to open one of the files, a new tab would open. This caused me to have multiple tabs open and constantly switching between these tabs trying to find whatever file it is I need. Sometimes I had eight tabs open at once- and that was just to see the data- does not even include the tabs open when I used multiple websites to research how to replicate a certain code. One way to resolve this is to have subheadings in the raw data that *make sense*. For example, I would constantly get confused as to whether SWLS stood for life satisfaction or relatedness. Something as simple as renaming the variables to 'Satisfaction' or 'Relatedness' would have saved me time switching between the raw data, R code and code book. Moreover, they could have had a side-tab that allows you to switch between aspects of the OSF file without opening a new tab. For example, if I am on Study 1 data.csv and decide I now need the study 2 csv, there is a side tab where I can click that heading and access the data rather than having to exit the OSF file as a whole, find the study OSF link, and open study 2. csv raw data on a new tab. Something as small as this can save time, allow reproducibility to become more time efficient and less frustrating whilst also not using a significant amount of energy on our devices and causing everything (particularly R studio) to run slower. *Explain code to individuals who are not from a coding background:* As I mentioned throughout my report, although we are grateful the authors left us with *some* R code, when for example I was using that code to replicate the 'most introverted' vs 'most extraverted' quantile data in Mean/SD stage, I had no idea what the code was doing and was unable to read and understand the provided code. I would read through 12 lines of code where the only comment was ''#Making subset of data with only participants in top quartile of extraversion' and my only response was "huh?". Although I would try to break down what the chunks of code meant, it only left me feeling more overwhelemed. In fact it made no sense to the point where I had decided to develop the code from scratch rather than reusing the code provided to us in the R doc. When trying to undertake open data, you want individuals from any background to be able to open the R doc and be able to follow along the steps taken to produce values, even if it is just them reading through the '#' comments next to your code to be able to understand the process of how the study's values were reproduced. Moreover, do not be hesitant to beak down steps into headings using # to explain each step that was taken. The recommendations can be found on this [website](https://swcarpentry.github.io/r-novice-inflammation/06-best-practices-R/). Moroever, researchers should explain WHAT each line of code is doing and WHY that line of code is needed (how does it affect output). We do not only want to see that the value was produced, we also want to see the R coding journey the authors went through to write this code. Among the key suggestions [found](https://www.r-bloggers.com/2019/03/writing-clean-and-readable-r-code-the-easy-way/), authors should make use of RStudio. They should write their coding journey and explain each line of code. Suggestions on the layout of this documentation and [tips](https://bookdown.org/marius_mather/Rad/tips-for-effective-r-programming.html) to make reprodubility easier. Not only does writing out this code step-by-step in detail make reproducibility easier and more time efficient, it also allows for transparency where issues- such as an illogical exclusion criteria- to produce significant results can become more transparent. Moreover, it allows authors to proof read their code and ensure that it does in fact produce the reported data, their code is [logical](https://duckly.com/blog/improve-how-to-code-with-rubber-duck-debugging/) (as their code will be scutinised by other scientists) whilst also allowing them to proof read their code to avoid issues such as the discrepencies between the intext and table 3 values whereby these issues bring into question the integrity of their code. *Writing code in the order it occured:* This again ties into the previous point of providing a detailed report of the coding journey and using # to break down steps. The main apsect I would like to focus on is to START FROM THE BEGINNING. For example, upon opening the R code for Study 1, the first line of code is titled: 'Has social connection changed as a result of the COVID-19 Pandemic'. Okay... but what about the reported participant demographics? How did you produce the physical distancing mean/sd and inferencial statistics? They skipped multiple steps ahead. Moreover. their Study 1 code is approximately 80 lines long (with installing packages and reading csv file) and only mainly focuses on how they produced inferential statistics such as t-tests, cohen's d and regression. Moreover, there is absolutely no mention of how they produced their tables and figures. These issues are also seen in the Study 2 R code. The point is that the authors have provided very bried code that in no way made replicating their descriptive statistics easier. If third year university students are able to write up a RStudio file detailing the steps they undertook to reproduce the values, so can the authors! In fact, this requirement of providing a detailed explanation of the author's coding journey should become a requirement in the peer review process to allow papers to be published. Although it may be time consuming for authors, it is more time consuming for the scientific community to go about reproducing these values. By implementinf this practice, it has scientific benefits including increasing reproducibility and encouraging multiple perspectives and results in publications. It allows various input on how to better analyse the original data and avoids issues such as the falsification of data. This practice can be encouraged by incentivising data sharing where authors who undertake this practice can receive benefits such as increased recognition of the intellectual value of their data and more citations. However, this open data practice should be mandated rather than recommended by the peer review board [(Levenstein & Lyle, 2018)](https://journals.sagepub.com/doi/10.1177/2515245918758319).