# Group 3 Progress Notes: Our reproducibility plan
The paper we are planning to reproduce is:
- [Behavioral Immune Trade-Offs: Interpersonal Value Relaxes Social Pathogen Avoidance](https://journals.sagepub.com/doi/full/10.1177/0956797620960011)
- [OSF](https://osf.io/4agk8/)
## Plan
There are 3 studies reported in this paper. For each study, the goal is to reproduce the …
1. demographic descriptives (reported in Participants) e.g.% of males, M and SD age, r value
2. figures (violin plot (with box plot inside it) of comfort contact by target type, and scatter plot of contact comfort by WTR + linear regression lines)
3. replicate all means and standard deviations mentioned in texts
### Study 1

### Study 2

### Study 3

## Figures


# Reproduction Doc
[Reproduction Doc](https://hackmd.io/@caitie11122/rJkfT3CYc)
## 24.06.22
Dataset: 0 = prioritise themselves (self), 1 = prioritise themselves (other)
- We standardised this so that the numbers are consistent
- We are trying to count the number of switch points
Working backwards
- Replicating the descriptive statistics, graphs (which have no code)
- Our current goal is to replicate: '*Mean contact comfort was 0.06 (SD = 1.97).*'
- Paper was written ironically such that this line was mentioned prior to the mention of the 'exclusion criteria'
Note: When we looked into the WTR task and dataset, we tried to understand what the numbers actually meant. Since it is a ratio we tried to divide the number to see if it replicates onto the given dataset, we noticed it worked for some of the numbers but other numbers don't align. This could possibly mean they didn't note it down accurately or perhaps some numbers were changed.
## 28.06.22
Currently struggling on what 'Caulsum' variable means
- Calculating the comfort contact mean: have tested the code that can calculate equivalent mean but getting different output to what is reported
- Possible reasongs: either what they reported is wrong or we are not using the same input they have used
- Noticed there is 'Caulsum' and 'Caulperson' however this is not located in the codebook or in the paper that indicates what it means
Creating a matrix using all 9 points
- 'Matrix': why are there two commas?
- matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimmanes = null) is what comes up when we look at the matrix output
- matrix is a 2-dimensional array that has m number of rows and n number of columns (google)
- end goal is to have a separate dataset: original dataset - exclusion critera dataset (this then will be used to calculate comfort contact mean as well as the boxplot graphs)
**Suggestion**: mutate a number of new columns, and calulate the difference between each column so that 0 = no change, non-zero (1 or -1 and code this as 1) = change and count how many non-zero numbers there are.
- Reproduce and clearly explain the code even if it is long and inefficient
Count function (total of non-zeros)
## Today's progress
Restarting the code of creating the exclusion criteria dataset
**Step 1**: Mutating 60 new columns calculating difference between each column
- Using Danielle's Week 3 module Video 7, we are able to make a difference column by using the mutate function. 'diff_1 <- study_1_WTF %>% mutate(diff = c(1) - c(2) %>% arrange(diff)'
- originally did it with the label of the column 'diff_1 <- study_1_WTF %>% mutate(diff = c(X37_.13) - c(X37_.6) %>% arrange(diff)' which gave the output successfully!
**Step 2**: To avoid mutating this for every column individually,
- 4 loops for columns and datasets:
- we want to generate 60 new columns, but can use transmute - changes existing column, not just creating new columns but also deleting old columns
Problem we found: There should be a 0 = self, 1 = other. In the codebook, it says '-0.25 = self and -0.45 = other' but for some reason, in column X37_13 which turns it vice versa which is based on the original code. We find this is inconsistent with the other columns which makes outputting subsequent data is confusing.
INstalled new package: matrixStats because it can sum the difference across all rows and columns
1. turned WTR data into a matrix and did row difference function
2. turned back into dataframe (not matrix)
3. changed all -1 to 1
4. by doing that they had number of switch points and tried to count it for each groups and used rowsums to do this - this worked
5. now trying to exclude >2
Throughout the week we all worked on the code individually:
The Data Exclusion Criteria includes:
1. Participants with more than 2 switch points within any of the 6 WTR anchors (n=35)
2. 3 participants whose descriptions of partners were nonsensical or demonstrated poor English
3. 2 participants who selected gender option inficating they were neither man nor a woman
Therefore total excluded participants = 35 + 3 + 2 = 40
This left remaining 464 participants
However, we suspect there may be overlap in the exclusion criteria because:
- when excluding based on WTR = 472 participants
- When excluding based on sex and language (criteria 2 and 3) = 499 participants
- When excluding all criteria = 470 participants
## 10/07/22 Progress
Goals: Look over figures + create scatterplot
Before Tuesday
- Helen: Scatterplot Code Creation
- Caitlyn: Recreating the mmean & confidence intervals
Before Friday
- Complete script so in the meeting the presentation can be made
- Make sure everyone is happy with the script and allocate parts to record
- Complete a finished product by Monday night (everyone has watched it)
#### Helen's Progress
- Wanted to remove the background of the figure so it's aesthetically similar to the one produced in the paper
The scatterplots are known as Hierarchical Regression Analyses: test whether WTR value relates to contact comfort independently of relationship type
1. R^2 = .08, contact comfort was regressed on variables unrelated to WTR (participant sex, income, target sex and pathogen-disgust sensitivity)
- Note: Regressed means the extent to which a given dependent variable (y) can be explained or predicted by a number of independent variables (x)