owned this note
owned this note
Published
Linked with GitHub
# Telling more about “telling more than we can know”: Inferences from mouse tracking and introspective reports
## Study Information
### 1. Title
Telling more about “telling more than we can know”: Inferences from mouse tracking and introspective reports
### 2. Authorship
Rima-Maria Rahal^1^ & Michael Schulte Mecklenbeck^2^
^1^ Max Planck Institute for Research on Collective Goods<br>
^2^ University of Bern & Max Planck Institute for Human Development
### 3. Description
#### Decision Effort
We study cognitive processes during decision making via mousetracking in repeated decisions, testing the relation between preference strength and decision effort, inferred from mouse-cursor trajectories. We test two types of theoretical predictions:
1. Predictions derived from Dual Process Theory, suggesting that certain types of decisions are based on effortless vs. effortful processing (e.g., effortful (deliberate) processing supports utilitarian judgment, Greene et al., 2001)
2. Predictions derived from a Choice Discriminability perspective, suggesting that extreme preferences support less effortful decision processes compared to mixed preferences (e.g., Kim et. al, 2018))
These predictions lead to different expectations about the pattern of results, which could provide evidence for one and against the other account (or fail to provide evidence for either perspective). If the Dual Process perspective holds, certain preferences (utilitarian, selfish, ...) should be associated with higher processing effort (longer decision times, higher AUC, higher MAD) compared to other preferences (deontological, prosocial, ...). If the Choice Discriminability perspective holds, a inverse u-shaped effect between preferences and decision effort should be detected: more extreme preferences yield low effort processes (shorter decision times, lower AUC, lower MAD), while mixed preferences yield high effort processes (longer decision times, higher AUC, higher MAD). See section 4 for details.
In addition to studying these predictions for five types of preferential choices (incentivized moral decisions, not incentivized trolley type moral decisions, social dilemmas, risky choices), we include one task with knowledge-based decisions (animal classification). We test the following predictions: If the Dual Process perspective holds, poorer abilities to classify animals should be associated with higher processing effort (longer decision times, higher AUC, higher MAD, ...). If the Choice Discriminability perspective holds, a reverse u-shaped effect between abilities and decision effort should be detected: more extreme abilities to classify animals (extremely good, extremely poor) yield low effort processes (shorter decision times, lower AUC, lower MAD), while mixed abilities yield high effort processes (longer decision times, higher AUC, higher MAD). See section 4 for details.
#### Introspection vs. Mouse Movements
We additionally ask for introspective self-reports during a replay of the information acquisition phase for each decision, where participants either see their own mouse movements and choices or pre-recorded, prototypical trajectories (straight, curved, change of mind, see Wulff et al., 2019). Our aim is to understand if participants' self-reported assessment of how effortful making a decision was (in terms of difficulty, conflict, certainty) correlates with the mouse-tracking based (in terms of eg. AUC and MAD) assessment of how effortful a decision is classfied based on the trajectories alone. See section 4 for details.
### 4. Hypotheses
1. **Decision time**
Higher decision effort is reflected in longer *decision times*.
- Dual Process: Longer decision times for
- more individualistic decision makers compared to more prosocial decision makers.
- more utilitarian decision makers compared to more deontological decision makers.
- more risk-averse decision makers compared to more risk-seeking decision makers.
- decision makers who are worse at categorizing animals.
- Choice Discriminability: Shorter decision times for
- more extreme preferences compared to mixed preferences.
- more extreme animal categorization abilities (extremely good, extremely poor) compared to moderate abilities.
2. **Mouse movement**
Higher decision effort in mouse movements is inferred from an increase in the following measures:
1) Number of xflips (directional change along the x-axis),
2) Area under the curve (AUC, geometric area between the observed trajectory and the direct line between start- and choice-button),
3) Maximum absolute deviation (MAD), maximum deviation of a trajectory to a straight line between starting point and end point,
4) Sample entropy, ie, spatio-temporal disorder of a trajectory,
5) Motion time, time for ongoing movement,
6) Idle time, total time during which there is no movement.
These measures are, to some extent, correlated. Hence, we expect an increase on all measures for higher conflict.
- Dual Process: Higher decision effort inferred from mouse movements for
- more individualistic decision makers compared to more prosocial decision makers.
- more utilitarian decision makers compared to more deontological decision makers.
- more risk-averse decision makers compared to more risk-loving decision makers.
- decision makers who have a worse ability to categorize animals.
- Choice discriminability: Higher decision effort inferred from mouse movements for
- more extreme preferences compared to mixed preferences.
- more extreme animal categorization abilities (extremely good, extremely poor) compared to moderate abilities.
3. **Choices**
- Decision makers are more likely to make preference-consistent choices than preference-inconsistent choices.
- Decision makers with higher abilities to classify animals are more likely to make correct classification choices than incorrect classification choices.
4. **Introspection**
We study whether participants' post-hoc introspection on how effortful the solutinon of a specific trial was (more effortful = higher mean score of difficulty, conflict and certainty rating) correlates with how effortful this trial was based on the mouse movements during the task (more effortful = longer decision time, higher AUC, higher MAD).
If the correlation between introspection-based and mouse-based decision effort is higher for playbacks of their own trials compared to playbacks of others' trials, then participants can discriminate between own and others' playbacks better.
## Design Plan
### 5. Study Type
Experiment
### 6. Blinding
Participants, will not know the treatment group to which they have been assigned. The lab managers running the study will be blind towards its purpose. Given that the structure of the different tasks gives away the conditions there is no blinding planned for the analysis.
### 7. Is there any additional blinding in this study?
No.
### 8. Study design
We run a repeated measures design with choice preferences as between-subjects, continuous predictors.
In the *online phase*, to be completed before coming to the lab, participants fill in introspective preference measures (see 17.1 for details). Participants are screened for handedness, such that those indicating that they use their left hand to write cannot complete the online questionnaire and are asked to resign from the experiment.
In the *lab phase* five blocks are presented to each participants.
1. Participants receive instructions.
2. Participants complete two practice trials.
3. Participants complete 5 blocks, in randomized order, with 12 trials each (see details on block types below). For these trials, we record choices, decision times and mouse trajectories.
4. Participants complete a replay phase, which shows playbacks of trials' mouse trajectories. Randomly for each trial, participants either see their own choice and mouse trajectories, or a pre-recorded choice and mouse trajectories from a prototypical mouse movement (see Wulff et al., 2022). After each playback, we ask participants to report on: 1) choice difficulty, 2) conflict, and 3) certainty they experienced regarding the presented trial.
5. Participants complete a final questionnaire asking them to evaluate their choices in the preceding block on the same three dimensions (difficulty, conflict, certainty).
*Block types:-
1. Risky Choices
- 12 gambles selected from Stillman, Krajbich, & Ferguson (2020), Study 1
- choices between option to play or to forgo the game
- when the game is played, there is a 50% chance to win the gain amount, and a 50% chance to loose the loss amount
- when the game is forgone, participants receive 0€ for certain
2. Moral Machine
- 12 trolley-type dilemma tasks from Awad et al. (2018)
- choices between option to swerve, sacrificing a vehicle's passengers, or to stay on collision course with pedestrians (or animals) on the street
3. Moral DM
- 12 moral reallocation tasks from Rahal, Hoeft & Fiedler (in preparation)
- choices between option to leave donations with pre-selected recipients or reallocate to benefit other recipients
4. Social Dilemma
- 12 social dilemma games from Kieslich & Hilbig (2014):
- 4 trials of chicken games
- 4 trials of prisoners' dilemmas
- 4 trials of stag hunt games
- choices between options to cooperate or defect (presented without the cooperation frame as Options A and B)
5. Animal Classification
- 12 animal classification tasks from Kieslich et al., 2020
- choices between categorizing animal exemplars (e.g., dog) as one of two classes (e.g., mammal or fish)
Finally, we ask participants to indicate their handedness.
### 9. Randomization
We randomize the sequence in which blocks are displayed.
Within blocks, the sequences of the trials is randomized.
Participants randomly see playback of own vs. others' decision and mouse trajectories per trial.
## Sampling Plan
We collect data from the participant pool of the MPI DecisioLab in Bonn (Germany), consisting mainly of students. Participants are recruited via HROOT in winter/spring 2023.
### 10. Existing data
A pilot of 5 participants has been drawn to check procedure, data quality and data collection processes. These 5 participants will not be included in the final analysis.
Registration prior to creation of data: As of the date of submission of this research plan for preregistration, the data have not yet been collected, created, or realized.
### 11. Explanation of existing data
Not applicable.
### 12. Data collection procedures
We collect data from the participant pool of the MPI DecisioLab in Bonn (Germany), consisting mainly of students. Participants are recruited via HROOT in winter/spring 2023. Participants will be invited to take part, and informed that they are eligible to sign up if they are above 18 and below 35 years of age, speak German well and are righthanded. Participants receive an average of 12€ per hour for participating in the study, consisting of a show-up fee and a variable additional payment depending on their decisions in the following tasks:
- animal classification (piece rate of 0.10€ per correct answer, both online and in the lab)
- moral dm (one choice in lab phase from random participant paid to charity)
- risky choice (one random choice paid out per participant, both online and in the lab)
- SVO (one random choice paid out per participant, randomly either as receiver or as dictator)
- inequality aversion (one random choice of one random participant paid to three random receivers)
### 13. Sample size
We plan to recruit 150 participants (5 pilot participants whose data is not used in the analyses + 145 for the main sample).
### 14. Sample size rationale
We determine the sample size based on feasibility of data collection and funding available for participant payments.
### 15. Stopping rule
We will initially invite 15 + 145 people to the experiment. 5 for the pilot and 145 for the main experiment. If no-shows and technical failures reduce the number of people actually participating in the main study, we will continue to invite people in groups of 3 until we have reached or exceeded the target sample size of 145 participants or we run out of participant funds.
## Variables
### 16. Manipulated variables
- Risky Choices
- Trials vary regarding the difference of expected value between the option to play or to forgo the game.
- More difficult trials have a smaller EV difference.
- Moral Machine
- Trials vary regarding the number and types of lives saved or sacrificed.
- More difficult trials have a more similar number of lives saved in the car or on the street. More difficult trials have more similar types of lives saved (man-man, woman-woman).
- Moral DM
- Trials vary regarding the original recipient (one person vs. group of people) and the number of people in the group (2 vs. 3 vs. 4). Target trials are trial in which a single person is the original recipient and would obtain the donation by default, and the group of people would receive the donation due to a decision to reallocate.
- More difficult trials have a smaller number of people in the group.
- Social Dilemma
- Trials vary regarding the sucker prize in case of one-sided cooperation, the reward for joint cooperation, the temptation prize in case of defection, and the punishment for joint defection.
- More difficult trials have a smaller difference between outcomes for the different options.
- Animal Classification
- Trials vary regarding the typicality of the animal exemplar for its category (e.g., typical mammal: horse, atypical mammal: dolphin). Depending on the exemplar, correct and incorrect categories displayed (fish, mammal, reptile, bird, insect, amphibian).
- More difficult trials present atypical animal exemplars.
### 17. Measured variables
1. **Preference Strength / Abilities**
- Risky Choices
- We measure risk preferences using the Holt & Laury (2002) risk measure.
- Moral Machine & Moral DM
- We measure moral preferences by assessing trolley-type tasks based on Awad et al. (2018), where single (or multiple) passangers of a car a pitted against differnt types of pedestrian and by assessing how many utilitarian choices are made in 10 classic trolley-type dilemmas (Rahal, Hoeft & Fiedler, in preparation).
- Social Dilemma
- We measure social preferences using the Social Value Orientation slides measure (Murphy, Ackermann & Handgraaf, 2011).
- Animal Classification
- We measure the ability to classify animals by assessing how many animals are correctly classified in 10 atypial exemplar cases.
2. **Additional Introspective Reports**
- Equality-efficiency trade-off: We measure participants’ equality-efficiency trade-off in third-party games via a new measure assessing the degree to which participants follow each motivation based on Engelmann and Strobel (2004). Participants are asked to make 3 decisions between two options each, where money is allocated between three hypothetical players (Person A, Person B and Person C). The decision makers’ own payoff are not affected by their choices. In each decision task, choices for each of the two options are motivated by one motivation: inefficiency aversion (Option 2) and variant inequality (the sum of all pairwise differences between the values, Option 1).
- Indecisiveness: We measure indecisiveness using a scale by Frost & Shows (1993).
3. **Choices**
- Risky Choices
- risky choice: choose option to play
- safe choice: choose option to forgo the game
- Moral Machine
- utilitarian choice: try to maximize number of survivors
- deontological choice: stay on course to avoid doing harm
- Moral DM
- in target trials (original recipient is one person)
- utilitarian choice: reallocate to maximize number of recipients by
- deontological choice: maintain original allocation to avoid doing harm
- Social Dilemma
- prosocial choice: cooperate to maximize joint payoff
- selfish choice: defect to maximize own payoff
- Animal Classification
- correct choice: selecting the correct category (e.g., mammal if the exemplar is "dog")
- incorrect choice: selecting the incorrect category (e.g., reptile if the exemplar is "dog")
4. **Decision Time**
We measure decision time by assessing for each trial the time passing between the onset of the presentation of the decision screen until the participant logs in their decision.
5. **Mouse Movement**
Using mousetrap (Kieslich et al., 2020) we measure the following movement related variables:
- x, y coordinates of mouse movements in the each task type
- time between stimulus onset and choice
6. **Introspection about the decision process**
On a 10 point Likert scale (0 not at all - 100 extremely), we measure:
- Difficulty: How difficult was this decision for you?
- Conflict: How much did you think back and forth between the options?
- Certainty: How certain are you that you made the right choice?
6. **Handedness**
Participants indicate their handedness in terms of which hand they use to write (left or right). In the lab stage, we additionally ask participants to indicate which hand they used to move the mouse (left or right).
### 18. Indices
#### Mouse Movement indices
From the x-y coordinates recorded we calculate the several indices after common data cleaning steps (for details see Wulff, et al., 2019):
1) adjusting spatial arrangement of mouse trajectories, i.e., mapping all tasks onto one side and letting trajectories all start at the same coordinate and
2) re-sampling trajectories, so that each trajectory consists of the same number of points (most often 101 points).
Once these preparations are done, we will calculate the following indices:
- Maximum Absolute Deviation (MAD): the maximum distance of a trajectory to a straight line between starting point and end point
- Area under the curve (AUC): area between trajectory and straight line between starting point and end point
- x-flips: number of directional changes on the x-axis
- sample entropy: spatio-temporal disorder of a trajectory
- motion time: time for ongoing movement
- idle time: total time during which there is no movement
## Analysis Plan
### 19. Statistical models
1. **Decision time**
Dual Process: Linear mixed effect model, predicting decision times with decision makers' preferences/abilities controlling for task and participant id as random effects.
Choice Discriminability: Interrupted linear regression for repeated measures, where the breakpoint to interrupt the regression is the midpoint of the scale for preferences/abilities, i.e.,
- 50% utilitarian responses in 10 classic trolley-type dilemmas
- median SVO score measured in the sample
- 50% correct answers in animal classification task
- risk neutral (4 safe choices)
2. **Mouse Movement**
Dual Process: Linear mixed effect model predicting [AUC, MAD, entropy] with decision makers' preferences/abilities controlling for task and participant id as random effects.
Choice Discriminability: write as above ..
3. **Choices**
Logistic regression for repeated measures, predicting the odds of making preference-consistent choices from the relevant preference, i.e.,
- higher odds of making utilitarian choices for more utilitarian decision makers
- higher odds of making cooperative choices for more prosocial decision makers
- higher odds of making risk-averse choices for more risk-averse decision makers.
Logistic regression for repeated measures, predicting the odds of making correct classifications from their classification abilities, i.e., higher odds of correctly classifying for more able decision makers.
For all analyses, we control for item-specific/participant variation.
4. **Introspection**
We compare the correlation of introspection-based and mouse-based commplicatedness between playbacks of own and others' playback trials (dependent groups, non-overlapping) with Zou’s (2007) confidence interval (see https://cran.r-project.org/web/packages/cocor/cocor.pdf).
### 20. Transformations
- For models including interaction terms, we center all predictors.
- For decision times, we conduct Shapiro-Francia normality tests and use log-transformations if the assumption of normal distribution is violated.
### 21. Inference criteria
We use the standard p<.05 criteria for determining if the test results suggest that the data are significantly different from those expected if the null hypotheses were correct. Additionally we will report CIs and effect sizes where appropriate and computationally possible.
### 22. Data exclusion
Data from participants who indicate that they are not right-handed (i.e., indicating that they use predominantly their left hand to write in the online or lab stage, or that they used the mouse in the lab stage with their left hand) will be excluded.
We will exclude trials, in which subjects did not initiate mouse-movements for 5 seconds in a trial.
As choices are forced, in order to move forward within the mousetracking tasks, trial completion can not serve as a criterion, however, very fast trial completion times, might indicate that participants, did not take the tasks serious. Hence we will exclude trials that are two SDs below the average task completion time within one task type. Given that there are more than 50% of such trials for a task type, we will exclude all data from this task type. Given that there are more than 50% of task types excluded we will exclude the participant from analysis.
### 23. Missing data
We will exclude data from participants with missing mouse recordings.
We will exclude data from participants with incomplete data in the moral inclinations measure.
### 24. Exploratory analysis
Decision makers who are more utilitarian are more likely to make utilitarian choices the cheaper the moral good for a single person in the group (compared to the individual person).
More difficult trials (see definition in 16) are associated with higher decision effort, both in terms of self-reported introspection and mouse-movement-based inference.
## Other
### 25. Other
Not applicable.
[](https://hackmd.io/Ll0Iedx2S76VDL27uYD4rQ)