# Intuition and Mousetracking
## Study Information
### 1. Title
Telling more about “telling more than we can know”: Inferences from mouse tracking and introspective reports
### 2. Authorship
Rima-Maria Rahal^1^ & Michael Schulte Mecklenbeck^2^
^1^ Max Planck Institute for Research on Collective Goods <br>
^2^ University of Bern & Max Planck Institute for Human Development
### 3. Description
We study cognitive processes during decision making via mousetracking. In repeated decisions from five domains (risky choice, moral decision making trolley, moreal decision making reallocation, social dilemmata, simple classification), testing the relation between preference strength and decision conflict between options, inferred from mouse-cursor trajectories. We additionally ask for introspective self-reports during a replay of the mouse-cursor trajectories for each choice, where participants either see their own trajectories or pre-recorded prototypical trajectories. These data will allow us to test, whether common assumptions about mouse-movements and cognitive, e.g., increased lateral movement with increased conflict, will hold.
Braindump Mouse Intuition – lies das einfach gleich, wenn wir sprechen
Program
- replay choices that are not mine (i.e., to the wrong side) – is that good or bad?
- labels are always on the right/left (i.e., not counterbalanced) – is that good or bad?
Preregistration
- RT appears twice (once again for conflict measures)
- section on MT measures missing: Michael?
- incentivised or no?
- sample size: effect size estimates?
- Drop? 6. Evaluations of mouse movements: In the playback phase, we ask participants to indicate if they thought a played back mouse movement was their own or somebody else's.
### 4. Hypotheses
1. **Choices**
Decision makers are more likely to make preference-consistent choices than preference-inconsistent choices. Tasks before experiment "match" behavior within experiment XXX [MSM] ... see 17.1 for details on tasks.
2. **Decision time**
* *Dual Process:* *Longer* decision times for
- More individualistic decision makers compared to more prosocial decision makers.
- More utilitarian decision makers compared to more deontological decision makers.
- More risk-averse decision makers compared to more risk-seeking decision makers. ([Kirchler et al 2017](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5486903/))
* False categorization than correct categorization. :question: XXX [MSM]
* *Alternative prediction:* *Shorter* decision times for
* More extreme preferences compared to mixed preferences.
* Categorizations? :question: XXX [MSM] (see DP above)
3. **Mouse movement**
* *Dual Process:* *Higher* conflict inferred from mouse movements for
* More individualistic decision makers compared to more prosocial deicison makers.
* More utilitarian decision makers compared to more deontological decision makers.
* More risk-averse decision makers compared to more risk-seeking decision makers.
* False categorization than correct categorization. :question:
* *Alternative prediction:* *Lower* decision conflict for
* More extreme preferences compared to mixed preferences.
* Categorizations? :question: XXX [MSM] (see DP above)
*
> Hier fehlt uns noch der Abschnitt zu MT measures (und RT sollte da nicht noch mal auftauchen.)
>
:ski: Mousetracking offers different measures of conflict which we will evalute in turn.
Higher conflict means an increase in *reaction time* (RT, defined as the time from mousemovement onset to button click (ie, choice)) as well as number of xflips (directional change along the x-axis), and the area under the curve (AUC, geometric area between the observed trajectory and the direct line between start- and choice-button).
utilitatrian > deontological
Green - deontological first and after xflip direction change to utilitarian choice
- Acceleration (...)
4. **Introspection** :question:
5. **Recognition of Mouse Trajectories** :question:
- self - introspection
- other - typical trajectories from Wulff et al
- additional question on recall
## Design Plan
### 5. Study Type
Experiment
### 6. Blinding
For studies that involve human subjects, they will not know the treatment group to which they have been assigned.
### 7. Is there any additional blinding in this study?
No.
### 8. Study design
We run a repeated measures design with preferences as between-subjects continuous predictors.
In the online phase to be completed before coming to the lab, participants complete instrospective preference measures.
In the lab phase, in randomized order, five blocks are shown. In each block:
1. Participants receive instructions.
2. Participants complete two practice trials.
3. Participants are shown a number of trials (details see below) in randomized order. For these trials, we record decisions, decision times and mouse trajectories.
4. Participants complete a replay phase, which shows plays back decisions and mouse trajectories from the trials. Randomly for each trial, participants either see their own decision and mouse trajectories, or pre-recorded decision and mouse trajectories. After each playback, we then ask participants to report on the choice difficulty, conflict, and certainty they experienced regarding the presented trial.
5. Participants complete a final questionnaire asking them to evaluate their choices in the preceding block on the same three dimensions (difficulty, conflict, certainty).
Block types:
* Risky Choices
* 12 gambles selected from Stillman, Krajbich, & Ferguson (2020), Study 1
* choices between option to play or to forgo the game
* when the game is played, there is a 50% chance to win the gain amount, and a 50% chance to loose the loss amount
* when the game is forgone, participants receive 0€ for certain
* Moral Machine
* 12 trolley-type dilemma tasks from [Award et al, 2018](https://www.nature.com/articles/s41586-018-0637-6)
* choices between option to swerve, sacrificing a vehicle's passengers, or to stay on collision course with pedestrians on the street
* Moral DM
* 12 moral reallocation tasks from Rahal, Hoeft & Fiedler (in preparation)
* choices between option to leave donations with pre-selected recipients or reallocatoe to benefit other recipients
* Social Dilemma
* 12 social dilemma games from Kieslich & Hilbig (2014):
* 4 trials of chicken games
* 4 trials of prisoners' dilemmas
* 4 trials of stag hunt games
* choices between options to cooperate or defect (presented without the cooperation frame as Options A and B)
* Animal Classification
* 12 animal classification tasks from Kieslich et al., 2020
* choices between categorizing animal exemplars (e.g., dog) as one of two classes (e.g., mammal or fish)
### 9. Randomization
We randomize the sequence in which blocks are displayed.
Within blocks, the sequences of the trials is randomized, too.
Participants randomly see playback of own vs. others' decision and mouse trajectories per trial.
## Sampling Plan
<!--
In this section we’ll ask you to describe how you plan to collect samples, as well as the number of samples you plan to collect and your rationale for this decision. Please keep in mind that the data described in this section should be the actual data used for analysis, so if you are using a subset of a larger dataset, please describe the subset that will actually be used in your study.
-->
Participants are sampled from the participant pool of the MPI DecisioLab in Bonn (Germany). Based on a power analysis ... Which study should be the base? all of them? :question:
check Kieslich for effect size ...
### 10. Existing data
Registration prior to creation of data: As of the date of submission of this research plan for preregistration, the data have not yet been collected, created, or realized.
### 11. Explanation of existing data
Not applicable.
### 12. Data collection procedures
We collect data from the participant pool of the MPI DecisioLab in Bonn (Germany), consisting mainly of students. Participants are recruited via HROOT in winter 2023. Participants will be invited to take part, and informed that they are eligible to sign up if they are above 18 and below 35 years of age, and speak good German. Participants receive an average of 12€ per hour for participating in the study, consisting of a show-up fee and variable components depending on their own decisions. :question:
### 13. Sample size
We plan to collect data from 100 participants. :question: sample size estimation above ...
### 14. Sample size rationale
We determine the sample size based on feasibility of data collection and funding available for participant payments.
Given the repeated measures design, were for each block, participants complete 12 trials, we forsee the experiment will be adequately powered :question:
### 15. Stopping rule
We will initially invite 100 people to the experiment. If no-shows and technical failures reduce the number of people actually participating in the study, we will continue to invite people in groups of 3 until we have reached or exceeded the target sample size of 100 participants.
## Variables
### 16. Manipulated variables
<!--
Describe all variables you plan to manipulate and the levels or treatment arms of each variable. This is not applicable to any observational study.
Example: We manipulated the percentage of sugar by mass added to brownies. The four levels of this categorical variable are: 15%, 20%, 25%, or 40% cane sugar by mass.
More information: For any experimental manipulation, you should give a precise definition of each manipulated variable. This must include a precise description of the levels at which each variable will be set, or a specific definition for each categorical treatment. For example, “loud or quiet,” should instead give either a precise decibel level or a means of recreating each level. 'Presence/absence' or 'positive/negative' is an acceptable description if the variable is precisely described.
-->
* Risky Choices
* Trials vary regarding the difference of expected value (EV) between the option to play or to forgo the game, probabilities are kept constant at 50% throughout all gambles.
* More difficult trials have a smaller EV difference, the overall EV range is between 0.625 and 6.75 Euros.
* Moral Machine
* Trials vary regarding the number (1 to 4) and types (male/female, adult/child, normal/burgler, young/old) of lives (human, animal) saved or sacrificed.
* More difficult trials have a smaller difference of lives saved. More difficult trials have more similar types of lives saved.
* Moral DM
* Trials vary regarding the original recipient (one person vs. group of people) and the number of people in the group (2 vs. 3 vs. 4). Target trials are trial in which a single person is the original recipient and would obtain the donation by default, and the group of people would receive the donation due to a decision to reallocate.
* More difficult trials have a smaller number of people in the group.
* Social Dilemma
* Trials vary regarding the sucker prize in case of one-sided cooperation, the reward for joint cooperation, the temptation prize in case of defection, and the punishment for joint defection.
* More difficult trials :question:
* Animal Classification
* Trials vary regarding the typicality of the animal exemplar for its category (e.g., typical mammal: horse, atypical mammal: dolphin). Depending on the exemplar, correct and incorrect categories displayed (fish, mammal, reptile, bird, insect, amphibian).
* More difficult trials present atypical animal exemplars.
### 17. Measured variables
1. **Preference Strength**
* Risky Choices
* We measure risk preferences using the Holt & Laury (2002) risk measure.
* Moral Machine & Moral DM
* We measure moral preferences by assessing how many utilitarian choices are made in 10 classic trolley-type dilemmas (Rahal, Hoeft & Fiedler, in preparation).
* Social Dilemma
* We measure social preferences using the Social Value Orientation slides measure (Murphy, Ackermann & Handgraaf, 2011).
* Animal Classification
* We measure classification ability by assessing how many out of 10 classification choices are made correctly.
2. **Additional Introspective Reports**
* Equality-efficiency trade-off: We measure participants’ equality-efficiency trade-off in third-party games via a new measure assessing the degree to which participants follow each motivation based on Engelmann and Strobel (2004). Participants are asked to make 3 decisions between two options each (see Figure 1), where money is allocated between three hypothetical players (Person A, Person B and Person C). The decision makers’ own payoff are not affected by their choices. In each decision task, choices for each of the two options are motivated by one motivation: inefficiency aversion (Option 2) and Varian inequality (the sum of all pairwise differences between the values, Option 1).
* Indecisiveness: We measure indecisiveness using a scale by Frost & Shows (1993).
3. **Choices**
* Risky Choices
* risky choice: choose option to play
* safe choice: choose option to forgo the game
* Moral Machine
* utilitarian choice: swerve to maximize number of survivors :question:
* deontological choice: stay on course to avoid doing harm :question:
* Moral DM
* in target trials (original recipient is one person)
* utilitarian choice: reallocate to maximize number of recipients by
* deontological choice: maintain original allocation to avoid doing harm
* Social Dilemma
* prosocial choice: cooperate to maximize joint payoff
* selfish choice: defect to maximize own payoff
* Animal Classification
* correct choice: selecting the correct category (e.g., mammal if the exemplar is "dog")
* incorrect choice: selecting the incorrect category (e.g., reptile if the exemplar is "dog")
4. **Decision Time**
We measure decision time by assessing for each trial the time passing between the onset of the presentation of the decision screen until the participant logs in their decision.
5. :ski: **Mouse Movement**
6. **Introspection about the decision process**
On a 10 point Likert scale (0 not at all - 100 extremely), we measure:
* Difficulty: How difficult was this decision for you?
* Conflict: How much did you think back and forth between the options?
* Certainty: How certain are you that you made the right choice?
Additionally, we ask participants to indicate which choice they had made during the choice trial (response options: left, right, don't remember). :question:
### 18. Indices
> hier weiter!
<!--
If any measurements are going to be combined into an index (or even a mean), what measures will you use and how will they be combined? Include either a formula or a precise description of your method. If your are using a more complicated statistical method to combine measures (e.g. a factor analysis), you can note that here but describe the exact method in the analysis plan section.
Example: We will take the mean of the two questions above to create a single measure of ‘brownie enjoyment.’
More information: If you are using multiple pieces of data to construct a single variable, how will this occur? Both the data that are included and the formula or weights for each measure must be specified. Standard summary statistics, such as “means” do not require a formula, though more complicated indices require either the exact formula or, if it is an established index in the field, the index must be unambiguously defined. For example, “biodiversity index” is too broad, whereas “Shannon’s biodiversity index” is appropriate.
-->
**Moral inclinations**<br>
The degree of utilitarianism (vs. deontological inclinations) is the percentage of trolley-type dilemmas where the utilitarian option was chosen.
## Analysis Plan
<!--
You may describe one or more confirmatory analysis in this preregistration. Please remember that all analyses specified below must be reported in the final article, and any additional analyses must be noted as exploratory or hypothesis generating.
A confirmatory analysis plan must state up front which variables are predictors (independent) and which are the outcomes (dependent), otherwise it is an exploratory analysis. You are allowed to describe any exploratory work here, but a clear confirmatory analysis is required.
-->
### 19. Statistical models
<!--
What statistical model will you use to test each hypothesis? Please include the type of model (e.g. ANOVA, multiple regression, SEM, etc) and the specification of the model (this includes each variable that will be included as predictors, outcomes, or covariates). Please specify any interactions, subgroup analyses, pairwise or complex contrasts, or follow-up tests from omnibus tests. If you plan on using any positive controls, negative controls, or manipulation checks you may mention that here. Remember that any test not included here must be noted as an exploratory test in your final article.
Example: We will use a one-way between subjects ANOVA to analyze our results. The manipulated, categorical independent variable is 'sugar' whereas the dependent variable is our taste index.
More information: This is perhaps the most important and most complicated question within the preregistration. As with all of the other questions, the key is to provide a specific recipe for analyzing the collected data. Ask yourself: is enough detail provided to run the same analysis again with the information provided by the user? Be aware for instances where the statistical models appear specific, but actually leave openings for the precise test. See the following examples:
If someone specifies a 2x3 ANOVA with both factors within subjects, there is still flexibility with the various types of ANOVAs that could be run. Either a repeated measures ANOVA (RMANOVA) or a multivariate ANOVA (MANOVA) could be used for that design, which are two different tests.
If you are going to perform a sequential analysis and check after 50, 100, and 150 samples, you must also specify the p-values you’ll test against at those three points.
-->
1. **Choices**<br>
Mixed effects logistic repeated measures regression, predicting the odds of making a utilitarian choice from the decision makers’ moral inclinations, while controlling for the decision maker’s tendency to indecisiveness, equality-efficiency trade-off, as well as item-specific variation.
2. **Decision Time**<br>
*Dual Process:*
Mixed effects linear repeated measures regression, predicting decision times from the decision makers’ moral inclinations, while controlling for the decision maker’s tendency to indecisiveness, equality-efficiency trade-off, as well as item-specific variation and the trial number.<br>
*Alternative:*
Interrupted mixed effects linear repeated measures regression, predicting decision times are predicted from the decision makers’ moral inclinations, while controlling for the decision maker’s tendency to indecisiveness, equality-efficiency trade-off, as well as item-specific variation and the trial number. As the breakpoint to interrupt the regression, we use the moral inclinations score of 0.5, indicating a fully mixed moral type.
3.:ski: **Mouse Movement**<br>
### 20. Transformations
<!--
If you plan on transforming, centering, recoding the data, or will require a coding scheme for categorical variables, please describe that process.
Example: The “Effect of sugar on brownie tastiness” does not require any additional transformations. However, if it were using a regression analysis and each level of sweet had been categorically described (e.g. not sweet, somewhat sweet, sweet, and very sweet), ‘sweet’ could be dummy coded with ‘not sweet’ as the reference category.
More information: If any categorical predictors are included in a regression, indicate how those variables will be coded (e.g. dummy coding, summation coding, etc.) and what the reference category will be.
-->
For models including interaction terms, we center all predictors.
For decision times, we conduct Shapiro-Francia normality tests and use log-transformations if the assumption of normal distribution is violated.
### 21. Inference criteria
<!--
What criteria will you use to make inferences? Please describe the information youÍll use (e.g. p-values, bayes factors, specific model fit indices), as well as cut-off criterion, where appropriate. Will you be using one or two tailed tests for each of your analyses? If you are comparing multiple conditions or testing multiple hypotheses, will you account for this?
Example: We will use the standard p<.05 criteria for determining if the ANOVA and the post hoc test suggest that the results are significantly different from those expected if the null hypothesis were correct. The post-hoc Tukey-Kramer test adjusts for multiple comparisons.
More information: P-values, confidence intervals, and effect sizes are standard means for making an inference, and any level is acceptable, though some criteria must be specified in this or previous fields. Bayesian analyses should specify a Bayes factor or a credible interval. If you are selecting models, then how will you determine the relative quality of each? In regards to multiple comparisons, this is a question with few “wrong” answers. In other words, transparency is more important than any specific method of controlling the false discovery rate or false error rate. One may state an intention to report all tests conducted or one may conduct a specific correction procedure; either strategy is acceptable.
-->
We use the standard p<0.05 criteria for determining if the test results suggest that the data are significantly different from those expected if the null hypotheses were correct.
### 22. Data exclusion
<!--
How will you determine what data or samples, if any, to exclude from your analyses? How will outliers be handled? Will you use any awareness check?
Example: No checks will be performed to determine eligibility for inclusion besides verification that each subject answered each of the three tastiness indices. Outliers will be included in the analysis.
More information: Any rule for excluding a particular set of data is acceptable. One may describe rules for excluding a participant or for identifying outlier data.
-->
Describe your data exclusion criteria here or state not applicable.
### 23. Missing data
<!--
How will you deal with incomplete or missing data?
Example: If a subject does not complete any of the three indices of tastiness, that subject will not be included in the analysis.
More information: Any relevant explanation is acceptable. As a final reminder, remember that the final analysis must follow the specified plan, and deviations must be either strongly justified or included as a separate, exploratory analysis.
-->
We exclude data from participants with missing mouse recordings. We further exclude data from participants with incomplete data in the moral inclinations measure.
### 24. Exploratory analysis
Decision makers who are more utilitarian are more likely to make utilitarian choices the cheaper the moral good for a single person in the group (compared to the individual person).
## Other
### 25. Other
<!--
If there is any additional information that you feel needs to be included in your preregistration, please enter it here. Literature cited, disclosures of any related work such as replications or work that uses the same data, or other context that will be helpful for future readers would be appropriate here.
-->
Not applicable.
## References
Enter any references used throughout the text here.
[](https://hackmd.io/Ll0Iedx2S76VDL27uYD4rQ)