--- title: "Registered Report - Scales Experiment" author: "Moss, F. and Eerola, T." date: "18/7/2022" geometry: a4paper output: pdf_document bibliography: references.bib csl: apa.csl header-includes: - \usepackage{fancyhdr} - \pagestyle{fancy} - \fancyhead[HR]{Preregistration - Registered Report} --- # Introduction - Scales are one of the _fundamental_ building blocks of music. - Most musical systems in the world utilise five to seven notes within the octave but a variation exists that are sometimes called modes) [@savage2015statistical] - Why only a handful of scales have been favoured when the division of the octave into 12-TET could lend itself to a large number of different scales? - Various theoretical properties of scales have been identified (cardinality, symmetry, well-formedness, etc.) [@harasim2020axiomatic] - Scale structures may be linked to the physics/overtones/consonance or to vocal production [@gill2009biological] - In this work, we have several strong **assumptions**: + octave equivalence + equal temperament (?) or at least division into 12 x 100 cents. + cultural competence of the listeners (non-experts, Western listeners as a baseline) ## Research questions - How can we probe empirically the structural differences of the scales in an economic and viable fashion? - How are structural features of musical scales related to perception? In other words, are the differences in the scales readily picked up by listeners? - Are listeners able to discriminate the differences in the scales and if so, do the underlying theoretical properties of the scales predict the perceived structures (that is, the distances between scales)? ## Hypotheses Our hypotheses of the expected results are as follows: 1. **Common scales vs rare scales**: We expect listener judgements to relate to familiarity, which will be measured with relatively uncontroversial measures (overlapping tones, voice leading, or by Johnson-Laird "Tonal Dissonance model") 2. **Cardinality**: We expect cardinality to have a large impact on differences between scales. 3. **Symmetry**: We foresee symmetry (rotational and mirror) having an impact on perception of scales. 4. **Consonance of the scales**: We expect listener judgements to relate to consonance of scales, if the scale is treated is an instantaneous pitches (measurable with roughness and harmonicity models). 5. **Common attributes of existing music using these scales**: Pitch proximity (or small intervals) is a significant aspects of melodies found across the world [@huron2001; savage2015statistical]. We will compare the impact of these factors on participants perception of scale differences. # Method We want to avoid labelling and rating known properties of the scales as these are difficult to articulate verbally and would be dependent on musical training. Instead, we choose to collect _implicit similarity data_ by allowing the participants to nominate which an instance of scale which is different from the two other instances. This method, called _spot the odd one out_, has been successfully used in colour research [@griffin2004optimality] and in studies of reading comprehension [@seigneuric2000working], and also in music-related tasks, for similarity of commercial music tracks [@wolff2014spot] and testing the memory for melodies [@mullensiefen2015investigating]. ## Potential scales See [scale_properties.ipynb](scale_properties.ipynb) notebook here for details. ## Stimulus details The tone clouds are detailed in [tone_clouds.ipynb](tone_clouds.ipynb). A pilot experiment will be used to optimise the duration of the tones and other properties of the tone clouds. - Shepard tones ## Procedure We will collect the data using *PsychTestR* / *PsyNet*, a web-based service designed for large-scale iterative experiment [@harrison2020c]. The participants will be required to use headphones to take the experiment; we will implement the headphone check proposed by @woods2017headphone which utilises phase-information to create differences in dynamics that can be easily detected with headphones but not with external speakers such as those found in laptops. We follow the guidelines set out by Woods et al. in the headphone check (i.e., the participants will hear 6 separate items and will need to get at least 5 of them correct in order to proceed). The experiment proper will utilise a task called "spot to odd one out". Each item will consist of the three tone clouds separated by a silence of XXXX ms. After hearing the three tone clouds, participant will have to choose the one which is does not fit the others. In addition, confidence ratings will be collected (on a scale of 1-4). Participants will first complete a XXX-item familiarisation block, which will be followed by the experimental block of 24 items. In addition, there will be 4-6 extra items: - 3 attention control triplets that contain the maximally different triplets (separate items) - 3 transposition control trials that contain the variants of the previous trials transposed -1, +6 semitones. - timbre controls for a subsample to test that the similarity ratings don't change too much (i.e. they are timbre-invariant) ## Sample size Previous studies have ....? In the current study, we want to keep the experiment length compact and compensate for the number of observations with increasing the sample size compared to the related studies. For our design (xxx triplets.....), we will collect data from XXX participants as this creates a comparable number of observations per cell to other studies (???? 10 readings). In addition, we have calculated the power analysis and sample size estimation that supports the proposed _N_ (reported in a later section). We will recruit participants from [Prolific.ac.uk](http://Prolific.ac.uk) with two recruitment criteria: 1) English as a native native language to ensure a clear understanding of the instructions. ## Data analysis and quality control We will run quality control analyses and discard participants who fail the 2 or more control items. We will recruit initially XX participants to obtain the planned sample size (XXX, see the following section). After this recruitment, will we not analyze the data except to check the quality (missed answers, the speed of responses) to determine how many participants have not fulfilled the criteria. If we do not have the target sample size after eliminating those that do not fill the quality criteria, we will recruit the missing number of participants, and repeat the quality control re-recruitment process until we have the target number of participants. Based on our past experience this process will not be long as typically only a small number of participants (\<5%) will be discarded during the process. The initial data screening involves discarding participants who (a) fail the headphone check (b) fail to respond to all items (c) respond incorrectly to 2 or more control items We will report the outcome of these data exclusion operations (how many participants are eliminated and the reason for their elimination). Some of the reaction time data will be missing as the participants will be too slow or too fast to respond, or they respond incorrectly. All such data will be removed. If background (age, gender, musical sophistication) data is missing, we will retain the participant in the data and allow incomplete background information as these will be not be essential for the main research questions. The experiment has been approved by the ethics committee of the Department of Music at Durham University and will be conducted in accordance with its guidelines and regulations (`MUS-2022-CODE HERE`). In terms of demographic background questions, we will also collect age, gender and musical sophistication using the Ollen Musical Sophistication Index's 1-item measure [for successful implementation, see @Zhang2019] from all participants. # Proposed Analyses Our study design is a within-participant experiment, where we manipulate various **scale structure** (cardinality, symmetry, familiarity, and roughness). We have a within subjects design with XXX factors: - *cardinality* with 14 levels (0 to 13). - *symmetry* with XX levels (XXXXX). - *familiarity* with XX levels (XXXXX) / continuous? - *rougness* with a continuous values. As we have sparse design where participants only rate a subset of items (balanced how?), our analysis will be divided in two parts, raw decisions and aggregated decisions. ## Analysis of raw decisions In our first model, we evaluate whether the decisions are influenced by the properties of the scales (and transposition). ``` model1 = glmer(decision~cardinality+symmetryRot+ symmetryMir + roughness + register + (1|transposition) + (1|order) + (1|ID), family=binomial(logit)) ``` We then explore whether the degree to which the decisions are impacted by the theoretical distance ``` model2 = glmer(decision~VSLdistance+cardinality+ symmetryRot+symmetryMir+roughness+register+ (1|transposition)+(1|order)+(1|ID), family=binomial(logit)) anova(model1,model2) ``` ## Analysis of aggregated decisions (distances) At this point, we will aggregated the data across participants by summing the triplet results together to obtain a measure of distance/similarity. We then apply linear model ... ``` model3 = lm(distance~cardinality+ symmetryRot+symmetryMir+ roughness+register+transposition) ``` And quantify the impact of the predictors using the standard regression diagnostics (semi-partial correlations?) Scripts and data will be made available at Github [https://github.com/fabianmoss/scale_properties/](https://github.com/fabianmoss/scale_properties/). # References