# Pupil-size-and-microsaccade-analysis
Matlab scripts for preprocessing pupil size timeseries, detecting microsaccades, extracting features, simple plotting and coordinating with R to perform LMM.
Table of contents
[toc]
# I. Import data and data structure
## A. Folder structure
The folder should be conduct with this structure.
Project
└─ rawdata
└─ derivatives
└─ raw
├─ sub-01
├─sub-01_task-Stroop_info
├─sub-01_task-Stroop_beh
└─sub-01_task-Stroop_pup
└─ prec
├─ olFalg.mat
├─ VarName.mat
├─ sub-01
├─sub-01_task-Stroop_info
├─sub-01_task-Stroop_beh
└─sub-01_task-Stroop_pup
# II. Create project and preprocess data
## A. Create project
```m
createProject();
```
This function simply create raw folders under project path and convert files in rawdata into a standard format used in this toolbox.
## B. Preprocess data
```matlab
doPreprocess(projectDir)
```
Input project path as variable *projectDir*.
This function would performe multiple steps to preprocess pupillometry data, detect microsaccades, and save processed data in prec folder. Steps included are followed:
### 1. Upsample Data
```matlab
nData = upSample(Data);
```
This function upsample raw pupillometry data to 1000 Hz.
### 2. Convert pixel data into diameter
```matlab
nDataa = cvtDia(Data);
```
This function linear calibrates the square root of pixel data into diameter data by linear function ```p = polyfit([sqrt(6607) sqrt(3796) sqrt(1624) sqrt(414)],[8 6 4 2],1);```
### 3. Remove blink and invalid interval
```matlab
[nData, rmFlag] = rmBlink(Data,'interp','left');
```
This step adopts a function ```rawDataFilter()``` from a pupil size preprocessing toolbox (Kret & Sjak-Shie, 2018) to detect blink and invalid interval left eye data, and interpolates these invalid interval. Trials that valid point rate below 30% would be target as outlier trial.
See:
>1. Kret, M. E., & Sjak-Shie, E. E. (2019). Preprocessing pupil size data: Guidelines and code. Behavior research methods, 51, 1336-1342.
>2. [Github repository link](https://github.com/ElioS-S/pupil-size)
### 4. Epoch data
```matlab
evtList = targeton;
tw = [-1000 4000];
[l_epData, r_epData, rmFlag] = doEpoch(nData.rbkData,evtList,tw);
```
This function epoch pupil data according to a eventlist and a epoch window. Here, this example takes target onset as event, and sets the epoch window from 1000 before to 4000 ms after target onset.
### 5. Baseline correction
```matlab
tw = [-200 0];
tw = tw - (-1000); % Make baseline window matching data index
nData = Data - mean(Data(tw(1):tw(2),:));
```
This step subtracts averaging data from 200 ms before to target onset for each trial.
### 6. Extract microsaccade features
```m
% To detect sac, remove blink and invalid interval without interpolating
[nData.nanData, ~] = rmBlink(nData.cvtData,'nan','left');
[nData.nanData, ~] = rmBlink(nData.nanData,'nan','right');
% Detect binocular candidates for saccades (Engbert & Mergenthaler, 2006).
[sacData, rmFlag.sac] = detectSac(nData.nanData,evtList);
% Remove saccades those exceed specific criteria
tw = [-1000 3999]; % Use whole window, we can epoch small window later
thr = [0.1 2]; % Angle amplitude criteria
vthr = [0 200]; % Velocity criteria
dur = false; % Duration of microsaccades
intv = false; % The interval between two microsaccades
Sac = rmSac(sacData, tw, thr, vthr, dur, intv);
% Extract feature of microsaccads and epoch
tw = [-1000 4000];
[RateMatx, AmpMatx, VelMatx, AngMatx] = getSacMTX(Sac,tw);
```
First, This step generates data serie that is blink removed and invalid interval keeped. Then, it detects saccade by applying a toolbox from Engbert and Mergenthaler in 2006 with setting relative velocity threshold as 5, and output rmFlag indicating outlier trial that falied to detect saccad. Also, it only keeps saccades that are coincident in overlaping time at both eyes. Following, it removes saccades that dismatch features of microsaccade. Finnaly, it generates four matrixs, including:
1. RateMatx: 1 represents microsaccads occuring at that time point.
2. AmpMatx: Store amplitude value at that time point.
3. VelMatx: Store velocity value at that time point.
4. AngMatx: Store theta value from polar coordinates system at that time point.
See:
>1. Engbert, R., & Mergenthaler, K. (2006). Microsaccades are triggered by low retinal image slip. Proceedings of the National Academy of Sciences, 103(18), 7192-7197.
## C. Exclude outlier trials and participants
```matlab
excludeOutliers(projectDir)
```
This function aims to indicates bad trials produced in previous steps and are excluded in following analysis, including:
1. Incorrect response
2. Reaction times exceed 3 standard deviations from the mean
3. No enough time point for epoching
4. Valid point rate below 30% when remove blink
5. Fail to detect saccade
Also, this step exclude participants that:
1. has low accuracy in any congruency condition (< 48%)
2. detects none microsaccades in each trial within epoch window.
# III. Main functions
This section would introduce some main functions that this toolbox could implement, including getting interested values, t-test, two-way mixed anova, dymamic, main sequence, polar and box plot. To perform these steps with following instruction, see `Exp_session_02.m`
## A. Extracte interested values
To extract interested values, we use this function:
```m
Data = getValues(projectDir, ...
dv, ... % str; dependent variable
tw, ... % [num,num], two element array; if dv is time series variable, input array
% to average target window; otherwise, input empty array [].
iv, ... % cell; independent variable; {'none'} for no IV; {'cond1'} for one IV;
% {'cond1', 'cond2'} for 2 IVs.
ivVals);% cell; IV values; indicate certain subcondition for extracting.
```
### 1. Non-time series data
Let's say we were interesting in reaction time for two group, we could apply:
```m
Data = getValues(projectDir, ...
'RT', ...
[], ...
{'group'}, ...
{});
```
Here, the `dv` was setted as `'RT'` which stand for reaction times; RTs is not a time series variable, hence, we leave `tw` empty, `[]`. We also interest in RTs in different group, so we assign `iv` as `{'group'}`. And for now we leave `ivVal` empty, cause we don't want to extract subcondition values. Below is head result of `Data`:

First column stand for the index of subject; second for number of trials used to average; third is the iv condition and fourth is the dv value extracted.
:::info
Note: To check what variable could be used to assign as DV and IV, you could check *VarName.mat* in *prec* folder and it depents on how you setting behavioural table.
:::
However, as you can see, the 16th subject has only 19 trials qualified to generate mean RT, and it may lead to higher measurement error. To address with it, we could consider remove subjects with few trials.
```m
Data = rmInsff(Data, n)
```
`n` stand for the criteria number that used to exclud subject who has lower trials than it.
### 2. Time series data
If we want to extract time series data, let's say it's left pupil diameter size, we could apply:
```
Data = getValues(projectDir, ...
'Pupil_l', ...
[], ...
{'group'}, ...
{});
```
We assigned `DV` as `'Pupil_l'`, standing for left pupil size, `tw` as `[]`, `IV` as `{'group'}`. Here is head result:

The dynamic data is stored in `Pupil_l` column as array format.
If we want to extract average value based on certain window, from 1000 to 2000 ms, from dynamic data, we could use follow input:
```
Data = getValues(projectDir, ...
'Pupil_l', ...
[1000 2000], ...
{'group'}, ...
{});
```
And the output is:

### 3. Extract value according to multiple conditions
If we want to extract mean left pupil size value based on both group and condition, we could apply:
```
Data = getValues(projectDir, ...
'Pupil_l', ...
[1000 2000], ...
{'group','cond'}, ...
{});
```

By including additional IV, `'cond'`, we have values that extracted from each sub condition.
If we only interest in certain sub condition, we could achive it by assign values in `ivVal`. For example, we want extract value that under condition 2 under both group:
```
Data = getValues(projectDir, ...
'Pupil_l', ...
[1000 2000], ...
{'group','cond'}, ...
{1, 2; ...
2, 2});
```
Here, we assign `ivVal` as `{1, 2; 2, 2}`, it means we want to extract values from group 1 under condition 2 and from group 2 under condition 2.

As you can see, we extracted values based on the subcondition we indicated. However, there are `nan` values, that is because dismatch between subject and group. We could simply apply `rmInsff()` to remove these nan rows:
```
Data = rmInsff(Data, 0);
```

## B. Plot data
### 1. Plot dynamic data
To plot dynamic data, we use this function:
```m
plotDynamic(Data,varargin)
% Input:
%
% Data: table; dynamic data that used to plot
%
% Optional input:
%
% tw: [st,ed,epochST]; default [1, 5000, 1], time window to plot. st: start point;
% ed: end point; epochST: epoch start point.
% col: cell; default jet(numCond).
% xlab: str; default '', label of x axis.
% ylab: str; default '', label of y axis.
% lnstyle: cell; default '-'.
```
We use left pupil size as example. First, we extract data following steps mentioned previously.
```m
Data = getValues(projectDir, ...
'Pupil_l', ...
[], ...
{'group'}, ...
{});
```
Then, we plot it within window of -200 to 4000 ms
```m
plotDynamic(Data, ...
'tw', [-200 3999 -1000], ...
'xlab', 'Time from target onset (ms)', ...
'ylab', 'Pupil size (mm)', ...
'col', [[0 .5 .8];[.8 .1 0]])
```

Here is examplt for two condition, group and condition:
```m
Data = getValues(projectDir, ...
'Pupil_l', ...
[], ...
{'group','cond'}, ...
{});
plotDynamic(Data, ...
'tw', [-200 3999 -1000], ...
'xlab', 'Time from target onset (ms)', ...
'ylab', 'Pupil size (mm)', ...
'col', [[0 .5 .8];[.8 .1 0];[0 .5 .8];[.8 .1 0]], ...
'lnstyle', {'-','-','--','--'})
```

### 2. Box plot
In order to generate these plot, following function is used:
```m
plotBox(Data,varargin)
% Input:
%
% Data: table; dynamic data that used to plot
%
% Optional input:
%
% dot: boolean; default false, plot subject values or not.
% col: cell; default jet(numCond).
% ylab: str; default '', label of y axis.
```
We take left pupil size mean value in window 1000 to 2000 ms as example.
```m
Data = getValues(projectDir, ...
'Pupil_l', ...
[1000 2000], ...
{'group','cond'}, ...
{});
plotBox(Data, ...
'dot', true, ...
'col', [[0 .5 .8];[.8 .1 0];[0 .5 .8];[.8 .1 0]], ...
'ylab', 'Pupil size (mm)');
```

### 3. Plot main sequence
To do it so, we use:
```m
op = plotSQ(projectDir, ...
tw, % [num,num]; window used to extract microsaccades
iv, % cell; independent variables
ivVal, % cell; IV values; indicate certain subcondition for extracting
col % cell; color variable)
```
For example, we want to plot the main sequence within 0 to 2000 ms.
```
op = plotSQ(projectDir, ...
[0 2000], ...
{'group','cond'}, ...
{}, ...
[[0 .5 .8];[.8 .1 0];[0 .5 .8];[.8 .1 0]]);
```

### 4. Plot polar rate distribution
To do it so, we use:
```m
op = plotPolar(projectDir,
tw, % [num,num]; window used to extract microsaccades
iv, % % cell; independent variables
ivVal,% cell; IV values; indicate certain subcondition for extracting
nbins,% num; how many bins used to plot
col, % cell; color variable
lty, % cell; linestyle variable
rmFlat); % boolean; remove subject with flat eye
```
For example, we want to plot polar distrubution within 0 to 2000 ms.
```
op = plotPolar(projectDir, ...
[0 2000], ...
{'group','cond'}, ...
{}, ...
20, ...
[[0 .5 .8];[.8 .1 0];[0 .5 .8];[.8 .1 0]], ...
{'-','-','--','--'}, ...
true);
```

## C. Analysis
### 1. t-test
This function is apply to construct t-test:
```m
[df,tval,pval,cd,f] = doTTest(op,dv,iv,ivVal,paired,tail)
% Input:
%
% op: table; data used to perform t-test
% dv: str; dependent variables
% iv: str; independent variables
% ivVal: cell; values used to compare
% paired: boolean; paired sample or not
% tail: 'both','left','right'; used to indicate test tail
%
% Output
%
% df: degread of freedom of t-test
% tval: t value
% pval: p value
% cd: Cohen's d
% f :result in sentence
```
For example, we would like to compare RT difference between group. First, we extract values and take it as input to `doTTest()`:
```m
Data = getValues(projectDir, ...
'RT', ...
[], ...
{'group'}, ...
{});
doTTest(Data, ...
'RT', ...
'group', ...
{1 2}, ...
false, ...
'both');
```
It will generate the result:
```m
t(45) = -0.74, p = 0.462, d = -0.22
```
### 2. Two-way mixed ANOVA
```m
result = mixed_anova(op,dv,bt,wt,plotflag)
% Input:
%
% op: table; data used to perform t-test
% dv: str; dependent variables
% bt: str; between factor
% wt: str; within factor
% plotflag: boolean; plot interaction or not
%
% Output
%
% result: degread of freedom of t-test
```
Here is an example attempt to perform 2way mixed anova on RT to investigate group and condition effect:
```m
Data = getValues(projectDir, ...
'RT', ...
[], ...
{'group','cond'}, ...
{});
mixed_anova(Data, ...
'RT', ...
'group', ...
'cond', ...
false);
```
It's the result output generated:
```
Btw: F(1,45) = 0.56, p = 0.459, pes = 0.12
Wth: F(1,45) = 36.22, p = 0.000, pes = 0.45
Int: F(1,45) = 3.22, p = 0.079, pes = 0.07
Btw1: p(bonferroni) = 0.001
Btw2: p(bonferroni) = 0.000
```
First line indicates main between effect, second for main within effect, third for interaction effet, fourth for simple within effect under between factor 1, fifth for simple within effect under between factor 2.
:::warning
`mixed_anova()` is generate by chatGPT, it should be examied in furtur by comparing to result from R.
:::
### 3. Principle component analysis (PCA)
This analysis adopt ep_toolkit by Joseph Dien. For more detail, see:
>1. [ep_toolkit](https://sourceforge.net/projects/erppcatoolkit/)
>2. Chang Y, Chen H, Barquero C, et al. Linking tonic and
phasic pupil responses to P300 amplitude in an emotional
face-word Stroop task. Psychophysiology. 2023;61:e14479.
To perform PCA, use:
```m
doPCA(Data,% table; data used to perform pca
dv, % str; dependent variable
tw) % [st,ed,epochST]; array used to crop data
```
For example, we want to do PCA on left pupil size within window of 1 to 4000 ms and attempt to compare group and condition.
```m
Data = getValues(projectDir, ...
'Pupil_l', ...
[], ...
{'group','cond'}, ...
{});
doPCA(Data,'Pupil_l',[1 4000 -1000])
```
The result output is:
```
Componemnt 1 score:
Btw: F(1,45) = 0.00, p = 0.960, pes = 0.00
Wth: F(1,45) = 0.09, p = 0.760, pes = 0.00
Int: F(1,45) = 0.15, p = 0.703, pes = 0.00
Btw1: p(bonferroni) = 0.920
Btw2: p(bonferroni) = 0.707
Componemnt 2 score:
Btw: F(1,45) = 0.02, p = 0.876, pes = 0.01
Wth: F(1,45) = 2.95, p = 0.093, pes = 0.06
Int: F(1,45) = 0.35, p = 0.557, pes = 0.01
Btw1: p(bonferroni) = 0.172
Btw2: p(bonferroni) = 0.213
```
If the we extract data with two factors, it will generate mixed anova result for each component score.
:::info
For now, this function is setted to generate only two components. And if two factors are inputted, first one shall be an between factor, second one shall be a within factor.
:::
It also generate several figures in order to check how this analysis performed:






# IV. Procedure